Analysis of ClickDiary Data: Some Initial Results Tso-Jung Yen Institute of Statistical Science Academia Sinica [email protected] Joint work with Ta-Chien Chan, Yang-chih Fu, and Jing-Shiang Hwang July 7, 2015 Outline • • • Introduction • Definition of the egocentric network • Data collection Data analysis: • Main hypothesis • Model building strategy • Results Discussion Egocentric Networks • What is an egocentric network? • One node called ego is at center, surrounded by other nodes called alters. • From a graph viewpoint: An egocentric network is a graph G = (V, E) with V = {{ego}, {alter 1, alter 2, · · · , alter n}}, being the node set and E = {Eego,alter , Ealter,alter } being the link set, where Eego,alter = all links connecting alters to ego, Ealter,alter = all links connecting alters to alters. Egocentric Networks Figure: Graphical representation of an egocentric network. Egocentric Networks Figure: Graphical representation of an egocentric network. Egocentric Networks Figure: Graphical representation of an egocentric network. Egocentric Networks Figure: Graphical representation of an egocentric network. Data Collection • Collecting egocentric network data: • Position generators (Lin and Dumin, 1986; Lin et al., 2001). • Name generators (Laumann, 1973; Wellman, 1979). • Contact diary (de Sola Pool and Kochen, 1978; Freeman and Thompson, 1989; Fu, 2007; Chan et al., 2015): • Collects egocentric network data via self-reporting. • Collects egocentric network data on a daily basis. Data are in longitudinal format. Data Collection • ClickDiary: An online platform for collecting egocentric network data using the contact diary method. • Health diary (Personal health information on a daily basis): • • (1) sleep, (2) emotion, (3) dietary, (4) exercise, (5) flu symptoms, (5) number of contacted people and physical distance from home, (7) blood pressure, and (8) weight. Contact diary (Personal contact information on a daily basis): • (1) Contact type, content, time, duration, and location, (2) active or passive, (4) feel beneficial or not, (3) emotion change, (4) health information (e.g. flu-like symptoms). Data Collection • Collected between May 1, 2014 and October 31, 2014 (184 days). • Hierarchical data (3 levels): • # of egos: 130. • # of alters: 13,409. • # of contacts: 110,394. • # of alters in each egocentric network: minimum is 3, maximum is 1115, and mean is 103.15. • # of contacted days: minimum is 1, maximum is 184, and mean is 8.233. Hypothesis • Claim: • Quality of contact between ego and an alter is associated with the alter’s network position in the ego’s personal network (egocentric network). • Theories suggest this claim: • Theory of the weak ties (Granovetter, 1973). • Theory of structural holes and embeddedness (Burt, 2001; 2004; 2009). Model • Quantify the weak tie: • Ego i has a weak tie to alter j if i is not familiar with j (also include those whom ego i did not know previously). Model • Quantify embeddedness of an alter in an egocentric network: • Normalized embeddedness score based on the strong ties: NESSj = # of alters with whom alter j is familiar . # of alters − 1 • Normalized embeddedness score based on the weak ties: NESW j = # of alters with whom alter j knows but is unfamiliar . # of alters − 1 • NES scores quantify the proportion of alters whom alter j and the ego have known in common (A measure of triadic closure). Model • Quantify quality of contact: • To what extent did you (ego) feel beneficial when contacting the alter: • (1) lost (0.6%); (2) almost none (35.7%); (3) somewhat beneficial (46.0%); (4) very beneficial (17.7%). Model • Quantify quality of contact (contd): • The dependent variable: Yijl = I{ego i felt very beneficial after contacting alter j on record l}. Here l is an index for the contact record between ego i and alter j. • Weak ties may play important roles in achieving great gains, e.g. job finding, but may not be that important in achieving small or moderate gains. (1) Model • Assume logit[P(Yijl = 1)] = β0 + X p αk Xijlk k=1 S +β1 WeakTieij + β2 NESW ij + β3 NESij + θi , where • Xijlk ’s are controlled variables. • WeakTiej is an indicator whether alter j is weakly tied to ego i. • NESW ij is the normalized embeddedness score based on the weak ties. • NESSij is the normalized embeddedness score based on the strong ties. • θi is the random intercept associated with ego i. Model • List of controlled variables: • (1) Did ego i feel very beneficial when last time contacting alter j? • (2) Was this contact initiated by ego i? (3) Was this contact face-to-face? (4) Did the contact last longer than 1 hr? • (5) Homophily in sex and (6) homophily in age. • What is ego i’s relationship with alter j? (11 types) • How long has ego i known alter j? (5 levels) • How frequently does ego i contact alter j? (5 levels) • How likely does ego i discuss important issues with alter j? (5 levels) Model • Model estimation strategies: • Scenario I: Model estimation using truncated sample: Egos whose cumulative number of contacted alters over a certain period are too few (the last 10%) are dropped from the sample. • Scenario II: Model estimation using subsample (of the truncated sample): Contact records of spouses, parents, children and boy friends/girl friends are excluded from the data set. • Contacting spouses, parents, children, boy friends/girl friends usually generate non-instrumental gains. Removing these contacts allows us to detect the weak tie effect and embeddedness effect on instrumental gains. Model Estimation Figure: Scatter plot of the cumulative number of contacted alters vs the number of contact days. Number of egos M = 130. Model Estimation Figure: Scatter plot of the cumulative number of contacted alters vs the number of contact days. Number of egos M = 130. Model Estimation Truncated sample Subsample∗ # of egos (M ) 115 115 # of alters (K) 13,091 12,563 # of contact records ( 105,775 91,376 Table: Basic statistics for the two scenarios. ∗ means that sample excluding contacts with members of immediate family and partners. Model Estimation Truncated sample Subsample∗ Yijl = 0 86,361 (82.6%) 75,204 (82.3%) Yijl = 1 19,414 (18.4%) 16,172 (17.7%) Total (N ) 105,775 (100%) 91,376 (100%) Table: Basic statistics of the dependent variables for the four scenarios. ∗ means that sample excluding contacts with members of immediate family and partners Regression Results Figure: The probabilities of feeling beneficial after contacting the alter. The results are based on the logistic regression model estimated from the ClickDiary data (Number of egos M = 115; Number of alters K = 13, 091; Number of contact records N = 105, 775). Regression Results Figure: The probabilities of feeling beneficial after contacting the alter. The results are based on the logistic regression model estimated from the ClickDiary data (Number of egos M = 115; Number of alters K = 12, 563; Number of contact records N = 91, 376). Discussion • Possible sources of bias: • NES scores only count interpersonal relations within ego’s personal network. They do not consider interpersonal relations outside ego’s personal network. • Frequently contacted alters count most. This may result in the loophole of ”self-reinforcement” that alters are contacted often because ego can feel beneficial after contacting them. • Enthusiastic respondents count most. They are minority. This results in highly unbalanced data structure. • Modeling strategy: Ordinal regression. • Estimation technique: Inverse probability weighted methods (Horvitz and Thompson, 1952; Robbins et al. 1995; Tsiatis, 2006). ClickDiary App Figure: Available from January 2015.
© Copyright 2025 Paperzz