ORIGINAL ARTICLE Deployment and the Citizen Soldier Need and Resilience E. Michael Foster, PhD Background: The wars in Iraq and Afghanistan have made unprecedented demands on the nation’s citizen soldiers, the National Guard and Reserve. A major concern involves the repeated deployment of these forces overseas. Objectives: Using data from the Department of Defense Survey of Health Related Behaviors among the Guard and Reserve Force, we examined the effects of deployment on 6 health outcomes. Subjects: The Department of Defense Survey of Health Related Behaviors among the Guard and Reserve Force is a sample (n ⫽ 17,754) of all Reserve component personnel (including full time and/or activated Guard and Reservists) serving in all pay grades throughout the world. Research Design: We relied on inverse probability of treatment weights to adjust for observed confounders and used sensitivity analyses to examine the sensitivity of our findings to potential unobserved confounding. Results: Observed confounders explain much of the apparent effect of deployment. For men, the adjusted relationships could very well reflect further confounding involving unobserved factors. However, for women, effects of deployment on marijuana use, symptoms of post-traumatic stress disorder, and suicidal ideation are robust to adjustments for multiple testing and possible unobserved confounding. Conclusions: These effects are large in practical terms and troubling but suggest that media reports of the harm caused by deployment may be overstated. Such exaggerations run the risk of stigmatizing those who serve. Key Words: causal inference, deployment, mental health (Med Care 2011;49: 301–312) T he movement to an all-volunteer force has been accompanied by a transformation of the role of the National Guard and Reserve (NGR). Having historically served as a strategic reserve, the Guard and Reserve have become part of the nation’s operational force.1 This evolution reached new heights during the wars in Iraq and Afghanistan.2,3 Reservists From the Gillings School of Global Public Health, University North Carolina, Chapel Hill, NC. Reprints: E. Michael Foster, PhD, Department of Maternal and Child Health, University North Carolina at Chapel Hill, Rosenau Hall, Campus Box 7445, Chapel Hill, NC 27599. E-mail: [email protected]. Copyright © 2011 by Lippincott Williams & Wilkins ISSN: 0025-7079/11/4903-0301 Medical Care • Volume 49, Number 3, March 2011 mobilized at the start of the Iraq war in late spring 2003 exceeded 200,000. As of late 2006, NGR component troops represented nearly half (46%) of the combat brigades in Iraq.3 As of May 2007, more than 575,000 NGR had been mobilized since September 11, 2001.1 As noted, not only are Reservists and Guard members more likely to be mobilized, but their involvement has also changed. In particular, the frequency of deployments has increased, and the time between deployments (or so-called dwell time) has decreased. As a result, many NGR troops have been deployed repeatedly. In a 2008 study of 2543 members of the New Jersey National Guard preparing for deployment, 1 in 4 had been deployed to Iraq or Afghanistan previously.4 These changes have become “a point of deep concern both in and outside the military.”1 Deployment is of concern because serving overseas may signify increased exposure to combat. However, deployment is not synonymous with combat; many Guard and Reserve members serve in support positions. Nonetheless, deployment may cause stress to the members and families of NGR for several reasons. Deployment may tax relationships with spouses or partners and children as members spend long periods away from home. Other effects may be more subtle— deployment may jeopardize access to health insurance coverage among the member’s dependents.1 Returning home will not eliminate these issues entirely, and so the related stress may continue postdeployment. Returning home may bring added adjustments at work and home. Deployment is no doubt stressful for all military personnel, but this experience may be especially difficult for Guard and Reserve members as compared with active duty personnel.3,5 As they are primarily used in the civilian sector, deployment of Guard and Reserve members involves transitions to and from their usual jobs. Members and their families may live further from military posts, making it harder for their families to maintain access to health care. The members themselves may have relatively little postdeployment contact with other members of their units and may not benefit from the support of social and occupational peers.1,3 Post-traumatic stress disorder (PTSD) among Vietnam veterans, so-called Gulf War Syndrome among those serving in the first Gulf War, and other aspects of the health of military personnel have interested researchers and concerned policy makers for decades.6 –9 Much of that research focuses on combat exposure10 –12 and negative outcomes, such as binge drinking; studies focusing on deployment per se are www.lww-medicalcare.com | 301 Medical Care • Volume 49, Number 3, March 2011 Foster less common. A portion of the deployment literature considers deployment among Guard and Reservists, whereas other studies include only active-duty personnel.13 One study examined the experiences of 522 Guardsmen in Mississippi about to be deployed. The study compared those preparing for their first deployment with those previously deployed. The results revealed more PTSD, depressive, and somatic symptoms among the latter.3 Two other articles considered the effects of deployment among Reservists in the United Kingdom.5,14 These analyses suggested that deployment is associated with common mental disorders and fatigue. These analyses directly compared the effect of deployment for Reserve and regular personnel. The effect for the latter was weak and involved only physical symptoms. A handful of other studies considered the effect of repeated deployments. Kline et al4 examined 2543 Reservists preparing for deployment to Iraq in 2008. Previously deployed servicepersons were more than 3 times as likely as servicepersons with no deployments to screen positive for PTSD and major depression (The title of this article is misleading. “Multiple deployments” refers to individuals who have a prior deployment measured before the 2008 deployment. In that case, the outcomes measured reflect only a single deployment for some of these individuals). A recent report highlighted health problems stemming from repeated deployments among members of the NGR. Data from that study, “Department of Defense survey of health related behaviors among the Guard and Reserve Force,” (HRBGR) revealed that those Guard and Reservists deployed multiple times in the past 2 years were more likely to report symptoms of depression and PTSD. Deployment also was associated with suicidal ideation and limitation of activities due to poor mental health.15 A key issue in assessing this literature involves the distinction between association and causal relationships. As discussed later, describing an association between health problems and deployment is not without utility (eg, screening); however, in the present study, our focus is on the effect of deployment per se—that is, on whether the association between deployment and health problems reflects the effect of the former on the latter. Prior research suggests that confounding factors may amplify observed associations beyond the effects of deployment. For example, research from the first Gulf War indicates that individuals who were deployed were more likely to be in better health before deployment than those who were not deployed.16 The deployed individuals were younger on average and also had lower levels of education. They also had higher levels of risktaking. These factors represent obvious confounders for anyone interested in the effect of deployment—these factors likely influence not only deployment but also postdeployment health. Further complicating matter is that the process of military screening and training likely ensures that only individuals who are relatively healthy remain in the military long enough to be deployed more than once (This phenomenon has been labeled the “Healthy Warrior effect”4,17). The military may incorporate factors in screening that are relevant but not captured by the data available to researchers. 302 | www.lww-medicalcare.com Existing research differs in whether and how to handle potential selection bias. The analyses in some studies ignore the issue entirely.3 The 2 UK studies recognized the potential for confounding and adjusted comparisons of those who were and were not deployed for a range of potential confounders (age, gender, rank, educational and marital status, service branch, fitness to deploy, and Reservist status). Other studies relied on pre- or postdeployment comparisons with a comparison group of those who were not deployed.18 Using data from the HRBGR, this article examines the relationship between deployment and health outcomes. We considered 3 health behaviors and 3 mental health problems. These outcomes were highlighted in the original report and are generally representative of the outcomes considered in prior research. The original report did adjust between-group comparisons for a small number of covariates (gender, age, enlisted/officer indicator, marital status, education, and race/ethnicity).15 Our analyses in this study extend those findings in 4 ways. First, we adjusted for a more extensive list of covariates, such as whether a child lives with the respondent, the branch of military service, household income, and employment status. Second, we used propensity score methodology to better adjust for confounding factor. Propensity score methods have several advantages over regression, including the ability to check for covariate balance. Third, we examined the sensitivity of our findings to unobserved confounding, an ever present challenge to inferring causality from observational data. Finally, and perhaps most importantly, we pay particular attention to gender differences in the effect of deployment. Although most research on issues involving military personnel focuses on men or both genders combined, a smaller literature considers the experiences of female veterans.19,20 The effect of deployment on men and women may differ for several reasons. Although the gap has narrowed in recent years, women still have more responsibilities in the household for child care, cooking meals, and housecleaning, among others.21 The experiences of women who were deployed may differ as well (eg, sexual harassment22). Vogt et al22 considered gender differences in the effect of combat and other stressors and found larger effects on women in both the NGR and active duty units. Deployment is the focus of this article, but it is also important to be specific about what the article is not about. In particular, prior research largely focused on combat exposure, but these data lack that information. The focus in the present study is on deployment, and it remains true that the effects of combat exposure may be larger than those reported in this study. We have considered the implications of these findings for combat exposure in the discussion section. METHODS Data The HRBGR provides the first comprehensive, population-based picture of health behaviors in the total Guard and Reserve force. Conducted under the direction of the Office of the Assistant Secretary of Defense, this survey assesses lifestyle factors affecting health and readiness; identifies and © 2011 Lippincott Williams & Wilkins Medical Care • Volume 49, Number 3, March 2011 tracks health-related trends and high-risk groups; targets groups and/or lifestyle factors for intervention; and identifies future directions for additional studies, Department of Defense programs, and policies. The eligible population for the 2006 survey consisted of all Reserve component personnel (including full time and/or activated Guard and Reservists). Participants represent Reserve or Guard men and women in all pay grades throughout the world. The sampling was multistage. Sixteen geographic areas (clusters of zip codes where Department of Defense records indicated NGR lived) were selected in the first stage of sampling. Within these areas, the contractor attempted to sample 1000 persons evenly divided across the 6 NGR components (Army Reserve, Army National Guard, Navy Reserve, Marine Corps Reserve, Air Force Reserve, and Air National Guard). Generally, this task was accomplished by randomly selecting 1 relevant facility for each component in each area. Eighty percent of study participants were identified in this way, and these individuals were interviewed in-person at the facility involved. An additional 20% of the sample was recruited from those living outside of the selected geographic areas. These individuals completed paper and pencil interviews. The final sample consisted of 18,342 military personnel (2796 Army National Guard, 3215 Navy Reserve, 1159 Marine Corps Reserve, 6656 Air Force Reserve, 2851 Air National Guard, and 1665 Army Reserve). The overall response rate was 55.3%. The data were weighted to represent all Reserve component personnel, using sampling weights that are a function of age, gender, education, and race/ethnicity.15 Deployment Status Exposure is the number of deployments in the past 24 months. The term “deployed” refers to deployment overseas or outside the continental United States. Participants were told that deployment does “not include drilling weekends and Annual Training.” We grouped responses into “not deployed,” “deployed once,” and “deployed multiple times.” One can note that we have not used location of deployment to define exposure. The data revealed that 37% of the respondents served in Iraq or Afghanistan (This figure was 40% for men and 25% for women). The definition of exposure was driven by 2 considerations. First, place of deployment may introduce added selection factors not included in our covariates. In particular, those who were deployed to Iraq or Afghanistan may differ in unobserved ways from those deployed elsewhere. Second, because combat exposure is so common in Iraq and Afghanistan, limiting our attention to these areas makes it more difficult to distinguish the effect of deployment from that of combat exposure. Outcome Measures Binge drinking was defined as having 5 or more drinks on a single occasion at least once in the past 30 days. Marijuana use was measured as the use of marijuana/hashish in the past 12 months. This construct was measured with a set of questions about the use of various illicit drugs. The wording of these questions is consistent with that used in the prior Department of Defense surveys of active duty personnel © 2011 Lippincott Williams & Wilkins Deployment and the Citizen Soldier and has been used in other well-known surveys, such as the National Survey on Drug Use and Health. Smokers were defined as those who smoked at least 100 cigarettes during their lifetime and who last smoked a cigarette during the past 30 days. Suicidal ideation involved suicidal thoughts in the past year. The measures of depressive and PTSD symptoms were developed for use as a preliminary screening tool to identify individuals in need of further assessment and perhaps treatment. In particular, depressive symptoms were measured using an 8-item set of symptoms on the basis of 6 items from the Center for Epidemiologic Studies–Depression Scale23 and 2 items from the Diagnostic Interview Schedule.24 Individuals were identified as needing further evaluation on the basis of a cut-point determined in prior research.25 That research suggests that this screener has good sensitivity and high positive predictive value. In the case of PTSD, the survey relies on the PTSD Checklist-Civilian version.26 This measure consists of 17 questions involving experiences related to PTSD, such as loss of interest in activities the respondent used to enjoy, being “superalert” or watchful or on guard, having physical reactions when reminded of a stressful experience, and feeling jumpy or easily started. Respondents were asked to indicate how much they had been bothered by each of the 17 conditions; response options were “not at all,” “a little bit,” “moderately,” “quite a bit,” and “extremely.” Each statement was scored 0 to 4 and a sum for all items was computed. The participant was classified as screening positive for PTSD symptoms if the sum of symptoms was greater than or equal to 50.27 Covariates Covariates were selected to eliminate confounding—we identified variables that prior research revealed as predictive of deployment. In most instances, the presumption was that the variable also predicted the outcomes. For variables like education, this assumption involved no leap of faith as a large literature links education to health and health behaviors. For some variables, no effect on the outcome may exist (eg, rank). In that case, the analyses adjust for a covariate needlessly, but given the sample size, the resulting effect on efficiency is likely very small. Data availability also limited our choice, and some desirable covariates were missing (such as health behaviors before deployment). These variables represent sources of unobserved confounding, and we considered their potential effect on sensitivity analyses described later. Overall, the covariates used resemble those in prior research. Covariates include the respondents’ gender; age quantified using a set of 4 categories; race and ethnicity (7 categories: White, Black, or African American, Latino, American Indian, Asian, Pacific Islander; other); whether a child was living in the respondent’s household; marital status (6 categories: married; living as married; widowed and not living as married; divorced or separated and not living as married; never married and not living as married); service (Army National Guard; Army Reserve; Naval Reserve; Air National Guard; www.lww-medicalcare.com | 303 Medical Care • Volume 49, Number 3, March 2011 Foster Air Force Reserve; Marine Corps Reserve); enlistment status (enlisted or officer); annual income from all sources in the last 12 months (8 categories); infantry (yes/no); employment status (3 categories: employed part-time and not in school; in school; employed full time); and education (3 categories: high school graduate or less; some postsecondary education; college degree or more). Statistical Model Inverse Probability of Treatment Weights Our analyses represent an application of propensity score methodology. In a binary exposure (exposed, not exposed), the propensity score is the predicted probability of exposure. It represents a weighted sum of the covariates; the weights are determined by the link between the covariates and the exposure of interest. For example, one might use logistic regression with exposure as the dependent variable and the covariates as explanatory variables. The resulting coefficient estimates can be used to generate a predicted probability of exposure, the propensity score. As sample size grows, the propensity score captures all variation in the covariates related to exposure—that is, their potential to confound the link between the exposure and outcomes of interest.28 The propensity score “balances” the distribution of the covariates across levels of exposure.29 The propensity score is a statistical convenience, no more or less. Rather than adjusting comparisons of outcomes across exposures using several covariates, one can work with just one variable, the propensity score. One can use the propensity score in a variety of ways, such as matching.29 One option involves calculating inverse probability of treatment weights (IPTW). The IPTW is one over the probability that the exposure was actually experienced. For the exposed, the IPTW is one over the propensity score; for the other cases, one over one minus the propensity score. Analyses using these weights essentially correspond to analyses of a pseudo-population in which exposure and the covariates are unrelated. In other words, in that pseudo-population, any association between exposure and outcomes represents a causal relationship. The IPTW generalize to exposures involving 3 or more levels, such as not deployed, deployed once, or deployed multiple times.30,31 As discussed in Appendix 1, with 3 levels of exposure, the IPTW is the probability an individual was in the exposure group in which he or she appears. In that case, one might estimate a multinomial logit with 3 levels of the outcome, and use the resulting parameter estimates to generate 3 predicted probabilities, one for each level. Each individual will get one weight— one over the probability of the exposure actually experienced. This property of the IPTW reflects the nature of the propensity score as a “balancing score.”29 However, this property holds only as sample size grows large, and in any given sample, comparisons across levels of the exposure may reflect confounding by the covariates. Residual imbalance may reflect mis-specification of the equations used to generate the propensity score and other factors. For that reason, a key step in using the propensity score involves checking “balance”—that is, making sure that the distribution of the 304 | www.lww-medicalcare.com covariates is the same across levels of the exposure, when adjusted with the propensity score.32 Covariate balance should be checked regardless of the particular way in which propensity-score adjustments are operationalized. It should also be noted that we trimmed the IPTW at the fifth and 95th percentiles to improve the precision of the estimates. Doing so did not affect the balance of the covariates. Regardless of how one uses the propensity score, making the jump from association to causation depends on the key assumption of ignorability—that the list of covariates included is comprehensive and that no residual, unobservable confounding remains. If such confounding exists, it might hide a real relationship because those who were deployed were in better health before deployment. Alternatively, a statistically significant finding could reflect the fact that those who were deployed were unhealthier. The 2 possibilities could not apply to a single finding, but 1 of the 2 would be relevant for each (depending on whether the estimate is significant or nonsignificant). The assumption of no unobserved confounding is essential and cannot be tested. One alternative involves sensitivity analyses involving unobserved confounding of hypothetical strengths. This approach has been used to demonstrate the effect of smoking on health, and we used it in this study.33 Building on Rosenbaum,29 this analysis assumes that a hypothetical factor confounds comparisons across exposures even after adjusting for the covariates. This confounding factor has 2 levels, “healthy” and “unhealthy” and predicts exposure perfectly. The sensitivity analyses further ask the question of how strongly the unobserved factor would have to be related to the outcome to explain the observed (but adjusted) finding. One can express this relationship as the log-odds ratio of the unobservable on the outcome (The literature refers to this relationship as ⌫). ⌫ captures the sensitivity of a given finding to potential unobserved confounding. However, these analyses cannot demonstrate whether unobserved confounding exists or not. The Effect of Deployment on the Outcomes The outcomes considered in this study are dichotomous, and we used IPTW-weighted logistic regression to link deployment to the outcome. To assess the practical significance of the estimated effect of deployment, we calculated the so-called “marginal effect”—the effect of deployment on the predicted probability of the outcome.34 These analyses were approved by the Institutional Review Board of the University of North Carolina. Estimation The first stage of the analysis involved generating the IPTW. The propensity score was calculated using coefficient estimates from a multinomial logit with deployment status as the dependent variable. The resulting parameter estimates were used to generate the predicted probabilities of exposure level and then the IPTW. These estimates were generated using gender-specific models. A range of factors predicted deployment, and these relationships were consistent with © 2011 Lippincott Williams & Wilkins Medical Care • Volume 49, Number 3, March 2011 prior research (For example, older NGR members were less likely to be deployed). We further checked “balance,” ie, whether the distribution of the covariates was equivalent across levels of exposure. As all covariates were categorical, this checking process involved survey-weighted cross-tabs of the covariate with deployment status. Finally, estimates of the effect of deployment were generated using Stata’s survey-weighted logistic regression; the IPTW were used as sampling weights with the outcomes as the dependent variables. Estimation reflected the sampling design of the study. We included the probability weight for the data as a covariate in our model estimating deployment (We also examined other possibilities such as building a combined weight reflecting the propensity score and the survey weight. Our results were insensitive to the handling of the sampling weight. This insensitivity likely reflects that the variables used to generate the weights were included as covariates). RESULTS Table 1 describes the sample. These tabulations are unweighted to provide the reader with a sense of the actual cell sizes in the analyses (Weighted tables are available from the author). Of all, 76% of female servicepersons were not deployed. Of those who were deployed, about 3 in 10 were deployed multiple times (7% of females in the study). Forty percent of male servicepersons were deployed. Of these, roughly 1 in 3 was deployed multiple times (14% of all men). Table 1 also provides descriptive information on the outcomes and on the other characteristics of the sample, including the covariates. In the first stage of the analysis, the covariates as a group were highly significant predictors of deployment for both men and women (These estimates are available from the author). These results suggest that at least the potential for confounding was present in the earlier reports. Checking covariate balance revealed one comparison (enlistment status) for which a small but significant adjusted difference remained (P ⬍ 0.10). The difference was very small in practical terms. The percentage of participants who were an officer in the weighted data was 17%, 19%, and 17% for none, once, and multiple deployments, respectively. These figures were 15%, 12%, and 18% in the unweighted data. One alternative would be to modify the equation predicting exposure (eg, adding interactions). Instead, we included this covariate in the model predicting the outcome. The estimates of the effect of deployment were not affected. Table 2 provides the beta coefficients for deployment on the 6 outcomes. For each outcome, Table 2 provides 3 numbers—the logit coefficient, the standard error, and the marginal effect. The marginal effect captures the size of the relationship between the outcome and the exposure.34 For each outcome, there are 2 sets of figures, unadjusted and adjusted. The former is descriptive: it reflects the effect of deployment as well as confounders, both observed and unobserved. One can note that deployment was associated with several of the outcomes. Among males, for example, those who had not been deployed were less likely than those who © 2011 Lippincott Williams & Wilkins Deployment and the Citizen Soldier had been deployed once to be involved in 5 of the 6 behaviors. The unadjusted relationships were similar for women. The adjusted between-group differences are provided in the second column. One can observe 2 patterns. First, in most instances, adjusted differences were smaller in absolute terms—ignoring the covariates inflated the between-group differences. A second pattern involves the standard errors; as one would expect, adjusting for the covariates increased the standard error. Taken together, these 2 trends reduced several of the unadjusted between-group differences to statistical insignificance. In terms of significant findings, 3 of the 4 estimates for binge drinking were significant. Among both men and women, those who had never been deployed were less likely to binge drink than those who had been deployed once. The effects were fairly substantial— being deployed once (but not multiple times) raised the probability of binge drinking by 4 and 6 percentage points for men and women, respectively (The estimates are negative because the reference category is “deployed once”). Comparing those who were deployed once, multiple deployments raised the probability of binge drinking still further— by 6 and 7 percentage points, respectively. Only the effect for men was statistically significant at conventional levels. The effects for smoking and depressive symptoms tell a different story. When the covariates were considered, deployment was associated with none. For marijuana use, only 1 of the 4 adjusted effects was significant; those women who were deployed multiple times were much more likely to have used marijuana—9 percentage points. For suicidal ideation, the effect of multiple deployments was significant for both men and women. The effect for women (10 percentage points) was much larger than that for men (2 percentage points). For women, the difference between those who had been deployed once and those not deployed was also statistically significant. Combining the 2 effects, women who had been deployed multiple times were 15 percentage points more likely to experience suicidal ideation than those who were not deployed. Finally for PTSD symptoms, 3 of the 4 adjusted effects were statistically significant. For both men and women, the effect of multiple deployments was statistically significant—2 and 6 percentage points, respectively. Strength of Evidence These results may mislead for 2 reasons—multiple testing and unobserved confounding. We considered both and reported the results in Table 3. The table lists each of the 24 effects (2 contrasts for deployment status ⫻ 2 for gender 关male, female兴 ⫻ 6 outcomes). The first column of numbers reports the P values corresponding to the estimates in Table 2, and these were used to rank the alternative effects. By a strict ␣ ⫽ 0.05 criterion, one can note that 10 of the 24 effects were significant. First, we considered the possibility that some findings were significant by chance. We handled this issue in 2 ways. First, we calculated the joint significance of the exposure across the 6 outcomes. For each gender, this involved a joint www.lww-medicalcare.com | 305 Medical Care • Volume 49, Number 3, March 2011 Foster TABLE 1. Means by Gender and Deployment Status Female Deployment None 76% Outcomes Binge drinking Smoking Marijuana use Depressive symptoms Suicidal thoughts PTSD symptoms Age 21–25 26–34 35⫹ 17–20 Race/ethnicity Black or African American American Indian or Alaska native Asian Native Hawaiian or Pacific Islander Other White Child in household (1 ⫽ yes; 0 ⫽ no) Marital status Living as married Widowed Divorced or separated Never married Married Branch Army reserve Naval reserve Air national guard Air force reserve Marine corps reserve Army national guard Officer (1 ⫽ yes; 0 ⫽ enlisted) Income $15,000–$19,999 $20,000–$24,999 $25,000–$34,999 $35,000–$44,999 $45,000–$49,999 $50,000–$74,999 $75,000 or more ⬍$15,000 Infantry (1 ⫽ yes; 0 ⫽ no) Employment status Part-time In school Full-time Education Some college College⫹ High school graduate or less N Male Deployment Once 16% Multiple 7% None 60% Once 27% Multiple 14% 24% 19% 3% 20% 13% 5% 34% 26% 5% 23% 18% 8% 37% 23% 11% 24% 25% 14% 35% 19% 3% 13% 9% 3% 43% 23% 4% 15% 9% 6% 43% 21% 2% 14% 9% 6% 20% 26% 48% 7% 27% 25% 44% 4% 23% 32% 42% 3% 17% 22% 54% 6% 21% 24% 52% 2% 11% 22% 67% 1% 22% 15% 1% 6% 2% 54% 78% 19% 14% 0% 9% 3% 54% 72% 20% 17% 2% 5% 1% 55% 73% 11% 13% 1% 7% 1% 66% 74% 9% 13% 1% 14% 4% 59% 73% 7% 12% 1% 6% 2% 72% 73% 9% 1% 19% 29% 42% 10% 2% 18% 36% 34% 8% 1% 23% 31% 37% 6% 0% 9% 26% 58% 6% 0% 9% 25% 59% 5% 0% 11% 18% 66% 13% 24% 14% 40% 1% 8% 15% 18% 10% 15% 31% 1% 25% 14% 6% 8% 25% 48% 1% 12% 8% 9% 24% 14% 37% 7% 9% 15% 10% 11% 13% 26% 6% 33% 12% 2% 5% 26% 56% 2% 9% 19% 7% 7% 12% 12% 7% 19% 25% 12% 2% 7% 7% 14% 17% 8% 20% 17% 10% 3% 6% 7% 14% 13% 9% 19% 21% 10% 5% 6% 6% 10% 10% 7% 22% 32% 8% 9% 5% 7% 13% 13% 9% 24% 25% 5% 17% 3% 4% 6% 9% 8% 28% 38% 3% 10% 13% 21% 66% 13% 26% 61% 10% 25% 65% 8% 17% 75% 8% 22% 70% 5% 30% 66% 54% 36% 10% 2632 50% 34% 17% 564 58% 30% 12% 248 50% 33% 17% 7447 53% 26% 21% 3326 50% 36% 13% 1697 These calculations are unweighted. PTSD indicates post-traumatic stress disorder. 306 | www.lww-medicalcare.com © 2011 Lippincott Williams & Wilkins © 2011 Lippincott Williams & Wilkins None Beta SE Mfx¶ Multiple Beta SE Mfx¶ N None Beta SE Mfx¶ Multiple Beta SE Mfx¶ N ⫺0.28§ 0.14 ⫺0.06 0.30 0.18 0.07 3520 0.14 0.15 0.03 3608 0.26† 0.07 0.06 13,191 ⫺0.11 0.06 ⫺0.03 13,429 ⫺0.52† 0.09 ⫺0.11 ⫺0.15§ 0.06 ⫺0.04 Adjusted ⫺0.43† 0.04 ⫺0.10 Unadjusted Binge Drinking ⫺0.11 0.14 ⫺0.02 ⫺0.10 0.20 ⫺0.02 3663 ⫺0.26 0.17 ⫺0.04 3755 0.17 0.09 0.02 13,769 ⫺0.04 0.08 ⫺0.01 Adjusted ⫺0.40† 0.09 ⫺0.07 ⫺0.22‡ 0.07 ⫺0.03 13,976 ⫺0.36† 0.05 ⫺0.06 Unadjusted Smoking 0.87† 0.25 0.04 3756 ⫺0.79† 0.20 ⫺0.03 ⫺0.59† 0.17 ⫺0.01 13,926 ⫺0.59† 0.10 ⫺0.02 Unadjusted 1.70† 0.31 0.09 3662 ⫺0.02 0.29 0.00 0.38 0.22 0.01 13,759 ⫺0.20 0.19 0.00 Adjusted Marijuana Use *The reference category for the effect of deployment is “deployed once.” † P ⬍ 0.01. ‡ 0.01 ⬍ P ⬍ 0.05. § 0.05 ⬍ P ⬍ 0.10. ¶ Mfx is the marginal effect of deployment on the predicted probability of the outcome relative to the reference category. SE indicates standard error. Female Male No. Deployments* TABLE 2. Effect of Deployment, By Gender 0.08 0.17 0.01 3654 ⫺0.19 0.10 ⫺0.03 ⫺0.11 0.08 ⫺0.01 13,332 ⫺0.21† 0.06 ⫺0.02 Unadjusted 0.33 0.21 0.06 3619 0.06 0.15 0.01 0.20 0.11 0.02 13,565 ⫺0.05 0.09 ⫺0.01 Adjusted Depressive Symptoms 0.59† 0.17 0.09 3619 ⫺0.32‡ 0.12 ⫺0.04 0.00 0.10 0.00 13,348 ⫺0.10 0.07 ⫺0.01 Unadjusted 0.61‡ 0.21 0.10 3598 ⫺0.39§ 0.17 ⫺0.05 0.28§ 0.13 0.02 13,584 ⫺0.04 0.11 0.00 Adjusted Suicidal Ideation 0.42 0.23 0.03 3651 ⫺0.62† 0.16 ⫺0.04 ⫺0.03 0.12 0.00 13,315 ⫺0.58† 0.09 ⫺0.03 Unadjusted 0.82‡ 0.28 0.06 3643 ⫺0.29 0.24 ⫺0.02 0.46‡ 0.16 0.02 13,652 ⫺0.44‡ 0.15 ⫺0.02 Adjusted PTSD Symptoms Medical Care • Volume 49, Number 3, March 2011 Deployment and the Citizen Soldier www.lww-medicalcare.com | 307 Medical Care • Volume 49, Number 3, March 2011 Foster TABLE 3. Strength of Evidence: Effect Estimates, by Gender, Outcome, and Comparison Gender Outcome Effect P* B-H Cutoff† Exp (Gamma)‡ Female Marijuana use Female PTSD Female Suicidal ideation 1 Sensitivity parameters greater than 1.25 1 Multiple 0.000 0.004 2.99 Multiple 0.010 0.025 1.31 Multiple 0.004 0.017 1.26 (3 effects relatively insensitive to unobserved confounding) Male PTSD Multiple 0.003 0.013 Male PTSD None 0.005 0.021 1.15 Male Binge drinking Multiple 0.000 0.008 1.14 Female Suicidal ideation None 0.024 0.033 1.05 Male Binge drinking None 0.014 0.029 1.02 1 Significant by B-H 1 Stronger evidence of harm 1.19 (8 effects significant adjusting for multiple testing) Male Suicidal ideation Multiple Female 1 P ⬍ 0.05 1 Male Male Binge drinking None Female 0.046 0.038 1.01 0.99 Smoking Depression 0.049 0.042 (10 effects statistically significant) Multiple 0.065 0.046 Multiple 0.080 0.050 1.01 0.98 Binge drinking Multiple 0.95 0.092 0.054 Female Depression Multiple 0.113 0.058 0.93 Male Marijuana use Multiple 0.134 0.063 0.93 Female PTSD None 0.235 0.067 0.83 Male Marijuana use None 0.317 0.071 0.88 Female Smoking None 0.356 0.075 0.85 Female Smoking Multiple 0.493 0.079 0.58 Male Smoking None 0.525 0.083 0.87 Male Depression None 0.617 0.088 0.88 Male Female Female Suicidal ideation Depression Marijuana use None None None 0.617 0.647 0.920 0.092 0.096 0.100 0.84 0.68 0.54 Weaker evidence Stronger evidence of no effect *P value for beta coefficient reported in Table 2. † Cut-off value for Benjamin-Hochberg adjustment for multiple testing. ‡ Sensitivity to unobserved confounding as defined in text. B-H indicates Benjamin-Hochberg; PTSD, post-traumatic stress disorder. F test with 12 degrees of freedom. For both men and women, the effect was statistically significant at P ⬍ 0.0001. Next, I used the Benjamini-Hochberg procedure for controlling the false discovery rate (FDR).35,36 The FDR is the expected proportion of incorrectly rejected null hypotheses. Following the literature, we set the FDR to 10% and further calculated a cutoff for ␣ for each effect that keeps the FDR at or below this level. This cutoff is reported in the second column; any effect with a P value below the appropriate cutoff is then labeled “significant.” In this case, 2 effects (of the 10 with P ⬍ 0.05) do not meet this criterion and can no longer be considered “significant” (The Appendix provides more information on this procedure). The third column of the table reports these estimates. For effects with P ⬍ 0.05, ⌫ is greater than 1. The issue in this study is whether those who were deployed were—all else equal– otherwise more likely to experience these outcomes even if they were not deployed. For P ⬎ 0.05, the gamma is less than 1—the question is whether the unobserved factor is hiding significant effects. 308 | www.lww-medicalcare.com The table reports ⌫ corresponding to the effects reported above. This parameter is used to sort the rows of the table that met the Benjamini-Hochberg criterion. One can note that the top 3 rows are the most robust. All involve women and multiple deployments and have ⌫ parameters greater than 1.25. The effect of multiple deployments for women on marijuana use is roughly 3.00. For the apparent effect of multiple deployments to be spurious, the confounding effect of any unobserved factor— even one that predicts multiple deployments perfectly—must triple the odds of using marijuana. Another way to gauge the size of ⌫ is to consider predicted probabilities. Suppose that the probability of the negative outcome among individuals in the healthy group was 20% (and that none of these individuals were deployed multiple times). A ⌫ of 1.25 would mean that 24% of those in the unhealthy group experienced that outcome (and all were deployed multiple times). A difference of this size would create the appearance that deployment itself caused the negative outcome when in fact predeployment differences © 2011 Lippincott Williams & Wilkins Medical Care • Volume 49, Number 3, March 2011 were responsible. A ⌫ of 3 would imply that 42% of unhealthy individuals experienced the outcome. Overall, for the 3 top rows of the table, any observed confounding would have to involve a characteristic not measured but strongly related to both deployment and the outcome over and above the characteristics included as observed confounders. The sensitivity analyses also identified nonsignificant findings that are potentially the result of unobserved confounding. For example, the difference among women in depressive symptoms between those who were deployed once and multiple times was not significant (P ⫽ 0.11). The ⌫ parameter is 0.93. This parameter implies that the probability of that outcome differs between the healthy and unhealthy by only 1.5 percentage points (To be specific, “healthy” are deployed multiple times and their likelihood of elevated depressive symptoms is 1.5 percentage points lower.) In this case, the nonsignificant harmful effects of deployment are masked by this predeployment difference. Other nonfindings are more robust. The effect of 1 deployment on marijuana use for women has a ⌫ of 0.54. If a true effect is hidden by unobserved confounding, the effect of the unobservable must be enormous. The healthy must be deployed multiple times and be 34 percentage points less likely to use marijuana (20% vs. 54%). Overall, the findings (and nonfindings) at the top and bottom of the table are likely not due to unobserved confounding. The top of the table is dominated by the effect of multiple deployments on women; the 5 rows at the bottom of the table involve comparisons between those who were deployed once and those never deployed. DISCUSSION The effects presented in the present study suggest that some— but not all– of the associations in the earlier report reflect confounding by observed and perhaps unobserved factors. This article attempts to identify causal relationships, a difficult task outside of randomization. The fundamental lesson of the literature on causal inference is that one cannot move from association to causation without making an assumption, generally untestable. In the present case that assumption is ignorability. We strengthen our inference by examining the sensitivity of our findings to violation of that assumption. The effect of multiple deployments on women can be explained by unobserved confounding only if that confounding is very powerful. In general, we do not find any effects for men that cannot be explained as chance findings or as reflecting unobserved confounding. As noted, some of the nonsignificant findings may reflect unobserved confounding as well. In that case, the mechanism at work would involve healthier individuals being deployed multiple times. It is not clear how some unobserved confounder would work to inflate some effects and deflate others. Some readers may find the sensitivity analyses unpersuasive. An alternative would be to model selection into deployment and allow for unobserved determinants of deployment to correlate with unobserved determinants of the outcome. As a general rule, such “selection” or Heckman models work best when one has an instrumental variable,37 © 2011 Lippincott Williams & Wilkins Deployment and the Citizen Soldier and none were apparent. Sensitivity analysis is an alternative, but a large ⌫ is in the eye of beholder. Sensitivity analyses give structure to— but do not replace– one’s subjective assessment of the role of unobserved confounding. Such analyses have proven persuasive in other situations, such as in studying the relationship between smoking and cancer.33 More generally, some readers may find the effort to identify causal relationships in observational data impossible or even misguided. And as one reviewer noted, many prior studies caution that the relationships identified are “only” associations (This claim is not true for all prior research38). Are these associations of any use? One possible use involves using deployment as a screening mechanism for these problems. In that case, an analysis would need to focus on a different set of concerns such as specificity and sensitivity. These issues are generally not the concern of prior research. Furthermore, it is not clear why one would adjust the relationship between deployment and these outcomes for other covariates. If one’s focus is screening, then women who had been deployed once or more should be screened for binge drinking, whether that association is causal or not. They need assistance whether they brought that problem with them to their deployment or the latter caused it. Limitations Several limitations temper our findings. Principal among these is the nature of the outcomes, which are all self-reported. However, reporting errors for the dependent variable do not necessarily bias the effect of deployment. Such errors, however, do inflate the standard errors, and as a result, some of the effects that are larger in practical terms might be statistically significant with more accurate outcome reports. More sinister forms of bias are possible if the degree of reporting accuracy varied with deployment status. However, it is difficult to imagine why such bias would have occurred. It is also worth noting that all of our outcomes are undesirable—future analyses might consider whether and how deployment compromises positive outcomes (like productivity) or how protective factors moderate the effect of deployment. A second principal limitation involves the lack of a measure of mediators, such as exposure to violence. It is possible that among those deployed, those with greater exposure to combat would manifest more and larger negative effects. However, incorporating such a mediating process is difficult,39 and many such regression-based analyses are incorrect. Furthermore, one can think of the estimates provided as “intention to treat.” The relatively modest effects of deployment reflect either limited exposure to violence or a modest link between that exposure and the outcomes considered in the present study. A third limitation involves the relatively low response rate. Systematic patterns of nonresponse might have biased the coefficient estimates. In a sense, however, these analyses have considered that possibility—such patterns are relevant only to the extent they involve patterns of underlying health status and the outcomes of interest. For that reason, assessments of systematic nonresponse are subsumed under the analysis of unobserved confounding. The reader can consider systematic nonresponse as a mechanism that might have www.lww-medicalcare.com | 309 Medical Care • Volume 49, Number 3, March 2011 Foster created the unobserved confounding assessed above. One could argue that a final limitation involves the nature of the exposure itself. As mentioned earlier in the article, we might have focused on deployment to Iraq and/or Afghanistan, which has been the focus of media reports. It is worth noting that supplemental analyses did analyze exposure in this manner, and the results were very similar to those presented in the present study. CONCLUSIONS In conclusion, the nation owes its brave men and women nothing but the utmost respect for their service. This debt is especially great for citizen soldiers. As noted, their involvement in the wars in Afghanistan and Iraq has exceeded that of their counterparts in prior wars. It seems hard to believe that anyone enlisting in the Guard or Reserve could ever have anticipated being deployed overseas multiple times. Where deployment has harmed these men and women, appropriate services must be provided and those needing them identified. The harm suffered is as real as physical injury, and treatment should be provided without delay or stigma. The process of statistical adjustment suggests that some of the apparent associations are not truly causal, and the adjusted estimates provide greater focus on the group that seems most at risk—women who are deployed multiple times. The analyses of unobserved confounding bring additional focus to our findings. All things considered, the findings in the present study are mixed, and some balance is needed. Media reports that potentially exaggerate the problems faced run the risk of stigmatizing those served. At the same time, our findings do identify areas of real concern, especially among women deployed multiple times. APPENDIX 1: DETAILS OF ESTIMATION Adjustment Using Propensity Score Weights As noted in the text, the propensity score(s) is(are) a statistical convenience, summarizing the covariates’ potential to confound between-group comparisons. The predicted probability can be generated using any technique appropriate for the prediction of a categorical outcome.29,40 Like ordinary regression, the propensity score assumes ignorability28—that is, the exposure is as if randomly assigned conditioning on the propensity score(s). This assumption is also referred to as “no unobserved confounding.”41 If this assumption is true, then any association between exposure and the outcome(s) can be interpreted as causal. Having calculated a propensity score, one can use it in any of the several ways, such as a covariate or for matching. Our approach in the present study is to use the propensity score to calculate a sampling weight, the IPTW. Such weights represent an implicit form of matching. As noted in the text, they are calculated as one over the probability of the exposure actually received. For a dichotomous exposure, the weight for the exposed is one over the propensity score; for others, (one) over (one minus the propensity score). The result is to inflate the importance of individuals under-represented in each group. Individuals who are underrepresented among the ex- 310 | www.lww-medicalcare.com posed have a smaller propensity score—their weight in the analysis is inflated. When weighted, the data represent a sample of observations from a population where exposure and the confounders are unrelated. Because such weights have the same form as sampling weights, they can be incorporated in any analysis that can accommodate such weights. An advantage of the propensity scores weights is that they can be generalized to multivalued treatments, like deployment.30,31 In that case, there are 3 predicted probabilities—the probability of no deployment, of being deployed once, and of being deployed multiple times. The propensity score weight is still calculated in the same manner—as one over the predicted probability of the exposure experienced. There is only one weight per observation. In theory, either regression or propensity score methods should be effective in conditioning on potential confounders.42 However, in practice (and in finite samples), in either case, such conditioning might fail to “balance” the covariates—that is, to equalize the distribution of the covariates across the levels of exposure (deployed or not). One advantage of propensity scores is that they facilitate checking whether such balance has occurred; one conducts the analysis of each covariate in the same way as one assess the effect of the exposure on the outcome. If balancing occurred, one cannot reject the null hypothesis of no treatment “effect” on the covariates. In the case of regression, a lack of balance is known as the “support” problem, and the standard regression package offers no hint that one does or does not have that problem. At the very least, one could argue that propensity scores are a nice companion to ordinary regression because they provide a check for this problem. A second advantage of propensity score methodology also involves diagnostic checking. Among those who are deployed or not deployed, there could be a subset of individuals who are unique to either group—their propensity score would be very close to 1 or zero. Regression handles these individuals by implicitly extrapolating predicted outcomes to these individuals. Because these individuals have unusual combinations of the covariates, they may be particularly influential in determining the regression line. Propensity scores offer a better way of handling these cases. One can identify them on the basis of high or low values and potentially remove them or at least examine their effect on the analysis. The use of propensity scores to form weights might precipitate a related problem. Weighted analyses are less efficient than unweighted analyses, and the loss of precision is a function of the variation of the weights. One can get quite noisy estimates if the weights are highly variable. When survey samplers have this problem, they “trim” the weights. My discussions with them indicate that this involves a rather informal process (“we make the big ones smaller and the smaller ones bigger”). In the case of propensity-score weights, balance provides a reference point for trimming— one can trim the weights until covariate balance is lost. Balance provides a bottom line for trimming the weights. In the present case, we censored the weights at the fifth and 95th © 2011 Lippincott Williams & Wilkins Medical Care • Volume 49, Number 3, March 2011 Deployment and the Citizen Soldier percentiles. The resulting estimates of the effect of exposure were more precise, but covariate balance was not eliminated. Sensitivity Analyses: The Role of Unobserved Confounding A key limitation of propensity score (and regression) methodology is the ignorability assumption. As noted, this assumption really cannot be tested formally; one only can replace it with another assumption (eg, an exclusion restriction and instrumental variables estimation) and further examine whether the effect of deployment changes as a result. Another possibility involves sensitivity analyses involving unobserved confounding of hypothetical strengths. Such an approach was used to demonstrate the effect of smoking on health.33 We used this strategy in the present study. Building on Rosenbaum,29 this analysis assumes that there is a hypothetical confounding factor (H), above and beyond those captured by the covariates. This factor causes omitted variable bias. It can assume one of the 2 values, “healthy” or “unhealthy.” (Rosenbaum29 showed that assuming this variable as dichotomous does not fundamentally change the problem but is a helpful convenience.) One can examine the potential effects of this bias in a regression framework. The true relationship is as follows: Yi ⫽ ␣ ⫹ Ei ⫹ ⌫Hi ⫹ i (1) In the case of dichotomous outcomes (and a logit model), Yi is the log-odds of the outcome.  is the (true) effect of exposure on the log-odds ratio. If we assume ignorability (ie, that H does not exist when it does), then the resulting estimate of  (b) is as follows: ␦Hi ⌫ ⫽⫹ , b̂ ⫽  ⫹ ⌫ ␦Ei (2) where ⌫ is the effect of Hi on the outcome, and is the strength of the relationship between H and Ei. The ratio of ⌫ and is the resulting bias. The bias is a function of 2 parameters, and a bias of a given magnitude could be generated by infinite combinations of ⌫ and . To simplify matters, Rosenbaum and others assumed that H predicts either exposure or the outcome perfectly. In our analyses, we assumed that this factor predicts deployment perfectly. In essence, P(E ⫽ 1 兩 H ⫽ 1) ⫽ P(E ⫽ 0 兩 H ⫽ 0) ⫽ 1 , and this effectively sets to 1. So, now the issue of bias solely involves the ⌫ term. How big would gamma have to be to change the interpretation of our findings? The key issue is whether the bias would make a relationship appear significant when it is not. In effect, the question is whether the bias pulled a statistically significant t statistic more than 㛳t㛳 ⫺ 1.96 standard errors away from 0. Suppose a parameter estimate is 3 and the standard error is 1, then the key issue is whether the bias could have pulled the t statistic 1.04 units from the critical value. In terms of the magnitude of the parameter itself, the bias would have to be 1.04 standard errors. For a nonsignificant effect, the logic is reversed. When the estimated parameter is 1, the standard error is 1. The © 2011 Lippincott Williams & Wilkins question now is whether a bias could have worked to make an otherwise significant finding appear insignificant— could the bias have pulled the t-statistic 0.96 units toward zero? The corresponding bias would be 0.96 standard errors. One can note that the size of this bias depends on both the magnitude of the estimated effect as well as its precision. A larger, more precise effect will be more robust to possible unobserved confounding. In the present case, because ⌫ is the effect of H on the log-odds of the outcome, one can exponentiate that value and generate the corresponding odds ratio. This estimate has the same interpretation as any odds ratio, and this is the value we report in the article. Adjustment for Multiple Testing The Benjamini-Hochberg procedure assess statistical significance in light of the FDR that one finds acceptable. The FDR is the expected proportion of incorrectly rejected null hypotheses. This test involves ranking the significance levels of the tests conducted from smallest to largest (Table 1). For each significance level, the term, k can be calculated as follows: { k ⫽ maxi p(i) ⱕ i m } q , (3) where i is the rank of a given test, p(i) is the corresponding P value, m is the number of tests (24 in the present case), and q is the FDR. One rejects all hypotheses the smallest to the kth. Consistent with prior research, the FDR in this article is set at 0.10. This figure means that any finding labeled as “statistically significant” has a 10% chance of being a chance finding (or “false discovery”). REFERENCES 1. Defense Science Board Washington DC. Defense Science Board Task Force on deployment of members of the National Guard and Reserve in the global war on terrorism. Washington, DC: Defense Science Board; 2007. 2. Lowenberg M. The Role of the National Guard in national defense and homeland security. Natl Guard. 2005;59:97. 3. Polusny MA, Erbes CR, Arbisi PA, et al. Impact of prior Operation Enduring Freedom/Operation Iraqi Freedom combat duty on mental health in a predeployment cohort of National Guard soldiers. Mil Med. 2009;174:353–357. 4. Kline A, Falca-Dodson M, Sussner B, et al. Effects of repeated deployment to Iraq and Afghanistan on the health of New Jersey Army National Guard troops: implications for military readiness. Am J Public Health. 2010;100:276. 5. Browne T, Hull L, Horn O, et al. Explanations for the increase in mental health problems in UK reserve forces who have served in Iraq. Br J Psychiatry. 2007;190:484. 6. Wolfe J, Erickson DJ, Sharkansky EJ, et al. Course and predictors of posttraumatic stress disorder among Gulf War veterans: a prospective analysis. J Consult Clin Psychol. 1999;67:520 –528. 7. Boscarino JA. Post-traumatic stress and associated disorders among vietnam veterans: the significance of combat exposure and social support. J Trauma Stress. 1995;8:317–336. 8. Marshall RP, Psych D, Jorm AF, et al. Posttraumatic stress disorder and other predictors of health care consumption by Vietnam Veterans. Psychiatr Serv. 1998;49:1609 –1611. 9. Weiss DS, Marmar CR, Schlenger WE, et al. The prevalence of lifetime and partial post-traumatic stress disorder in Vietnam theater veterans. J Trauma Stress. 1992;5:365–376. www.lww-medicalcare.com | 311 Medical Care • Volume 49, Number 3, March 2011 Foster 10. Jacobson IG, Ryan MA, Hooper TI, et al. Alcohol use and alcoholrelated problems before and after military combat deployment. JAMA. 2008;300:663. 11. Smith TC, Ryan MA, Wingard DL, et al. New onset and persistent symptoms of post-traumatic stress disorder self reported after deployment and combat exposures: prospective population based US military cohort study. BMJ. 2008;336:366 –371. 12. Killgore WD, Stetz MC, Castro CA, et al. The effects of prior combat experience on the expression of somatic and affective symptoms in deploying soldiers. J Psychosom Res. 2006;60:379 –385. 13. Mental Health Advisory Team (MHAT) III Operation Iraqi Freedom. Final report. Office of the Surgeon Multinational Force-Iraq and Office of The Surgeon General United States Army Medical Command; 2006. http://www.armymedicine.army.mil/reports/mhat/mhat_iii/mhat-iii.cfm. 14. Hotopf M, Hull L, Fear NT, et al. The health of UK military personnel who deployed to the 2003 Iraq war: a cohort study. Lancet. 2006;367: 1731–1741. 15. Hourani LL, Bray RM, Marsden ME, et al. Department of Defense survey of health related behaviors among the Guard and Reserve Force. Report prepared for the Assistant Secretary of Defense (Health Affairs). Research Triangle Park, NC: Research Triangle Institute; 2007. 16. Bell NS, Amoroso PJ, Williams JO, et al. Demographic, physical, and mental health factors associated with deployment of U.S. Army soldiers to the Persian Gulf. Mil Med. 2000;165:762–772. 17. Larson GE, Highfill-McRoy RM, Booth-Kewley S. Psychiatric diagnoses in historic and contemporary military cohorts: combat deployment and the healthy warrior effect. Am J Epidemiol. 2008;167:1269. 18. Vasterling JJ, Proctor SP, Amoroso P, et al. Neuropsychological outcomes of army personnel following deployment to the Iraq war. JAMA. 2006;296:519. 19. King DW, King LA, Gudanowski DM, et al. Alternative representations of war zone stressors: relationships to posttraumatic stress disorder in male and female Vietnam veterans. J Abnorm Psychol. 1995;104:184 – 196. 20. King DW, King LA, Foy DW, et al. Posttraumatic stress disorder in a national sample of female and male Vietnam veterans: risk factors, war-zone stressors, and resilience-recovery variables. J Abnorm Psychol. 1999;108:164 –170. 21. Bianchi SM, Milkie MA, Sayer LC, et al. Is anyone doing the housework-trends in the gender division of household labor. Soc Forces. 2000;79:191. 22. Vogt DS, Pless AP, King LA, et al. Deployment stressors, gender, and mental health outcomes among Gulf War I veterans. J Trauma Stress. 2005;18:115–127. 23. Radloff LS. The CES-D scale: a self-report depression scale for research in the general population. Appl Psychol Meas. 1977;1:385. 312 | www.lww-medicalcare.com 24. Robins LN, Helzer JE, Croughan J, et al. National Institute of Mental Health diagnostic interview schedule: its history, characteristics, and validity. Arch Gen Psychiatry. 1981;38:381. 25. Burnam MA, Wells KB, Leake B, et al. Development of a brief screening instrument for detecting depressive disorders. Med Care. 1988;26:775–789. 26. Weathers FW, Huska JA, Keane TM. The PTSD checklist— civilian version (PCL-C). Boston, MA: National Center for PTSD; 1994. 27. Lang AJ, Stein MB. An abbreviated PTSD checklist for use as a screening instrument in primary care. Behav Res Ther. 2005;43:585– 594. 28. Rosenbaum PR, Rubin DB. The Central role of the propensity score in observational studies for causal effects. Biometrika. 1983;70:41–55. 29. Rosenbaum PR. Observational Studies. New York, NY: Springer; 2002. 30. Imbens GW. The Role of the propensity score in estimating doseresponse functions. Biometrika. 2000;87:706 –710. 31. Foster EM. Is more treatment better than less? An application of propensity score analysis. Med Care. 2003;41:1183–1192. 32. Hansen BB. The essential role of balance tests in propensity-matched observational studies: comments on ⬘A critical appraisal of propensityscore matching in the medical literature between 1996 and 2003’ by Peter Austin. Stat Med. 2008;27:2050 –2054. 33. Cornfield J, Haenszel W, Hammond EC, et al. Smoking and lung cancer: recent evidence and a discussion of some questions. J Natl Cancer Inst. 1959;22:173–203. 34. Greene WH. Econometric Analysis. Upper Saddle River, NJ: Prentice Hall; 2008. 35. Benjamini Y, Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc Ser B Stat Methodol. 1995;289 –300. 36. Benjamini Y, Yekutieli D. The control of the false discovery rate in multiple testing under dependency. Ann Stat. 2001;29:1165–1188. 37. Bushway S, Johnson BD, Slocum LA. Is the magic still there? The use of the Heckman two-step correction for selection bias in criminology. J Quant Criminol. 2007;23:151–178. 38. Smith B, Ryan MA, Wingard DL, et al. Cigarette smoking and military deployment: a prospective evaluation. Am J Prev Med. 2008;35:539 –546. 39. Sobel ME. Identification of causal parameters in randomized studies with mediating variables. J Educ Behav Stat. 2008;33:230 –251. 40. Rosenbaum PR, Rubin DB. Constructing a control group using multivariate matched sampling methods that incorporate the propensity score. Am Stat. 1985;39:33–38. 41. Lee M. Micro-Econometrics for Policy, Program, and Treatment Effects. Oxford, United Kingdom: Oxford University Press; 2005. 42. Morgan SL, Winship C. Counterfactuals and Causal Inference: Methods and Principles for Social Research. Cambridge, United Kingdom: Cambridge University Press; 2007. © 2011 Lippincott Williams & Wilkins
© Copyright 2026 Paperzz