Deployment and the Citizen Soldier

ORIGINAL ARTICLE
Deployment and the Citizen Soldier
Need and Resilience
E. Michael Foster, PhD
Background: The wars in Iraq and Afghanistan have made unprecedented demands on the nation’s citizen soldiers, the National Guard
and Reserve. A major concern involves the repeated deployment of
these forces overseas.
Objectives: Using data from the Department of Defense Survey of
Health Related Behaviors among the Guard and Reserve Force, we
examined the effects of deployment on 6 health outcomes.
Subjects: The Department of Defense Survey of Health Related
Behaviors among the Guard and Reserve Force is a sample (n ⫽
17,754) of all Reserve component personnel (including full time
and/or activated Guard and Reservists) serving in all pay grades
throughout the world.
Research Design: We relied on inverse probability of treatment
weights to adjust for observed confounders and used sensitivity
analyses to examine the sensitivity of our findings to potential
unobserved confounding.
Results: Observed confounders explain much of the apparent
effect of deployment. For men, the adjusted relationships could
very well reflect further confounding involving unobserved factors. However, for women, effects of deployment on marijuana
use, symptoms of post-traumatic stress disorder, and suicidal
ideation are robust to adjustments for multiple testing and possible unobserved confounding.
Conclusions: These effects are large in practical terms and troubling
but suggest that media reports of the harm caused by deployment
may be overstated. Such exaggerations run the risk of stigmatizing
those who serve.
Key Words: causal inference, deployment, mental health
(Med Care 2011;49: 301–312)
T
he movement to an all-volunteer force has been accompanied by a transformation of the role of the National
Guard and Reserve (NGR). Having historically served as a
strategic reserve, the Guard and Reserve have become part of
the nation’s operational force.1 This evolution reached new
heights during the wars in Iraq and Afghanistan.2,3 Reservists
From the Gillings School of Global Public Health, University North Carolina, Chapel Hill, NC.
Reprints: E. Michael Foster, PhD, Department of Maternal and Child Health,
University North Carolina at Chapel Hill, Rosenau Hall, Campus Box
7445, Chapel Hill, NC 27599. E-mail: [email protected].
Copyright © 2011 by Lippincott Williams & Wilkins
ISSN: 0025-7079/11/4903-0301
Medical Care • Volume 49, Number 3, March 2011
mobilized at the start of the Iraq war in late spring 2003
exceeded 200,000. As of late 2006, NGR component troops
represented nearly half (46%) of the combat brigades in Iraq.3
As of May 2007, more than 575,000 NGR had been mobilized since September 11, 2001.1
As noted, not only are Reservists and Guard members
more likely to be mobilized, but their involvement has also
changed. In particular, the frequency of deployments has
increased, and the time between deployments (or so-called
dwell time) has decreased. As a result, many NGR troops
have been deployed repeatedly. In a 2008 study of 2543
members of the New Jersey National Guard preparing for
deployment, 1 in 4 had been deployed to Iraq or Afghanistan
previously.4 These changes have become “a point of deep
concern both in and outside the military.”1
Deployment is of concern because serving overseas
may signify increased exposure to combat. However, deployment is not synonymous with combat; many Guard and
Reserve members serve in support positions. Nonetheless,
deployment may cause stress to the members and families of
NGR for several reasons. Deployment may tax relationships
with spouses or partners and children as members spend long
periods away from home. Other effects may be more subtle—
deployment may jeopardize access to health insurance coverage among the member’s dependents.1 Returning home will
not eliminate these issues entirely, and so the related stress
may continue postdeployment. Returning home may bring
added adjustments at work and home.
Deployment is no doubt stressful for all military personnel, but this experience may be especially difficult for
Guard and Reserve members as compared with active duty
personnel.3,5 As they are primarily used in the civilian sector,
deployment of Guard and Reserve members involves transitions to and from their usual jobs. Members and their families
may live further from military posts, making it harder for
their families to maintain access to health care. The members
themselves may have relatively little postdeployment contact
with other members of their units and may not benefit from
the support of social and occupational peers.1,3
Post-traumatic stress disorder (PTSD) among Vietnam
veterans, so-called Gulf War Syndrome among those serving
in the first Gulf War, and other aspects of the health of
military personnel have interested researchers and concerned
policy makers for decades.6 –9 Much of that research focuses
on combat exposure10 –12 and negative outcomes, such as
binge drinking; studies focusing on deployment per se are
www.lww-medicalcare.com |
301
Medical Care • Volume 49, Number 3, March 2011
Foster
less common. A portion of the deployment literature considers deployment among Guard and Reservists, whereas other
studies include only active-duty personnel.13 One study examined the experiences of 522 Guardsmen in Mississippi
about to be deployed. The study compared those preparing
for their first deployment with those previously deployed.
The results revealed more PTSD, depressive, and somatic
symptoms among the latter.3 Two other articles considered
the effects of deployment among Reservists in the United
Kingdom.5,14 These analyses suggested that deployment is
associated with common mental disorders and fatigue. These
analyses directly compared the effect of deployment for
Reserve and regular personnel. The effect for the latter was
weak and involved only physical symptoms.
A handful of other studies considered the effect of
repeated deployments. Kline et al4 examined 2543 Reservists
preparing for deployment to Iraq in 2008. Previously deployed servicepersons were more than 3 times as likely as
servicepersons with no deployments to screen positive for
PTSD and major depression (The title of this article is
misleading. “Multiple deployments” refers to individuals
who have a prior deployment measured before the 2008
deployment. In that case, the outcomes measured reflect only
a single deployment for some of these individuals).
A recent report highlighted health problems stemming
from repeated deployments among members of the NGR.
Data from that study, “Department of Defense survey of
health related behaviors among the Guard and Reserve
Force,” (HRBGR) revealed that those Guard and Reservists
deployed multiple times in the past 2 years were more likely
to report symptoms of depression and PTSD. Deployment
also was associated with suicidal ideation and limitation of
activities due to poor mental health.15
A key issue in assessing this literature involves the
distinction between association and causal relationships. As
discussed later, describing an association between health
problems and deployment is not without utility (eg, screening); however, in the present study, our focus is on the effect
of deployment per se—that is, on whether the association
between deployment and health problems reflects the effect
of the former on the latter. Prior research suggests that
confounding factors may amplify observed associations beyond the effects of deployment. For example, research from
the first Gulf War indicates that individuals who were deployed were more likely to be in better health before deployment than those who were not deployed.16 The deployed
individuals were younger on average and also had lower
levels of education. They also had higher levels of risktaking. These factors represent obvious confounders for anyone interested in the effect of deployment—these factors
likely influence not only deployment but also postdeployment
health. Further complicating matter is that the process of
military screening and training likely ensures that only individuals who are relatively healthy remain in the military long
enough to be deployed more than once (This phenomenon has
been labeled the “Healthy Warrior effect”4,17). The military
may incorporate factors in screening that are relevant but not
captured by the data available to researchers.
302
| www.lww-medicalcare.com
Existing research differs in whether and how to handle
potential selection bias. The analyses in some studies ignore
the issue entirely.3 The 2 UK studies recognized the potential
for confounding and adjusted comparisons of those who were
and were not deployed for a range of potential confounders
(age, gender, rank, educational and marital status, service
branch, fitness to deploy, and Reservist status). Other studies
relied on pre- or postdeployment comparisons with a comparison group of those who were not deployed.18
Using data from the HRBGR, this article examines the
relationship between deployment and health outcomes. We
considered 3 health behaviors and 3 mental health problems.
These outcomes were highlighted in the original report and
are generally representative of the outcomes considered in
prior research.
The original report did adjust between-group comparisons
for a small number of covariates (gender, age, enlisted/officer
indicator, marital status, education, and race/ethnicity).15 Our
analyses in this study extend those findings in 4 ways. First, we
adjusted for a more extensive list of covariates, such as whether
a child lives with the respondent, the branch of military service,
household income, and employment status. Second, we used
propensity score methodology to better adjust for confounding
factor. Propensity score methods have several advantages over
regression, including the ability to check for covariate balance.
Third, we examined the sensitivity of our findings to unobserved
confounding, an ever present challenge to inferring causality
from observational data.
Finally, and perhaps most importantly, we pay particular attention to gender differences in the effect of deployment. Although most research on issues involving military
personnel focuses on men or both genders combined, a
smaller literature considers the experiences of female veterans.19,20 The effect of deployment on men and women may
differ for several reasons. Although the gap has narrowed in
recent years, women still have more responsibilities in the
household for child care, cooking meals, and housecleaning,
among others.21 The experiences of women who were deployed may differ as well (eg, sexual harassment22). Vogt et
al22 considered gender differences in the effect of combat and
other stressors and found larger effects on women in both the
NGR and active duty units.
Deployment is the focus of this article, but it is also
important to be specific about what the article is not about. In
particular, prior research largely focused on combat exposure,
but these data lack that information. The focus in the present
study is on deployment, and it remains true that the effects of
combat exposure may be larger than those reported in this
study. We have considered the implications of these findings
for combat exposure in the discussion section.
METHODS
Data
The HRBGR provides the first comprehensive, population-based picture of health behaviors in the total Guard and
Reserve force. Conducted under the direction of the Office of
the Assistant Secretary of Defense, this survey assesses
lifestyle factors affecting health and readiness; identifies and
© 2011 Lippincott Williams & Wilkins
Medical Care • Volume 49, Number 3, March 2011
tracks health-related trends and high-risk groups; targets
groups and/or lifestyle factors for intervention; and identifies
future directions for additional studies, Department of Defense programs, and policies.
The eligible population for the 2006 survey consisted of
all Reserve component personnel (including full time and/or
activated Guard and Reservists). Participants represent Reserve or Guard men and women in all pay grades throughout
the world. The sampling was multistage. Sixteen geographic
areas (clusters of zip codes where Department of Defense
records indicated NGR lived) were selected in the first stage
of sampling. Within these areas, the contractor attempted to
sample 1000 persons evenly divided across the 6 NGR
components (Army Reserve, Army National Guard, Navy
Reserve, Marine Corps Reserve, Air Force Reserve, and Air
National Guard). Generally, this task was accomplished by
randomly selecting 1 relevant facility for each component in
each area. Eighty percent of study participants were identified
in this way, and these individuals were interviewed in-person
at the facility involved. An additional 20% of the sample was
recruited from those living outside of the selected geographic
areas. These individuals completed paper and pencil interviews.
The final sample consisted of 18,342 military personnel (2796
Army National Guard, 3215 Navy Reserve, 1159 Marine Corps
Reserve, 6656 Air Force Reserve, 2851 Air National Guard, and
1665 Army Reserve). The overall response rate was 55.3%. The
data were weighted to represent all Reserve component personnel, using sampling weights that are a function of age, gender,
education, and race/ethnicity.15
Deployment Status
Exposure is the number of deployments in the past 24
months. The term “deployed” refers to deployment overseas
or outside the continental United States. Participants were
told that deployment does “not include drilling weekends and
Annual Training.” We grouped responses into “not deployed,” “deployed once,” and “deployed multiple times.”
One can note that we have not used location of deployment to define exposure. The data revealed that 37% of the
respondents served in Iraq or Afghanistan (This figure was
40% for men and 25% for women). The definition of exposure was driven by 2 considerations. First, place of deployment may introduce added selection factors not included in
our covariates. In particular, those who were deployed to Iraq
or Afghanistan may differ in unobserved ways from those
deployed elsewhere. Second, because combat exposure is so
common in Iraq and Afghanistan, limiting our attention to
these areas makes it more difficult to distinguish the effect of
deployment from that of combat exposure.
Outcome Measures
Binge drinking was defined as having 5 or more drinks
on a single occasion at least once in the past 30 days.
Marijuana use was measured as the use of marijuana/hashish
in the past 12 months. This construct was measured with a set
of questions about the use of various illicit drugs. The
wording of these questions is consistent with that used in the
prior Department of Defense surveys of active duty personnel
© 2011 Lippincott Williams & Wilkins
Deployment and the Citizen Soldier
and has been used in other well-known surveys, such as the
National Survey on Drug Use and Health. Smokers were
defined as those who smoked at least 100 cigarettes during
their lifetime and who last smoked a cigarette during the past
30 days.
Suicidal ideation involved suicidal thoughts in the past
year. The measures of depressive and PTSD symptoms were
developed for use as a preliminary screening tool to identify
individuals in need of further assessment and perhaps treatment. In particular, depressive symptoms were measured
using an 8-item set of symptoms on the basis of 6 items from
the Center for Epidemiologic Studies–Depression Scale23 and
2 items from the Diagnostic Interview Schedule.24 Individuals were identified as needing further evaluation on the basis
of a cut-point determined in prior research.25 That research
suggests that this screener has good sensitivity and high positive
predictive value.
In the case of PTSD, the survey relies on the PTSD
Checklist-Civilian version.26 This measure consists of 17
questions involving experiences related to PTSD, such as loss
of interest in activities the respondent used to enjoy, being
“superalert” or watchful or on guard, having physical reactions when reminded of a stressful experience, and feeling
jumpy or easily started. Respondents were asked to indicate
how much they had been bothered by each of the 17 conditions; response options were “not at all,” “a little bit,”
“moderately,” “quite a bit,” and “extremely.” Each statement
was scored 0 to 4 and a sum for all items was computed. The
participant was classified as screening positive for PTSD
symptoms if the sum of symptoms was greater than or equal
to 50.27
Covariates
Covariates were selected to eliminate confounding—we
identified variables that prior research revealed as predictive
of deployment. In most instances, the presumption was that
the variable also predicted the outcomes. For variables like
education, this assumption involved no leap of faith as a large
literature links education to health and health behaviors. For
some variables, no effect on the outcome may exist (eg,
rank). In that case, the analyses adjust for a covariate needlessly, but given the sample size, the resulting effect on
efficiency is likely very small. Data availability also limited
our choice, and some desirable covariates were missing (such
as health behaviors before deployment). These variables
represent sources of unobserved confounding, and we considered their potential effect on sensitivity analyses described
later. Overall, the covariates used resemble those in prior
research.
Covariates include the respondents’ gender; age quantified
using a set of 4 categories; race and ethnicity (7 categories:
White, Black, or African American, Latino, American Indian,
Asian, Pacific Islander; other); whether a child was living in
the respondent’s household; marital status (6 categories: married; living as married; widowed and not living as married;
divorced or separated and not living as married; never married and not living as married); service (Army National
Guard; Army Reserve; Naval Reserve; Air National Guard;
www.lww-medicalcare.com |
303
Medical Care • Volume 49, Number 3, March 2011
Foster
Air Force Reserve; Marine Corps Reserve); enlistment status
(enlisted or officer); annual income from all sources in the
last 12 months (8 categories); infantry (yes/no); employment
status (3 categories: employed part-time and not in school; in
school; employed full time); and education (3 categories:
high school graduate or less; some postsecondary education;
college degree or more).
Statistical Model
Inverse Probability of Treatment Weights
Our analyses represent an application of propensity score
methodology. In a binary exposure (exposed, not exposed), the
propensity score is the predicted probability of exposure. It
represents a weighted sum of the covariates; the weights are
determined by the link between the covariates and the exposure
of interest. For example, one might use logistic regression with
exposure as the dependent variable and the covariates as explanatory variables. The resulting coefficient estimates can be used
to generate a predicted probability of exposure, the propensity
score. As sample size grows, the propensity score captures all
variation in the covariates related to exposure—that is, their
potential to confound the link between the exposure and outcomes of interest.28 The propensity score “balances” the distribution of the covariates across levels of exposure.29
The propensity score is a statistical convenience, no
more or less. Rather than adjusting comparisons of outcomes
across exposures using several covariates, one can work with
just one variable, the propensity score. One can use the
propensity score in a variety of ways, such as matching.29
One option involves calculating inverse probability of treatment weights (IPTW). The IPTW is one over the probability
that the exposure was actually experienced. For the exposed,
the IPTW is one over the propensity score; for the other
cases, one over one minus the propensity score. Analyses
using these weights essentially correspond to analyses of a
pseudo-population in which exposure and the covariates are
unrelated. In other words, in that pseudo-population, any
association between exposure and outcomes represents a
causal relationship.
The IPTW generalize to exposures involving 3 or more
levels, such as not deployed, deployed once, or deployed
multiple times.30,31 As discussed in Appendix 1, with 3 levels
of exposure, the IPTW is the probability an individual was in
the exposure group in which he or she appears. In that case,
one might estimate a multinomial logit with 3 levels of the
outcome, and use the resulting parameter estimates to generate 3 predicted probabilities, one for each level. Each individual will get one weight— one over the probability of the
exposure actually experienced.
This property of the IPTW reflects the nature of the
propensity score as a “balancing score.”29 However, this
property holds only as sample size grows large, and in any
given sample, comparisons across levels of the exposure may
reflect confounding by the covariates. Residual imbalance
may reflect mis-specification of the equations used to generate the propensity score and other factors. For that reason, a
key step in using the propensity score involves checking
“balance”—that is, making sure that the distribution of the
304
| www.lww-medicalcare.com
covariates is the same across levels of the exposure, when
adjusted with the propensity score.32 Covariate balance
should be checked regardless of the particular way in which
propensity-score adjustments are operationalized. It should
also be noted that we trimmed the IPTW at the fifth and 95th
percentiles to improve the precision of the estimates. Doing
so did not affect the balance of the covariates.
Regardless of how one uses the propensity score, making the jump from association to causation depends on the
key assumption of ignorability—that the list of covariates
included is comprehensive and that no residual, unobservable
confounding remains. If such confounding exists, it might
hide a real relationship because those who were deployed
were in better health before deployment. Alternatively, a
statistically significant finding could reflect the fact that those
who were deployed were unhealthier. The 2 possibilities
could not apply to a single finding, but 1 of the 2 would be
relevant for each (depending on whether the estimate is
significant or nonsignificant). The assumption of no unobserved confounding is essential and cannot be tested.
One alternative involves sensitivity analyses involving
unobserved confounding of hypothetical strengths. This approach has been used to demonstrate the effect of smoking on
health, and we used it in this study.33 Building on Rosenbaum,29 this analysis assumes that a hypothetical factor
confounds comparisons across exposures even after adjusting
for the covariates. This confounding factor has 2 levels,
“healthy” and “unhealthy” and predicts exposure perfectly.
The sensitivity analyses further ask the question of how
strongly the unobserved factor would have to be related to the
outcome to explain the observed (but adjusted) finding. One
can express this relationship as the log-odds ratio of the
unobservable on the outcome (The literature refers to this
relationship as ⌫). ⌫ captures the sensitivity of a given finding
to potential unobserved confounding. However, these analyses cannot demonstrate whether unobserved confounding
exists or not.
The Effect of Deployment on the Outcomes
The outcomes considered in this study are dichotomous, and we used IPTW-weighted logistic regression to link
deployment to the outcome. To assess the practical significance of the estimated effect of deployment, we calculated
the so-called “marginal effect”—the effect of deployment on
the predicted probability of the outcome.34
These analyses were approved by the Institutional Review Board of the University of North Carolina.
Estimation
The first stage of the analysis involved generating the
IPTW. The propensity score was calculated using coefficient
estimates from a multinomial logit with deployment status as
the dependent variable. The resulting parameter estimates
were used to generate the predicted probabilities of exposure
level and then the IPTW. These estimates were generated
using gender-specific models. A range of factors predicted
deployment, and these relationships were consistent with
© 2011 Lippincott Williams & Wilkins
Medical Care • Volume 49, Number 3, March 2011
prior research (For example, older NGR members were less
likely to be deployed).
We further checked “balance,” ie, whether the distribution
of the covariates was equivalent across levels of exposure. As all
covariates were categorical, this checking process involved survey-weighted cross-tabs of the covariate with deployment status.
Finally, estimates of the effect of deployment were generated using Stata’s survey-weighted logistic regression; the
IPTW were used as sampling weights with the outcomes as the
dependent variables.
Estimation reflected the sampling design of the study.
We included the probability weight for the data as a covariate
in our model estimating deployment (We also examined other
possibilities such as building a combined weight reflecting
the propensity score and the survey weight. Our results were
insensitive to the handling of the sampling weight. This
insensitivity likely reflects that the variables used to generate
the weights were included as covariates).
RESULTS
Table 1 describes the sample. These tabulations are
unweighted to provide the reader with a sense of the actual
cell sizes in the analyses (Weighted tables are available from
the author). Of all, 76% of female servicepersons were not
deployed. Of those who were deployed, about 3 in 10 were
deployed multiple times (7% of females in the study). Forty
percent of male servicepersons were deployed. Of these,
roughly 1 in 3 was deployed multiple times (14% of all men).
Table 1 also provides descriptive information on the outcomes and on the other characteristics of the sample, including the covariates.
In the first stage of the analysis, the covariates as a
group were highly significant predictors of deployment for
both men and women (These estimates are available from the
author). These results suggest that at least the potential for
confounding was present in the earlier reports.
Checking covariate balance revealed one comparison
(enlistment status) for which a small but significant adjusted
difference remained (P ⬍ 0.10). The difference was very
small in practical terms. The percentage of participants who
were an officer in the weighted data was 17%, 19%, and 17%
for none, once, and multiple deployments, respectively.
These figures were 15%, 12%, and 18% in the unweighted
data. One alternative would be to modify the equation predicting exposure (eg, adding interactions). Instead, we included this covariate in the model predicting the outcome.
The estimates of the effect of deployment were not affected.
Table 2 provides the beta coefficients for deployment
on the 6 outcomes. For each outcome, Table 2 provides 3
numbers—the logit coefficient, the standard error, and the
marginal effect. The marginal effect captures the size of the
relationship between the outcome and the exposure.34 For
each outcome, there are 2 sets of figures, unadjusted and
adjusted. The former is descriptive: it reflects the effect of
deployment as well as confounders, both observed and unobserved. One can note that deployment was associated with
several of the outcomes. Among males, for example, those
who had not been deployed were less likely than those who
© 2011 Lippincott Williams & Wilkins
Deployment and the Citizen Soldier
had been deployed once to be involved in 5 of the 6 behaviors. The unadjusted relationships were similar for women.
The adjusted between-group differences are provided in
the second column. One can observe 2 patterns. First, in most
instances, adjusted differences were smaller in absolute
terms—ignoring the covariates inflated the between-group
differences. A second pattern involves the standard errors; as
one would expect, adjusting for the covariates increased the
standard error. Taken together, these 2 trends reduced several
of the unadjusted between-group differences to statistical
insignificance.
In terms of significant findings, 3 of the 4 estimates for
binge drinking were significant. Among both men and
women, those who had never been deployed were less likely
to binge drink than those who had been deployed once. The
effects were fairly substantial— being deployed once (but not
multiple times) raised the probability of binge drinking by 4
and 6 percentage points for men and women, respectively
(The estimates are negative because the reference category is
“deployed once”). Comparing those who were deployed
once, multiple deployments raised the probability of binge
drinking still further— by 6 and 7 percentage points, respectively. Only the effect for men was statistically significant at
conventional levels.
The effects for smoking and depressive symptoms tell
a different story. When the covariates were considered, deployment was associated with none. For marijuana use, only
1 of the 4 adjusted effects was significant; those women who
were deployed multiple times were much more likely to have
used marijuana—9 percentage points.
For suicidal ideation, the effect of multiple deployments was significant for both men and women. The effect
for women (10 percentage points) was much larger than that
for men (2 percentage points). For women, the difference
between those who had been deployed once and those not
deployed was also statistically significant. Combining the 2
effects, women who had been deployed multiple times were
15 percentage points more likely to experience suicidal ideation than those who were not deployed.
Finally for PTSD symptoms, 3 of the 4 adjusted effects
were statistically significant. For both men and women, the
effect of multiple deployments was statistically significant—2
and 6 percentage points, respectively.
Strength of Evidence
These results may mislead for 2 reasons—multiple
testing and unobserved confounding. We considered both and
reported the results in Table 3. The table lists each of the 24
effects (2 contrasts for deployment status ⫻ 2 for gender
关male, female兴 ⫻ 6 outcomes). The first column of numbers
reports the P values corresponding to the estimates in Table
2, and these were used to rank the alternative effects. By a
strict ␣ ⫽ 0.05 criterion, one can note that 10 of the 24 effects
were significant.
First, we considered the possibility that some findings
were significant by chance. We handled this issue in 2 ways.
First, we calculated the joint significance of the exposure
across the 6 outcomes. For each gender, this involved a joint
www.lww-medicalcare.com |
305
Medical Care • Volume 49, Number 3, March 2011
Foster
TABLE 1. Means by Gender and Deployment Status
Female Deployment
None
76%
Outcomes
Binge drinking
Smoking
Marijuana use
Depressive symptoms
Suicidal thoughts
PTSD symptoms
Age
21–25
26–34
35⫹
17–20
Race/ethnicity
Black or African American
American Indian or Alaska native
Asian
Native Hawaiian or Pacific Islander
Other
White
Child in household (1 ⫽ yes; 0 ⫽ no)
Marital status
Living as married
Widowed
Divorced or separated
Never married
Married
Branch
Army reserve
Naval reserve
Air national guard
Air force reserve
Marine corps reserve
Army national guard
Officer (1 ⫽ yes; 0 ⫽ enlisted)
Income
$15,000–$19,999
$20,000–$24,999
$25,000–$34,999
$35,000–$44,999
$45,000–$49,999
$50,000–$74,999
$75,000 or more
⬍$15,000
Infantry (1 ⫽ yes; 0 ⫽ no)
Employment status
Part-time
In school
Full-time
Education
Some college
College⫹
High school graduate or less
N
Male Deployment
Once
16%
Multiple
7%
None
60%
Once
27%
Multiple
14%
24%
19%
3%
20%
13%
5%
34%
26%
5%
23%
18%
8%
37%
23%
11%
24%
25%
14%
35%
19%
3%
13%
9%
3%
43%
23%
4%
15%
9%
6%
43%
21%
2%
14%
9%
6%
20%
26%
48%
7%
27%
25%
44%
4%
23%
32%
42%
3%
17%
22%
54%
6%
21%
24%
52%
2%
11%
22%
67%
1%
22%
15%
1%
6%
2%
54%
78%
19%
14%
0%
9%
3%
54%
72%
20%
17%
2%
5%
1%
55%
73%
11%
13%
1%
7%
1%
66%
74%
9%
13%
1%
14%
4%
59%
73%
7%
12%
1%
6%
2%
72%
73%
9%
1%
19%
29%
42%
10%
2%
18%
36%
34%
8%
1%
23%
31%
37%
6%
0%
9%
26%
58%
6%
0%
9%
25%
59%
5%
0%
11%
18%
66%
13%
24%
14%
40%
1%
8%
15%
18%
10%
15%
31%
1%
25%
14%
6%
8%
25%
48%
1%
12%
8%
9%
24%
14%
37%
7%
9%
15%
10%
11%
13%
26%
6%
33%
12%
2%
5%
26%
56%
2%
9%
19%
7%
7%
12%
12%
7%
19%
25%
12%
2%
7%
7%
14%
17%
8%
20%
17%
10%
3%
6%
7%
14%
13%
9%
19%
21%
10%
5%
6%
6%
10%
10%
7%
22%
32%
8%
9%
5%
7%
13%
13%
9%
24%
25%
5%
17%
3%
4%
6%
9%
8%
28%
38%
3%
10%
13%
21%
66%
13%
26%
61%
10%
25%
65%
8%
17%
75%
8%
22%
70%
5%
30%
66%
54%
36%
10%
2632
50%
34%
17%
564
58%
30%
12%
248
50%
33%
17%
7447
53%
26%
21%
3326
50%
36%
13%
1697
These calculations are unweighted.
PTSD indicates post-traumatic stress disorder.
306
| www.lww-medicalcare.com
© 2011 Lippincott Williams & Wilkins
© 2011 Lippincott Williams & Wilkins
None
Beta
SE
Mfx¶
Multiple
Beta
SE
Mfx¶
N
None
Beta
SE
Mfx¶
Multiple
Beta
SE
Mfx¶
N
⫺0.28§
0.14
⫺0.06
0.30
0.18
0.07
3520
0.14
0.15
0.03
3608
0.26†
0.07
0.06
13,191
⫺0.11
0.06
⫺0.03
13,429
⫺0.52†
0.09
⫺0.11
⫺0.15§
0.06
⫺0.04
Adjusted
⫺0.43†
0.04
⫺0.10
Unadjusted
Binge Drinking
⫺0.11
0.14
⫺0.02
⫺0.10
0.20
⫺0.02
3663
⫺0.26
0.17
⫺0.04
3755
0.17
0.09
0.02
13,769
⫺0.04
0.08
⫺0.01
Adjusted
⫺0.40†
0.09
⫺0.07
⫺0.22‡
0.07
⫺0.03
13,976
⫺0.36†
0.05
⫺0.06
Unadjusted
Smoking
0.87†
0.25
0.04
3756
⫺0.79†
0.20
⫺0.03
⫺0.59†
0.17
⫺0.01
13,926
⫺0.59†
0.10
⫺0.02
Unadjusted
1.70†
0.31
0.09
3662
⫺0.02
0.29
0.00
0.38
0.22
0.01
13,759
⫺0.20
0.19
0.00
Adjusted
Marijuana Use
*The reference category for the effect of deployment is “deployed once.”
†
P ⬍ 0.01.
‡
0.01 ⬍ P ⬍ 0.05.
§
0.05 ⬍ P ⬍ 0.10.
¶
Mfx is the marginal effect of deployment on the predicted probability of the outcome relative to the reference category.
SE indicates standard error.
Female
Male
No.
Deployments*
TABLE 2. Effect of Deployment, By Gender
0.08
0.17
0.01
3654
⫺0.19
0.10
⫺0.03
⫺0.11
0.08
⫺0.01
13,332
⫺0.21†
0.06
⫺0.02
Unadjusted
0.33
0.21
0.06
3619
0.06
0.15
0.01
0.20
0.11
0.02
13,565
⫺0.05
0.09
⫺0.01
Adjusted
Depressive Symptoms
0.59†
0.17
0.09
3619
⫺0.32‡
0.12
⫺0.04
0.00
0.10
0.00
13,348
⫺0.10
0.07
⫺0.01
Unadjusted
0.61‡
0.21
0.10
3598
⫺0.39§
0.17
⫺0.05
0.28§
0.13
0.02
13,584
⫺0.04
0.11
0.00
Adjusted
Suicidal Ideation
0.42
0.23
0.03
3651
⫺0.62†
0.16
⫺0.04
⫺0.03
0.12
0.00
13,315
⫺0.58†
0.09
⫺0.03
Unadjusted
0.82‡
0.28
0.06
3643
⫺0.29
0.24
⫺0.02
0.46‡
0.16
0.02
13,652
⫺0.44‡
0.15
⫺0.02
Adjusted
PTSD Symptoms
Medical Care • Volume 49, Number 3, March 2011
Deployment and the Citizen Soldier
www.lww-medicalcare.com |
307
Medical Care • Volume 49, Number 3, March 2011
Foster
TABLE 3. Strength of Evidence: Effect Estimates, by Gender, Outcome, and Comparison
Gender
Outcome
Effect
P*
B-H Cutoff†
Exp (Gamma)‡
Female
Marijuana use
Female
PTSD
Female
Suicidal ideation
1 Sensitivity parameters greater than 1.25 1
Multiple
0.000
0.004
2.99
Multiple
0.010
0.025
1.31
Multiple
0.004
0.017
1.26
(3 effects relatively insensitive to unobserved confounding)
Male
PTSD
Multiple
0.003
0.013
Male
PTSD
None
0.005
0.021
1.15
Male
Binge drinking
Multiple
0.000
0.008
1.14
Female
Suicidal ideation
None
0.024
0.033
1.05
Male
Binge drinking
None
0.014
0.029
1.02
1 Significant by B-H 1
Stronger evidence of harm
1.19
(8 effects significant adjusting for multiple testing)
Male
Suicidal ideation
Multiple
Female
1 P ⬍ 0.05 1
Male
Male
Binge drinking
None
Female
0.046
0.038
1.01
0.99
Smoking
Depression
0.049
0.042
(10 effects statistically significant)
Multiple
0.065
0.046
Multiple
0.080
0.050
1.01
0.98
Binge drinking
Multiple
0.95
0.092
0.054
Female
Depression
Multiple
0.113
0.058
0.93
Male
Marijuana use
Multiple
0.134
0.063
0.93
Female
PTSD
None
0.235
0.067
0.83
Male
Marijuana use
None
0.317
0.071
0.88
Female
Smoking
None
0.356
0.075
0.85
Female
Smoking
Multiple
0.493
0.079
0.58
Male
Smoking
None
0.525
0.083
0.87
Male
Depression
None
0.617
0.088
0.88
Male
Female
Female
Suicidal ideation
Depression
Marijuana use
None
None
None
0.617
0.647
0.920
0.092
0.096
0.100
0.84
0.68
0.54
Weaker evidence
Stronger evidence of no effect
*P value for beta coefficient reported in Table 2.
†
Cut-off value for Benjamin-Hochberg adjustment for multiple testing.
‡
Sensitivity to unobserved confounding as defined in text.
B-H indicates Benjamin-Hochberg; PTSD, post-traumatic stress disorder.
F test with 12 degrees of freedom. For both men and women,
the effect was statistically significant at P ⬍ 0.0001.
Next, I used the Benjamini-Hochberg procedure for
controlling the false discovery rate (FDR).35,36 The FDR is
the expected proportion of incorrectly rejected null hypotheses. Following the literature, we set the FDR to 10% and
further calculated a cutoff for ␣ for each effect that keeps the
FDR at or below this level. This cutoff is reported in the
second column; any effect with a P value below the appropriate cutoff is then labeled “significant.” In this case, 2
effects (of the 10 with P ⬍ 0.05) do not meet this criterion
and can no longer be considered “significant” (The Appendix
provides more information on this procedure).
The third column of the table reports these estimates.
For effects with P ⬍ 0.05, ⌫ is greater than 1. The issue in
this study is whether those who were deployed were—all else
equal– otherwise more likely to experience these outcomes
even if they were not deployed. For P ⬎ 0.05, the gamma is
less than 1—the question is whether the unobserved factor is
hiding significant effects.
308
| www.lww-medicalcare.com
The table reports ⌫ corresponding to the effects reported above. This parameter is used to sort the rows of the
table that met the Benjamini-Hochberg criterion. One can
note that the top 3 rows are the most robust. All involve
women and multiple deployments and have ⌫ parameters
greater than 1.25. The effect of multiple deployments for
women on marijuana use is roughly 3.00. For the apparent
effect of multiple deployments to be spurious, the confounding effect of any unobserved factor— even one that predicts
multiple deployments perfectly—must triple the odds of using marijuana.
Another way to gauge the size of ⌫ is to consider
predicted probabilities. Suppose that the probability of the
negative outcome among individuals in the healthy group was
20% (and that none of these individuals were deployed
multiple times). A ⌫ of 1.25 would mean that 24% of those
in the unhealthy group experienced that outcome (and all
were deployed multiple times). A difference of this size
would create the appearance that deployment itself caused the
negative outcome when in fact predeployment differences
© 2011 Lippincott Williams & Wilkins
Medical Care • Volume 49, Number 3, March 2011
were responsible. A ⌫ of 3 would imply that 42% of unhealthy individuals experienced the outcome. Overall, for the
3 top rows of the table, any observed confounding would
have to involve a characteristic not measured but strongly
related to both deployment and the outcome over and above
the characteristics included as observed confounders.
The sensitivity analyses also identified nonsignificant
findings that are potentially the result of unobserved confounding. For example, the difference among women in
depressive symptoms between those who were deployed once
and multiple times was not significant (P ⫽ 0.11). The ⌫
parameter is 0.93. This parameter implies that the probability
of that outcome differs between the healthy and unhealthy by
only 1.5 percentage points (To be specific, “healthy” are
deployed multiple times and their likelihood of elevated
depressive symptoms is 1.5 percentage points lower.) In this
case, the nonsignificant harmful effects of deployment are
masked by this predeployment difference. Other nonfindings
are more robust. The effect of 1 deployment on marijuana use
for women has a ⌫ of 0.54. If a true effect is hidden by
unobserved confounding, the effect of the unobservable must
be enormous. The healthy must be deployed multiple times
and be 34 percentage points less likely to use marijuana (20%
vs. 54%).
Overall, the findings (and nonfindings) at the top and
bottom of the table are likely not due to unobserved confounding. The top of the table is dominated by the effect of
multiple deployments on women; the 5 rows at the bottom of
the table involve comparisons between those who were deployed once and those never deployed.
DISCUSSION
The effects presented in the present study suggest that
some— but not all– of the associations in the earlier report
reflect confounding by observed and perhaps unobserved
factors. This article attempts to identify causal relationships,
a difficult task outside of randomization. The fundamental
lesson of the literature on causal inference is that one cannot
move from association to causation without making an assumption, generally untestable. In the present case that assumption is ignorability. We strengthen our inference by
examining the sensitivity of our findings to violation of that
assumption. The effect of multiple deployments on women
can be explained by unobserved confounding only if that
confounding is very powerful. In general, we do not find any
effects for men that cannot be explained as chance findings or
as reflecting unobserved confounding.
As noted, some of the nonsignificant findings may reflect
unobserved confounding as well. In that case, the mechanism at
work would involve healthier individuals being deployed multiple times. It is not clear how some unobserved confounder
would work to inflate some effects and deflate others.
Some readers may find the sensitivity analyses unpersuasive. An alternative would be to model selection into
deployment and allow for unobserved determinants of deployment to correlate with unobserved determinants of the
outcome. As a general rule, such “selection” or Heckman
models work best when one has an instrumental variable,37
© 2011 Lippincott Williams & Wilkins
Deployment and the Citizen Soldier
and none were apparent. Sensitivity analysis is an alternative,
but a large ⌫ is in the eye of beholder. Sensitivity analyses
give structure to— but do not replace– one’s subjective assessment of the role of unobserved confounding. Such analyses have proven persuasive in other situations, such as in
studying the relationship between smoking and cancer.33
More generally, some readers may find the effort to
identify causal relationships in observational data impossible
or even misguided. And as one reviewer noted, many prior
studies caution that the relationships identified are “only”
associations (This claim is not true for all prior research38).
Are these associations of any use? One possible use involves
using deployment as a screening mechanism for these problems. In that case, an analysis would need to focus on a
different set of concerns such as specificity and sensitivity.
These issues are generally not the concern of prior research.
Furthermore, it is not clear why one would adjust the relationship between deployment and these outcomes for other
covariates. If one’s focus is screening, then women who had
been deployed once or more should be screened for binge
drinking, whether that association is causal or not. They need
assistance whether they brought that problem with them to
their deployment or the latter caused it.
Limitations
Several limitations temper our findings. Principal among
these is the nature of the outcomes, which are all self-reported.
However, reporting errors for the dependent variable do not
necessarily bias the effect of deployment. Such errors, however,
do inflate the standard errors, and as a result, some of the effects
that are larger in practical terms might be statistically significant
with more accurate outcome reports. More sinister forms of bias
are possible if the degree of reporting accuracy varied with
deployment status. However, it is difficult to imagine why such
bias would have occurred. It is also worth noting that all of our
outcomes are undesirable—future analyses might consider
whether and how deployment compromises positive outcomes
(like productivity) or how protective factors moderate the effect
of deployment.
A second principal limitation involves the lack of a
measure of mediators, such as exposure to violence. It is
possible that among those deployed, those with greater exposure to combat would manifest more and larger negative
effects. However, incorporating such a mediating process is
difficult,39 and many such regression-based analyses are incorrect. Furthermore, one can think of the estimates provided
as “intention to treat.” The relatively modest effects of deployment reflect either limited exposure to violence or a
modest link between that exposure and the outcomes considered in the present study.
A third limitation involves the relatively low response
rate. Systematic patterns of nonresponse might have biased
the coefficient estimates. In a sense, however, these analyses
have considered that possibility—such patterns are relevant
only to the extent they involve patterns of underlying health
status and the outcomes of interest. For that reason, assessments of systematic nonresponse are subsumed under the
analysis of unobserved confounding. The reader can consider
systematic nonresponse as a mechanism that might have
www.lww-medicalcare.com |
309
Medical Care • Volume 49, Number 3, March 2011
Foster
created the unobserved confounding assessed above. One
could argue that a final limitation involves the nature of the
exposure itself. As mentioned earlier in the article, we might
have focused on deployment to Iraq and/or Afghanistan,
which has been the focus of media reports. It is worth noting
that supplemental analyses did analyze exposure in this manner, and the results were very similar to those presented in the
present study.
CONCLUSIONS
In conclusion, the nation owes its brave men and women
nothing but the utmost respect for their service. This debt is
especially great for citizen soldiers. As noted, their involvement
in the wars in Afghanistan and Iraq has exceeded that of their
counterparts in prior wars. It seems hard to believe that anyone
enlisting in the Guard or Reserve could ever have anticipated
being deployed overseas multiple times. Where deployment has
harmed these men and women, appropriate services must be
provided and those needing them identified. The harm suffered
is as real as physical injury, and treatment should be provided
without delay or stigma.
The process of statistical adjustment suggests that some
of the apparent associations are not truly causal, and the adjusted
estimates provide greater focus on the group that seems most at
risk—women who are deployed multiple times. The analyses of
unobserved confounding bring additional focus to our findings.
All things considered, the findings in the present study
are mixed, and some balance is needed. Media reports that
potentially exaggerate the problems faced run the risk of
stigmatizing those served. At the same time, our findings do
identify areas of real concern, especially among women
deployed multiple times.
APPENDIX 1: DETAILS OF ESTIMATION
Adjustment Using Propensity Score Weights
As noted in the text, the propensity score(s) is(are) a
statistical convenience, summarizing the covariates’ potential
to confound between-group comparisons. The predicted
probability can be generated using any technique appropriate
for the prediction of a categorical outcome.29,40 Like ordinary
regression, the propensity score assumes ignorability28—that
is, the exposure is as if randomly assigned conditioning on
the propensity score(s). This assumption is also referred to as
“no unobserved confounding.”41 If this assumption is true,
then any association between exposure and the outcome(s)
can be interpreted as causal.
Having calculated a propensity score, one can use it in
any of the several ways, such as a covariate or for matching.
Our approach in the present study is to use the propensity
score to calculate a sampling weight, the IPTW. Such weights
represent an implicit form of matching. As noted in the text,
they are calculated as one over the probability of the exposure
actually received. For a dichotomous exposure, the weight for
the exposed is one over the propensity score; for others, (one)
over (one minus the propensity score). The result is to inflate
the importance of individuals under-represented in each
group. Individuals who are underrepresented among the ex-
310
| www.lww-medicalcare.com
posed have a smaller propensity score—their weight in the
analysis is inflated. When weighted, the data represent a
sample of observations from a population where exposure
and the confounders are unrelated. Because such weights
have the same form as sampling weights, they can be incorporated in any analysis that can accommodate such weights.
An advantage of the propensity scores weights is that
they can be generalized to multivalued treatments, like deployment.30,31 In that case, there are 3 predicted probabilities—the probability of no deployment, of being deployed
once, and of being deployed multiple times. The propensity
score weight is still calculated in the same manner—as one
over the predicted probability of the exposure experienced.
There is only one weight per observation.
In theory, either regression or propensity score methods
should be effective in conditioning on potential confounders.42 However, in practice (and in finite samples), in either
case, such conditioning might fail to “balance” the covariates—that is, to equalize the distribution of the covariates
across the levels of exposure (deployed or not). One advantage of propensity scores is that they facilitate checking
whether such balance has occurred; one conducts the analysis
of each covariate in the same way as one assess the effect of
the exposure on the outcome. If balancing occurred, one
cannot reject the null hypothesis of no treatment “effect” on
the covariates. In the case of regression, a lack of balance is
known as the “support” problem, and the standard regression
package offers no hint that one does or does not have that
problem. At the very least, one could argue that propensity
scores are a nice companion to ordinary regression because
they provide a check for this problem.
A second advantage of propensity score methodology
also involves diagnostic checking. Among those who are
deployed or not deployed, there could be a subset of individuals who are unique to either group—their propensity score
would be very close to 1 or zero. Regression handles these
individuals by implicitly extrapolating predicted outcomes to
these individuals. Because these individuals have unusual
combinations of the covariates, they may be particularly
influential in determining the regression line. Propensity
scores offer a better way of handling these cases. One can
identify them on the basis of high or low values and potentially remove them or at least examine their effect on the
analysis.
The use of propensity scores to form weights might
precipitate a related problem. Weighted analyses are less
efficient than unweighted analyses, and the loss of precision
is a function of the variation of the weights. One can get quite
noisy estimates if the weights are highly variable. When
survey samplers have this problem, they “trim” the weights.
My discussions with them indicate that this involves a rather
informal process (“we make the big ones smaller and the
smaller ones bigger”). In the case of propensity-score
weights, balance provides a reference point for trimming—
one can trim the weights until covariate balance is lost.
Balance provides a bottom line for trimming the weights. In
the present case, we censored the weights at the fifth and 95th
© 2011 Lippincott Williams & Wilkins
Medical Care • Volume 49, Number 3, March 2011
Deployment and the Citizen Soldier
percentiles. The resulting estimates of the effect of exposure
were more precise, but covariate balance was not eliminated.
Sensitivity Analyses: The Role of Unobserved
Confounding
A key limitation of propensity score (and regression)
methodology is the ignorability assumption.
As noted, this assumption really cannot be tested formally; one only can replace it with another assumption (eg,
an exclusion restriction and instrumental variables estimation) and further examine whether the effect of deployment
changes as a result.
Another possibility involves sensitivity analyses involving unobserved confounding of hypothetical strengths.
Such an approach was used to demonstrate the effect of
smoking on health.33 We used this strategy in the present
study. Building on Rosenbaum,29 this analysis assumes that
there is a hypothetical confounding factor (H), above and
beyond those captured by the covariates. This factor causes
omitted variable bias. It can assume one of the 2 values,
“healthy” or “unhealthy.” (Rosenbaum29 showed that assuming this variable as dichotomous does not fundamentally
change the problem but is a helpful convenience.)
One can examine the potential effects of this bias in a
regression framework. The true relationship is as follows:
Yi ⫽ ␣ ⫹ ␤Ei ⫹ ⌫Hi ⫹ ␧i
(1)
In the case of dichotomous outcomes (and a logit
model), Yi is the log-odds of the outcome. ␤ is the (true)
effect of exposure on the log-odds ratio. If we assume
ignorability (ie, that H does not exist when it does), then the
resulting estimate of ␤ (b) is as follows:
␦Hi
⌫
⫽␤⫹ ,
b̂ ⫽ ␤ ⫹ ⌫
␦Ei
␭
(2)
where ⌫ is the effect of Hi on the outcome, and ␭ is the
strength of the relationship between H and Ei. The ratio of ⌫
and ␭ is the resulting bias. The bias is a function of 2
parameters, and a bias of a given magnitude could be generated by infinite combinations of ⌫ and ␭. To simplify matters,
Rosenbaum and others assumed that H predicts either exposure or the outcome perfectly. In our analyses, we assumed
that this factor predicts deployment perfectly. In essence,
P(E ⫽ 1 兩 H ⫽ 1) ⫽ P(E ⫽ 0 兩 H ⫽ 0) ⫽ 1 , and this
effectively sets ␭ to 1. So, now the issue of bias solely
involves the ⌫ term.
How big would gamma have to be to change the interpretation of our findings? The key issue is whether the bias
would make a relationship appear significant when it is not. In
effect, the question is whether the bias pulled a statistically
significant t statistic more than 㛳t㛳 ⫺ 1.96 standard errors away
from 0. Suppose a parameter estimate is 3 and the standard error
is 1, then the key issue is whether the bias could have pulled the
t statistic 1.04 units from the critical value. In terms of the
magnitude of the parameter itself, the bias would have to be 1.04
standard errors. For a nonsignificant effect, the logic is reversed.
When the estimated parameter is 1, the standard error is 1. The
© 2011 Lippincott Williams & Wilkins
question now is whether a bias could have worked to make an
otherwise significant finding appear insignificant— could the
bias have pulled the t-statistic 0.96 units toward zero? The
corresponding bias would be 0.96 standard errors.
One can note that the size of this bias depends on both
the magnitude of the estimated effect as well as its precision.
A larger, more precise effect will be more robust to possible
unobserved confounding.
In the present case, because ⌫ is the effect of H on the
log-odds of the outcome, one can exponentiate that value and
generate the corresponding odds ratio. This estimate has the
same interpretation as any odds ratio, and this is the value we
report in the article.
Adjustment for Multiple Testing
The Benjamini-Hochberg procedure assess statistical
significance in light of the FDR that one finds acceptable. The
FDR is the expected proportion of incorrectly rejected null
hypotheses. This test involves ranking the significance levels
of the tests conducted from smallest to largest (Table 1). For
each significance level, the term, k can be calculated as follows:
{
k ⫽ maxi p(i) ⱕ
i
m
}
q ,
(3)
where i is the rank of a given test, p(i) is the corresponding
P value, m is the number of tests (24 in the present case),
and q is the FDR. One rejects all hypotheses the smallest
to the kth.
Consistent with prior research, the FDR in this article is
set at 0.10. This figure means that any finding labeled as
“statistically significant” has a 10% chance of being a chance
finding (or “false discovery”).
REFERENCES
1. Defense Science Board Washington DC. Defense Science Board Task
Force on deployment of members of the National Guard and Reserve in
the global war on terrorism. Washington, DC: Defense Science Board;
2007.
2. Lowenberg M. The Role of the National Guard in national defense and
homeland security. Natl Guard. 2005;59:97.
3. Polusny MA, Erbes CR, Arbisi PA, et al. Impact of prior Operation
Enduring Freedom/Operation Iraqi Freedom combat duty on mental
health in a predeployment cohort of National Guard soldiers. Mil Med.
2009;174:353–357.
4. Kline A, Falca-Dodson M, Sussner B, et al. Effects of repeated deployment to Iraq and Afghanistan on the health of New Jersey Army
National Guard troops: implications for military readiness. Am J Public
Health. 2010;100:276.
5. Browne T, Hull L, Horn O, et al. Explanations for the increase in mental
health problems in UK reserve forces who have served in Iraq. Br J Psychiatry. 2007;190:484.
6. Wolfe J, Erickson DJ, Sharkansky EJ, et al. Course and predictors of
posttraumatic stress disorder among Gulf War veterans: a prospective
analysis. J Consult Clin Psychol. 1999;67:520 –528.
7. Boscarino JA. Post-traumatic stress and associated disorders among
vietnam veterans: the significance of combat exposure and social support. J Trauma Stress. 1995;8:317–336.
8. Marshall RP, Psych D, Jorm AF, et al. Posttraumatic stress disorder and
other predictors of health care consumption by Vietnam Veterans.
Psychiatr Serv. 1998;49:1609 –1611.
9. Weiss DS, Marmar CR, Schlenger WE, et al. The prevalence of lifetime
and partial post-traumatic stress disorder in Vietnam theater veterans.
J Trauma Stress. 1992;5:365–376.
www.lww-medicalcare.com |
311
Medical Care • Volume 49, Number 3, March 2011
Foster
10. Jacobson IG, Ryan MA, Hooper TI, et al. Alcohol use and alcoholrelated problems before and after military combat deployment. JAMA.
2008;300:663.
11. Smith TC, Ryan MA, Wingard DL, et al. New onset and persistent
symptoms of post-traumatic stress disorder self reported after deployment and combat exposures: prospective population based US military
cohort study. BMJ. 2008;336:366 –371.
12. Killgore WD, Stetz MC, Castro CA, et al. The effects of prior combat
experience on the expression of somatic and affective symptoms in
deploying soldiers. J Psychosom Res. 2006;60:379 –385.
13. Mental Health Advisory Team (MHAT) III Operation Iraqi Freedom.
Final report. Office of the Surgeon Multinational Force-Iraq and Office
of The Surgeon General United States Army Medical Command; 2006.
http://www.armymedicine.army.mil/reports/mhat/mhat_iii/mhat-iii.cfm.
14. Hotopf M, Hull L, Fear NT, et al. The health of UK military personnel
who deployed to the 2003 Iraq war: a cohort study. Lancet. 2006;367:
1731–1741.
15. Hourani LL, Bray RM, Marsden ME, et al. Department of Defense
survey of health related behaviors among the Guard and Reserve Force.
Report prepared for the Assistant Secretary of Defense (Health Affairs).
Research Triangle Park, NC: Research Triangle Institute; 2007.
16. Bell NS, Amoroso PJ, Williams JO, et al. Demographic, physical, and
mental health factors associated with deployment of U.S. Army soldiers
to the Persian Gulf. Mil Med. 2000;165:762–772.
17. Larson GE, Highfill-McRoy RM, Booth-Kewley S. Psychiatric diagnoses in historic and contemporary military cohorts: combat deployment
and the healthy warrior effect. Am J Epidemiol. 2008;167:1269.
18. Vasterling JJ, Proctor SP, Amoroso P, et al. Neuropsychological outcomes of army personnel following deployment to the Iraq war. JAMA.
2006;296:519.
19. King DW, King LA, Gudanowski DM, et al. Alternative representations
of war zone stressors: relationships to posttraumatic stress disorder in
male and female Vietnam veterans. J Abnorm Psychol. 1995;104:184 –
196.
20. King DW, King LA, Foy DW, et al. Posttraumatic stress disorder in a
national sample of female and male Vietnam veterans: risk factors,
war-zone stressors, and resilience-recovery variables. J Abnorm Psychol.
1999;108:164 –170.
21. Bianchi SM, Milkie MA, Sayer LC, et al. Is anyone doing the housework-trends in the gender division of household labor. Soc Forces.
2000;79:191.
22. Vogt DS, Pless AP, King LA, et al. Deployment stressors, gender, and
mental health outcomes among Gulf War I veterans. J Trauma Stress.
2005;18:115–127.
23. Radloff LS. The CES-D scale: a self-report depression scale for research
in the general population. Appl Psychol Meas. 1977;1:385.
312
| www.lww-medicalcare.com
24. Robins LN, Helzer JE, Croughan J, et al. National Institute of Mental
Health diagnostic interview schedule: its history, characteristics, and
validity. Arch Gen Psychiatry. 1981;38:381.
25. Burnam MA, Wells KB, Leake B, et al. Development of a brief screening
instrument for detecting depressive disorders. Med Care. 1988;26:775–789.
26. Weathers FW, Huska JA, Keane TM. The PTSD checklist— civilian
version (PCL-C). Boston, MA: National Center for PTSD; 1994.
27. Lang AJ, Stein MB. An abbreviated PTSD checklist for use as a
screening instrument in primary care. Behav Res Ther. 2005;43:585–
594.
28. Rosenbaum PR, Rubin DB. The Central role of the propensity score in
observational studies for causal effects. Biometrika. 1983;70:41–55.
29. Rosenbaum PR. Observational Studies. New York, NY: Springer; 2002.
30. Imbens GW. The Role of the propensity score in estimating doseresponse functions. Biometrika. 2000;87:706 –710.
31. Foster EM. Is more treatment better than less? An application of
propensity score analysis. Med Care. 2003;41:1183–1192.
32. Hansen BB. The essential role of balance tests in propensity-matched
observational studies: comments on ⬘A critical appraisal of propensityscore matching in the medical literature between 1996 and 2003’ by
Peter Austin. Stat Med. 2008;27:2050 –2054.
33. Cornfield J, Haenszel W, Hammond EC, et al. Smoking and lung cancer:
recent evidence and a discussion of some questions. J Natl Cancer Inst.
1959;22:173–203.
34. Greene WH. Econometric Analysis. Upper Saddle River, NJ: Prentice
Hall; 2008.
35. Benjamini Y, Hochberg Y. Controlling the false discovery rate: a
practical and powerful approach to multiple testing. J R Stat Soc Ser B
Stat Methodol. 1995;289 –300.
36. Benjamini Y, Yekutieli D. The control of the false discovery rate in
multiple testing under dependency. Ann Stat. 2001;29:1165–1188.
37. Bushway S, Johnson BD, Slocum LA. Is the magic still there? The use
of the Heckman two-step correction for selection bias in criminology.
J Quant Criminol. 2007;23:151–178.
38. Smith B, Ryan MA, Wingard DL, et al. Cigarette smoking and military
deployment: a prospective evaluation. Am J Prev Med. 2008;35:539 –546.
39. Sobel ME. Identification of causal parameters in randomized studies
with mediating variables. J Educ Behav Stat. 2008;33:230 –251.
40. Rosenbaum PR, Rubin DB. Constructing a control group using multivariate matched sampling methods that incorporate the propensity score.
Am Stat. 1985;39:33–38.
41. Lee M. Micro-Econometrics for Policy, Program, and Treatment Effects. Oxford, United Kingdom: Oxford University Press; 2005.
42. Morgan SL, Winship C. Counterfactuals and Causal Inference: Methods
and Principles for Social Research. Cambridge, United Kingdom: Cambridge University Press; 2007.
© 2011 Lippincott Williams & Wilkins