Child Health - American Statistical Association

Sequential Logistic
Regression: Modeling
Risk Factors and Child
Outcomes
Presented to NIC Chapter of
ASA
October 21, 2005
Logistic Regression Model
Statistical method for relating explanatory
variable(s) to the log odds of a binary
outcome measure.
 Dependent variable is always a binary
outcome.
 Independent variables may be categorical
or quantitative.

Logistic Regression Model
Log of the Odds Ratio
 p 
     x   x  ... x
ln 
0
1
1
2
2
k
k
1

p





p is the probability associated with the binary
outcome measure.
eß1 is the odds ratio for independent variable x1.
Odds ratio (eß1) being the amount of increase in
the odds associated with a unit increase in x1.
Statistical Inference for Logistic
Regression

The confidence interval for the slope b1 is
b1  z  SEb1

The confidence interval for the odds ratio is
e

b1  zSEb1
, eb1  zSEb1

Where z is the value from the standard normal
density curve.
Statistical Inference for Logistic
Regression

To test the hypothesis Ho: ß1 = 0 we
compute the test statistic
 b1 

X  
 SEb1 
2
2

Which has approximately a Chi-Square
distribution with 1 df.
Logistic Regression with One
Predictor1
Assume in a large sample of college
students, those who frequently engage in
binge drinking are 3,314/17,096 = 0.1938.
 Odds for a for this outcome are thus:

p
0.1938

 0.24
1  p  0.8062
This example borrowed from introduction to the Practice of Statistics by
Moore and McCabe (2006).
Is Gender a Predictor?
Gender
Binge Drinking?


Odds Males:
Male
Female
Total
Yes
1,630
1,684
3,314
No
5,550
8,232
13,782
Total
7,180
9,916
17,096
p
0.2270

 0.294
1  p  0.7730
Log Odds:
ln( 0.294)  1.22
Odds for Females:
p
0.1698

 0.205
1  p  0.8302
Log Odds:
ln( 0.205)  1.58
Interpreting the LogReg Model
 p 
     x1
ln 
0
1
1 p 

Model for this example is:

For females (x1= 0) we have:
 p 
    1  0   0
ln 
0
1 p 

Thus the estimate of the intercept is equal
to ß0 which is the log odds for females.
 p 
    1.59
ln 
0
1 p 
Interpreting the LogReg Model

The estimate of the slope is the difference
between the log odds for males on the predictor
and the log odds for females on the predictor:
 p0 
 p1 
  1.23  (1.59)  0.36
  ln 
b1  ln 
 1  p1  
 1  p0  

The fitted model is: log(ODDS)=-1.59 +0.36x
Meaning of the Odds Ratio
ODDS males e 1.59 0.36
0.36


e
 1.43
 1.59 
ODDS females
e

The odds ratio is:

Interpretation: the odds of being a frequent
binge drinker for males is 1.43 times the
odds for females.
Multivariate Logistic Regression
The multivariate case has the same
statistical concepts but the computations
are more difficult because of the potential
correlation among multiple predictors.
 It is easy to conduct the analysis using a
statistical software package.

Overview of Study


Children grow up within the context of personality, family,
neighborhood, and society.
They grow up with both disadvantages and opportunities, problems
and strengths, referred to here as risk and protective factors.

Examples of commonly understood risk factors include low birth weight,
child maltreatment, illness, neighborhood violence.
 Examples of commonly understood protective factors include individual
verbal communication skills, the capacity for empathy, problem solving
skills, frustration tolerance, the presence of multiple and consistent
caregivers, access to health care and social services, and the concrete,
social, and affective support of family and friends.

The aim of this study was to empirically measure risk and protective
factors at the individual, family, and neighborhood level and to relate
them to poor short- and longer-term outcomes such as health
problems, behavioral and cognitive development, and maltreatment.
Methods -- Subjects



The 219 mother-infant dyads recruited for this
study were part of a larger cohort recruited in
waves over four years, beginning in 1990 as part
of the Capella Project, a twenty year longitudinal
study funded by NIH.
Data used in the current analysis were collected
over a period of approximately 4-5 years.
Infants in the study were all under 18 months of
age when they entered the study.
Methods -- Instruments


Extensive information was collected during the primary maternal interview.
The main tools were the interview and self-report inventories.


Maternal Information






Use of alcohol and drugs.
Physical and psychological health.
Personal history of physical, sexual and emotional abuse.
Family functioning and daily life stressors.
Neighborhood conditions.
Child Information




Combination of study-developed and standardized instruments.
Behavior.
Health, accidents, hospitalizations.
Cognitive and emotional development.
Child maltreatment

Abuse or neglect in the child’s first year of life, obtained from an annual review of hotline
records of reports, and supplemented by case record review
Caregiver Intra-Personal
Functioning




CAGE—4 item rapid alcoholism screening scale. Subjects were
classified as having a possible alcohol problem if they endorsed 2 or
more items.
Center for Epidemiologic Studies Depression Scale—20-item scale
to measure depressive symptoms. Clinical cut-off score of 16 used
here.
Health Opinion Survey—20 item scale to assess neurotic or
psychosomatic symptoms. Higher scores indicate more symptoms.
A binary measure was computed using a median split, to reflect
above-average psychosomatic symptoms.
Service Utilization – report of a psychiatric or substance use
hospitalization.
Caregiver Inter-Personal
Functioning



Family and Neighborhood —The family APGAR is a 5item inventory of family function and satisfaction. The
Neighborhood Satisfaction Index is a 9-item inventory of
neighborhood characteristics.
Domestic Violence was defined by self-report in
conjunction with questions regarding childhood physical,
sexual and emotional abuse, and was further confirmed
as current by interviewer in the site-specific Trauma and
Violence scale.
Lifetime Stressors – An inventory of common stressors
such as marriage, divorce, death in the family, moving,
experiencing violence, etc.
Child Short-Term Outcomes




Child Health Status— items to assess general health, specific
conditions applying to child and other illness or problems.
Service Utilization Measures—to assess accidents and
hospitalizations of the child.
Child Abuse Neglect Tracking System—abuse or neglect in the
child’s first year of life, obtained from an annual review of hotline
records of reports, and supplemented by case record review.
Battelle Developmental Inventory Screening Test—96 items (out of
341 in complete battery) to assess five domains: personal-social
skills, adaptive behavior, psychomotor ability, communication and
cognitive. Child considered to have delayed development if
(standardized) Battelle total score more than 1 standard deviation
from the mean.
Child Long-Term Outcomes



Child Health – items assessing general health,
specific conditions applying to child and other
illness or problems through caregiver report.
Child Behavior Checklist – 5 scale scores
assessing a child’s behavioral and social
development.
PRESS – A measure of intelligence for preschool children.
Hypotheses


The theoretical model guiding the analyses
posited a sequential model of the effect of
certain risk factors on child developmental
outcomes.
These risk factors were:
 Maternal
history of loss and/or victimization.
 Maternal compromised emotional status.
 Domestic violence.
 Family and/or neighborhood problems.
Hypotheses




Maternal history of loss/victimization would be
associated with maternal compromised
emotional status.
Maternal compromised emotional status would
be associated with problems in the family and
neighborhood and/or domestic violence.
Problems in the family and neighborhood and
domestic violence would be associated with
poor short-term child outcomes.
Poor short-term outcomes would be associated
with poor longer-term child outcomes.
Visual Model of the Hypotheses
Short-Term
Outcomes
Maternal History
Victim of Child Abuse
Lost a Parent
Compromised
Emotional Status
CageA/CES-D
Health Opinion Survey
Residential Treatment
Domestic Violence
Child Abuse or Neglect
AOD/Battelle
Child Health
Family &
Neighborhood
Long-Term
Outcomes
FAPGAR
Life Experiences
Neighborhood Short
Form
Press/CBCL
Battelle
Child Health
Measures used in Analyses


Maternal loss/victimization history coded yes (1) if the
mother reported either a personal history of abuse or
losing a parent before the age of 18. Coded no (0)
otherwise.
Maternal compromised emotional status was coded yes
(1) if the mother any of the following:




Score of 2 or higher on a 4-item rapid alcoholism screening
inventory (CAGE).
Score above cutoff of 16 on the depression inventory (CESD).
Score on inventory of psychosomatic symptoms above the
median.
Report of a substance or psychiatric hospitalization.
Measures used in Analyses

Problems in the family or neighborhood was
coded yes (1) if the mother scored above the
median on two or more of the following
inventories:
 Family
function and satisfaction.
 Neighborhood characteristics.
 Lifetime stressors.

Domestic violence coded yes (1) if the mother
reported domestic violence.
Measures used in Analyses

Poor short-term (1-2 Year) child outcomes
was coded yes (1) if the child had any two
of the following:
 Health
Problem(s), accident or hospitalization.
 Delayed Development (BATTELLE).
 Presence of Alcohol or Drugs at birth.

OR there was a report of abuse or neglect.
Measures used in Analyses

Poor long-term (3-4 Year) child outcomes
was coded yes (1) if the child had any two
of the following:
 Health
Problem(s), accident or hospitalization.
 Delayed Development (PRESS).
 Behavioral Problems (CBCL)
Logistic Regression #1



Maternal /loss victimization history entered as a single
predictor for maternal compromised emotional status.
This analysis was statistically significant (Chi-Square =
13.94, p < .001), and resulted in correct classification of
47% of cases without impaired caregiver status, 77% of
cases with caregiver status problems and 68% of cases
overall.
The odds ratio for the predictor (maternal victimization
history) was 3.1, and the 95% CI (1.7 to 5.6).
SPSS
Output
The Model so Far
Short-Term
Outcomes
Maternal History
Victim of Child Abuse
Lost a Parent
Domestic Violence
Child Abuse or Neglect
AOD/Battelle
Child Health
3.1
Compromised
Emotional Status
CageA/CES-D
Health Opinion Survey
Residential Treatment
Family &
Neighborhood
Long-Term
Outcomes
FAPGAR
Life Experiences
Neighborhood Short
Form
Press/CBCL
Battelle
Child Health
Logistic Regression #2




Maternal loss/victimization history and maternal
compromised emotional status entered together as
predictors for family/neighborhood problems.
This analysis was also statistically significant (ChiSquare = 16.17, p < .001), and resulted in correct
classification of 60% of cases without
family/neighborhood problems, 65% of cases with
family/neighborhood problems, and 63% of cases
overall.
The odds ratio for the maternal compromised emotional
status as a predictor (family neighborhood problems)
was 2.5, and the 95% CI (1.4 to 4.6).
The odds ratio for maternal victimization history was not
statistically significant.
SPSS
Output
The Model so Far
Short-Term
Outcomes
Maternal History
Domestic Violence
Victim of Child Abuse
Lost a Parent
Child Abuse or Neglect
AOD/Battelle
Child Health
3.1
Compromised
Emotional Status
CageA/CES-D
Health Opinion Survey
Residential Treatment
2.5
Family &
Neighborhood
Long-Term
Outcomes
FAPGAR
Life Experiences
Neighborhood Short
Form
Press/CBCL
Battelle
Child Health
Logistic Regression #3





Maternal loss/victimization history, caregiver status, and
family/neighborhood problems entered in one step to predict
presence of domestic violence in the home.
This regression was statistically significant (Chi-Square = 16.36, p <
.001), and resulted in correct classification of 71% cases without
domestic violence in the home, 51% of cases with domestic violence
in the home, and 62% cases overall.
The odds ratio for the maternal compromised emotional
status as a predictor (of domestic violence) was 2.1, and
the 95% CI (1.4 to 4.6).
The odds ratio for family/neighborhood problems as a
predictor (of domestic violence) was 1.8, and the 95% CI
(>1.0 to 3.2).
The odds ratio for maternal victimization history was not
statistically significant.
SPSS
Output
The Model so Far
Short-Term
Outcomes
Maternal History
Domestic Violence
Victim of Child Abuse
Lost a Parent
3.1
2.1
1.8
Compromised
Emotional Status
CageA/CES-D
Health Opinion Survey
Residential Treatment
Child Abuse or Neglect
AOD/Battelle
Child Health
2.5
Family &
Neighborhood
Long-Term
Outcomes
FAPGAR
Life Experiences
Neighborhood Short
Form
Press/CBCL
Battelle
Child Health
Logistic Regression #4




Maternal loss/victimization history, caregiver status,
family/neighborhood problems, and domestic violence entered in
one step to predict presence of poor short-term child outcomes.
The overall regression was not statistically significant (Chi-Square =
8.98, p < .062), and classification was less effective. Under this
model, all cases were classified into the poor short-term child
outcome group, correctly classifying only those subjects who did in
fact have poor short-term child outcomes (66%), and misclassifying
all the rest.
The odds ratio domestic violence as a predictor (of poor
short-term child outcomes) was 2.1, and the 95% CI (1.2
to 3.9). This was statistically significant.
The odds ratios for the other predictors were not
statistically significant.
SPSS
Output
The Model so Far
Short-Term
Outcomes
Maternal History
Domestic Violence
Victim of Child Abuse
Lost a Parent
3.1
2.1
2.1
1.8
Compromised
Emotional Status
CageA/CES-D
Health Opinion Survey
Residential Treatment
Child Abuse or Neglect
AOD/Battelle
Child Health
2.5
Family &
Neighborhood
Long-Term
Outcomes
FAPGAR
Life Experiences
Neighborhood Short
Form
Press/CBCL
Battelle
Child Health
Logistic Regression #5





Maternal loss/victimization history, caregiver status,
family/neighborhood problems, domestic violence, and poor shortterm child outcomes entered in one step to predict presence of poor
longer-term child outcomes.
The overall regression was statistically significant (Chi-Square =
16.67, p < .005), and resulted in correct classification of 39% cases
without poor long-term child outcomes, 85% of cases having poor
long-term child outcomes, and 68% cases overall.
The odds ratio for family/neighborhood problems as a predictor (of
poor long-term child outcomes) was 2.6, and the 95% CI (1.1 to 6.1).
The odds ratio for poor short-term outcomes as a predictor (of poor
long-term child outcomes) was 3.2, and the 95% CI (1.4 to 7.6).
The odds ratios for the other predictors were not statistically
significant.
SPSS
Output
The Final Model
Short-Term
Outcomes
Maternal History
Victim of Domestic
Violence
Victim of Child Abuse
Lost a Parent
3.1
2.1
2.1
3.2
1.8
Family &
Neighborhood
Compromised
Emotional Status
CageA/CES-D
Health Opinion Survey
Residential Treatment
Child Abuse or Neglect
AOD/Battelle
Child Health
2.5
FAPGAR
Life Experiences
Neighborhood Short
Form
Long-Term
Outcomes
2.6
Press/CBCL
Battelle
Child Health
Goodness of Fit


-2LL (LL = log likelihood) is 0 if model fits
perfectly.
Chi-Square is test the change in -2LL from
constant only to model with set of predictors.
Goodness of Fit

Quantification of the proportion of explained
variance.



Cox & Snell R2 & Nagelkerke R2
These are similar in intent to R2 in multiple linear regression.
For the current model, about 19.5%.
Discrimination and Calibration

Model Discrimination
 Ability
of the model to discriminate
observations in the two groups.

Model Calibration
 How
close the observed and predicted
probabilities match.
Model Discrimination

SPSS provides a classification table.
 Shown

earlier.
SPSS also provides a histogram of
estimated probabilities.
 Positive
cases should be on the right and
negative cases on the left.
Model Discrimination
not so good
one serious
problem is
the sample
itself was
quite biased
towards poor
outcomes
because of
poverty, etc.
Calibration

Hosmer-Lemeshow goodness-of-fit
 Cases
divided into deciles based on estimated
probabilities.

Compare observed to expected numbers (contingency table)
 Null
hypothesis for this is there is no difference
between the observed and predicted values.
 This statistic should be interpreted carefully because
it’s value is dependent upon the number of groups.
 Interpretation should be cautious.
Hosmer and Lemeshow for Final
Model
null hypothesis is
not rejected,
suggesting the
model is OK.
The c-Statistic

c-Statistic

Interpreted as the proportion of pairs of cases with different observed
outcomes where the model results in higher probability for cases with
the event than for cases without the event.
 Ranges in value from 0.5 to 1.0, where 1.0 means the model always
assigns higher probability to cases with the event than to those without
the event.

In SPSS to get this you first have to save the predicted probabilities
along with the actual outcome measure into a new file, and then
group them into a reasonably large number of distinct groups using
an equation like this:


probcat = trunc(prob_1/.00005)
Next cross tabulate probcat with the outcome measure and calculate
Somers’ d.
Somers’ d
c-Statistic

The c-statistic is interpreted as the % of
possible pairs of cases in which one is
positive on the outcome and the other is
negative, that the logistic model assigns a
higher probability to the positive case.
d
0.79
c   .05 
 0.5  0.895
2
2
Conclusion
These results provided general support for
the model overall.
 Subsequent analyses (not reported here)
helped further refine the model and
explore relationships among risk factors
and child outcomes.
