Online Materials: Logistic Regression

Chapter 4
Online Materials: Logistic Regression
© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
The correlation and regression methods you have seen up to this point require that both
variables of interest be numerical. But what if the dependent variable in a study is not
numerical? This situation requires a different approach. Logistic regression can be used to
describe the way in which a dependent variable that is categorical with just two categories
(a binary variable) is related to a numerical predictor variable.
Example 4.16 Look Out for Those Wolf Spiders
The paper “Sexual Cannibalism and Mate Choice Decisions in Wolf Spiders: In­flu­ence of Male
Size and Secondary Sexual Characteristics” (Animal Behaviour [2005]: 83–94) described a
study in which researchers were interested in variables that might be related to a female
wolf spider’s decision to kill and consume her partner during courtship or mating. The
accompanying data (approximate values read from a graph in the paper) are values of
x 5 difference in body width (female – male) and y 5 cannibalism, coded as 0 for no cannibalism and 1 for cannibalism for 52 pairs of courting wolf spiders.
Size Difference
(mm)
Cannibalism
21
21
20.8
20.8
20.6
20.6
20.4
20.4
20.4
20.4
20.2
20.2
20.2
20.2
0.0
0.0
0.0
0.0
0.0
0.0
0.2
0.2
0.2
0.2
0.2
0.2
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
Size Difference
(mm)
0.4
0.4
0.4
0.4
0.4
0.6
0.6
0.6
0.6
0.6
0.6
0.6
0.8
0.8
0.8
0.8
0.8
1.0
1.0
1.0
1.0
1.2
1.4
1.6
1.8
2.0
Cannibalism
0
0
0
0
1
0
0
0
0
0
1
1
0
0
1
1
1
0
0
1
1
0
0
1
1
1
A Minitab scatterplot of the data is shown in Figure 4.31. Notice that the plot was
constructed so that if two points fell in exactly the same position, one was offset a bit so
that all observations would be visible. (This is called jittering.)
The scatterplot doesn’t look like others you have seen before—its odd appearance is
due to the fact that all y values are either 0 or 1. But, you can see from the plot that there
are more occurrences of cannibalism for large x values (where the ­female is bigger than the
male) than for smaller x values. In this situation, it makes sense to consider the proportion
of the time cannibalism would occur as being related to size difference. For example, you
CHAPTER 4 Online Materials: Logistic Regression
3
1.0
0.6
0.4
0.2
0.0
Figure 4.31
Scatterplot of the wolf spider
data.
−1.0 −0.5
0.0
0.5
1.0
Size difference
1.5
2.0
2.5
might focus on a single x value, say x 5 0 where the female and male are the same size.
Based on the data at hand, what can you say about the cannibalism proportion for pairs
where the size difference is 0? This question will be revisited after introducing the logistic
regression equation.
A logistic regression equation is used to describe how the proportion of “successes”
(for example, cannibalism in the wolf spider example) changes as a numerical predictor
variable, x, changes.
With p denoting the proportion of successes, the logistic regression equation is
​e​a 1 bx​  
p 5 ​ ________
 ​
1 1 ​ea​ 1 bx​
where a and b are constants.
The logistic regression equation looks complicated, but it has some very convenient
properties. For any x value, the value of ​ea​ 1 bx​/(1 1 ​ea​ 1 bx​) is between 0 and 1. As x
changes, the graph of this equation has an “S” shape. Consider the two S-shaped curves of
Figure 4.32. The blue curve starts near 0 and increases to 1 as x increases. This is the type
of behavior exhibited by p 5 ​ea​ 1 bx​/(1 1 ​e​a 1 bx​) when b . 0. The red curve starts near 1
for small x values and then decreases as x increases. This happens when b , 0 in the logistic regression equation. The steepness of the curve—how quickly it rises or falls—also
depends on the value of b. The farther b is from 0, the steeper the curve.
Most statistics computer packages have the capability of using sample data to compute values for a and b in the logistic regression equation to produce an equation relating
the proportion of successes to the predictor x. An explanation of an alternate method for
computing reasonable values of a and b is given later in this section.
1.0
0.8
b<0
0.6
b>0
0.4
0.2
Figure 4.32
Two logistic regression curves.
0
Unless otherwise noted, all content on this page is © Cengage Learning.
© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Cannibalism
0.8
4
CHAPTER 4 Online Materials: Logistic Regression
Cannibal Spiders II
Minitab was used to fit a logistic regression equation to the wolf spider data
of Example 4.16. The resulting Minitab output is given in Figure 4.33, and Figure 4.34
shows a scatterplot of the original data with the logistic regression curve superimposed.
Response Information
Variable
Cannibalism
Value
1
0
Total
Count
11
41
52
(Event)
Logistic Regression Table
Figure 4.33
Minitab output for the data of
Example 4.17.
Predictor
Constant
Size difference
Coef
−3.08904
3.06928
SE Coef
0.828780
1.00407
Z
−3.73
3.06
P
0.000
0.002
Odds
Ratio
21.53
95% CI
Lower
Upper
3.01
154.05
With a 5 23.08904 and b 5 3.06928, the logistic regression equation is
​
​e​
p 5 ​ ________________
  
    ​
1 3.06928x
1 1 ​e23.08904
​
​
23.08904 1 3.06928x
To predict or estimate the proportion of cannibalism when the size difference between the
female and male 5 0, substitute 0 for x in the logistic regression equation to obtain
​e​23.08904​
__________
​
​e​23.08904 1 3.06928(0)
  
    ​5 ​ 
  
p 5 ​ _________________
 ​5 0.044
23.08904 1 3.06928(0)
​
​
1 1 ​e​
​ 1 1 ​e23.08904
The proportions of cannibalism for other values of x 5 size difference can be computed
in a similar manner.
1.0
Variable
Cannibalism
Logistic model
0.8
Cannibalism
© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Example 4.17 0.6
0.4
0.2
Figure 4.34
Scatterplot and logistic regression
curve for data of Example 4.17.
0.0
−1.0
−0.5
0.0
0.5
1.0
Size difference
1.5
2.0
Unless otherwise noted, all content on this page is ©Cengage Learning.
5
CHAPTER 4 Online Materials: Logistic Regression
Consider an important question in drug development—what strength dose of a drug
is needed to elicit a response? For example, suppose that we are marketing a poison,
RatRiddance, to be used to eradicate rats. We want to use enough of the toxic agent to dispose of the little critters, but for safety and ecological reasons we don’t want to use more
poison than necessary. Imagine that an experiment is conducted to assess the toxicity of
RatRiddance, where the amount of the active ingredient is varied. Eleven different concentrations are tested, with about 500 rats in each treatment. The results of the experiment
are given in Table 4.3. A plot of the data is shown in ­Figure 4.35.
Table 4.3 Mortality Data for RatRiddance
20
40
60
80
100
120
140
160
180
200
240
440
462
500
467
515
561
469
550
542
479
497
0.225
0.236
0.398
0.628
0.678
0.795
0.853
0.860
0.921
0.940
0.968
Concentration
Number Exposed
Mortality Rate
The original data consisted of about 5,000 observations. For each individual rat, there
was a (dose, response) pair, where the response was categorical—survived or did not survive. The data were then summarized in Table 4.3 by computing the proportion that did
not survive (the mortality rate) for each dose. It is these proportions that were plotted in
the scatterplot of Figure 4.35 and that exhibit the typical “S” shape of the logistic regression curve.
Let’s use the logistic regression equation to describe the relationship between the
proportion of rats who did not survive (mortality rate) and dose. The logistic regression
equation is
​e​ ​  
p 5 ​ ________
 ​
1 1 ​ea​ 1 bx​
a 1 bx
Mortality rate
1.0
0.8
0.6
0.4
0.2
Figure 4.35
Scatterplot of mortality rate
versus dose.
Unless otherwise noted, all content on this page is ©Cengage Learning.
20
40
60
80
100 120 140 160 180 200 220 240 260
Concentration
© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Relating Logistic Regression to Simple Linear Regression
6
CHAPTER 4 Online Materials: Logistic Regression
For data that have been converted into proportions, some tedious but straightforward
algebra demonstrates how you can use a transformation of the data and then fit the leastsquares regression line to obtain values for a and b in the logistic regression equation:
​e​a 1 bx​
p 5 ​ ________
  
 ​
multiply both sides by 1 1 ​ea​ 1 bx​
1 1 ​ea​ 1 bx​
© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
p(1 1 ​ea​ 1 bx​) 5 ​ea​ 1 bx​complete the multiplication on the left hand
side of the equation
subtract ​pe​a 1 bx​from each side
p 1 ​pe​a 1 bx​5 ​ea​ 1 bx​
p 5 ​ea​ 1 bx​2 ​pe​a 1 bx​factor out ​ea​ 1 bx​ in the right hand side of the
equation
p 5 ​ea​ 1 bx​(1 2 p)
p
​ _____
   
 ​ 5 ​ea​ 1 bx​
12p
p
ln ​ ​ _____
   
 ​  ​5 a 1 bx
12p
( 
divide both sides of the equation by (1 2 p)
take the natural log of both sides
)
This means that if the logistic regression equation is a reasonable way to describe the
p
​     ​  ​and x is linear. A conserelationship between p and x, the relationship between ln ​ _____
12p
quence of this is that if you transform p using
( 
( 
)
)
p
y9 5 ln ​ _____
​     
 ​  ​
12p
you can use least squares to fit a line to the (x, y9) data.
For the RatRiddance example, the transformed data are
(  )
x
p
p
​ ______
   
 ​
12p
p
​     
y9 5 ln ​ ______
 ​  ​
12p
20
40
60
80
100
120
140
160
180
200
0.225
0.236
0.398
0.628
0.678
0.795
0.853
0.860
0.921
0.940
0.290
0.309
0.661
1.688
2.106
3.878
5.803
6.143
11.658
15.667
21.237
21.175
20.414
0.524
0.745
1.355
1.758
1.815
2.456
2.752
The resulting least-squares line (using x and y9) is
y9 5 a 1 bx
5 21.6033 1 0.221x
You can check the transformed linear fit in the customary way, checking the scatterplot
and the residual plot, as shown in Figure 4.36(a) and (b). Although there seems to be a hint
of curvature in the data, the linear model appears to fit quite well.
CHAPTER 4 Online Materials: Logistic Regression
4
7
0.5
3
0.25
1
0
0
−0.25
−1
−2
0
50
100
150
Concentration
Figure 4.36
200
250
−0.5
50
0
100
150
Concentration
(a)
200
250
(b)
Plots for the mortality data
(a) scatterplot
(b) residual plot
Alan and Sandy Carey/
Photodisc/Getty Images
Example 4.18 The Call of the Wild Amazonian . . . Frog
The Amazonian tree frog uses vocal communication to call for a mate. In a study of
the relationship between calling behavior and the amount of rainfall (“How, When,
and Where to Perform Visual Displays: The Case of the Amazonian Frog Hyla ­parviceps,”
Herpetologica [2004]: 420–429), the daily rainfall (in mm) was recorded as well as call-
ing behavior of male Amazonian frogs. Calling behavior was used to compute the call
rate, which is the proportion of frogs exhibiting calling behavior. Data consistent with
the article are given in Table 4.4.
Table 4.4 Daily Rainfall (mm) and Proportion Calling
Rainfall
Call rate
Rainfall
Call rate
Rainfall
Call rate
Rainfall
Call rate
Unless otherwise noted, all content on this page is ©Cengage Learning.
0.2
0.17
0.3
0.19
0.4
0.20
0.5
0.21
0.7
0.27
0.8
0.28
0.9
0.29
1.1
0.34
1.2
0.39
1.3
0.41
1.5
0.46
1.6
0.49
1.7
0.53
2.0
0.60
2.2
0.67
2.4
0.71
2.6
0.75
2.8
0.82
2.9
0.84
3.2
0.88
3.5
0.90
4.2
0.97
4.9
0.98
5.0
0.98
© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Residuals
ln(p/(1−p))
2
8
CHAPTER 4 Online Materials: Logistic Regression
Inspection of the scatterplot in Figure 4.37(a) reveals a pattern that is consis­tent with
a logistic relationship between the daily rainfall and the proportion of frogs exhibiting
calling behavior. The transformed data in Figure 4.37(b) show a clearly ­linear ­pattern.
For the transformed data the least-squares line is given by the equation
To predict calling proportion for a location with daily rainfall of 4.0 mm, you can use
the computed values of a and b in the logistic regression equation:
1.177x
​
​
​e​21.871 1  
​e​21.871 1 1.177(4.0)
p 5 ​ ______________
    ​5 ​ _______________
  
    ​5 0.945
1 1.177x
1 1 ​e21.871
​
​ 1 1​e​21.871 1 1.177(4.0)​
1.0
4.0
0.8
3.0
ln(p/(1−p))
Calling rate (p)
© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
y9 5 21.871 1 1.177(Rainfall).
0.6
0.4
2.0
1.0
0.0
0.2
0.0
−1.0
0
1.0
2.0
3.0
4.0
Daily rain
(a)
5.0
6.0
−2.0
0
1.0
2.0
3.0
4.0
Daily rain
5.0
6.0
Linear Fit
ln(p/(1−p)) = −1.871 + 1.177 Daily Rain
Summary of Fit
0.996
0.996
0.103
RSquare
RSquare Adj
s
Figure 4.37
Scatterplot of original and
transformed data of Example 4.18.
(b)
Exercises
4.73 Anabolic steroid abuse has been increasing despite
increased press reports of adverse medical and psychiatric consequences. In a recent study, medical researchers studied the potential for addiction to testosterone in
Peak Intake
(micrograms)
Survival
Proportion
( p)
p
______
​     
 ​
12p
10
30
50
70
90
0.980
0.900
0.880
0.500
0.170
49.0000
9.0000
7.3333
1.0000
0.2048
(  )
p
​     
y9 5 ln ​ ______
 ​  ​
12p
3.8918
2.1972
1.9924
0.0000
21.5856
hamsters (Neuroscience [2004]: 971–981). Hamsters were
allowed to self-administer testosterone over a period of
days, resulting in the death of some of the animals. The
given data are the proportion of hamsters surviving and the
peak self-administration of testosterone (mg). Fit a logistic
regression equation and use the equation to predict the survival proportion for hamsters with a peak intake of 40mg.
4.74 Does high school GPA predict success in first-year
college English? The proportion with a grade of C or better
in freshman English for students with various high school
GPAs for freshmen at Cal Poly, San Luis Obispo, in fall
of 2007 is summarized in the accompanying table. Fit a
logistic regression equation that would allow you to predict
Unless otherwise noted, all content on this page is ©Cengage Learning.
CHAPTER 4 Online Materials: Logistic Regression
High
School
GPA
Proportion C
or Better
p
​ ______
   
 ​
12p
p
​     
y9 5 ln ​ ______
 ​  ​
12p
3.36
2.94
2.68
2.49
2.33
2.19
2.06
1.94
1.83
1.72
1.61
1.49
1.38
1.25
1.11
0.95
0.75
0.05
0.08
0.95
0.90
0.85
0.80
0.75
0.70
0.65
0.60
0.55
0.50
0.45
0.40
0.34
0.30
0.25
0.20
0.15
0.10
0.05
19.00
9.00
5.67
4.00
3.00
2.33
1.86
1.50
1.22
1.00
0.82
0.67
0.52
0.43
0.33
0.25
0.18
0.11
0.05
2.94
2.20
1.73
1.39
1.10
0.85
0.62
0.41
0.20
0.00
20.20
20.41
20.66
20.85
21.10
21.39
21.73
22.20
22.94
( 
)
)
The regression equation is
ln(p/(1-p)) 5 20.917 2 0.107 Distance
SE Coef
Constant
20.9171
0.1249
T
2
3
4
5
6
7
8
a. Make a scatterplot of the proportion hatching versus
exposure for the lowland data. Also make a scatterplot
using the mid-elevation data. Are the plots generally the
shape you would expect from “logistic” plots?
b. Using the method introduced in this section, calculate y9
p
5 ln ​ _____
​     
 ​  ​for each of the exposure times in the cloud
12p
forest and fit the line y9 5 a 1 b(Days). What is the
sig­nifi­cance of the negative slope to this line?
c. Using your best-fit line from Part (b), what would you
estimate for the proportion of eggs that would hatch if
they were exposed to cloud forest conditions for 3 days?
5 days?
d. At what point in time does the estimated proportion of
hatching for cloud forest conditions seem to cross from
greater than 0.5 to less than 0.5?
( 
p
to describe the relationship between x and y9 5 ln ​ _____
​     
 ​  ​.
12p
Minitab output resulting from fitting the least-squares line is
given below.
Coef
1
Proportion
0.81 0.83 0.68 0.42 0.13 0.07 0.04 0.02
(lowland)
Proportion
(mid-elevation) 0.73 0.49 0.24 0.14 0.037 0.040 0.024 0.030
Proportion
0.75 0.67 0.36 0.31 0.14 0.09 0.06 0.07
(cloud forest)
Borne Viruses in Lupin Stands” (Annals of Applied Biology
[2005]: 337–350) was used to fit a least-squares regression line
Predictor
exposure on the hatch rate of thrasher eggs. Data consistent
with the estimated proportion hatching after a number of days
of exposure given in the paper are shown here.
Exposure (days)
4.75 Some plant viruses are spread by insects and tend
to spread from the edges of a field inward. The data on
x 5 distance from the edge of the field (in meters) and
y 5 proportion of plants with virus symptoms that appeared in
the paper “Patterns of Spread of Two Non-Persistently Aphid-
( 
4.76 The paper “The Shelf Life of Bird Eggs: Testing Egg
Viability Using a Tropical Climate Gradient” (Ecology [2005]:
2164–2175) investigated the effect of altitude and length of
P
27.34 0.000
Distance
20.10716 0.01062 210.09 0.000
S 5 0.387646 R-Sq 5 72.8% R-Sq(adj) 5 72.1%
a. What is the logistic regression equation relating x and the
proportion of plants with virus symptoms?
b. What would you predict for the proportion of plants with
virus symptoms at a distance of 15 meters from the edge
of the field? (Note: the x values in the data set ranged
from 0 to 20.)
)
4.77 As part of a study of the effects of timber man­
agement strategies (Ecological Applications [2003]: 1110–1123)
investigators used satellite imagery to study abundance of the
lichen Lobaria oregano at different elevations. Abundance of
a species was classified as “common” if there were more than
10 individuals in a plot of land. In the table below, approximate proportions of plots in which Lobaria oregano were
common are given.
400 600 800 1000 1200 1400 1600
0.99 0.96 0.75 0.29 0.077 0.035 0.01
Elevation (m)
Prop. of plots
with lichen
common
a. As elevation increases, does the proportion of plots for
which lichen is common become larger or smaller? What
aspect(s) of the table support your answer?
b. Using the method introduced in this section, calculate
p
y9 5 ln ​ _____
​     ​  ​for each of the elevations and fit the line
12p
y9 5 a 1 b(Elevation). What is the equation of the leastsquares regression line?
( 
)
© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
the proportion of freshman passing English based on high
school GPA. Use the resulting equation to predict the proportion of freshman with a high school GPA of 2.2 who
pass English.
9
10
CHAPTER 4 Online Materials: Logistic Regression
© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
c. U
sing the best-fit line from Part (b), estimate the proportion of plots of land on which Lobaria oregano are
classified as “common” at an elevation of 900 m.
4.78 The hypothetical data below are from a toxicity study
designed to measure the effectiveness of different doses of
a pesticide on mosquitoes. The table below summarizes the
concentration of the pesticide, the sample sizes, and the number of critters dispatched.
0.10 0.15 0.20 0.30 0.50 0.70 0.95
Concentration (g/cc)
Number of
mosquitoes
Number
killed
48
52 56 51 47
53
51
10
13 25 31 39
51
49
a. Make a scatterplot of the proportions of mosquitoes
killed versus the pesticide concentration.
b. Using the method introduced in this section, calculate
p
y9 5 ln ​ _____
​     ​  ​for each of the concentrations and fit the
12p
line y9 5 a 1 b(Concentration). What is the sig­nifi­cance
of a positive slope for this line?
c. The dose for which 50% of the pests die is sometimes
called LD50, for “Lethal dose 50%.” What would you
estimate to be LD50 for this pesticide when used on
mosquitoes?
( 
)
4.79 In the study of textiles and fabrics, the strength of
a fabric is an important consideration. Suppose that a large
number of swatches of a certain fabric are subjected to different “loads” or forces applied to the fabric. The data from such
an experiment might look as follows:
Hypothetical Data on Fabric Strength
5
Load
(lb/sq in.)
Proportion
failing
15
35
50
70
80
90
0.02 0.04 0.20 0.23 0.32 0.34 0.43
a. Make a scatterplot of the proportion failing versus the
load on the fabric.
b. Using the techniques introduced in this section, calculate
p
y9 5 ln ​ _____
​     
 ​  ​ for each of the loads and fit the line
12p
y9 5 a 1 b(Load). What is the sig­nifi­cance of a positive
slope for this line?
c. What proportion of the time would you estimate this
fabric would fail if a load of 60 lb/sq in. were ­applied?
d. In order to avoid a “wardrobe malfunction,” one would
like to use fabric that has less than a 5% chance of failing. Suppose that this fabric is our choice for a new
shirt. To have less than a 5% chance of failing, what
would you estimate to be the maximum “safe” load in
lb/sq in.?
( 
)
CHAPTER 4 Online Materials: Logistic Regression
11
( 
)
p
4.73 ln ​ _____
​      
​  ​5 4.589 2 0.0659x or
12p
​e4.589
​ 2 0.0659x​
p 5 _____________
​ 
  
   ​.When x 5 40, p 5 0.876.
1 1 ​e4.589
​ 2 0.0659x​
4.47 Calculating the least-squares line for
y' 5 ln (p/(1 2 p)) against x 5 high-school GPA we get
4.77 (a) As elevation increases, the species becomes less
common. This is made clear in the table by the fact that the
proportion of plots where the lichen is common decreases
as the elevation values increase.
(b)
Proportion of Plots
y' 5 22.89399 1 1.70586x. Thus, the logistic regression
22.89399 1 1.70586x
​
   ​. For x 5 2.2, the
equation is p 5 ________________
​  ​e​ 22.89399  
1 1.70586x
1 1 e​ ​
​
22.89399 1 1.70586(2.2)
​
e
​
​
__________________
  
   ​5 0.702.
equation predicts p 5 ​ 
1 1.70586(2.2)
1 1 e​ 22.89399
​
​
4.75 (a) The logistic regression equation is
20.9171 2 0.10716x
​
   ​.
p 5 _______________
​  ​e​ 20.9171  
2 0.10716x
1 1 ​e​
​
(b) For x 5 15, the equation predicts
​e​2 0.9171 2 0.10716(15)​
  
   ​5 0.074
p 5 _________________
​ 
1 1 ​e​2 0.9171 2 0.10716(15)​
4.76
0.9
0.8
Low land
Mid-Elevation
0.6
0.5
with Lichen (p)
y' 5 ln (p/(1—p))
400
0.99
4.595
600
0.96
3.178
800
0.75
1.099
1000
0.29
20.895
1200
0.077
22.484
1400
0.035
23.317
1600
0.01
24.595
The least–squares line is y' 5 7.537 2 0.00788x, where
x 5 elevation.
Proportion
0.7
Elevation
0.4
0.3
(c) The logistic regression equation is
​ 2 0.00788x​ .
​e7.537
p 5 ______________
​ 
  
   ​ For x 5 900, the equation predicts
1 1 ​e7.537
​ 2 0.00788x​
7.537 2 0.00788(900)
​
​e​
p 5 ​ _______________
  
   ​ 5 0.609.
1 + e​ 7.537
​ 2 0.00788(900)​
4.78 (a)
Concentration
(g/cc)
0.2
0.1
0.0
0
1
2
3
4
5
Days
6
7
8
(b)
Exposure (days) (x)
1
2
3
4
5
6
7
8
Number of
Mosquitoes
Number
Killed
Proportion
Killed
y' 5
In(p/(1–p))
0.10
48
10
0.208333
–1.33500
0.15
52
13
0.250000
–1.09861
0.20
56
25
0.446429
–0.21511
0.30
51
31
0.607843
0.43825
0.50
47
39
0.829787
1.58412
Cloud Forest
Proportion (p)
y' 5 ln(p/(1 2 p))
0.70
53
51
0.962264
3.23868
0.75
0.67
0.36
0.31
0.14
0.09
0.06
0.07
1.09861
0.70819
20.57536
20.80012
21.81529
22.31363
22.75154
22.58669
0.95
51
49
0.960784
3.19867
Proportion killed
1.0
0.9
0.8
0.7
The least-square line relating y' and x (where x is the
exposure time in days) is y' 5 1.51297 2 0.58721x.
The negative slope reflects the fact that as exposure time
increases, the hatch rate decreases.
0.6
(c) When x 5 3, p 5 0.438. When x 5 5, p 5 0.194.
0.2
(d) About 2.6 days.
0.5
0.4
0.3
0.1
0.2
0.3
0.4
0.5 0.6 0.7
Concentration
0.8
0.9
1.0
11
Unless otherwise noted, all content on this page is © Cengage Learning.
© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Answers for Selected Exercises
12
CHAPTER 4 Online Materials: Logistic Regression
© 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
(b) The least-squares line relating y' and x (where x is the
​
concentration in g/cc) is y​
​ ˆ  ' 5 21.55892 1 5.76671x. The
positive slope reflects the fact that as the concentration
increases, the proportion of mosquitoes that die increases.
(c) When p 5 0.5, y' 5 ln (p/(1 – p)) 5 ln (0.5/(1 2 0.5))
5 0. So, solving 2 1.55892 1 5.76671x 5 0 we get x 5
1.55892/5.76671 5 0.270. LD50 is estimated to be around
0.270 g/cc.
4.79 (a)
Proportion failing
(b)
Load
Proportion
Failing
y' 5 ln(p/(12p))
5
0.02
–3.892
15
0.04
–3.178
35
0.2
–1.386
50
0.23
–1.208
70
0.32
–0.754
80
0.34
–0.663
90
0.43
20.282
0.4
The least-squares line is y' 5 23.579 1 0.03968x, where
x 5 load applied
0.3
(c) The logistic regression equation is
23.579 1 0.03968x
​
p 5 _______________
​  ​e​
  
   ​. For x 5 60, the equation predicts
1 0.03968x
1 1 ​e23.579
​
​
23.579 1 0.03968(60)
​
e
​
​
________________
  
   ​5 0.232.
p 5 ​ 
1 0.03968(60)
1 1 ​e23.579
​
​
(d) For p 5 0.05, y' 5 ln(0.05/0.95) 5 22.944. So, we
need 23.579 1 0.03968x 5 22.944. Solving for x, we get
x 5 (22.944 1 3.579)/0.03968 5 15.989 lb/sq in.
0.2
0.1
0.0
0
10
20
30
40 50
Load
60
70
80
90
Unless otherwise noted, all content on this page is © Cengage Learning.