Sample project 2

Introduction:
The birth rate in the state of Massachusetts has been steadily decreasing for years according to
the MA Executive Office of Health and Public Safety, and this becomes an increasingly
interesting issue worth investigating. Intrigued by this decrease, we were interested in
examining if and how certain social economic factors may be influencing this decline. Thus, we
chose to study whether there is a statistically significant relationship between the average
birthrate (rate of live births per 1,000 within the population) in the state of Massachusetts (MA)
between the years 1990-2009, and three household economic factors. Within Massachusetts,
we theorized that the decreasing annual birthrate may be tied to households having less
disposable income on average. The three independent variables we chose to represent such
household economic factors within the state were: median annual household income, annual
unemployment rate and average annual cost of a movie ticket (at a national chain). We
examined the yearly changes in our variables in MA from 1990-2009 to detect any patterns. To
test this relationship, we used multiple regression analysis as both the dependant variable
(annual average birthrate) and independent predictor variables (detailed below) are all
quantitative variables. Here are the details of our independent variables:



Median Annual Household Income in MA: Households as reported by March of the
following year; income in CPI-U-RS adjusted U.S dollars.
nice
Average Movie Ticket Price in MA: Randomly selected social activity. Average cost was
determined by national theater chain branches in the state of MA in U.S. Dollars.
MA Annual Unemployment Rate: Percentage of labour force-statewide; not seasonally
adjusted.
Test Assumptions:
1. Linearity
 Scatter plots were constructed to determine the linearity between our independent
and dependent variables, see below:
2.
3.
4.
5.
These plots are very interesting and we can see that a negative linear relationship
exists between birth rates and movie ticket prices, and between birth rates and
median household income. Unemployment rate however, has a very poor linear
relationship it seems with birth rates, which is very interesting. Based on the graph,
it almost seems like there is close to no significant linear relationship between the
two variables, which is contrary to what we expected as we hypothesized
unemployment rate would affect the birth rates in the state of Massachusetts, or
specifically it would decrease it. These conclusions would be useful to keep in mind
when we run our analysis.
there looks
like there
Zero mean
could be a
 Always true due to nature of analysis (multiple regression)
small
pattern in
Constant Variance
your plot
 Based on the residual plot below, we can presume that the residuals are
and outliers
approximately scattered randomly above and below the zero mean line, indicating
constant variance; however it is hard to come to clear-cut decision from looking at
the plot, since the sample size for our dataset is so small (20 data points). Further,
there doesn’t seem to be a ‘plot thickens’ pattern or a curved pattern in the plot,
but there is clearly one outlier present at the top end of the graph.
Normality
 Based on the NPP plot below, we can see that the residuals fit somewhat close to
the linear line and thus, we are going to presume that our residuals have a fairly
normal distribution even though it is hard to say (again our small sample size might
have something to do with this). We presume our normality assumption is met.
Independence
 We are going to presume that our independence assumption is also met since the
rates/prices of the independent variables for one year does not generally have a
substantial effect on the rates/prices of the same independent variables for other
years. The variables/residuals are fairly independent of each other since annual
income, unemployment rates and ticket prices are affected by the events,
economical success and political nature of the particular year that are not
necessarily substantially affected or can be predicted by previous years, in our
opinion. Our variables are also independent of each other.
6. Randomness
 The population for the study is the state of Massachusetts and we selected a two
decade period (1990-2009) to study trends between our chosen variables. We used
all of the dataset available to us so we do not have random assignment, but this is ok
since this is the focus of interest for our study.
outlier?
good
Test Analysis:
The predicted/fitted model for our study will be:
BirthRate= β0 + β1MedianHouseholdIncome + β2UnemploymentRate+ β3MovieTicketPrice+ ε
(for the state of Massachusetts by year for the 1990-2009 time span)
b
Model Summary
Model
1
R
.908
R Square
a
Adjusted R
Std. Error of the
Square
Estimate
.824
.791
.47044
a. Predictors: (Constant), MovieTicketPrice, UnemploymentRate,
MedianHouseholdIncome
b. Dependent Variable: BirthRate
a
ANOVA
Model
Sum of Squares
df
Mean Square
F
Sig.
Regression
1
Residual
Total
16.611
3
5.537
3.541
16
.221
20.152
19
25.018
.000
b
a. Dependent Variable: BirthRate
b. Predictors: (Constant), MovieTicketPrice, UnemploymentRate, MedianHouseholdIncome
Coefficients
Model
Unstandardized
Standardized
Coefficients
Coefficients
B
(Constant)
MedianHouseholdIncom
1
a
Std. Error
16.781
1.103
-6.987E-005
.000
.131
-.233
t
Sig.
Collinearity
Statistics
Beta
Tolerance
VIF
15.211
.000
-.575
-.868
.398
.025
40.029
.076
.224
1.728
.103
.656
1.525
.588
-.256
-.396
.697
.026
38.151
e
UnemploymentRate
MovieTicketPrice
a. Dependent Variable: BirthRate
We can assess the effectiveness of the overall model:
H0: β1= β2= β3= 0
Ha: The slopes are not all zero – or at least one slope is not 0
The F-value of 25.018 gives us a p-value of <0.01, thus we can reject the null hypothesis and
conclude that there is a useful relationship between average Birth Rates in the state of
Massachusetts and our combination of predictor variables. Together, median household
income, unemployment rate and movie ticket prices account for a significant amount of the
variability in birth rates. Based on the Adjusted R-square value, our model explains 79.1% of the
variability in MA birth Rates, after correcting for sample size and the number of predictors,
which is quite good.
We can test the individual betas to determine their individual effectiveness:
H0: βi =0
Ha: βi ≠0
good
Since the p-value is >0.05 for all three predictors, we fail to reject the null hypothesises and
must conclude that there isn’t a significant, linear relationship between our independent
variables and average birth rates, after accounting for the other predictors. Based on this result,
it seems that none of our individual predictor variables are significant, even though the whole
model is very significant (p=0.00*). This could mean that there is a multicollinarity issue.
Additionally, our beta value for Median Household Income is very small (-6.987E-005). This
suggests that scaling our variable by $10,000 might be beneficial.
To test multicollinarity:
Coefficient Correlations
Model
1
Correlations
a
MovieTicketPric
Unemployment
MedianHouseho
e
Rate
ldIncome
MovieTicketPrice
1.000
-.541
-.986
UnemploymentRate
-.541
1.000
.571
MedianHouseholdIncome
-.986
.571
1.000
a. Dependent Variable: BirthRate
Based on the above table it is clear that Movie Ticket Prices and Median Household Income are
very highly correlated with a r=-0.986. This could be the source of the problem in our model
and explain why none of our individual predictor variables are significant. When we include
Median Household Income in our regression model, the extra effect of Movie Ticket Prices is
small and it doesn’t explain any new variability This is confirmed by the very large VIF values for
each of these variable: VIF for Median Household Income =40.029 and VIF for Movie Ticket
Prices= 38.151.
Conclusion:
The final model is fitted to be:
BirthRate= 16.781 -6.987E-005*MedianHouseholdIncome + 0.131*UnemploymentRate -0.233
MovieTicketPrice
Based on the SPSS output, the following relationship is seen between our variables:
- For every one unit increase in MA Annual Median Household Income, average MA birth
rate during 1990-2009 decreased by 6.987E-005, after accounting for the other predictors.
- For every one unit increase in MA Unemployment rate, average MA birth rate during 19902009 increased by 0.131, after accounting for the other predictors.
good
- For every one unit increase in MA Movie Ticket Price, average MA birth rate during 19902009 decreased by 0.233, after accounting for the other predictors.
In context, the model suggests that annual MA birthrates are negatively associated with
increases in median household income and movie ticket prices, and positively associated with
increases in unemployment rate (this relationship is verified by the scatter plots of our data
seen initially). This conclusion is highly suspiring as we expected increased income would allow
for increased birth rate (increased disposable income would lend greater financial stability to
start a family), and increased unemployment rate would decrease birth rates (less financial
ability to provide for a family. The negative relationship between increased movie ticket prices
and decreased birth rates was the only one that matched our initial predictions, and
interestingly adds the least amount of new variability to the model and should be taken out.
These contrary findings lead us to conclude that we have multicollinarity issues in our model
and that incorporating other predictor variables (especially finding another predictor to take
the place of unemployment rate due to low linearity based on the initial scatter plot) might be
useful. However, it is also important to keep our smaller sample size in mind when interpreting
these relationships and using them to predict/infer changing MA birthrates currently.
good
The results from our overall model test (p-value of <0.01) leads us to conclude that our model is
useful in explaining the relationship between average annual Birth Rates in the state of
Massachusetts and the combination of annual median household income, annual
unemployment rate and annual average movie ticket prices for the years 1990-2009; or at least
our selected predictors are somewhat useful when chosen together. Together, median
household income, unemployment rate and movie ticket prices do account for a significant
amount of the variability in birth rates; our adjusted r-squared value of 0.791, which is quite
high, further illustrates this (79.1% of the variability in MA birth Rates is explained by the
combination of predictors, after correcting for sample size and the number of predictors).
However, the individual beta tests yield conflicting results.
good
The individual beta tests for our predictor variables returned p-value scores greater than 0.05
making them all insignificant predictors. Based on our initial scatter plots of the variables, this
conclusion shouldn’t be surprising for unemployment rate (the scatter plot indicated that there
probably isn’t a linear relationship between unemployment rate and birth rates at all), while it
should be surprising for movie ticket prices and median household income since our scatter
plot indicated there was fairly high linearity between these variables and our dependent
variable. This problem can be explained by the multicollinarity issue present in our model.
Based on the coefficient correlations and VIF scores, annual household income and average
movie ticket prices are strongly correlated with each other, and fighting with each other to
explain variability in our response variable. While overall the model is useful and the predictors
seem to be generally good in combination, we will need to tweak our model to improve its
predictive power and better explain the relationship between MA average birth rates and social
economic factors. Based on the results, we suggest removing the movie ticket prices variable
(because of the high collinarity), scaling the median household income term (since the
coefficient value is so small), removing the outlier, and then re-running the model to see if
there is any improvement. We might also try investigating other predictors that can help
explain changing annual birth rates, especially for unemployment rate since the scatter plot
shows low linearity between unemployment rate and birth rates. Additionally, since we were
unclear on our assumptions of normality and constant variance, and we have a small sample
size conducting non-parametric tests such as bootstrapping would be beneficial.
This study was very interesting to construct and perform, and provided some significant insight
into some of the factors that, on average, influenced MA birth rates over the past two decades
(1990-2009)- an issue we find socially important. This study helps us narrow down some of the
factors that influence state birth rate and significantly furthers our understanding of this
important social trend. Among our findings, we were surprised to see the collinarity between
movie ticket prices and median household income as we initially expected to see, if any, a
possible collinarity issue between median household income and unemployment rate.
Additionally, the cost of an average movie ticket almost doubled from 1990-2009 while the
median household income increased at a much slower rate. Evidently, the cost of a movie ticket
in MA is growing at a rate higher than that of median household income. The dataset for MA
birthrates and our predictor variables over the next many years will prove to be very interesting
to see.
good