Lab 10 instructions

Stat 301 – Lab 10
Goals: In this lab, we will see how to:
fit models with indicator variables
create indicator variables
fit models to factor variables with automatically created indicator variables
fit models with both qualitative and quantitative variables, possibly including an interaction
Fit models with indicator variables for qualitative variables:
Load the BIDMAINT.txt data set. That data set has one qualitative factor, State, with 3 levels: Kansas, Kentucky, and
Texas. This data set already includes an X1 and X2 variable. The coding of X1 and X2 is the one used in the text and
emphasized in lecture.
1. Look at the data set and compare the value in the State variable and the value in the X1 variable. X1 is an
indicator variable. That means it has the value of 1 for some level of State. X1 is an indicator for which state?
X2 is also an indicator variable. For which state?
2. To fit a regression, use Analyze / Fit Model and use X1 and X2 as the X variables. The parameter estimates are
the values reported in the text. Interpretation of the regression slopes for this coding (0/1 values) was explained
in lecture and in the text.
To create indicator variables from a qualitative variable:
Load the BIDMAINT.txt data set. Here is how to create X1 and X2 if they weren't already there.
Select the column containing the values of the qualitative variable, Right click Cols / Utilities. You should see the
following menu:
Choose Make Indicator Columns. Three new columns will appear in the data set. Their names are the values of State
(so Kansas, Kentucky, and Texas here). Each is a red bar variable (qualitative) even though the values are 0 and 1. Turn
each into a quantitative variable by left-clicking on the red bar by the variable name and selecting continuous (instead of
nominal).
Note: This is a new feature in JMP 12 Pro (the version you should be using). If you don't see this Utility menu, check you
are using JMP 12 Pro. If the menu items are grayed out, you forgot to select the State column before selecting Cols /
Utilities.
Automatically create indicator variables for a qualitative variable:
JMP will automatically create +1/-1 indicator variablesLoad the BIDMAINT.txt data set. Select Analyze / Fit Model
1. You will notice a set of red bars by State. That tells you that State is a qualitative variable (which JMP calls a
nominal variable).
2. Select Cost as the Y variable and State as the X variable, then run the model.
3. Some parts of the output are identical to the multiple regression output we’re familiar with:
Summary of Fit: familiar, Root Mean Square Error is a very useful number
Analysis of Variance: familiar. Compares the full model (3 means) to the “intercept only” model.
Parameter Estimates: these are “hidden” (because they correspond to the specific coding used by JMP). To see
them, click the sideways open triangle by “Parameter Estimates”. You see three rows: Intercept, STATE[Kansas],
and STATE[Kentucky]. The two “slopes” are for indicator variables that JMP creates for you. These are for -1/ +1
coding that I briefly discussed in lecture.
Residual by Predicted Plot: familiar
Some parts of the output are new:
Effect Tests: This is a comparison of the model that includes STATE to a model without. When there is only one
factor, the effect test for STATE is the same as the test in the Analysis of Variance box because both are
comparing the same pair of models.
Least Squares Means: When a model has a qualitative variable, the real interest is the means for each level of
the factor. These are calculated from the parameter estimates and reported in this table. For the models we’re
considering, the Least Squares Mean and Mean are the same quantity. The Standard Error is what you expect it
to be. You can also get a 95% confidence interval for each mean by clicking in the table, and selecting the Lower
95% and Upper 95% items.
To see what indicator variables JMP creates for you (optional):
1. After fitting the ANOVA model (using Analyze / Fit Model), click the red triangle by Response Cost, select Save
Columns and Save Coding Table (very last item on the menu). A new data set window opens with one row for
each observation and the indicator variables that JMP creates for you. If you look at them, you see that JMP
uses 1 / 0 / -1 values. Changes the parameter estimates for each indicator variable (which is why they’re
hidden) but doesn’t change the SS associated with each factor or the least squares means for each level.
To fit a model with both qualitative and quantitative variables, and optionally their interaction:
1. Load the bear.csv data set. This has the chest, sex and weight variables for the black bear data set. It also has
other variables, which we won't use here. Sex is the qualitative (nominal) variable; Isex is the corresponding
indicator variable, with 1 for Males and 0 for Females.
2. Add the desired variables to the model box in Analyze / Fit Model. If you create the indicator variable, you can
specify which level is the reference level. If JMP creates the indicator for you (i.e. the model includes the
nominal variable), it will use +1 / -1 coding. There is little practical difference when the qualitative variable has
two levels. I find results from 0/1 coding a little easier to use.
3. When there are more than two levels for the nominal variable, there is a huge advantage to letting JMP create
the variables: it knows that the k-1 indicator variables (for k levels) are related. That means JMP automatically
gives you the test of a k-1 regression coefficients = 0. If you create the indicator variables, you have to use a
custom test.
4. To include their interaction, the easiest way is to create the product on the fly using Cross. This can be used to
specify the interaction as the cross of a nominal and a quantitative variable.
5. To fit the different slopes line to the bear data, add Chest and Sex to the model box, then select both variables
again and click Cross. Sex*Chest is added to the Model Effects box. If you run the model and look at the
parameter estimates, you see different results from what I gave in lecture, because of the different coding.
6. Open the Effect Tests box to see the effect-specific tests (all interactions = 0 is the Chest*Sex test).
Self assessment: We'll use the bear data in bear.csv.
1. Use Fit Model to test whether males and females differ in average weight. (This is a t-test, but use Fit Model).
What is the p-value for this test?
2. Test whether males and females differ in average weight when compared at the same Chest size. What is the pvalue for the comparison of their weights?
3. What is the difference between average male and average female weight, when compared at the same chest
size?
4. What is the slope for Chest in the ANCOVA model with Chest and Sex?
5. Is there evidence of an interaction between chest size and sex?
Answers:
1. p = 0.20
2. p = 0.68
3. Males are 4.0 lbs heavier, when compared at the same chest size. The most straight forward way to get this
number is to switch to a 0/1 indicator variable and look at the regression coefficient. Isex is 1 when the bear is
male; the estimated regression coefficient is 4.0.
4. 12.65 lbs/inch. This is (and should be) the same value for models using Sex and models using Isex to describe
the bear's sex.
5. Yes, p = 0.029. This is the p-value for the test of sex*chest or Isex*chest = 0, or the Effect Test p-value for the
Sex*Chest interaction. If you had more than two levels for the qualitative variable, you would want to look at
the Effect Test to test all k-1 coefficients = 0 simultaneously.
6.