Stat 401 - Lab 10, part b: Constructing indicator variables

Stat 401 - Lab 10, part b: Constructing indicator variables
Goals: In this second part of lab 10, we’ll see how to construct indicator variables ‘by hand’ and use
JMP’s automatically constructed indicator variables
Constructing indicator variables by hand: A couple of ways:
1) edit the data set by hand and add a column with the indicator values.
2) write a formula to do this. The formula function that does this best is match(). JMP provides a ‘fill in
values automatically’ option, which is extremely helpful but hard to find. The JMP 11 documention says
the option dialog you need appears automatically. That doesn’t happen for me. Here is I find what
works:
Create a new column one of the usual ways (e.g., double click on a blank column, select column
properties then formula). The usual formula construction box appears. Right click on the variable that
defines the groups (e.g. time in the light / flower production example).
THROUGHOUT THE NEXT TWO STEPS, hold down the shift key (i.e. shift-click all steps).
Right click Conditional, then select match (reminder, shift key is held down). You should see the
following template with some values partially filled in:
JMP has filled in the name of the defining variable (time) and the levels found in the data (“E” and “L”).
The third row is for ‘not in the above list’, which we won’t use.
You can release the shift key now.
left-click the box to the right of “E” => (with the text “then clause”) and fill in the indicator value for “E”
groups. To match the class example, this should be 0.
repeat for “L”, but the indicator should be a 1 for the “L” observations.
This is what that dialog looks like when filled in:
click OK. A new column will be added. You can change the column name to something more interesting
in any of the usual ways (double left click on the name, or right click on the name/column, select Column
info, then change the name, or select the column, then Cols / Column Info).
Using an indicator variable in a regression:
Just add the variable to the model.
Creating an interaction variable: Two ways
1) use the formula dialog to create a variable with the product of the two variables (e.g. two continuous
variables or a continuous variable and an indicator):
bring up the formula dialog
double left click one of the two variable to put it in the formula box
left click X in the operations box (top center with various math operation)
double left click the second variable to put it in the equation
click ok
2) create the product “on the fly” when fitting the regression:
Analyze / Fit model to bring up the model dialog
Add the two variables to the Create Model box
Either: select both variables in the “Select columns” dialog (use shift- or ctrl-left click)
or: select one variable in the Construct Model Effects box and the other variable in the “Select
Columns” dialog.
left-click the “Cross” button.
Click the red triangle by Model Specification and click on “Center Polynomials” to unselect that
option. (We’ll talk about centering polynomials in lecture. Class examples will not using centering;
JMP does it by default. That’s why to match class examples, you have to switch off that default
behaviour).
Run the model.
Creating indicator variables automatically in JMP:
When you use a nominal variable (red bars) in a regression model, JMP automatically converts that to
one or more indicator variables. If the nominal variable has two levels (e.g. time with “E” and “L”), JMP
creates one indicator variable. If it has k levels, JMP creates k-1 indicators.
In lecture, I said (or will say) that the definition of indicator variables is arbitrary. Class and the text use
‘last level is 0’ indicators. JMP uses +1/-1 indicators. For two levels, the first, e.g., “E”, gets a +1 and the
last, e.g. “L”, gets a -1. To create this indicator automatically, Analyze / Fit model, then add the nominal
variable, e.g. time, (not the indicator version of time) to the model. Then run the model. My example
has time (nominal) and light (continuous) in the model.
You can see the indicator variable that JMP constructs after fitting the model by:
left click the red triangle by Response flowers (top left of the results box)
select Save Columns / Save Coding Table
You get a new data window with a new variable time[E]. This is the indicator JMP created.
If your nominal variable has k levels, you will get k-1 new indicator variables.
If you look at the coding table, you will see that the JMP-created indicator has values of +1 (for E) and -1
(for L)
The parameter estimates include a Term labeled time[E]. This is the regression estimate for the
indicator variable with +1/-1 coding. This is shown in the Parameter Estimates box in the screen shot
below.
You get a more interpretable version of this, especially with more than two levels, by:
left click the red triangle by Response flowers (top left of the results box)
select Estimates / Expanded Estimates
The Expanded Estimates box has one value for each level of the nominal variable. This is the amount
that is added to the intercept when an observation is in the specified group.
Here are both the Parameter Estimates and Expanded Estimates output:
The intercept for an “E” observation is Intercept + time[E] = 77.395 + 6.079 = 83.46 ; that for an “L”
observation is Intercept + time[L] = 77.395 + (-6.079) = 71.30.
The regression coefficients (e.g. the 6.079) changes for different choices of indicator coding. It’s 12.158
for 0/1 coding with last = 0. But, no matter what coding is used, the intercept for the E group is always
83.46; that for the L group is always 71.30.