Lab 2 - SmartSite

Lab 2
Genes and Behavior
Fall Quarter 2012
Introduction
To do this lab you will be using four software packages:
1. Mouse Models. A set of mouse behavioral development models written by Jeff
Schank.
2. Microsoft Excel. A spread sheet program that allows you to rearrange,
manipulate and graph data (it also has limited statistical analysis capabilities).
3. Microsoft Word. A word processing program that will allow you to combine text
with tables and graphics.
4. Statistica. A statistical analysis package, which allows you to do very
sophisticated statistical analyses with your data.
There are five main parts to this lab (under “The Lab” heading):
o
o
o
Lab Setup
The Lab: Parental handling in Two Strains of Mice Reared by Foster Parents
 Introduction and Methods
 Procedure
 Hypotheses
 Statistical Analysis
 Write Up
Hypothesis Testing
Lab Setup
Step 1: Create a lab group folder in the PSC113 folder within your lab group folder. Once you
open your lab group folder, it should look something like Figure 1. You will need to come up
with a unique name for your lab group. Don’t make it simple because you will be sharing this
space with others and we can’t have two folders with the same name. Figure 2 illustrates an
example lab folder.
Step 2: Copy the contents of the folder “class files” and paste them into the new folder you
created.
1
Figure 1. Inside your lab-group folder.
Figure 2. A new folder is created with the name “LabGroup-JKM”.
2
The Lab: Parental handling in Two Strains of Mice Reared by Foster Parents
In this lab we will be following the 3P’s of science. The problem posing phase has been initially
setup for you, so the main focus of the lab will be on the problem solving and persuasion phases.
In the persuasion phase (the writing up of the lab report), you will be asked to describe problems
posed by these results and how you might go about experimentally investigating them.
Introduction and Methods
In the early 1960s experiments in behavioral genetics, with different highly inbred strains of
animals, demonstrated behavioral differences among strains. Under the assumption discussed
above that environmental conditions were held relatively constant, the results of those early
studies have been interpreted as demonstrating direct effects of genetics on behavior (i.e.
upward causation). However, as Robert Ressler (1962) argued, not all aspects of a mouse pup’s
developmental environment can be held constant in the laboratory. Both the prenatal and
postnatal environments of each mother may provide a different environment for each litter of
pups. Thus, the prenatal and postnatal environments of the pups are confounded with genotype
(confounded in this context means that in previous experiments with inbred strains, the
experimental designs did not consider the possibility that either (or both) of the prenatal and
postnatal environments of pups may influence their subsequent behavioral development.) Failure
to examine the effects of these environments, leaves the question of direct genetic influence
indeterminate.
The simulated experiment you will run and analyze is designed to provide some information
about the possible differences between the two inbred stains of mice in the parental handling of
pups (this would be a component of the pup’s postnatal environment). Handling is an important
variable to examine since it is well known that variations in the handling an animal receives
either from its natural parents or by human lab technicians can have profound influences on a
number of behavioral characteristics in adulthood. Thus considerations of the effects of handling
in the postnatal environment are particularly important in assessing whether there are purely
direct genetic effects on behavior.
To adequately evaluate the differences in handling due to the strain of mice, a simulated cross
fostering scheme will be used. To do this, 10 litters of C57 mice will be reared with foster
parents of their own strain (CC) and 10 litters will be raised by the BALB strain (CB). Similarly,
10 litters of the BALB strain will be raised by foster parents of their own strain (BB), and 10
litters by the C57 strain (BC).
The day after litters are switched, parental handling will be measured for 10 successive days. On
each test day, you will suppose that the foster parents are removed from their home cage and
placed in an empty cage. The pups will be taken from the nest in their home cage and placed at
the other end of the cage. The foster parents are then returned to their home cage. The simulation
will then record the total number of second of handling of pups by the foster parents. Handling is
defined as carrying, dragging or oral manipulation of a pup by either of their foster parents. The
total amount of handling recorded for a litter on each day will be divided by the number of pups
in the litter to produce a measure of the average amount of handling per pup each day.
3
Hypotheses:
H01: If the prenatal and postnatal environments are the same for these two inbred strains of mice,
then there should not be a difference in the mean time spent handling each pup in each of the
cross fostering conditions.
H11: If they are different, then there should be differences among conditions.
H02: The mean amount of handling each pup receives should decrease over time as the pups
mature and become more and more independent of their parents, but should not differ among
conditions if prenatal and postnatal environments are the same for these two inbred strains of
mice.
H12: If there are differences, then this may appear as conditions x day interaction effects.
Procedure
To run a simulation, double click the file “Run Mouse Models” as illustrated in Figure 3. A
simulation window and control panel will open up as illustrated in Figure 4.
Figure 3. File to double click for a simulation.
4
Figure 4. Simulation environment with the model tab circled in red.
Click on the “Model” tab in on the control panel in Figure 4. There are several features in this
control panel as illustrated in Figure 5.
5
Figure 5. Control panel.
In the control panel, you can specify the model to be run. In this case, “Model 1” comes up by
default. The “delay” slide bar allows you to slow down or speed up a simulation. The variable
“N_per_Treatment” is the number of subjects (animals/litters) in each experimental condition.
When “saveDataToFile” is checked, it will save the data from a simulation to a data file. For
now, let’s keep it unchecked. Finally, “dataFileName” allows you to specify the name of your
data file. By default it is named “data.”
Let’s run a sample simulation without saving the data. Figure 5 illustrates the arrow in the lower
left corner (red circle around it) that you click to start a simulation. You can pause a simulation
by clicking on the vertical bars next to the start arrow and stop a simulation by clicking on the
square to the right of the vertical bars.
Now, let’s run a real simulation just once! Check “saveDataToFile” as illustrated in Figure 5. If
you run more than one simulation, the data from each simulation will be added on to your data
file, which will make it difficult to work with.
6
Figure 6. Snapshot of a simulation running but on pause.
Once you have run your simulation, you will find a new folder in your lab folder as depicted in
Figure 7.
Figure 7. Snapshot of the simulation folder in your lab folder.
7
When you open up the “dataModel1” folder, you will find the data file named “data” (unless you
changed the name of the file prior to simulation in which case it will have the name you gave it).
To prepare this file for statistical analysis, it is best to convert it into an Excel file. Open up
Excel and find the data file. In Excel, choose the “all files” option as indicated in Figure 8.
Figure 8. Snapshot the open file dialog box in Excel. The “all files” selection is circled in red.
Once you select the file named “data” and open it, you will get the dialog box in Figure 9. Click
on finish and it should open as an Excel file.
8
Figure 9. Dialog box in Excel for opening a text file with data.
If you have opened it up successfully, it will look like the file illustrated in Figure 10.
Figure 10. Data file opened in Excel.
9
The next step is to save the file as an Excel workbook. Choose “Save as” from the Excel file
menu and in the save dialog box and make sure you save it in your Lab Folder “dataModel1” and
change the file type to an Excel Workbook as illustrated in Figure 11.
Figure 11. Excel dialog box for saving a file. The file type selection is circled in red.
If you have done this successfully, then inside your “dataModel1” folder you should see the two
files depicted in Figure 12.
Figure 12. Contents of the folder “dataModel1”.
10
Statistical Analysis
You will be using a two factor (parent strain and pup strain), repeated measures (handling each
day) analysis of variance to analyze this data (this is a mouthful! But, you only need to know
what the results mean and how to interpret them). The two most difficult aspects of this part of
the lab will be figuring out how to get the data into and run it in Statistica. After we have
analyzed the data in Statistica, you will have to interpret the results (see the section below on
hypothesis testing).
First, we need to open the data in Statistica. When you open up Statistica, choose the “Classical
View”. Open Statistica and look for the dialog box illustrated in Figure 13. Make sure it is set
to open an Excel Workbook (Figure 13) and then click “OK”.
Figure 13. Statistica dialog box. The “Open an Excel Workbook” is circled in red.
11
A Statistica dialog box will open as illustrated in Figure 14. Make sure it is set to open all file
types as indicted in Figure 14 and click “Open”.
Figure 14. Statistica dialog box. The “Files of type” is circled in red.
You will get another dialog box as illustrated in Figure 15. Select “Import selected sheet to a
Spreadsheet” as illustrated in Figure 15.
Figure 15. Another Statistica dialog box. The “Import selected sheet to a Spreadsheet” is circled
in red.
12
You will then get yet another dialog box! Check “Get variable names from first row” and click
“OK” (Figure 16).
Figure 16. Yet another Statistica dialog box.
When another dialog box comes up, click “Import as Text Labels” (Figure 17).
Figure 17. Another Statistica dialog box.
You should now see a file in Statistica that looks like Figure 18.
Figure 18. Data file opened in Statistica.
13
Now, we are ready to set up for statistical analysis. From the statistics menu, find the general
linear model.
Figure 19. Statistics menu in the classical view of Statistica.
After you select the general linear model, choose the “repeated measures option” as illustrated in
Figure 20. Click “Ok” and the dialog box in Figure 21 will appear.
Figure 20. General linear model dialog box with “repeated measures selected”.
Start by clicking on the variables button as illustrated in Figure 21. Another dialog box will
appear with two windows (Figure 22). In the first window, select dependent variables D1 to
14
D10. In the second window, select categorical variable “condition”. Once you have made the
selections, click “OK”.
Figure 21. GLM repeated measures dialog box.
Figure 22. Dialog box for assigning dependent and categorical variables.
Now, you will be able to select the “Categorical Factors” button in Figure 21. Once you do that,
the dialog box illustrated in Figure 23 will appear. Select “All” and click “OK”.
15
Figure 23. Dialog box for categorical factors.
Finally, select the “Within Effects” button in Figure 21. The dialog box in Figure 24 will
appear. Enter “10” levels and use factor name “DAYS” and click “OK”.
Figure 24. Within-subjects dialog box.
Now you are ready to analyze your data. The dialog box that appears should look like Figure
25. Click “Ok”.
16
Figure 25. The dialog box in Figure 21 with all variables assigned.
Figure 26. GLM control panel.
17
To view your main results, click on all “affects/Graphs” in Figure 26. Your results should look
something like Figure 27. You can, for example, view graphs of your data by clicking on the
rows of the effects table. It is a good idea to “uncheck” the “Close dialog on OK” check box
illustrated in Figure 27.
Figure 27. Table of all effects.
Hypothesis Testing
We are assuming that you already understand some basic statistical concepts such as the mean
and variance of a population. Moreover, it is assumed that you have the basic idea of a
probability distribution such as a binomial distribution or a normal distribution. If you do not,
you can discuss it among members of your group, other groups or us (This is one reason to break
up into groups of 3, in your interactions you can share knowledge with each other and your lab
reports can be truly emergent!).
In classical hypothesis testing, which you will be doing in this lab, a statistical hypothesis is
typically a statement about one or more population distributions. It is important to remember that
a statistical hypothesis is always about the populations distributions and not about the
sample you are using to test the hypothesis. Statistical hypotheses are also typically never
exactly the same as "real-world" hypotheses, which are statements about phenomena or their
causes of things of interest. However, when a problem is posed in an appropriate way (i.e. by
developing an appropriate research design), statistical hypotheses can be inferred from realworld hypotheses, with the limitation that we must be prepared to relate the structure (i.e.
research design) we imposed on the problem to the real world. This is the important issue
18
concerning how carefully designed research applies to real world contexts. This aspect of science
belongs to the persuasion stage of scientific discovery.
For convenience exposition, we can represent statistical hypotheses by the letter H. A hypothesis
that completely specifies the population may, for example, have the form:
H: The population in question (e.g., BALB mice) are normally distributed with respect to
mean = 25g weight with standard deviation = 5g.
It is often the case that we have (1) less than completely specified hypotheses and (2) that our
hypotheses are stated as comparisons between parameters (e.g., means of two populations). For
example:
H: The populations of BALB and C57 mice are normally distributed and BALB have
larger mean litter sizes than do C57 mice (i.e. µ(BALB) > µ(C57)).
To test hypotheses using classical hypotheses testing we set up alternative hypotheses from
which to choose. One hypothesis stating the difference we expect to find and the other stating
that there is no difference. It is conventional to label the hypothesis we are making as H1 and the
hypothesis that states there is no difference as H0 (i.e. the Null hypothesis). Thus, for example,
we would state the hypothesis above as:
H1: The populations of BALB and C57 mice are normally distributed and have larger
mean litter sizes than do C57 mice (i.e. µ(BALB) > µ(C57)).
And the null hypothesis as:
H0: The populations of BALB and C57 mice are normally distributed and have the same
mean litter sizes as do C57 mice (i.e. µ(BALB) = µ(C57)).
We can almost never look at a data sample and from that data sample alone determine which
hypothesis is most likely correct. Population means for both populations may be essentially the
same, but because our sample size was not the whole population of these mice, the sample means
may not be equal (even though the actual populations means may be virtually the same).
One approach is to determine whether these data are improbably assuming H0 is true. To
calculate the probability of the data given that H0 is true, we assume that both populations are
the same and then given the sample data, we calculate the probability that these data were
observed given these assumptions.
If the probability of the data given H0 is sufficiently small, then we should reject H0, based on
the decision rule that if the data are improbable given H0, then reject H0. If we reject H0, then—
by the logic of the way we have set up the problem—we should be inclined to accept the view
that H1 is more plausible.
19
It is important to realize that there are many possible decision rules for rejecting H0 in favor of
H1. In many areas of the biological and social sciences, it is conventional to make this decision
based on a threshold probability below which we believe that the data are too improbable and
therefore that we should reject H0. This probability is the rejection threshold,  for the data
assuming H0.
In classical hypothesis testing, errors, of course, can be made! The kinds of errors recognized by
statisticians can be summarized in a 2 X 2 table comparing the truth of the statistical hypotheses
we make and our decisions:
True Situation
H0
Decision
H1
Don't reject H0
p(data| H0) > 
Correct
Error
Reject H0
p(data| H0) ≤ 
Error
Correct
Where p(data| H0) is the probability of the data given H0 is true. If p(data| H0) > , then we
conclude that conclude that the data are sufficiently probable that we should not reject H0. Keep
in mind that our calculated probability of the data is based on the data! If the data do not
adequately represent the population, then our estimate can be wrong. That is there really is a
difference between the two populations (i.e., H1 is true). Similarly, we may have be chance
obtained data such that p(data| H0) ≤ , but this estimate is in fact in error.
Statisticians distinguish between Type I errors (in which the null hypothesis is rejected when in
fact its true) and Type II errors (in which the null hypothesis is not rejected when in fact it is
false). The probabilities of making these two types of errors are  and  receptively.
The two types of errors are important and not independent. Decreasing the likelihood of one
error can increase the likelihood of the other. But in classical hypothesis testing, the Null
hypothesis is given the "benefit of the doubt" and we set the probability  , relatively low,
typically  = 0.05 or 0.01. In the simulated experiments you will be conducting, the  level you
will use is 0.05.
The hypotheses you are testing, as mentioned above, involve two factors (parent strain and pup
strain) and 10 repeated measures (handling over days). Below is what a Statistical data sheet
would look like for a repeated measures analysis of variance for one factor only (parental strain)
and 10 repeated measures. Note that time is a nested variable with 10 measures. Your Statistica
data sheet will be more complicated because you will have two factors: pup strain in addition to
parental strain.
20
Figure 27 is an analysis of variance table. This will be the main output of your analysis as
described earlier. In the first column are the effects (Condition, days, and interaction of days and
Condition), the second column is the sum of squares, the third column is the degrees of freedom,
the fourth is the mean squares, the fifth is the F-Values, sixth fifth is the p-values. The p-values
are the probability the probability of the data given the null hypothesis. The p-value is
determined by calculating the F-Value, which is the Mean Square of the condition divided by the
subject mean square. (Note that the Mean Square is the Sum of the Squares divided by the
degrees of freedom). Since  criterion is 0.05 for rejecting the null hypothesis, all three effects
are statistically significant.
21