Species of hummingbird Mean Bill length

Statistics
allow
biologists to
support the
findings of
their
experiments.
“Why is this Biology?”
Variation in populations.
Variability in results.
affects
Confidence
in conclusions.
The key methodology in Biology is hypothesis
testing through experimentation.
Carefully-designed and controlled
experiments and surveys give us quantitative
(numeric) data that can be compared.
We can use the data collected to test our
hypothesis and form explanations of the
processes involved… but only if we can be
confident in our results.
We therefore need to be able to evaluate the
reliability of a set of data and the significance
of any differences we have found in the data.
Image: 'Transverse section of part of a stem of a Dead-nettle (Lamium sp.) showing+a+vascular+bundle+and+part+of+the+cortex'
http://www.flickr.com/photos/71183136@N08/6959590092 Found on flickrcc.net
“Which medicine should I prescribe?”
Image from: http://www.msf.org/international-activity-report-2010-sierra-leone
Donate to Medecins Sans Friontiers through Biology4Good: http://i-biology.net/about/biology4good/
“Which medicine should I prescribe?”
Generic drugs are out-of-patent, and are
much cheaper than the proprietary
(brand-name) equivalents. Doctors need to
balance needs with available resources.
Which would you choose?
Image from: http://www.msf.org/international-activity-report-2010-sierra-leone
Donate to Medecins Sans Friontiers through Biology4Good: http://i-biology.net/about/biology4good/
Hummingbirds are nectarivores (herbivores
that feed on the nectar of some species of
flower).
In return for food, they pollinate the flower.
This is an example of mutualism –
benefit for all.
As a result of natural selection,
hummingbird bills have evolved.
Birds with a bill best suited to
their preferred food source have
the greater chance of survival.
Photo: Archilochus colubris, from wikimedia commons, by Dick Daniels.
Researchers studying comparative anatomy collect
data on bill-length in two species of hummingbirds:
Archilochus colubris
(red-throated hummingbird) and
Cynanthus latirostris (broadbilled hummingbird).
To do this, they need to collect sufficient
relevant, reliable data so they can test
the Null hypothesis (H0) that:
“there is no significant difference
in bill length between the two species.”
Photo: Archilochus colubris (male), wikimedia commons, by Joe Schneid
The Null hypothesis presumes that there is
NO STATISTICAL DIFFERENCE
between the two samples.
The ALTERNATIVE hypothesis presumes that there is
a STATISTICAL DIFFERENCE
between the two samples.
The t-test provides a probability that the two
samples are the same.
A P < 0.05 is accepted as a low enough probability of
sameness to reject the NULL hypothesis.
The sample size must
be large enough to provide
sufficient reliable data and for us
to carry out relevant statistical
tests for significance.
We must also be mindful of
uncertainty in our measuring tools
and error in our results.
Photo: Broadbilled hummingbird (wikimedia commons).
The mean is a measure of the central tendency
of a set of data.
Table 1: Raw measurements of bill length in
A. colubris and C. latirostris.
Bill length (±0.1mm)
n
A. colubris
C. latirostris
1
13.0
17.0
2
14.0
18.0
3
15.0
18.0
4
15.0
18.0
5
15.0
19.0
6
16.0
19.0
7
16.0
19.0
8
18.0
20.0
9
18.0
20.0
10
19.0
20.0
Mean
s
Calculate the mean using:
• Your calculator
(sum of values / n)
•
Excel
n = sample size. The bigger the better.
In this case n=10 for each group.
All values should be centred in the cell, with
decimal places consistent with the measuring
tool uncertainty.
=AVERAGE(highlight raw data)
Standard deviation is a measure of the spread of
most of the data.
Table 1: Raw measurements of bill length in
A. colubris and C. latirostris.
Bill length (±0.1mm)
n
A. colubris
C. latirostris
1
13.0
17.0
2
14.0
18.0
3
15.0
18.0
4
15.0
18.0
5
15.0
19.0
6
16.0
19.0
7
16.0
19.0
8
18.0
20.0
9
18.0
20.0
10
19.0
20.0
Mean
15.9
18.8
s
Which of the two sets of data has:
a. The longest mean bill length?
a. The greatest variability in the data?
1.91
1.03
Standard deviation can have one more
=STDEV (highlight RAW data). decimal place.
Standard deviation is a measure of the spread of
most of the data.
Table 1: Raw measurements of bill length in
A. colubris and C. latirostris.
Bill length (±0.1mm)
n
A. colubris
C. latirostris
1
13.0
17.0
2
14.0
18.0
3
15.0
18.0
4
15.0
18.0
5
15.0
19.0
6
16.0
19.0
7
16.0
19.0
8
18.0
20.0
9
18.0
20.0
10
19.0
20.0
Mean
15.9
18.8
s
Which of the two sets of data has:
a. The longest mean bill length?
C. latirostris
a. The greatest variability in the data?
A. colubris
1.91
1.03
Standard deviation can have one more
=STDEV (highlight RAW data). decimal place.
Standard deviation is a measure of the spread of
most of the data. Error bars are a graphical
representation of the variability of data.
Error bars could represent standard deviation, range or confidence intervals.
Which of the two sets of data has:
a. The highest mean?
A
a. The greatest variability in the data?
B
The overlap of a set of error bars gives a clue as to the
significance of the difference between two sets of data.
Large overlap
No overlap
Lots of shared data points
within each data set.
No (or very few) shared data
points within each data set.
Results are not likely to be
significantly different from
each other.
Results are more likely to be
significantly different from
each other.
Any difference is most likely
due to chance.
The difference is more likely
to be ‘real’.
Graph 1: Comparing mean bill lengths in two
hummingbird species, A. colubris and C.
latirostris.(error bars = standard deviation)
Our results show a very small overlap
between the two sets of data.
22.0
C. latirostris,
18.8mm
(n=10)
Mean Bill length (±0.1mm)
17.0
A. colubris,
15.9mm
(n=10)
So how do we know if the difference is
significant or not?
We need to use a statistical test.
12.0
The t-test is a statistical
test that helps us determine
the significance of the
difference between the
means of two sets of data.
7.0
2.0
-3.0
Species of hummingbird
The Null Hypothesis (H0):
“There is no significant
difference.”
This is the ‘default’ hypothesis that we always test.
In our conclusion, we either accept the null hypothesis or reject it.
A t-test can be used to test whether the difference between two means is significant.
• If we accept H0, then the means are not significantly different.
• If we reject H0, then the means are significantly different.
Remember:
• We are never ‘trying’ to get a difference. We design carefully-controlled experiments and
then analyse the results using statistical analysis.
Excel can jump straight to a value of P for our results.
One function (=ttest) compares both sets of data.
As it calculates P directly (the
probability that the difference is due
to chance), we can determine
significance directly.
In this case, P=0.00051
This is much smaller than 0.005, so
we are confident that we can:
reject H0.
The difference is unlikely to be due to
chance.
Conclusion:
There is a significant difference in bill
length between A. colubris and C.
latirostris.
Two tails: we assume data are normally distributed, with two ‘tails’ moving away from mean.
Type 2 (unpaired): we are comparing one whole population with the other whole population.
(Type 1 pairs the results of each individual in set A with the same individual in set B).
Correlation does not imply causation, but it does waggle its eyebrows
suggestively and gesture furtively while mouthing "look over there."
Cartoon from: http://www.xkcd.com/552/
Correlation does not imply causality.
Pirates vs global warming, from http://en.wikipedia.org/wiki/Flying_Spaghetti_Monster#Pirates_and_global_warming
Correlation does not imply causality.
Where correlations exist, we must then design solid scientific experiments to determine the
cause of the relationship. Sometimes a correlation exist because of confounding variables –
conditions that the correlated variables have in common but that do not directly affect each
other.
To be able to determine causality through experimentation we need:
• One clearly identified independent variable
• Carefully measured dependent variable(s) that can be attributed to change in the
independent variable
• Strict control of all other variables that might have a measurable impact on the
dependent variable.
We need: sufficient relevant, repeatable and statistically significant data.
Some known causal relationships:
• Atmospheric CO2 concentrations and global warming
• Atmospheric CO2 concentrations and the rate of photosynthesis
• Temperature and enzyme activity
Pirates vs global warming, from http://en.wikipedia.org/wiki/Flying_Spaghetti_Monster#Pirates_and_global_warming
Flamenco Dancer, by Steve Corey
http://www.flickr.com/photos/22016744@N06/7952552148