Session 4

Data Analysis in Paleontology
Using R
Session 4
26 Jan 2006
Gene Hunt
Dept. of Paleobiology
NMNH, SI
Looping Basics
Situation: you have a set of objects (sites, species,
measurements, etc.) and want to do some computation to
each one
femur <- c(10, 8, 14, 22, 32)
log.femur <- array(dim=5)
# 5 femora lengths
# want to log transform them
for (i in 1:5)
log.femur[i] <- log(femur[i])
Here, looping is not necessary
# save to a new variable
log.femur <- log(femur)
1
If / else statemtents
Commands can be executed depending on some
condition being TRUE, using if() and else
x <- 4
if (x==4)
print(“Oh”)
“Oh”
if (x==3)
print(“Oh”)
# Nothing happens
if (x==3)
print(“Oh”) else print(“Hey”)
“Hey”
Writing Functions
• There are many functions built-in to R
• Sometimes, need to do something for
which no function exists
• For example: people who wrote vegan
wanted to rarefy and compute diversity
metrics
• If it is a general enough task, it can be
useful to write your own function
2
A totally unnecessary function…
The function to make
functions is called
function()
times5 <- function(x)
{
function
name
result <- x*5
argument
return (result)
}
Once defined, it can be used
just like built-in functions:
times5(10)
result gets returned
as the function output
50
A more useful function: RMA
• Ordinary Least-squares
regression assumes all
error is in y-variable
• Often, x-variable has error
too
• Reduced Major Axis draws
a line that allows for error
in both x and y
slope
b1 = ±sy sx
intercept
b0 = y " b1 x
RMA
LS
!
!
3
A new function: rma()
Need 5 pieces of information:
mean of x
mean of y
stand. dev. of x
stand. dev. of y
correlation btwn x and y
rma <- function(x,y)
{
mx <- mean(x)
my <- mean(y)
sx <- sd(x)
sy <- sd(y)
rxy <- cor(x,y)
b1 <- sy/sx*sign(rxy)
b0 <- my - b1*mx
result <- c(b0,b1)
return(result)
b1 = ±sy sx
b0 = y " b1 x
!
!
}
Sourcing R scripts
• We have been entering commands, one at a
time, at the interactive R prompt
• We can also write commands in a text file,
and then tell R to do all of them consecutively
• The text file of commands = “script”
• Running commands from a file = “sourcing”
Open Script
Mac
Win
File > Open Document
File > Open Script
Source Script File > Source File
File > Source R Code
4
Sourcing R scripts
Example script file: sample.R
One difference: expressions not printed by default
Side note: syntax highlighting is really nice!
Win Users: see TINN-R (http://www.sciviews.org/Tinn-R/)
Saving your Work
1. Save the Workspace
– All variables / data that have been created
– When loaded, have access to same
variables again
– Does not save commands (you get “x”, but
not command that created it)
– Save/Load via Menus
Mac
Workspace > Save
Workspace File
Workspace > Load
Workspace File
Win
File > Save Workspace File > Load Workspace
5
Saving your Work
2. Save the History
– The History is the set of commands
entered during an R session
– The variables are not saved/loaded, but the
commands you used are
– Save/Load via Menus
Mac
Open Command Sidebar > Open Command Sidebar >
Save History
Load History
Win
File > Save History
File > Load History
Saving your Work
3. Write / copy results to file
– Functions write.table(), write(), …
– Copy output/results from R Console to Text
editor
4. Build / Save analyses as R Scripts
– Can source these to reconstruct analysis
– Start with data import, and build up
analysis, testing commands using the
interactive prompt
– Also useful for making figures
6
Miscellanea: Math
Matrix Algebra
Matrix Multiplication: X %*% Y
Matrix Transpose: t(X)
Matrix Inverse: solve(X)
Extract / replace diagonal: diag(X)
Probability Distributions
• Many probability distributions built-in to R: normal, t, F,
exponential, binomial, Poisson, …
• For each, can generate random variates, access
statistical tables
7
Miscellanea: Graphing
Open new plot window:
quartz()
# Mac
windows()
# Win
Windows version also has plot history: <Ins> to add plot,
<Page Up> <Page Down> to scroll through
Multiple graph layout
Can place multiple graphs on same device:
layout(1:2)
# lots of options
plot (rnorm(10))
# see also par(mfrow), par(mfcol)
plot (rnorm(200))
Exercise 12. Functions and Sourcing
1.
Open the script file rma.R, and look at it. Note the use of comments to
explain what the function is doing. Source this file to make the
function rma() available in your R session.
2.
Type data(mtcars) and then attach(mtcars) to make the car
data available. Plot horsepower (hp) as a function of engine
displacement (disp). Perform a linear regression of hp on disp,
saving the result to a variable called w.ls. Use the rma() function to
calculate the reduced major axis for these variables, and save the
results to a variable called w.rma. Use abline() to add the
regression and RMA lines to the scatterplot; use different colors for
the two lines.
3.
One measure of evenness, is calculated as E = H / log(S), where H is
the Shannon-Weiner diversity, and S is species richness. Write a
function to calculate E for a community data matrix. Note: you should
take advantage of the vegan functions to do hard parts! The body of
the function needs no more than 4 lines. Use your new function to
calculate E for the BCI data set (remember you’ll have to use
data(BCI) to make this accessible).
8
Answers to Exercises [4]
Exercise 12. Functions and Sourcing
1.
Pretty self-explanatory.
2.
plot(disp, hp); w.ls<- lm(hp~disp); w.rma <- rma(disp,
hp); abline(w.ls, col=“blue”); abline(w.rma, col=“red”)
3.
Evenness<- function(X)
{
H<- diversity(X, index=“shannon”)
S<- specnumber(X)
ee<- H/log(S)
return(ee)
}
data(BCI)
Evenness(BCI)
# prints E values for all sites in BCI
9