Using R for scientific computing - ANSWERS

Using R for scientific computing - ANSWERS - D-MATH

Using R for scientific computing - ANSWERS
Karline Soetaert
Centre for Estuarine and Marine Ecology
Netherlands Institute of Ecology
The Netherlands
June 2009
Abstract
The answers to the exercises from the document:
”Using R for scientific computing” (Soetaert 2008).
Keywords:˜scientific computing, lecture notes, R.
This document gives the answers to the exercises in the lecture notes:
”Using R for scientific computing” (Soetaert 2008).
These notes are an introduction to R - at beginners level. They can be found in package
marelacTeaching.
In R, write:
require(marelacTeaching)
browseURL(paste(system.file(package="marelacTeaching"),
"/lecture/R_for_scientific_computing.pdf", sep=""))
In order to make this vignette more readable, the questions are repeated.
Chapter 1
No exercises in this chapter
Chapter 2 - R as a scientific calculator
> (4/6*8-1)^(2/3)
[1] 2.657958
> log(20)
[1] 2.995732
> log2(4096)
[1] 12
2
Using R for scientific computing - ANSWERS
> 2*pi*3
[1] 18.84956
> exp(2+cos(0.5*pi))
[1] 7.389056
> # length of 3rd side of a triangle with size 2.3 and 5.4 and angle
> sqrt(2.3^2+5.4^2-2*2.3*5.4*cos(pi/8))
[1] 3.391288
pi/8
Karline Soetaert
Chapter 3 - computing with R-variables
Chapter 3.8.1
Use R-function mean to estimate the mean of two numbers, 9 and 17.
> mean(c(9,17))
[1] 13
• Create a vector, called V, with even numbers, between 16 and 56. Do not use loops.
• Display this vector
• What is the sum of all elements of V?
• Display the first 4 elements of V
• Calculate the product of the first 4 elements of V
• Display the 4th, 9th and 11th element of V.
> (V<-seq(16,56,by=2)) # creates AND displays the vector
[1] 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 52 54 56
> # or:
> V <- 16+2*(0:20)
; V
[1] 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 52 54 56
> sum(V)
[1] 756
> V[1:4]
[1] 16 18 20 22
> prod(V[1:4])
[1] 126720
> V[c(4,9,11)]
[1] 22 32 36
• Create a new vector, W, which equals vector V, multiplied with 3; display its content.
• How many elements of W are smaller than 100?
3
4
Using R for scientific computing - ANSWERS
> W<-V*3; W
[1] 48 54 60 66 72
[17] 144 150 156 162 168
78
84
90
96 102 108 114 120 126 132 138
> W100<-W[W<100] ; length(W100)
[1] 9
> # or
> length(W[W<100])
[1] 9
• Create a sequence that contains the values (1,1/2,1/3,1/4,Ě,1/10)
• Compute the square root of each element
• Compute the square (2 ) of each element
• Create a sequence with values (0/1,1/2,2/3,3/4,Ě,9/10)
> 1/1:10
[1] 1.0000000 0.5000000 0.3333333 0.2500000 0.2000000 0.1666667
[7] 0.1428571 0.1250000 0.1111111 0.1000000
> sqrt(1/1:10)
[1] 1.0000000 0.7071068 0.5773503 0.5000000 0.4472136 0.4082483
[7] 0.3779645 0.3535534 0.3333333 0.3162278
> (1/1:10)^2
[1] 1.00000000 0.25000000 0.11111111 0.06250000 0.04000000 0.02777778
[7] 0.02040816 0.01562500 0.01234568 0.01000000
> (0:9)/(1:10)
# or : 0:9/1:10
[1] 0.0000000 0.5000000 0.6666667 0.7500000 0.8000000 0.8333333
[7] 0.8571429 0.8750000 0.8888889 0.9000000
• Create a vector, U, with 100 random numbers, uniformly distributed between -1 and 1.
• Check the range of U; all values should be within -1 and +1.
• Calculate the sum and the product of the elements of U
Karline Soetaert
5
• How many elements of U are positive?
• Zero all negative values of U.
• Sort U
> U <- runif(100,-1,1)
> range(U)
[1] -0.9502918
0.9954595
> sum(U);prod(U)
[1] -0.7583097
[1] 7.387896e-40
> length(U[U>0]) # or: sum(U>0)
[1] 46
> U[U<0]<-0
> sort(U)
[1]
[6]
[11]
[16]
[21]
[26]
[31]
[36]
[41]
[46]
[51]
[56]
[61]
[66]
[71]
[76]
[81]
[86]
[91]
[96]
0.00000000
0.00000000
0.00000000
0.00000000
0.00000000
0.00000000
0.00000000
0.00000000
0.00000000
0.00000000
0.00000000
0.10203900
0.17819833
0.32233289
0.38054464
0.49369212
0.71292092
0.75031026
0.79026967
0.90929318
0.00000000
0.00000000
0.00000000
0.00000000
0.00000000
0.00000000
0.00000000
0.00000000
0.00000000
0.00000000
0.00000000
0.10306845
0.18654360
0.34161667
0.38474568
0.52899283
0.71399717
0.75825391
0.82444906
0.92537636
0.00000000
0.00000000
0.00000000
0.00000000
0.00000000
0.00000000
0.00000000
0.00000000
0.00000000
0.00000000
0.00000000
0.12784833
0.20716692
0.34610485
0.39734014
0.65078140
0.72350425
0.76891759
0.84358471
0.95395961
0.00000000
0.00000000
0.00000000
0.00000000
0.00000000
0.00000000
0.00000000
0.00000000
0.00000000
0.00000000
0.00000000
0.13305956
0.24500929
0.37439649
0.44241280
0.69210200
0.74408004
0.77040642
0.89822813
0.96087621
0.00000000
0.00000000
0.00000000
0.00000000
0.00000000
0.00000000
0.00000000
0.00000000
0.00000000
0.00000000
0.05141517
0.17595446
0.30097985
0.38009024
0.45446726
0.70053326
0.74555211
0.77670676
0.89953952
0.99545950
• Create two vectors: vector x, with the elements: 2,9,0,2,7,4,0 and vector y with the
elements 3,5,0,2,5,4,6 (in that order).
6
Using R for scientific computing - ANSWERS
• Divide all the elements of y by the elements of x.
• Select all values of y that are larger than the corresponding values of x
• Select all values of y for which the corresponding values of x are 0.
• Remove all values of y for which the corresponding values of x equal 0.
• Zero all elements of x that are larger or equal than 7. Show x.
> x<- c(2,9,0,2,7,4,0)
> y<- c(3,5,0,2,5,4,6)
> y/x
[1] 1.5000000 0.5555556
[7]
Inf
NaN 1.0000000 0.7142857 1.0000000
> x>y
[1] FALSE
TRUE FALSE FALSE
TRUE FALSE FALSE
> x==0
[1] FALSE FALSE
TRUE FALSE FALSE FALSE
TRUE
> y[y>x]
[1] 3 6
> y[x==0]
[1] 0 6
> y<-y[x!=0]
> x[x>=7]<-0 ; x
[1] 2 0 0 2 0 4 0
Chapter 3.8.2
• Use R-function ”matrix” to create a matrix with the following contents:
3 9
7 4
• display it to the screen
Karline Soetaert
7
• Use R-function ”matrix” to create a matrix called ”A”:
3 9
7 4
• Take the transpose of A.
• Create a new matrix, B, by extracting the first two rows and first two columns of A.
Display it to the screen.
> A<-matrix(nrow=2,data=c(3,7,9,4)) ; A
[1,]
[2,]
[,1] [,2]
3
9
7
4
> A<-matrix(nrow=3,data=1/1:9,byrow=TRUE)
> t(A)
# or: 1/matrix(nrow=3,data=1:9,byrow=TRUE)
[,1]
[,2]
[,3]
[1,] 1.0000000 0.2500000 0.1428571
[2,] 0.5000000 0.2000000 0.1250000
[3,] 0.3333333 0.1666667 0.1111111
> B <- A[1:2,1:2] ; B
[,1] [,2]
[1,] 1.00 0.5
[2,] 0.25 0.2
Matrix D
• Use diag to create the following matrix, called ”D”:


1 0 0
 0 2 0 
0 0 3
• Use cbind and rbind to augment this matrix,

1 0 0
 0 2 0

 0 0 3
5 5 5
such that you obtain:

4
4 

4 
5
• Remove the second row and second column of the previous matrix
8
>
>
>
>
Using R for scientific computing - ANSWERS
D
<- diag(nrow=3,c(1,2,3))
DD <- cbind(D,rep(4,3)) # or: cbind(D,4)
DDD <- rbind(DD,rep(5,4)) # or: rbind(DD,5)
DDD
[1,]
[2,]
[3,]
[4,]
[,1] [,2] [,3] [,4]
1
0
0
4
0
2
0
4
0
0
3
4
5
5
5
5
> # same, in one sentence
> DD <- rbind(cbind(D,4),5)
> DD[-2,-2]
[1,]
[2,]
[3,]
[,1] [,2] [,3]
1
0
4
0
3
4
5
5
5
Chapter 3.8.3 - nematode diversity
• Select the data from station M160b (the 2nd column of Nemaspec); put these
data in a vector called ”dens”.
• Remove from vector dens, the densities that are 0. Display this vector on the
screen.
• Calculate N, the total nematode density of this station.
• Divide the values in vector dens by the total nematode density N. Put the results
in vector p. The sum of all values in p should equal 1.
• Calculate S, the number of species.
• Estimate the values of diversity indices N1 and N2 and Ni, given by the following
formulae:
P
i ·loge (pi )
N 1 = e −p
P 2
N 2 = 1/ ( pi )
N i = 1/ max(pi )
• The expected number of species in a sample with size n, drawn from a population
which size N, which has S species is given by:
"
#
S
N −N i
X
ES(n) =
1− nN
(n )
i=1
What is the expected number of species per 100 individuals ?
Karline Soetaert
9
• Print all diversity indices to the screen, which should look like:
> head(Nemaspec)
SPECIES M160a
M160b
M280a
M280b
M530a
1
Acantholaimus
0 6.580261 0.000000 1.120782 1.315487
2 Acantholaimus elegans
0 0.000000 1.439706 0.000000 3.956836
3 Acantholaimus iubilus
0 0.000000 0.000000 0.000000 0.000000
4
Acantholaimus M1
0 5.919719 0.000000 3.628518 0.000000
5
Acantholaimus M10
0 0.000000 0.000000 1.120782 0.000000
6
Acantholaimus M11
0 0.000000 0.000000 0.000000 0.000000
M530b
M820a
M820b
M990a
M990b
M1220a
M1220b
1 1.727387 3.417313 3.748096 2.447545 3.728838 4.369345 4.787512
2 0.000000 0.000000 2.198407 5.080900 5.330997 3.644567 3.494481
3 1.193131 0.000000 0.000000 1.270450 1.372789 0.000000 0.000000
4 1.193131 1.155307 0.000000 5.052036 6.225598 0.000000 1.166825
5 0.000000 0.000000 0.000000 0.000000 0.000000 1.115981 0.000000
6 0.000000 0.000000 0.000000 0.000000 2.285694 0.000000 0.000000
>
>
>
>
>
>
>
>
>
>
dens <- Nemaspec[,2]
dens <- dens[dens>0]
N
<- sum(dens);
p
<- dens/N
N0
<- length(p)
N1
<- exp(sum(-p*log(p)))
N2
<- sum(p*p)^(-1)
Ni
<- 1/max(p)
ESS <- N0-1/choose(N,100)*sum(choose(n=(N-dens),k=100))
c(N=N,N0=N0,N1=N1,N2=N2,Ni=Ni,ESS=ESS)
N
576.000000
N0
97.000000
N1
27.782793
N2
8.364525
Ni
3.162870
ESS
40.502318
10
Using R for scientific computing - ANSWERS
Chapter 4 user-defined functions
Chapter 4.4.1
>
>
+
+
+
+
+
+
>
## Sphere function
Sphere <- function(radius)
{
vol
<- 4/3*pi*radius^3
surf <- 4 *pi*radius^2
circ <- 2*pi*radius
return(list(volume=vol,surface=surf,circumference=circ))
}
Sphere(6371)
$volume
[1] 1.083207e+12
$surface
[1] 510064472
$circumference
[1] 40030.17
Chapter 4.4.2
The saturated oxygen concentration in water (molkg −1 ), as function of temperature (T),
and salinity (S) can be calculated by: SatOx = eA where : A= -173.9894 + 25559.07/T +
146.4813* loge(T/100) -22.204*T/100 + S * (-0.037362+0.016504*T/100-0.0020564 *T/100*T/100)
and T is temperature in Kelvin (Tkelvin = Tcelsius+273.15).
• Make a function that implements this formula; the default values for temperature and
salinity are 20◦ C and 35 respectively.
• What is the saturated oxygen concentration at the default conditions?
• Estimate the saturated oxygen concentration for a range of temperatures from 0 to
30◦ C, and salinity 35.
> SatOx <- function(T=20,S=35)
+
{
+
T <- T+273.15
+
A= -173.9894 + 25559.07/T + 146.4813* log(T/100) -22.204*T/100 + S *
+
(-0.037362+0.016504*T/100-0.0020564 *T/100*T/100)
+
exp(A)
+
}
> SatOx()
[1] 225.2346
Karline Soetaert
11
> SatOx(0:30)
[1]
[8]
[15]
[22]
[29]
349.6542
294.0533
252.7452
221.2020
196.4796
340.6019
287.4051
247.7235
217.3070
193.3808
331.9557
281.0288
242.8884
213.5431
190.3755
323.6924
274.9098
238.2306
209.9038
315.7901
269.0344
233.7412
206.3833
308.2286
263.3897
229.4118
202.9759
300.9890
257.9638
225.2346
199.6764
Chapter 4.4.3
The Fibonacci numbers are calculated by the following relation: Fn = Fn−1 + Fn−2 With
F1 = F2 = 1
• Compute the first 50 Fibonacci numbers; store the results in a vector.
• For large n, the ratio Fn/Fn-1 approaches the ”golden mean”
• What is the value of F50/F49; is it equal to the golden mean?
• When is n large enough? (i.e. sufficiently close (<1e-6) to the golden mean)
>
>
>
>
Fibo<-vector()
Fibo[1:2]<-1
for (i in 3:50) Fibo[i]<-Fibo[i-1]+Fibo[i-2]
(1+sqrt(5))/2
[1] 1.618034
> Fibo[50]/Fibo[49]
[1] 1.618034
> Fibo[2:50]/Fibo[1:49]- (1+sqrt(5))/2
[1]
[5]
[9]
[13]
[17]
[21]
[25]
[29]
[33]
[37]
[41]
[45]
[49]
-6.180340e-01
-1.803399e-02
-3.869299e-04
-8.237677e-06
-1.753498e-07
-3.732537e-09
-7.945178e-11
-1.691314e-12
-3.597123e-14
-8.881784e-16
0.000000e+00
0.000000e+00
0.000000e+00
3.819660e-01
6.966011e-03
1.478294e-04
3.146529e-06
6.697766e-08
1.425702e-09
3.034772e-11
6.459278e-13
1.376677e-14
2.220446e-16
0.000000e+00
0.000000e+00
-1.180340e-01
-2.649373e-03
-5.646066e-05
-1.201865e-06
-2.558319e-08
-5.445699e-10
-1.159184e-11
-2.466916e-13
-5.329071e-15
-2.220446e-16
0.000000e+00
0.000000e+00
4.863268e-02
1.013630e-03
2.156681e-05
4.590718e-07
9.771908e-09
2.080072e-10
4.427569e-12
9.414691e-14
1.998401e-15
0.000000e+00
0.000000e+00
0.000000e+00
12
Using R for scientific computing - ANSWERS
Chapter 4.4.5 - nematode diversity all stations
• Make a function that will calculate the diversity indices for any data matrix.
• Calculate and show diversity on species level
>
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
>
Diversity <- function (Dens,
S=100)
# density, each column a station
# common number of individuals on which
# to estimate expected number of species
{
nstat <- NCOL(Dens)
# number of stations
if(is.vector(Dens)) Dens <- matrix(ncol=nstat,Dens)
div
<- matrix(nrow=nstat,ncol=6,data=NA)
# create matrix for results
rownames(div) <- colnames(Dens)
colnames(div) <- c("N","N0","N1","N2","Ninf",paste("ESS",S,sep=""))
for (i in 1:nstat)
{
dens<- Dens[,i]
dens<- dens[dens>0] # selection of species present
N
<- sum(dens)
# N, total density
p
<- dens/N
# relative proportion
N0 <- length(p)
# N0 = number of species present
N1 <- exp(sum(-p*log(p)))
# N1 = exp(Shannon-Wiener)
N2 <- sum(p*p)^(-1)
# Na = sum(sp^a)^(1/(1-a)
Ni <- 1/max(p)
# Ninf
ESS <- N0-1/choose(N,S)*sum(choose(n=(N-dens),k=S))
div[i,] <- c(N,N0,N1,N2,Ni,ESS)
}
return(div)
}
summary(Nemaspec)
# calculate summary characteristics
SPECIES
M160a
M160b
Acantholaimus
: 1
Min.
: 0.000
Min.
: 0.000
Acantholaimus elegans: 1
1st Qu.: 0.000
1st Qu.: 0.000
Acantholaimus iubilus: 1
Median : 0.000
Median : 0.000
Acantholaimus M1
: 1
Mean
: 1.226
Mean
: 1.487
Acantholaimus M10
: 1
3rd Qu.: 0.000
3rd Qu.: 1.341
Acantholaimus M11
: 1
Max.
:182.113
Max.
:30.982
(Other)
:464
M280a
M280b
M530a
M530b
Min.
: 0.000
Min.
: 0.000
Min.
: 0.000
Min.
: 0.0000
1st Qu.: 0.000
1st Qu.: 0.000
1st Qu.: 0.000
1st Qu.: 0.0000
Median : 0.000
Median : 0.000
Median : 0.000
Median : 0.0000
Mean
: 1.143
Mean
: 1.047
Mean
: 0.849
Mean
: 0.8553
Karline Soetaert
3rd Qu.: 1.183
Max.
:54.031
M820a
Min.
: 0.0000
1st Qu.: 0.0000
Median : 0.0000
Mean
: 0.9234
3rd Qu.: 0.0000
Max.
:78.3521
M990b
Min.
: 0.000
1st Qu.: 0.000
Median : 0.000
Mean
: 1.217
3rd Qu.: 1.213
Max.
:51.271
3rd Qu.: 1.121
Max.
:33.500
3rd Qu.: 0.000
Max.
:45.938
M820b
Min.
: 0.0000
1st Qu.: 0.0000
Median : 0.0000
Mean
: 0.9383
3rd Qu.: 0.0000
Max.
:38.0481
M1220a
Min.
: 0.0000
1st Qu.: 0.0000
Median : 0.0000
Mean
: 0.7213
3rd Qu.: 0.0000
Max.
:37.5895
13
3rd Qu.: 0.0000
Max.
:29.2128
M990a
Min.
: 0.0000
1st Qu.: 0.0000
Median : 0.0000
Mean
: 0.9596
3rd Qu.: 1.1771
Max.
:47.8478
M1220b
Min.
: 0.0000
1st Qu.: 0.0000
Median : 0.0000
Mean
: 0.7085
3rd Qu.: 0.0000
Max.
:25.6069
> # remove species names
> (divspec<-Diversity(Nemaspec[,-1]))
N
M160a 576
M160b 699
M280a 537
M280b 492
M530a 399
M530b 402
M820a 434
M820b 441
M990a 451
M990b 572
M1220a 339
M1220b 333
N0
97
126
148
140
107
105
102
115
121
148
106
96
N1
27.78279
90.15358
83.57356
87.18598
61.03511
66.41245
47.47238
60.48626
72.89122
88.37314
65.53017
63.58659
N2
8.364525
66.778414
43.779249
51.988253
29.166239
44.308181
20.190796
33.178177
40.725866
47.535985
37.188517
41.009214
Ninf
3.162870
22.561569
9.938717
14.686371
8.685560
13.761081
5.539098
11.590593
9.425726
11.156321
9.018488
13.004317
ESS100
40.50232
60.68971
59.44306
61.17590
54.97142
54.25870
48.29562
52.14167
57.11996
61.16281
55.97925
54.90511
Chapter 4.4.6 - rarefaction diversity An alternative way of estimating the number of species
per 100 individuals is by taking random ’subsamples’ of 100 individuals and estimating the
number of species from this subsample.
>
>
>
>
>
>
>
dens <- Nemaspec[,2]
dens <- dens[dens>0] # selection of species present
cs
<- round(dens) # rarefaction method can only work with integer numbers
ind <- NULL
# individual organisms; each one belonging to a species
for (i in 1:length(cs)) ind <- c(ind,rep(i,times=cs[i]))
ind100 <-sample(ind,size=100)
# take 100 random individuals
Spec
<-table(ind100)
# table of counts: speciesnr versus nr ind
14
Using R for scientific computing - ANSWERS
> ESS100 <-length(Spec)
# length of Spec = number of species
> # or, three sentences combined in 1!
> length(table(sample(ind,size=100)))
[1] 39
> ESS100 <- vector()
> for (i in 1:1000) ESS100[i] <- length(table(sample(ind,size=100)))
> mean (ESS100)
[1] 40.512
Karline Soetaert
Chapter 5 - statistics
• Perform a hierarchic clustering of the Nemaspec dataset and plot the dendrogram
• Perform a principal component analysis (PCA) and plot the results
• repeat the PCA analysis, with the first two stations removed
>
>
>
>
>
>
>
>
>
>
nemaspec <- Nemaspec[,-1]
hc <- hclust(dist(t(nemaspec)), "ave")
par(mfrow=c(2,2))
plot(hc)
plot(hc, hang = -1)
x <- prcomp(t(nemaspec))
biplot(x)
x2 <- prcomp(t(nemaspec[,-(1:2)]))
biplot(x2)
par(mfrow=c(1,1))
15
16
Using R for scientific computing - ANSWERS
Cluster Dendrogram
100
M160a
M160b
M1220a
M1220b
M990a
M990b
M820a
M820b
M280a
M280b
M530a
M530b
0
Height
M160a
M160b
M1220a
M1220b
M990a
M990b
M820a
M820b
M280a
M280b
M530a
M530b
−50
50
−40
−20
0
20
−0.4
PC1
0.0
0.4
Var 405
M530a
−0.6
−0.2
PC1
Figure 1: Cluster analysis and PCA of nematode data
20
0
−20
0.2
−0.2
PC2
M280b
M1220b
M990b
Var 24
Var 301 M1220a
305
M280aVar 300Var
M990a
VarVar
428
Var
151
Var
Var 293
Var
28
319
Var
401309
Var
307
Var
294
Var
318
349
465
44
Var
308
450
Var
404
20
Var
4
Var
78
254
Var
402
330
Var
144
Var
18
236
259
Var
374
250
446
91
39
433
72
270
444
Var
440
103
Var
303
180
312
141
240
327
Var
456
90
302
381
341
Var
251
224
Var
109
1
Var
92
137
66
42
452
100
235
Var
112
84
262
348
352
Var
16
222
247
Var
340
Var
273
367
V
ar
17
283
383
154
2
447
357
36
Var
248
68
Var
271
297
376
160
13
Var
285
192
Var
21
194
34
364
253
178
375
38
Var
86
181
56
249
257
163
83
410
Var
Var
278
234
Var
111
9
267
387
350
414
5
12
118
334
468
57
363
69
89
147
134
49
187
454
280
398
61
195
413
Var
463
33
6
457
393
150
23
320
184
101
146
260
218
117
127
157
67
252
156
196
338
102
126
158
170
205
239
261
362
453
458
344
462
420
217
191
185
268
279
288
324
358
390
434
190
430
436
64
388
441
171
161
416
389
132
179
372
373
115
284
331
46
82
382
385
451
95
114
223
116
231
384
58
59
445
265
41
47
332
329
337
313
466
423
310
272
321
415
108
153
379
431
432
96
97
421
275
311
131
426
412
177
125
204
229
258
461
85
133
215
186
216
355
407
449
31
53
94
314
145
7
162
214
246
3
183
448
104
105
106
113
128
129
159
168
169
175
207
208
209
210
213
226
233
245
263
264
287
289
290
291
292
295
296
306
325
326
343
346
353
359
360
361
365
366
368
369
370
386
391
392
396
397
403
406
435
455
459
460
469
37
45
50
51
54
65
76
79
80
81
98
8
107
172
174
211
212
274
411
427
193
77
328
242
380
439
135
149
198
304
206
354
241
Var
316 143
140
70
32
119
52
203
124
317
244
Var
276
136
148
164
345
470
155
429
322
25
123
438
342
377
378
443
315
93
121
122
22
142
221
Var
286
437
227
176
Var
173
418
152
417
14
225
339
281
356
130
63
26
60
Var
110
299
120
197
464
298
Var
55
Var
238
201
202
467
Var
199
200
269
62
75
Var
237
166
371
323
347
335
88
35
48
30
333
Var
230
Var
219
189
351
Var
188
74
243
256
19
422
Var
40
409
282
Var
255
Var
15
228
Var
425
167
220
10
71
139
73
424
232
408
Var
266
Var
442
Var 394
165
87
VarVar
395
Var
43
Var
29
Var
277
M820b
11
Var
419
Var
27
99
Var
182
Var
138
Var
336
Var 400
Var 399 M820a
M530b
−0.6
−0.8
M160b
−0.8
−50
M820b
Var 400
M990b
M990a
Var309
394
Var
Var
399
M1220a
401
Var
251
Var 334
Var
143
Var
29
28
285
Var
336
24
364
Var
290
42
141
305
167
189
Var
35
307
Var
2
316
10
418
55
Var
108
14
17
320
323
151
Var
266
92
416
363
318
380
420
299
225
265
224
165
144
152
67
12
Var
263
329
112
250
197
162
322
90
236
349
415
242
13
154
222
33
310
247
327
330
49
343
248
Var
161
11
315
140
449
249
193
160
381
232
328
73
286
437
319
25
292
111
277
26
465
44
360
89
118
414
91
119
335
451
231
429
183
280
350
195
244
155
456
88
136
148
164
345
470
246
463
338
344
34
6
271
128
207
233
289
392
191
337
174
214
462
81
15
Var
291
325
374
422
321
194
187
412
227
37
50
76
48
132
179
372
272
70
3
175
209
361
366
460
135
149
198
241
304
375
217
116
384
387
356
282
408
58
59
133
145
186
215
216
355
407
150
134
31
53
94
41
47
426
442
219
7
296
69
342
147
332
115
284
331
385
173
218
423
82
95
84
Var
203
114
117
464
376
23
176
373
206
417
314
351
20
156
190
196
430
436
201
202
298
199
200
181
410
62
75
243
19
130
409
237
9
413
393
297
457
125
131
204
229
258
461
74
326
311
153
379
431
432
110
96
97
68
123
177
377
378
438
443
234
352
5
121
122
142
221
313
421
466
192
22
60
66
228
382
371
397
185
268
279
288
324
358
390
434
163
64
333
102
126
158
170
205
239
261
362
453
458
388
63
101
146
260
127
441
107
252
276
120
124
238
389
398
454
184
98
103
468
166
57
61
223
257
54
38
27
30
253
188
80
36
357
137
106
129
169
287
467
452
267
340
368
270
439
427
21
18
281
100
445
283
308
367
419
83
Var
109
254
32
341
1
220
448
425
240
85
Var
353
113
469
213
226
264
346
370
455
459
180
8
211
212
274
411
273
56
312
139
99
208
306
235
433
303
259
46
210
245
295
359
365
369
386
391
435
M1220b
269
138
157
Var
79
105
171
93
278
51
172
Var
4
275
402
86
354
Var
406
Var
159
348
424
178
294
45
52
M530a
347
444
Var
383
256
404
Var
43
77
447
262
300
168
39
Var
405
71
440
339
302
Var
255
182
446
65
317
Var
396
Var
403
428
104
450
230
Var
87
72
Var
40
Var
16
Var
301
293
Var
395
M280a
M530b
M280b
−150
M160a
Var 78
50
M820a
−0.4
0.0
0.4
−150
dist(t(nemaspec))
hclust (*, "average")
−40
100
0
dist(t(nemaspec))
hclust (*, "average")
PC2
Height
200
200
Cluster Dendrogram
0.2
Karline Soetaert
17
Chapter 6 - graphics
• Create a script file which draws a curve of the function y = x3 sin2 (3πx) in the interval
[-2 , 2].
• Make a curve of the function y = 1/cos(1 + x2 ) in the interval [-5,5].
• The relative importance of ammonia
p[N H3 ] =
KN
KN + [H + ]
• Plot the relative fraction of toxic ammonia to the total ammonia concentration as a
function of pH, where pH = -log10([H+]) and for a temperature of 30◦ C. Use a range
of pH from 4 to 9. The value of KN is 810−10 at a temperature of 30◦ C.
• Add to this plot the relative fraction of ammonia at 0◦ C; the value of KN at that
temperature is 810−11 molkg −1 .
• For the US, the population density in 1900 (N0) was 76.1 million; the population growth
can be described as:
K
N (t) =
K−Nt0 −a·(t−t0)
1 + [ Nt0 ]e
a=0.02 yr-1, K = 500 million of people.
Actual population values are:
1900 1910 1920 1930 1940 1950 1960 1970 1980
76.1 92.4 106.5 123.1 132.6 152.3 180.7 204.9 226.5
• Plot the population density curve as a thick line, using the US parameter values.
• Add the measured population values as points. Finish the graph with titles, labels etcĚ
>
>
>
>
>
>
>
>
>
>
>
>
>
>
+
>
par(mfrow=c(2,2))
# simple curves
curve(x^3*sin(3*pi*x)^2,-2,2)
curve(1/cos(1+x^2),-5,5)
# ammonia
pN <- function(pH,Kn=8*10^-10) Kn/(Kn+10^-pH)
curve(pN(x),4,9,main="fraction toxic ammonium")
curve(pN(x,Kn=8*10^-11),4,9,add=TRUE,col="red")
legend("topleft",lty=1,col=c("black","red"),c("30 dg","0 dg"))
# US population
K <- 500
N0 <- 76.1
a <- 0.02
curve(K/(1+((K-N0)/N0*exp(-a*(x-1900)))),1900,1980,main="US population",
xlab="year",ylab= "million",lwd=2)
N <- matrix(ncol=2,data=c(
18
Using R for scientific computing - ANSWERS
+ seq(1900,1980,by=10), 76.1,92.4,106.5,123.1,132.6,152.3,180.7,204.9,226.5
+
) )
> points(N)
• Have a look at the iris data; What is the class and dimension of the data set?
• Produce a scatter plot of petal length against petal width
• Repeat the same graph, using different symbol colours for the three species.
• Create a box-and whisker plot for sepal length where the data values are split into
species groups
• Now produce a similar box-and whisker plot for all four morphological measurements,
arranged in two rows and two columns.
> head(iris)
1
2
3
4
5
6
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
5.1
3.5
1.4
0.2 setosa
4.9
3.0
1.4
0.2 setosa
4.7
3.2
1.3
0.2 setosa
4.6
3.1
1.5
0.2 setosa
5.0
3.6
1.4
0.2 setosa
5.4
3.9
1.7
0.4 setosa
> class(iris)
[1] "data.frame"
> dim(iris)
[1] 150
5
>
>
+
>
+
+
>
+
>
par(mfrow=c(2,2))
plot(iris$Petal.Length,iris$Petal.Width,cex=1.5,pch=15,
xlab="Petal length", ylab=" Petal width")
plot(iris$Petal.Length,iris$Petal.Width,cex=1.5,pch=15,
xlab="Petal length", ylab=" Petal width",
col=c("red","blue","green")[iris$Species])
legend("bottomright",pch=15,col=c("red","blue","green"),
legend=levels(iris$Species))
boxplot(Petal.Width~Species,data=iris)
>
>
>
>
>
>
par(mfrow=c(2,2))
boxplot(Sepal.Length~Species, data=iris,main="sepal length")
boxplot(Sepal.Width~Species, data=iris,main="sepal width")
boxplot(Petal.Length~Species, data=iris,main="petal length")
boxplot(Petal.Width~Species, data=iris,main="petal width")
mtext(outer=TRUE,side=3,line=-2,"Iris data set",cex=1.5)
5
−15
−5
1/cos(1 + x^2)
−2
−1
0
1
2
−4
−2
0
2
4
x
fraction toxic ammonium
US population
●
0.2
million
200
30 dg
0 dg
●
●
150
0.4
x
●
100
●
0.0
pN(x)
19
15
6
4
2
−2
−6
x^3 * sin(3 * pi * x)^2
Karline Soetaert
●
●
●
●
4
5
6
7
x
8
9
1900
1920
1940
year
Figure 2: Use of R-function curve to plot simple functions
1960
1980
1.5
Petal width
setosa
versicolor
virginica
0.5
1.5
0.5
1
2
3
4
5
6
7
1
1.5
●
●
setosa
2
3
4
5
Petal length
2.5
Petal length
0.5
Petal width
2.5
Using R for scientific computing - ANSWERS
2.5
20
virginica
Figure 3: The iris data set
6
7
Karline Soetaert
21
Iris data set
sepal width
5.5
3.0
6.5
4.0
7.5
sepal length
●
2.0
4.5
●
setosa
virginica
setosa
petal width
3
4
1.5
5
6
7
2.5
petal length
virginica
1
2
0.5
●
●
●
●
setosa
virginica
setosa
Figure 4: The iris data set
virginica
22
Using R for scientific computing - ANSWERS
Chapter 7 - matrix algebra
Chapter 7.1.1
• Create matrices called ”A” and ”B”:


1 2 3
A= 6 4 1 
−2 1 −1


1 4 7
B= 2 5 8 
3 6 9
• Take the inverse of A and the transpose of A.
• Multiply A with B.
• Estimate the eigenvalues and eigenvectors of A.
• For a matrix A, x is an eigenvector, and ? the eigenvalue of a matrix A, if Ax =?x. Test
it!
> A <- matrix(nrow=3, data=c(1,6,-2,2,4,1,3,1,-1))
> B <- matrix(nrow=3, data=1:9)
> solve(A); t(A)
[,1]
[,2]
[,3]
[1,] -0.11111111 0.1111111 -0.2222222
[2,] 0.08888889 0.1111111 0.3777778
[3,] 0.31111111 -0.1111111 -0.1777778
[1,]
[2,]
[3,]
[,1] [,2] [,3]
1
6
-2
2
4
1
3
1
-1
> A%*%B
[1,]
[2,]
[3,]
[,1] [,2] [,3]
14
32
50
17
50
83
-3
-9 -15
> eigen(A)
$values
[1] 6.366696+0.000000i -1.183348+2.380697i -1.183348-2.380697i
Karline Soetaert
23
$vectors
[,1]
[,2]
[,3]
[1,] -0.36275602+0i -0.0725936-0.5240033i -0.0725936+0.5240033i
[2,] -0.93146469+0i -0.2632991+0.4856291i -0.2632991-0.4856291i
[3,] -0.02795726+0i 0.6441961+0.0000000i 0.6441961+0.0000000i
> ee<-eigen(A)
> A%*%ee$vectors[,1]
[,1]
[1,] -2.3095572+0i
[2,] -5.9303521+0i
[3,] -0.1779954+0i
> ee$values[1]*ee$vectors[,1]
[1] -2.3095572+0i -5.9303521+0i -0.1779954+0i
Chapter 7.1.2 killer whale model
• Create a matrix, called P:


0
0.0043 0.1132
0

 0.9775 0.9111
0
0




0
0.0736 0.9534
0
0
0
0.0452 0.9804
• What is the value of the largest eigenvalue (the so-called dominant eigenvalue)
and the corresponding eigenvector?.
• Create a new matrix, T, which equals P, except for the first row, where the
elements are 0.
• Now estimate N = (I − T )−1 , where I is the identity matrix.
> A<- matrix (nrow=4,data=c(0,
0.9775,0,
0,
+
0.0043,0.9111,0.0736,0,
+
0.1132,0,
0.9534,0.0452,
+
0,0,0,0.9804))
> eigen(A)
$values
[1] 1.025441326 0.980400000 0.834222976 0.004835698
$vectors
[1,]
[2,]
[3,]
[4,]
[,1] [,2]
[,3]
[,4]
0.06634512
0 -0.0659050 0.678780909
0.56718211
0 0.8379894 -0.732135578
0.57945357
0 -0.5175160 0.056807091
0.58149491
1 0.1600233 -0.002631995
24
Using R for scientific computing - ANSWERS
> T <- A
> T[1,] <- 0
> N <- solve(diag(4)-T) ; N
[1,]
[2,]
[3,]
[4,]
[,1]
[,2]
[,3]
[,4]
1.00000 0.00000 0.00000 0.00000
10.99550 11.24859 0.00000 0.00000
17.36628 17.76602 21.45923 0.00000
40.04878 40.97062 49.48761 51.02041
Chapter 7.1.3. System of equations
Solve the following system of linear equations for the unknown xi:
3x1 + 4x2 + 5x3 = 0
6x1 + 2x2 + 7x3 = 5
7x1 + x2 = 6
Check the results
>
>
>
>
A
B
x
A
<- matrix(nrow=3,data=c(3,6,7,4,2,1,5,7,0))
<- c(0,5,6)
<- solve(A,B)
%*% x - B
[,1]
[1,] 5.551115e-16
[2,] 0.000000e+00
[3,] 0.000000e+00
25
0.0
−0.5
●
−1.0
exp(x) − 4 * x^2
0.5
1.0
Karline Soetaert
0.0
0.2
0.4
0.6
0.8
1.0
x
Figure 5: The root of a simple function
Chapter 8 - roots of functions
Chapter 8.3.1 simple root of equations
• Find the root of the equation ex = 4x2 in the interval [0,1]. First draw the function
curve.
• Solve the equation 1000 = y ∗ (3 + x) ∗ (1 + y)4 for y and with x varying over the range
from 1 to 100. Plot the root as a function of x.
> root<-uniroot(f=function(x) exp(x)-4*x^2,interval=c(0,1))
> curve(exp(x)-4*x^2,0,1)
> abline(h=0,lty=2)
> points(root$root,0,pch=16,cex=2)
> res<-vector()
> for (x in 1:100)
+ res[x]<-uniroot (f=function(y) y*(3+x)*(1+y)^4-1000,c(-1000,1000))$root
> plot(1:100,res)
Chapter 8.3.2. pCO2 rises increase acidity
• Estimate the pH at equilibrium with alkalinity 2300 molkg −1 and the current pCO2 of
360 ppm.
• Use package seacarb to estimate the dissociation constants and Henry’s constants at
temperature 20◦ C, salinity 0, and pressure 0.
• Estimate pH as a function of pco2, varying between 200 and 1250
Using R for scientific computing - ANSWERS
2.2
26
●
2.0
●
●
●
1.6
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●●
●●
●●
●●
●●
●●
●●
●●●
●●●
●●●
●●●
●●●
●●●●
●●●●
●●●●
●●●●●
●●●●●
●●●●●
●●●●●●
●●●●●●
●●●●●●●
●●●
0.8
1.0
1.2
1.4
res
1.8
●
0
20
40
60
80
100
1:100
Figure 6: Roots of an equation y=f(x) for a sequence of x-values
• What is the value of pH at pCO2 = 1250
(see lecture notes for formulae)
>
>
>
>
>
+
+
+
+
+
+
+
>
require(seacarb)
k1 <- K1(S=0,T=20,P=0)
k2 <- K2(S=0,T=20,P=0)
kh <- Kh(S=0,T=20,P=0)
nonlinfun <- function(pH,pco2=360,alk=2300e-6)
{
H
<- 10^(-pH)
CO2 <- pco2*kh
HCO3 <- k1*CO2/H
CO3 <- k2*HCO3/H
return( HCO3+2*CO3-H*1.e6 - alk)
}
uniroot(nonlinfun,interval=c(2,12),pco2=360,alk=2300,tol=1e-30)
$root
[1] 8.317286
$f.root
[1] 2.728484e-12
attr(,"unit")
[1] "mol/kg-soln"
attr(,"pH scale")
[1] "total scale"
$iter
[1] 16
Karline Soetaert
27
8.2
7.8
8.0
pHseq
8.4
Effect of pCO2 on pH
200
400
600
800
1000
1200
pco2seq
Figure 7: pH as a function of pCO2
$estim.prec
[1] 3.552714e-15
>
>
>
+
+
>
>
pHseq
<- vector()
pco2seq <-200:1250
for (i in 1:length(pco2seq))
pHseq[i]<-uniroot(nonlinfun,interval=c(2,12),
pco2=pco2seq[i],alk=2300,tol=1e-30)$root
# max drop of pH
pHseq[length(pHseq)]
[1] 7.81045
> plot(pco2seq,pHseq,type="l",lwd=2,main="Effect of pCO2 on pH")
Using R for scientific computing - ANSWERS
●
9
28
●
●
8
●
●
●
●
●
6
●
●
●
●
●
●
5
wind speed
7
●
●
●
●
●
4
●
●
3
●
5
10
15
20
time
Figure 8: The interpolated wind data
Chapter 9 - interpolating, smoothing, curve fitting
Chapter 9.1 interpolating wind data Wind velocities are: 5,6,7,9,4,6,3,7,9 at time 0, 3,
Ě24 o’clock respectively.
• Interpolate the three-hourly measurements to hourly measurements.
• Make a plot of the interpolated values
> t3
<- seq(3,24,by=3)
> wind3 <- c(6,7,9,4,6,3,7,9)
> plot(approx(t3,wind3,xout=3:24),type="b" ,xlab="time",ylab="wind speed")
Chapter 9.2 Fitting primary production data
Primary production (pp) at different light intensities are given; Fit the resulting production estimates (pp), as a function of light intensity (ll) with the 3-parameter Eilers-Peeters
equation. The primary production is calculated as:
pp = p max ·
2 · (1 + β) · I/Iopt
(I/Iopt)2 + 2 · β · I/Iopt + 1
where I is light and pmax, ? and Iopt are parameters.
Add the best-fit line to the graph. (note: use coef to retrieve the best parameter values).
>
>
>
+
>
+
+
>
ll <- c(0.,1,10,20,40,80,120,160,300,480,700)
pp <- c(0.,1,3,4,6,8,10,11,10,9,8)
plot(ll,pp,xlab= expression("light, ţEinst"~ m^{-2}~s^{-1}),
ylab="production",pch=15,cex=1.5)
fit<-nls(pp ~pmax*2*(1+b)*(ll/iopt)/
((ll/iopt)^2+2*b*ll/iopt+1),
start=c(pmax=max(pp),b=0.005,iopt=ll[which.max(pp)]))
summary(fit)
Karline Soetaert
Formula: pp ~ pmax * 2 * (1 + b) * (ll/iopt)/((ll/iopt)^2 + 2 * b * ll/iopt +
1)
Parameters:
Estimate Std. Error t value Pr(>|t|)
pmax 10.4351
0.3171 32.909 7.93e-10 ***
b
1.5998
0.4353
3.676 0.00626 **
iopt 209.6325
15.8052 13.264 9.96e-07 ***
--Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.5445 on 8 degrees of freedom
Number of iterations to convergence: 8
Achieved convergence tolerance: 1.340e-06
> pars <- as.list(coef(fit))
> with(pars,
+ curve(pmax*2*(1+b)*(x/iopt)/((x/iopt)^2+2*b*x/iopt+1),
+
add=TRUE,lwd=2)
)
> title(expression (frac(pmax%*%2%*%(1+beta)%*%I/Iopt,
+
(I/Iopt)^2+2%*%beta%*%I/Iopt+1)),cex.main=0.8)
29
30
Using R for scientific computing - ANSWERS
Formula: pp ~ pmax * 2 * (1 + b) * (ll/iopt)/((ll/iopt)^2 + 2 * b * ll/iopt +
1)
Parameters:
Estimate Std. Error t value Pr(>|t|)
pmax 10.4351
0.3171 32.909 7.93e-10 ***
b
1.5998
0.4353
3.676 0.00626 **
iopt 209.6325
15.8052 13.264 9.96e-07 ***
--Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.5445 on 8 degrees of freedom
Number of iterations to convergence: 8
Achieved convergence tolerance: 1.340e-06
6
0
2
4
production
8
10
pmax × 2 × (1 + β) × I Iopt
(I Iopt)2 + 2 × β × I Iopt + 1
0
100
200
300
400
500
600
700
light, .Einst m−2 s−1
Figure 9: Primary production data with fit
Karline Soetaert
31
Chapter 10 differential equations
Chapter 10.1 Lotka-Volterra model
• Solves the following system of ODEs
dx
dt
dy
dt
x
= a · x · (1 − K
)−b·x·y
=g·b·x·y−e·y
for initial values x=300,y=10 and parameter values: a=0.05, K=500, b=0.0002, g=0.8,
e=0.03
• Make three plots, one for x and one for y as a function of time, and one plot expressing
y as a function of x. Arrange these plots in 2 rows and 2 columns.
• run the model with other initial values (x=200, y=50); add the (x,y) trajectories to the
phase-plane plot
>
>
+
+
+
+
+
+
+
+
+
>
>
>
>
>
>
>
>
require(deSolve)
model <- function (time,VAR,pars)
{
with (as.list(c(VAR,pars)), {
# the rate of change of the state variables
dx
<- a*x*(1-x/K)-b*x*y
dy
<- g*b*x*y
- e*y
return(list(c(dx,dy)))
})
}
pars <- c(a=0.05,b=0.0002,K=500,g=0.8,e=0.03)
VAR
<- c(x=300,y=10)
times <- seq(0,1000,1)
out
<- as.data.frame(lsoda(VAR,times,model,pars))
plot(out$x,out$y,type="l")
VAR
<- c(x=200,y=50)
out2 <- as.data.frame(lsoda(VAR,times,model,pars))
lines(out2$x,out2$y,lty=2)
Chapter 10.2 Lorenz Butterfly
Solve the Lorenz equations:
dx
dt
dy
dt
dz
dt
= − 83 · x + y · z
= −10 · (y − z)
= −x · y + 28y − z
Use as initial conditions x=y=z=1; create output for a time sequence ranging from 0 to 100,
and with a time step of 0.005.
> require(scatterplot3d)
> model<-function(t,state,parameters)
Using R for scientific computing - ANSWERS
out$y
50
100
150
200
32
150
200
250
300
350
400
out$x
Figure 10: Result of the lotka-volterra model
+
+
+
+
+
+
+
+
+
+
>
>
>
>
+
{
with(as.list(c(state)),{
dx
dy
dz
<- -8/3*x+y*z
<- -10*(y-z)
<- -x*y+28*y-z
list(c(dx,dy,dz))
})
} # end of model
state <-c(x=1, y=1, z=1)
times <-seq(0,100,0.005)
out
<-as.data.frame(lsoda(state,times,model,0))
scatterplot3d(out$x,out$y,out$z,type="l",
main="Lorenz butterfly",ylab="",grid=FALSE,box=FALSE)
Karline Soetaert
33
0
−10
out$z
10
20
30
Lorenz butterfly
20
−20
10
0
−30
−10
−20
0
10
20
30
40
50
out$x
Figure 11: Results of the Lorenz model
34
Using R for scientific computing - ANSWERS
References
Soetaert K (2008). using R for scientific computing. NIOO-CEME, Yerseke.
Affiliation:
Karline Soetaert
Centre for Estuarine and Marine Ecology (CEME)
Netherlands Institute of Ecology (NIOO)
4401 NT Yerseke, Netherlands E-mail: [email protected]
URL: http://www.nioo.knaw.nl/users/ksoetaert

Download Report

Using R for scientific computing - ANSWERS - D-MATH

Paperzz.com

Your Paperzz