2. Data sets
Sihua Peng, PhD
Shanghai Ocean University
2016.10
1
Contents
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
Introduction to R
Data sets
Introductory Statistical Principles
Sampling and experimental design with R
Graphical data presentation
Simple hypothesis testing
Introduction to Linear models
Correlation and simple linear regression
Single factor classification (ANOVA)
Nested ANOVA
Factorial ANOVA
Simple Frequency Analysis
2
R Function
Each function performs a specific
function, followed by brackets, for
example:
mean(): average value
sum(): Summation
plot(): Plotting
sort(): Sorting
log(): log2; log10(): log10 ; exp(); sin(); cos():
3
R Packages
What is R package?
The R package is a collection of functions
with detailed descriptions and examples.
Each package contains R functions, data,
help files, description files, and so on.
4
R Packages
How to install the package?
R package is an extension of the R
function, the specific analysis
functions, the need to use the
corresponding package to achieve.
For example: phylogenetic analysis, commonly
used to ape package, community ecology vegan
package.
5
R Packages
Common R packages (1)
Name
Description
ade4
Using the Euclidean method to analyze the ecological data
ape
Phylogeny and evolutionary analysis
apTreeshape Phylogenetic tree analysis
cluster
Cluster analysis
geiger
Species Formation Rate and Evolutionary Analysis
ouch
Phylogenetic comparison
pgirmess
Ecological data analysis
6
R Packages
Common R packages(2)
Name
Description
phangorn Phylogenetic analysis
picante Analysis of phylogenetic diversity of community
seqinr
DNA sequence analysis
SDMTools Species distribution model tool
vegan
Plant and plant community sequencing, and
biodiversity calculation
Graphics Plotting figures
lattice Lattice
7
R Packages
How to install
a package?
Install “ade4”
8
Using Packages
Packages in the function must be first imported, and
then can be used, so the importing the package is the
first step.
In the console, enter the following command:
library(vegan)
The functions within a package are used just like
the basic functions built into R.
9
Data frames: An example
10
Data frames: An example
Firstly, generate the three variables (excluding the site
labels as they are not variables) separately:
> HABITAT <- factor(c("Mixed", "Gipps.Manna",
"Gipps.Manna", "Gipps.Manna", "Mixed", "Mixed",
"Mixed", "Mixed"))
> GST <- c(3.4, 3.4, 8.4, 3, 5.6, 8.1, 8.3, 4.6)
> EYR <- c(0, 9.2, 3.8, 5, 5.6, 4.1, 7.1, 5.3)
11
Data frames: An example
Next, use the names of the vectors as arguments in the
data.frame() function to amalgamate the three separate
variables into a single data frame (data set) which we will
call MACNALLY.
> MACNALLY <- data.frame(HABITAT, GST, EYR)
12
Data frames: An example
Notice that each vector (variable)
becomes a column in the data frame
and that each row represents a single
sampling unit.
By default, the rows are named using
numbers corresponding to the
number of rows in the data frame.
However, these can be altered to
reflect the names of the sampling
units by assigning a list of alternative
names to the row.names() property of
the data frame.
13
Data frames: An example
> row.names(MACNALLY) <- c("Reedy Lake", "Pearcedale",
"Warneet", "Cranbourne", "Lysterfield", "Red Hill",
"Devilbend", "Olinda")
14
Access the data in a data frame
MACNALLY$HABITAT access the Column 1
MACNALLY$GST
access the Column 2
MACNALLY$EYR
access the Colum 3
First row
Third column
MACNALLY[1,]
MACNALLY[,3]
MACNALLY[3,2]
Element of third row and second column
i=1:4; MACNALLY[i,]
MACNALLY[,2:3]
rows from 1 to 4
cloumns from 2 to 3
15
Importing (reading) data
> MACNALLY <- read.table(
+ 'macnally.csv', header=T,
+ row.names=1, sep=‘,')
> MACNALLY <- read.table(
+ 'macnally.txt', header=T,
+ row.names=1, sep='\t')
16
Reviewing a data frame - fix()
A data frame can also be viewed as a simple
spreadsheet in a separate window by using the
name of the data frame as an argument in the fix()
function.
The fix() function also enables simple editing of the
data frame.
>fix(MACNALLY)
17
Saving and loading of R objects
Any object in R (including data frames) can also be saved
into a native R workspace image file (*.RData) either
individually, or as a collection of objects using the save()
function. For example;
> save(MACNALLY, file='macnally.RData')
The saved object(s) can be loaded during subsequent
sessions by providing the name of the saved workspace
image file as an argument to the load() function. For
example;
> load("macnally.RData")
18
Exporting (writing) data
The write.table() function is used
to save data frames.
> write.table(MACNALLY, "macnally.csv",
quote = F, row.names = T, sep = ",")
19
Dummy data sets
- generating random data
Normal
> # generate 5 random numbers from a normal
> # distribution with a mean of 10 and a standard
> # deviation of 1
> rnorm(5,mean=10,sd=1)
[1] 11.564555 9.732885 8.357070 8.690451 12.272846
Log-Normal
> # generate 5 random numbers from a log-normal
> # distribution whose logarithm has a mean of 2 and a
> # standard deviation of 1
> rlnorm(5,mean=2,sd=1)
[1] 8.157636 30.914781 20.175299 5.071559 16.364014
20
Dummy data sets
- generating random data
Poisson
> # generate 5 random numbers from a Poisson
> # distribution with a lambda parameter of 4
> rpois(5,min=1,max=10)
[1] 4 4 2 6 1
Binomial
> # generate 5 random numbers from a binomial
> # distribution based on 10 Bernoulli trials and
> # a prob. of 0.5
> rbinom(5,size=10,prob=.5)
[1] 4 4 1 4 6
21
Manipulating data sets
Subsets of data frames – data frame indexing
> #extract all the bird densities from sites that have
GST values greater than 3
> subset(MACNALLY, GST>3)
22
The %in% matching operator
Subset the MACNALLY dataset according to those
rows that correspond to HABITAT 'Montane Forest' or
'Foothills Woodland'
> MACNALLY[MACNALLY$HABITAT %in%
c("Montane Forest", "Foothills Woodland"),]
23
Sorting datasets
> MACNALLY[order(MACNALLY$HABITAT,
MACNALLY$GST), ]
24
25
© Copyright 2026 Paperzz