Using R for Elementary Statistics Part 1

.
.
Using R for Elementary Statistics
Part 1
Chia-hung Tsai
Election Study Center, Graduate Institute of East Asian Studies
NCCU
July 16, 2014
..
CHT (ESC and GIEAS)
IPM R course, part 1
.
..
.
..
.
. . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. ..
July 16, 2014
.
..
.
..
.
..
.
..
1 / 42
.
Strength of R
. Learnign statistics while you’re coding.
2. Lots of canned programs with PDF documentation on internet.
AT X; easy to write scientific notations and to
3. Integration with L
E
make elegant tables.
4. Analyzing multiple datasets in one workplace.
5. Storing datasets, functions, and results in a workplace.
1
..
CHT (ESC and GIEAS)
IPM R course, part 1
.
..
.
..
.
. . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. ..
July 16, 2014
.
..
.
..
.
..
.
..
2 / 42
.
The R Console
..
CHT (ESC and GIEAS)
IPM R course, part 1
.
..
.
..
.
. . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. ..
July 16, 2014
.
..
.
..
.
..
.
..
3 / 42
.
The R Script
..
CHT (ESC and GIEAS)
IPM R course, part 1
.
..
.
..
.
. . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. ..
July 16, 2014
.
..
.
..
.
..
.
..
4 / 42
.
RStudio
..
CHT (ESC and GIEAS)
IPM R course, part 1
.
..
.
..
.
. . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. ..
July 16, 2014
.
..
.
..
.
..
.
..
5 / 42
.
Hashes
..
CHT (ESC and GIEAS)
IPM R course, part 1
.
..
.
..
.
. . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. ..
July 16, 2014
.
..
.
..
.
..
.
..
6 / 42
.
Working Directory
..
CHT (ESC and GIEAS)
IPM R course, part 1
.
..
.
..
.
. . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. ..
July 16, 2014
.
..
.
..
.
..
.
..
7 / 42
.
Loading Libraries
..
CHT (ESC and GIEAS)
IPM R course, part 1
.
..
.
..
.
. . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. ..
July 16, 2014
.
..
.
..
.
..
.
..
8 / 42
.
Reading Dataset
..
CHT (ESC and GIEAS)
IPM R course, part 1
.
..
.
..
.
. . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. ..
July 16, 2014
.
..
.
..
.
..
.
..
9 / 42
.
Running a Model
..
CHT (ESC and GIEAS)
IPM R course, part 1
.
..
.
..
.
. . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. ..
July 16, 2014
.
..
.
..
.
..
.
..
10 / 42
.
Saving the Results
..
CHT (ESC and GIEAS)
IPM R course, part 1
.
..
.
..
.
. . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. ..
July 16, 2014
.
..
.
..
.
..
.
..
11 / 42
.
R environment
In R, you can save objects in the .Rdata.
> brr<-rnorm(3,0,1)
> brr
[1] 1.2405116 0.6713496 -1.3300341
> save.image("/Users/Apple/practice.Rdata")
You can load .Rdata and change the previous objects.
> load("/Users/Apple/practice.Rdata")
R allows users to set up the working directory.
> setwd("/Users/Apple/Dropbox")
..
CHT (ESC and GIEAS)
IPM R course, part 1
.
..
.
..
.
. . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. ..
July 16, 2014
.
..
.
..
.
..
.
..
12 / 42
.
Packages, Functions, and Data
There are many packages to download, such as arm, UsingR,· · ·
You can use install.packages("") to install packages, and
call the package by typing library().
> install.packages("car")
> library(car)
search() will show how many packages you have called.
To call the data embeded in packages, type data()
You can show the functions in package by typing, for example,
ls("package:car)
Every package has PDF document on website, or put command
like ?car
..
CHT (ESC and GIEAS)
IPM R course, part 1
.
..
.
..
.
. . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. ..
July 16, 2014
.
..
.
..
.
..
.
..
13 / 42
.
Using data
After calling the data, you can check out the variables by typing
head()
> library(car) ; data(USPop)
> head(USPop)
year population
1 1790
3.929214
2 1800
5.308483
3 1810
7.239881
4 1820
9.638453
5 1830 12.860702
6 1840 17.063353
To examine one of the variables or part of it, use $ to indicate
the dataset:
> USPop$population[1:5]
[1] 3.929214 5.308483 7.239881 9.638453 12.860702
..
CHT (ESC and GIEAS)
IPM R course, part 1
.
..
.
..
.
. . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. ..
July 16, 2014
.
..
.
..
.
..
.
..
14 / 42
.
Using data
If you want to know the meaning of variables, type ?Nile
table() can show the frequencies of the variable.
> data(mtcars)
> table(mtcars$am)
0 1
19 13
colnames() will show the names of variables, and rownames()
returns the names of each row.
> colnames(mtcars)
[1] "mpg" "cyl" "disp" "hp"
"drat" "wt"
"qsec"
[11] "carb"
> rownames(mtcars)[1:5]
[1] "Mazda RX4"
"Mazda RX4 Wag"
"Datsun 71
[4] "Hornet 4 Drive"
"Hornet Sportabout"
..
CHT (ESC and GIEAS)
IPM R course, part 1
.
..
.
..
.
. . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. ..
July 16, 2014
.
..
.
..
.
..
.
..
15 / 42
.
Excercise 1
.
2.
3.
4.
1
Please download the package MASS.
Please point out the data sets with the package.
Call data Boston and use head(Boston) to show the variables.
Examine the mean of one of the variables, for example, age.
..
CHT (ESC and GIEAS)
IPM R course, part 1
.
..
.
..
.
. . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. ..
July 16, 2014
.
..
.
..
.
..
.
..
16 / 42
.
Arithmetic operation
R allows us to operate arithmetic.
>
103+148-52*.75
[1] 212
> 2^3
[1] 8
> (12-7.5)/40
[1] 0.1125
(
)
3
=
2
> choose(3,2)
[1] 3
..
CHT (ESC and GIEAS)
IPM R course, part 1
.
..
.
..
.
. . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. ..
July 16, 2014
.
..
.
..
.
..
.
..
17 / 42
.
Object Assignment
We use ”<-” to assign objects.
> x<-c(1,2,3,100,50); x
[1]
1
2
3 100
50
> y<-c("Europe","Africa","Asia"); y
[1] "Europe" "Africa" "Asia"
> c(10.4, 5.6, 3.1, 6.4, 21.7) -> x ; x
[1] 10.4
5.6
3.1
6.4 21.7
..
CHT (ESC and GIEAS)
IPM R course, part 1
.
..
.
..
.
. . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. ..
July 16, 2014
.
..
.
..
.
..
.
..
18 / 42
.
Vector arithmetic
We can write commands to do arithmetics or∑use default
functions.
(X −X̄ )2
For example, we want to compute var (x) = n−1 and standard
deviation.
>
x<-c(2,6,10,1,3,15,4)
> mean(x)
[1] 5.857143
> squaremean<-(x-mean(x))^2
> sum(squaremean)/(length(x)-1)
[1] 25.14286
> sd(x)
[1] 5.014265
..
CHT (ESC and GIEAS)
IPM R course, part 1
.
..
.
..
.
. . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. ..
July 16, 2014
.
..
.
..
.
..
.
..
19 / 42
.
Probability
If we have n objects and we want to pick k < n from them with
replacement, the number of choices is
n × n × · · · n = nk
If we pick k objects without replacement,the counting rule is
n!
n × (n − 1) × (n − 2)×, · · · , ×(k + 1) × k = (n−k)!
We may pick unordered k objects without replacement. For
example, we can find 3 possible combinations if we pick 2
students among 3. The counting rule is
(
n
k
)
=
n!
3!
=
=3
k!(n − k)!
2!1!
We can use choose() to do this computation.
> choose(4,2)
[1] 6
..
CHT (ESC and GIEAS)
IPM R course, part 1
.
..
.
..
.
. . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. ..
July 16, 2014
.
..
.
..
.
..
.
..
20 / 42
.
Exercise 2
. 23 + 33 + 43 =?
)
(
)
(
.2 Pr(y<3|10,0.7)= 10 0.70 · 0.310−0 + 10 0.71 · 0.310−1 +
1
0
(
)
10
0.72 · 0.310−2 =?
2
3. log( 14 ) =?
5
4. 1 + 1 + 1 + 1 + 1 =?
2
4
8
16
5. 1 × 2 × 3×, · · · , ×8 =?
1.5/3.7
6.
=?
103/210
7. y={10, 15, 21, 28, 30, 40, 100}, ȳ =?
8. Find DIPC101 −DIPC100 , where DIPC
101 = 537, DIPC100 = 485
DIPC100
1
.9 Find
, b=1.3
1+exp(b)
. Find ∑1 2 , Pi = {0.48, 0.24, 0.1, 0.06, 0.07, 0.05}
10
P
1
i
CHT (ESC and GIEAS)
..
IPM R course, part 1
.
..
.
..
.
. . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. ..
July 16, 2014
.
..
.
..
.
..
.
..
21 / 42
.
Plots
Suppose we have a frequency table of each party’s seats.
Seats
KMT 48
DPP 27
Other 4
We can translate the numbers to a vector.
> seats<-c(48,27,4)
> names(seats)<-c("KMT","DPP","Other")
..
CHT (ESC and GIEAS)
IPM R course, part 1
.
..
.
..
.
. . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. ..
July 16, 2014
.
..
.
..
.
..
.
..
22 / 42
.
Bar Chart
30
20
0
10
Number of Seats
40
> barplot(seats,ylab="Number of Seats")
KMT
DPP
Other
..
CHT (ESC and GIEAS)
IPM R course, part 1
.
..
.
..
.
. . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. ..
July 16, 2014
.
..
.
..
.
..
.
..
23 / 42
.
Pie Chart
> seats.percent<-seats/sum(seats)
> pie(seats.percent,
+
main="Seats Distribution (%)",
+
col=c("blue","darkgreen","gray60"), cex=1.5)
Seats Distribution (%)
KMT
Other
DPP
..
CHT (ESC and GIEAS)
IPM R course, part 1
.
..
.
..
.
. . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. ..
July 16, 2014
.
..
.
..
.
..
.
..
24 / 42
.
Dot Chart
Suppose there are two variables within a dataset. We can use
order() to sort the DIPC variable and make our dataset accordingly.
> df<-data.frame(city,DIPC) ; df
city
DIPC
1
Yilan 23995
2 Taoyuan 268841
3 Hsinchu 274737
4
Miaoli 213792
5 Changhua 210870
> df<-df[with(df,order(-DIPC)),]
> df$city<-reorder(df$city,df$DIPC)
..
CHT (ESC and GIEAS)
IPM R course, part 1
.
..
.
..
.
. . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. ..
July 16, 2014
.
..
.
..
.
..
.
..
25 / 42
.
Dot Chart
> library(lattice)
> dotplot(city ~ DIPC,data=df,pch=16,cex=2)
Hsinchu
Taoyuan
Miaoli
Changhua
Yilan
50000
100000
150000
200000
250000
DIPC
..
CHT (ESC and GIEAS)
IPM R course, part 1
.
..
.
..
.
. . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. ..
July 16, 2014
.
..
.
..
.
..
.
..
26 / 42
.
Line Chart
ENEP
2.5
3.0
3.5
x<-c("1998","2001","2004","2008","2012")
y<-c(2.46,3.47,3.25,1.75,2.24)
plot(x,y,xaxt="n", pch=20 , xlab="Years",ylab="ENEP")
x.lab<-c("1998","2001","2004","2008","2012")
axis(1,x.lab) ; lines(x,y, type="l")
abline(v=c(2004,2008), lty=2, col="gray60")
2.0
>
>
>
>
>
>
1998
2001
2004
2008
Years
CHT (ESC and GIEAS)
IPM R course, part 1
2012
..
.
..
.
..
.
. . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. ..
July 16, 2014
.
..
.
..
.
..
.
..
27 / 42
.
Exercise 3
. Please draw a bar plot showing the density of each party’s
number of seats.
.2 Suppose you randomly draw 32 people from a list. There are 14
people supporting increasing social welfare, 8 people supporting
decreasing social welfare and 10 people having no response.
Please draw a bar plot to show the percentage of the responses.
3. Use dataset Anscombe in car and draw a dot plot to show the
income of each state.
1
..
CHT (ESC and GIEAS)
IPM R course, part 1
.
..
.
..
.
. . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. ..
July 16, 2014
.
..
.
..
.
..
.
..
28 / 42
.
Objects
Vector
Matrix(matrix(x, row=2,col=2))
Array(array(x,dim=length(x)))
Dataframe(data.frame(x))
List(list(A,B,C))
..
CHT (ESC and GIEAS)
IPM R course, part 1
.
..
.
..
.
. . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. ..
July 16, 2014
.
..
.
..
.
..
.
..
29 / 42
.
Types of Data
Numeric(1, 0.1, pi)
Character(A,b,”1”)
Factor(A,”Group A”)
Integer(1,2,3)
Logical(x==4)
command: class()
..
CHT (ESC and GIEAS)
IPM R course, part 1
.
..
.
..
.
. . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. ..
July 16, 2014
.
..
.
..
.
..
.
..
30 / 42
.
Vector
The most common object in R is vector. For example:
> r<-c(30.9, 42.3, 0.024)#exchange rate
> r*1200
[1] 37080.0 50760.0
28.8
> r-c(0.2, 0.05, 0.001)
[1] 30.700 42.250 0.023
We can calculate numeric vector’s statistics, such as mean.
> y<-c(320, 402, 430, 288, 508, 177, 234, 269)
> sum(y)/length(y)
[1] 328.5
We can index vector as:
> y[2]; y[-c(3,4)]; y[length(y)]
[1] 402
[1] 320 402 508 177 234 269
[1] 269
..
CHT (ESC and GIEAS)
IPM R course, part 1
.
..
.
..
.
. . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. ..
July 16, 2014
.
..
.
..
.
..
.
..
31 / 42
.
Logical values
Logical values mean R returns T or F
> y<-c(320, 500, 430, 288, 500, 177, 234, 500)
> y>300
[1] TRUE TRUE TRUE FALSE TRUE FALSE FALSE TRUE
> y[y>300]
[1] 320 500 430 500 500
which() correponds to the TRUE values of x. The expression of
equality, ==, compares a data vector with a value.
> which(y>300)
[1] 1 2 3 5 8
> y[which(y>300)]
[1] 320 500 430 500 500
> which(y==500)
[1] 2 5 8
..
CHT (ESC and GIEAS)
IPM R course, part 1
.
..
.
..
.
. . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. ..
July 16, 2014
.
..
.
..
.
..
.
..
32 / 42
.
Sequence numbers
We can generate a sequence of numbers by typing:
> 1:10
[1]
1
2
3
4
5
6
7
8
9 10
> a=2; b=15 ; a+(10:b)
[1] 12 13 14 15 16 17
> rev(6:1)
[1] 1 2 3 4 5 6
seq() generates a sequence of numbers by specified interval.
> seq(10,15)
[1] 10 11 12 13 14 15
> seq(0.1,1,by=.2)
[1] 0.1 0.3 0.5 0.7 0.9
..
CHT (ESC and GIEAS)
IPM R course, part 1
.
..
.
..
.
. . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. ..
July 16, 2014
.
..
.
..
.
..
.
..
33 / 42
.
Repeat
rep() repeats elements for a specified number of times, as in
> rep(02139, 5)
[1] 2139 2139 2139 2139 2139
> rep(1:5, 2)
[1] 1 2 3 4 5 1 2 3 4 5
> rep(c("red","green","white"), c(1,2,1))
[1] "red"
"green" "green" "white"
..
CHT (ESC and GIEAS)
IPM R course, part 1
.
..
.
..
.
. . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. ..
July 16, 2014
.
..
.
..
.
..
.
..
34 / 42
.
rep
We can also create a vector of repeated missing values for
further use.
> s<-rep(NA, 5)
> for (i in 1:5){
+
num<-15:19
+
s<-num[i]^2
+
cat(num[i], "squared is", s, "\n")
+ }
15
16
17
18
19
squared
squared
squared
squared
squared
is
is
is
is
is
225
256
289
324
361
..
CHT (ESC and GIEAS)
IPM R course, part 1
.
..
.
..
.
. . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. ..
July 16, 2014
.
..
.
..
.
..
.
..
35 / 42
.
Matrix
We can arrange data elements as a matrix.
> set.seed(999)
> N=3; M=3;
> r = matrix(round(runif(N*M,1,9),0), N, M)
> r
[,1] [,2] [,3]
[1,]
4
8
6
[2,]
6
7
2
[3,]
2
2
4
..
CHT (ESC and GIEAS)
IPM R course, part 1
.
..
.
..
.
. . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. ..
July 16, 2014
.
..
.
..
.
..
.
..
36 / 42
.
Matrix
We can do element-wise manipulation or matrix multiplication.
> I.mat=matrix(c(1,0,0,0,1,0,0,0,1),3,3)
> I.mat
[1,]
[2,]
[3,]
[,1] [,2] [,3]
1
0
0
0
1
0
0
0
1
> r*I.mat
[,1] [,2] [,3]
[1,]
4
0
0
[2,]
0
7
0
[3,]
0
0
4
..
CHT (ESC and GIEAS)
IPM R course, part 1
.
..
.
..
.
. . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. ..
July 16, 2014
.
..
.
..
.
..
.
..
37 / 42
.
Matrix
> r%*%I.mat
[,1] [,2] [,3]
[1,]
4
8
6
[2,]
6
7
2
[3,]
2
2
4
..
CHT (ESC and GIEAS)
IPM R course, part 1
.
..
.
..
.
. . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. ..
July 16, 2014
.
..
.
..
.
..
.
..
38 / 42
.
Arrays
Like matrix, we can determine the dimension of vector. The
elements will be arrayed by the order of column.
> i <- array(c(1:3,3:1), dim=c(3,2))
> i
[,1] [,2]
[1,]
1
3
[2,]
2
2
[3,]
3
1
If there are not enough elements for the C × R cells, the
elements will be repeated from the begining.
> A<-array(c(1:9),c(2,5))
> A
[,1] [,2] [,3] [,4] [,5]
[1,]
1
3
5
7
9
[2,]
2
4
6
8
1
..
CHT (ESC and GIEAS)
IPM R course, part 1
.
..
.
..
.
. . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. ..
July 16, 2014
.
..
.
..
.
..
.
..
39 / 42
.
Exercise 4
.
2.
3.
4.
1
Open data mtcars and obtain the index of mpg greater than 20.
Find out how many cars with manual transmission.
Generate a series of numbers from 0 to 1,000 by 25s.
Create a 4 by 4 matrix for 16 positive integers.
..
CHT (ESC and GIEAS)
IPM R course, part 1
.
..
.
..
.
. . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. ..
July 16, 2014
.
..
.
..
.
..
.
..
40 / 42
.
Problems
. Suppose seven restaurants offer lunchboxes and the prices are:
55, 50, 65, 59, 49, 70, 75. Please find out the average price.
2. Suppose the housing prices in a villiage are 6, 11.99, 15.3, 23, 8,
10, 4.5, 7.5, and 12 million dollars. Please find out the median
price.
.3 Suppose you pump up 35, 33, 31, 25.8, 30.5, 29.99, and 39.44
liters of gasoline after driving your car 430, 231, 210, 200, 189,
220, and 310 kilometers. Please find out the kilometers per liter
for each run and the median of the result.
.4 Suppose you want to tell your American friend about the
temperature here, which is 19 ◦ C. The conversion formulat for
Celsius to Fahrenheit is ◦ C × 1.8 + 32. Please convert local
temperature for your friend.
1
..
CHT (ESC and GIEAS)
IPM R course, part 1
.
..
.
..
.
. . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. ..
July 16, 2014
.
..
.
..
.
..
.
..
41 / 42
.
Problems
. Find the average per capita crime rate of the Boston data.
6. Draw a barplot to show the distribution of number of cylinders
in mtcars dataset.
7. Draw a line chart to show the trend of US population (USPop)
8. Find out the top 5 automobile in terms of mpg (the higher mpg
the better performance) (mtcars)
5
..
CHT (ESC and GIEAS)
IPM R course, part 1
.
..
.
..
.
. . . . . . . . . . . .
.. .. .. .. .. .. .. .. .. .. .. .. ..
July 16, 2014
.
..
.
..
.
..
.
..
42 / 42
.