Computer lab 1 Course: Statistics using R

Computer lab 1
Course: Statistics using R
February 13, 2017
These exercises assumes that you have installed R. See the instructions on
the course home page.
1
Using R as a calculator
The most trivial way of using R is as a calculator: Write down the expression
you would like to compute, using numbers and symbols like
+ - * / ^ ( )
and press return. (If you don’t know what the symbol ”ˆ ” does, try it out, by
trying for example
> 2^1
> 2^2
> 2^3
or similar combinations.) To use standard functions, such as the logarithm,
sine, or square root, use functional notation with parentheses, e.g.,
> log(3)
and similar with “sqrt” for square root and “sin” for the sine. Note that “log”
by default computes the natural logarithm. Numbers can be stored in named
objects, using an operator that combines the character “<” with the character
“−”:
> A <- 42
They can then be used in expressions
> A + 5
Objects can be listed using the function
> ls()
1
and removed using the function
> rm(A)
When you use RStudio you can also see the list of objects and information about
them in the upper right pane (in the tab named ”Workspace”). Note that “ls()”
lists only the objects you have created. R contains many other objects; each
function is for example a “function-object”.
1. Compute the standard deviation of the numbers
3.5, 2, 6.1, 4
If you use several steps, store your intermediate results in named objects.
The formula for standard deviation of numbers x1 , . . . , xn is:
v
u
n
u 1 X
t
s=
(xi − x)2 ,
n − 1 i=1
2. Try writing each of the following commands
>
>
>
>
log()
log(2)
log
log(2,2)
and explain what happens in each case.
3. Try writing the commands below, with spaces exactly as written, and
explain what happens in each case:
>
>
>
>
>
2
A<-19
A
A< -19
A<--19
-22->AA
Accessing help; learning about R
In R, you have to, to some extent, remember the names of the functions to use.
That makes it particularly important to quickly learn to use the help functions
in an efficient manner.
1. Open the help with
> help.start()
2
In the web page shown to you in the lower right pane (in Rstudio), the
links “An Introduction to R”, and “Packages” are particularly important.
Take a few minutes to familiarize yourself with the information in “An
Introduction to R”. This documentis actually, in itself, a reasonably good
textbook in R. Under “Packages” you can find information about some of
the packages currently available to you; we will look more at this later.
2. Use either
> ?log
or
> help(log)
or simply type “log” in the search box in the help tab (in Rstudio) to
open the help information about the log function. Read through it and
understand as much as you can; help boxes always have the same sections
Description, Usage etc.
3. Find out how you can compute the standard deviation of the three numbers from the previous section in a simpler way, starting with the command
> help.search("standard deviation")
Other alternatives to do the same search is to use
> ??"standard deviation"
or just using the search box in Rstudio. To be able to use the function you
find, you may have to combine your data into a “vector”, using “c(3.5, 2,
6.1, 4)”.
4. Learn more about the different help functions
help, help.search, example, demo
by applying them to each other (e.g using “help(example)”) and to other
functions and words (e.g using “help(sum)”, “help.search(”sum”)” or “example(sd)”).
3
Data types
Objects in R can have many types; they are not only numbers. Two other
important types are “logical” and “character”. The usefulness of each type will
become more apparent soon.
3
1. Try out the following, and explain exactly what happens for each command:
>
>
>
>
>
>
>
>
>
>
>
>
>
AA <- TRUE
AA
class(AA)
class(3.1)
BB <- T
BB
!BB
BB & !AA
AA | !BB
as.numeric(AA)
as.numeric(F)
AA + 3
as.logical(1)
2. Try out the following commands, and explain what happens. Remember
that you can use the different help functions in R!
>
>
>
>
>
>
>
>
>
>
>
>
4
CC <- "This is some text"
CC
class(CC)
myAnswer <- 42
paste("The answer is", myAnswer, "as far as I know")
as.character(myAnswer)
as.numeric("42")
as.numeric("42, I think")
"42" + 3
TRUE + 3
"TRUE" + 3
as.logical("TRUE") + 3
Vectors and matrices
Most of your computations in R will be done with vectors and matrices. We
saw above how to construct a vector, using the function “c”.
1. Place the following data in a vector you call “myData”:
34, 52, 32, 63, 41, 16, 32, 45, 11, 35
2. Try out the commands
> myData + 1
> myData*2
4
>
>
>
>
>
moreData <- seq(0.5,5,by=0.5)
myData + moreData
1:10
myData + 1:10
rep(0,10)
and understand what happens in each case.
3. Explain what happens when applying the following functions to your data:
“sum”, “summary”, “mean”, “sd”, “var”, “length”.
4. Try also out applying “t.test”; we will return to t-tests later.
5. Find the two largest values (without manually looking!) (Hint: try “help(sort)”).
6. Try out the commands
> myData > 40
> sum(myData > 40)
and explain what happens. Compute how many of the data values are less
than 35 (without manually counting!).
7. Write a command that outputs
[1] "There are XX data values less than 35"
where XX is replaced by the actual number, computed in the same command.
8. To be able to access specific values or subsets of values in a data set is very
important and something you probably will use a lot. This is commonly
called “indexing”. Try out and explain the commands
>
>
>
>
>
>
myData[3]
myData[c(3,5,6)]
myData[3] <- 33
myData[c(3,5,6)] <- 33
myData[myData > 40]
myData[myData > 40] <- 40
Then, create a data vector where all values smaller than 35 in the original
data vector is replaced by 35. (Note: If you want to re-do old commands,
for example to re-create the old myData vector, use the arrow-up key to
get to older commands. You may of course also store the values of objects
in backup-objects, so that they are not lost).
5
9. Starting with the original myData object, construct a matrix that looks
like
34 52 32 63 41
.
16 32 45 11 35
Use
> help(matrix)
to get started. To be able to use the function “matrix()” there is a need
to understand how to use function arguments. If you just type
> matrix(myData)
you can see that the result is a matrix with only one column. To get
a matrix with two rows, and that the matrix is filled by rows additional
arguments must be specified. Read the help page and try to figure out
which arguments that have to be specified (i.e set to different values than
the default ones). Call the matrix you create myMatrix.
10. Try out, and explain the result of each of the following commands:
>
>
>
>
>
>
>
>
myMatrix[2,3]
myMatrix[1, c(3,5)]
myMatrix[,3]
myMatrix == 32
myMatrix[myMatrix == 32] <- 30
r1 <- myMatrix[1,]
r1 > 40
myMatrix[,r1>40]
11. Construct a sequence of commands that replaces all columns in (the original) myMatrix where the top number is larger than the bottom number
with zeros. Hint: Try what happens if you use
> vector1 <- c(1,2,3,4,5)
> vector2 <- c(5,4,3,2,1)
> vector1 > vector2
5
A first look at graphics
An essential part of any data analysis is visualisation. We start with some basic
possibilities:
1. The function “plot”, given two numerical vectors of the same length, will
produce a plot where the first numbers are used as x-coordinates and the
second numbers are used as y-coordinates. Use this functions on the rows
of myMatrix (i.e using the first row as x-values and the second as y-values),
to illustrate the data.
6
2. Run the same plot command again, but now with extra parameters (arguments) added to the function, using
main = "... some heading ..."
xlab = "... something describing the x variable ..."
ylab = "... something describing the y variable ..."
3. If the data is not paired according to the columns, it may be more reasonable to illustrate each row using a boxplot: Use the two rows as two
separate arguments to the boxplot function.
4. Make a histogram of the data in the vector myData. (Hint: If you for
example run
> help.search("histogram")
you will get a list of possible functions to use; each function name is followed by the name of the package containing it, in parentheses. The package “graphics” is a fundamental package, so you should choose a function
contained in it. Try to figure out which by using help.)
7