ANTHRO 309, semester 2, 2016 Cochrane Page of 9 An

ANTHRO 309, semester 2, 2016
Cochrane
Page 1 of 9
An Introduction to Using R
This lab* will familiarize you with R and the RStudio IDE (Integrated Development Environment,
i.e., software used to develop computer programs). R is a programming language, but one aimed
primarily at statistics and associated graphical outputs and includes both a coding environment (where
programs are written and then run) and a command line interface (where commands are entered and
return values immediately). In addition, R and its packages (i.e., extensions) are freely available,
platform neutral and are supported by a large user base. Finally, an entire analysis done in R can be
written in a text file, also known as a script, and anyone, anywhere can run your analysis as long as
they have a computer. Your R script is thus a shareable, repeatable, annotatable record of your
analysis.
This lab is not marked and in it you will learn how to:



Navigate the RStudio IDE
Use basic commands in R
Distinguish different object types in R


Read data into R and store data
efficiently
Write and annotate a short R script
Navigating the RStudio IDE
We will use RStudio to write, manipulate, and run R code to perform a variety of statistical
operations. There are other IDEs within which to run R, and you can run R through a basic text-editor
that is typically downloaded as part of R, so look around the internet if you are curious. RStudio is
used to operate R, organise R code, and handle graphic output.
(1) To begin, open RStudio. You should see a window that looks like this:
R script/code pane
Console
Environment and
History Pane
Navigation Pane
When first opened the Screen may not show the R script/code pane. Minimize the left pane (typically
Console) you will see the R script pane.
(2) Familiarize yourself with:


Dropdown Menus: These function in much the same way that menus in other programs do.
The File menu, for example, is where you can create, open, save, and close files.
Environment/History Pane: two tabs which show either the objects stored in the current
session (Environment), or the history of commands entered at the command line (History).
* This lab is modified from Ben Davies ANTHRO 370 Lab 1, and uses many ideas from Beckerman and
Petchey (2012, Getting Started with R, Oxford University Press).
ANTHRO 309, semester 2, 2016



Cochrane
Page 2 of 9
Console Pane: Contains the command line, where commands are passed to R by the user.
Navigation Pane Includes five tabs where outputs and other R features can be viewed
The Code Pane: This is the text file containing R script, i.e., the sequence of commands read
by R. You will save this file to save your analyses/work
Depending on what actions you are undertaking in the Console Pane, different things may occur in the
Navigation Pane.
(3) To see how this works, try typing the following command into the command line (indicated
by an >) in the Console Pane, and then press enter:
> plot(hist(rnorm(50,0,1)))
This code uses the plot function to plot whatever is inside the outermost brackets. The next bit of
code uses the hist function to generate a histogram of some data referenced inside the next set of
brackets. That data is a set of 50 random numbers drawn from a normal distribution using the rnorm
command, with a mean of 0 and a standard deviation of 1. The navigation pane should now jump to
Plots with the resulting histogram displayed. You might try changing the number of random numbers
generated, or the mean, or the standard deviation to see the different plots made. Just copy and paste
new lines in the console, change numbers inside the innermost brackets, and hit enter.
In addition to Plots, other tabs in the Navigation Pane include:




File provides a file explorer for your computer, with the default file open to the working
directory (where R looks for data and scripts).
Packages shows which function packages (also called libraries) are installed on this computer
and active in this session.
Help provides access to R help documentation, opens automatically when calls for help are
made at the command line using ? or ??
Viewer displays local web content (html, etc).
Use Basic Commands in R
(4) Learn to use the command line with some simple expressions.
Type
> 8 + 2
Hit enter and this appears
[1] 10
R can identify these as numbers, as well as the operator + for addition, so it will perform the addition
function on the two numbers. R uses a number of standard operators for basic maths operations,
including:
> 8+2
> 8*2
> 8^2
[1] 10
[1] 16
[1] 64
> 8–2
> 8/2
[1] 6
[1] 4
R can do simple arithmetic without much/any programming on your part. However if you type:
> a+2
ANTHRO 309, semester 2, 2016
Cochrane
Page 3 of 9
Error:object ’a’ not found
At this point R does not know what ‘a” is or what to do with ‘a’. All R knows is that we have asked it to add
2 to something unfamiliar labelled ‘a’.
But ‘a’ can stand for something else, such as a variable in algebra. For R to know what the variable ‘a’ is, we
have to define ‘a’ as an object in memory. To define an object in R use the ‘less than’ sign and a hyphen,
which taken together resemble an arrow like so: <For example, if we wanted to define ‘a’ as having a value of 8:
> a<-8
R will read this command as “Let ‘a’ have a value of 8”. Now if we ask the computer to tell us what a is, it
gives us this:
> a
[1] 8
R now interprets the letter a as being equivalent to the number 8. If we go back to our original expression of
a + 2, we get the same answer as 8 + 2:
> a + 2
[1] 10
We can also assign a new value to the object a using an expression, like this:
> a<-a + 1
> a
[1] 9
> a + 2
[1] 11
Here, we have updated the previous value of a (which was 8) to that value plus 1 (which is 9). And when we
add this new value to 2, we get 11. However, it is important to note this last expression (a + 2) does not
assign a new value of 11 to a; a is still holding at 9.
At the moment, this may not seem very impressive, but it is one of the keys of computer programming:
updating values of stored objects.
We can also perform a number of built-in functions on objects that are numbers or variables. Functions
operate typically by stating the function name in R, followed by a set of input values (or arguments) in
parentheses. For example, if we wanted to find the square root of the current value of a (9), we could use the
sqrt function like so:
> sqrt(a)
[1] 2.828427
NB: if you get a + at the start of a line in the console you may have entered a function without specifying nec
essary arguments. Hit ‘escape’ to get back to the >.
NBB: you can always type ? then a command name into the console and this will bring up the appropriate he
lp in the Navigation pane. Try it:
>?sqrt
ANTHRO 309, semester 2, 2016
Cochrane
Page 4 of 9
Distinguish Different Object Types in R
There are many different kinds of objects you can use in R depending on the kinds of things you want R to
do. Think of an object as something held in R’s brain. You do things to objects, like datasets, using functions
(like plot). This section discusses a number of the commonly-encountered object types in R.
(5) Try out the examples, but don’t worry if it doesn’t make perfect sense right away. If you run into a
problem with object types, it’s helpful just to know that different types of objects exist.
Atomic Objects
Atomic objects are individual elements that can be used in expressions and functions, or combined into
compound Class Objects. The most common of these are numeric, logical, and character objects. You can
determine what type an object is using the typeof() function (more below).
Numeric
As shown above, R can interpret numbers in the same way as a calculator. There are two categories
of numeric objects: integer and double. An integer is any whole number, while a double number can
be any number. A numerical object defaults to the double type unless specified as an integer. Some
mathematical terms that count as numeric object are either built into R, such as using pi for π, or
require some minor coding, like exp(1) for e.
> pi
[1] 3.141593
> typeof(pi)
[1] "double"
Logical
A logical object in R has a value of either TRUE or FALSE, always expressed in capital letters.
Some operations in R will return values as either TRUE or FALSE, while others may require input
in the form of a logical value.
> 2==2
[1] TRUE
> 2==3
[1] FALSE
> s<-2==3
> typeof(s)
[1] "logical"
Note: when determining equivalence, a double equals sign (==) is used. Remember <- tells R that
the thing on the left is defined by the thing on the right.
Character
A character object (also referred to as a string), is a block of text meant to be read as text. R cannot
interpret the meaning of the text directly, but some functions require character objects for input.
These are normally enclosed by quotes and can be used as categorical data in some class objects (see
below).
> text<-"text"
> typeof(text) [1] "character"
ANTHRO 309, semester 2, 2016
Cochrane
Page 5 of 9
Class Objects
Class objects are compound objects that can contain multiple objects and, sometimes, multiple types of
objects. Frequently used class objects include vectors, matrices, and dataframes.
Vectors
A vector is a collection of objects of a single type (e.g., numeric), presented as an ordered row. Each
object in a vector has a value as well as a position (1st, 5th, etc). Vectors can be created by entering
the objects, separated by commas, into a parentheses with a c (for ‘combine values’) at the front.
>pipi<-c(20.1,21.2,25,22.1,19.3,22.2,20.6,29.9,26.4)
>tuatua<-c(65.5,56.3,58.2,57.1,54.3,59.3,65,61.4,58.8)
Individual objects or groups of objects in a vector can be accessed by entering the name of the vector
with their position number into a set of square brackets ([ ]). For multiple objects, the individual
positions can be entered followed by commas, or a run of values can be expressed using the first and
last positions separated by a colon (:). The type of vector is determined by the kinds of atomic
objects it contains.
> pipi[5]
[1] 19.3
> tuatua[1:3]
[1] 65.5 56.3 58.2
> typeof(tuatua)
[1] "double"
Matrices
A matrix is a set of objects of the same type, structured in a rectangular grid. Matrices are
particularly useful when using spatial data. Matrices can be created by using the matrix function, or
by binding vectors together as columns (using cbind) or rows (using rbind).
> mat<-matrix(data = pipi, nrow=3, ncol=3)
> mat
[,1] [,2] [,3]
[1,]
20.1
22.1
20.6
[2,]
21.2
19.3
29.9
[3,]
25.0
22.2
26.4
> mat<-cbind(pipi,tuatua)
> mat
pipi tuatua
[1,] 20.1 65.5
[2,] 21.2 56.3
[3,] 25.0 58.2
[4,] 22.1 57.1
[5,] 19.3 54.3
[6,] 22.2 59.3
[7,] 20.6 65.0
[8,] 29.9 61.4
[9,] 26.4 58.8
ANTHRO 309, semester 2, 2016
Cochrane
Page 6 of 9
Now try rbind on the tuatua and pipi vectors. Note: when vectors are bound together as columns or
rows, the names of those vectors become column or row names by default.
Dataframes
A dataframe is like a matrix, only it can accept columns of different object types (e.g. numeric,
character, logical). Dataframes may use row and column numbers or names to distinguish between
the objects. Dataframes are similar to what you think of as a datasheet, i.e., each row can be an item
and each column is a kind of observation you make on that item (weight, length, sex, etc.). Many
data operations in R make use of dataframes to organise data. You can create a dataframe from
scratch using the data.frame function. However, this is cumbersome way and it is much easier to
set up your data in a program such as Excel and import into R (see below).
Converting object types
Sometimes you may find yourself in a situation where you have data stored as one object type, but a function
that requires another. For example, create a vector containing a set of numbers contained within strings, and
the vector will be called values.
> values<-c("2","4","10","5","2","11")
In theory, mathematical operations could be conducted on these numbers, but R will treat them as strings
because they are enclosed in quotes. For example, try obtaining the average of values using the mean
command:
> mean(values)
[1] NA
Warning message:
In mean.default(values) :
argument is not numeric or logical: returning NA
We could rewrite the values vector to contain numerical items, but this would be a pain, even more so with a
vector that had thousands of items. A quicker way to deal with this is to ask R to coerce the strings into
numbers using the as function.
> mean(as.numeric(values))
[1] 5.666667
Here you are asking for the mean of an object and that object is the vector of string ‘values’ (i.e., text or
character values) changed to a vector of numeric values by the function as. Coercion works when R can
recognise the data type and has a clear way of converting it to another type. While this code provides the
correct answer, it doesn’t permanently change values to a vector of numbers. To do that, you would need
to reassign the coerced vector to the values object stored in memory:
> values<-as.numeric(values)
> values
[1] 2 4 10 5 2 11
To convert them back to string use as.character
Read data into R and store data efficiently
As mentioned, R is not particularly good software for entering data, but it is good for analysing data. To get
data into R you should generate data as a comma separated values (CSV) file. This is a common document
under “save as” or “export” in many software applications (e.g., Excel, Access, Numbers, etc.). So that R can
easily read your csv file construct it this way:
ANTHRO 309, semester 2, 2016



Cochrane
Page 7 of 9
Make the first row of your spreadsheet your variable names. Use informative, short, simple names.
Don’t use spaces or special symbols. R can deal with these, but it just adds complications.
Enter data so each row is a case and the columns are observations (variables) made on that case. For
example you might generate data on people with each row being one person and the columns
signifying sex, age, ethnicity, height, mean income, highest educational qualification, etc.
If you have categorical variables (e.g., short, tall), use the text entries, not a numerical equivalent, in
your data file.
One key to working with R is keeping your data in a secure area along with your analysis scripts.
(6) Create a folder on your drive for R documents and clear memory. Create a folder in MyDocuments
(PC) or Documents (Mac), preferably on a university server you can access from anywhere, and title it
something informative such as 309Ranalyses. Within that you might make folders for 309Rscripts and
309Rdata. You can choose whatever names work best for you. Put your own data or the 309introdata.csv file
into your 309Rdata folder.
Now you need to determine what data are currently held in R, clear these data if necessary.
> ls()
[1] "a"
"mat"
"maxlength" "pipi"
"tuatua"
"values"
ls() instructs R to list all objects held in R’s memory. To remove everything in R’s memory:
> rm(list=ls())
rm stands for remove and here you are removing the results of ls() that have been coerced into a list by the
function list. A bit confusing, but you now have the command line input to remove all data currently in
R’s brain (you should do this before you start any new R session). Type:
> list()
[1]list()
There is nothing in memory.
(7) Now tell R where to look to load your data. First, figure out where R is currently looking by
> getwd()
[1] "C:/Users/Ethan/Documents/R/testanalyses"
getwd stands for “get working directory” and the result you get by typing this shows where R is looking.
Note that your result will not look like this and may be something strange like:
function ()
.Internal(getwd())
<bytecode: 0x000000000c03d020>
<environment: namespace:base>
So you must tell R where to look with the function setwd.
> setwd(“H:/Current Research/309Ranalyses")
[1] "H:/Current Research/309Ranalyses"
Note that setwd requires a string input value (within quotes) that is the file path to your directory. There are
several ways to determine your file path depending on if you are using a Mac or a PC. This hand-out will not
go into great detail on this:

In Windows, you can generally use Explorer and the address bar to determine the path of the
currently viewed file
ANTHRO 309, semester 2, 2016


Cochrane
Page 8 of 9
In Mac (OSX) open a Finder window, go to View -> Show Path Bar. Now a new bar will appear at
the bottom of the Finder window showing the path of the currently active folder or directory. The
path will update automatically as you navigate to different folders.
In RStudio, in the Navigation pane, select the File tab. Click on the ellipse at the upper right (…) and
this will bring up a dialog box with a folder-tree. Select the folder you will use as your working
directory, hit OK. This now puts this folder in the File tab view and you can click the More button
and choose Set as Working Directory.
Note that in R paths are designated with single forward slashes (/) while PCs use backslashes (\), so you must
change the backward- to forward-slashes in your path in R when using a PC.
After you have used one of these methods to set your working directory, use getwd() to make sure R is
looking in the right place.
(8) Have R load your data. Now that R is looking in the right place, you need to have R load the data you
want to analyse.
You can use the read.csv() function to have R read your data and it will list it all for you.
> read.csv("Tafuna_ceram_data_test.csv")
CAT.NUMB SHERD.NU SHERD.TY BODY.THI RIM.THIC
1
3000
2
rim
8.3
11
2
4025
72
rim
7.9
9
3
2049
93
rim
9.1
10
4
4022
149
rim
6.5
10
PASTE
fine
fine
fine
fine
These data are not actually in R’s memory. It has just read your file. To hold these data in R you must assign
the data to an object.
testdata<-read.csv("Tafuna_ceram_data_test.csv")
Us ls() to see what is in R’s memory now. NB: testdata is just a made up name for the object that you
have created by the read.csv function.
(9) Check that you really have the correct data in R. It’s wise to check your data. Here are a set of
functions you can use to look at various aspects of the loaded data. Make sure they return values you would
expect: names(), head(), dim(). Typing these in the console with the name of your data as the argument
(e.g., names(testdata)) will give you the column head names, the first six rows of data, and the number
of rows and columns, respectively.
The str() function tells you the structure of the data and is an excellent way to check on your data.
> str(testdata)
'data.frame':
261 obs. of 6 variables:
$ CAT.NUMB: int 3000 4025 2049 4022 3032 3032 3025 4048 4048 4048 ...
$ SHERD.NU: int 2 72 93 149 4 10 47 53 54 61 ...
$ SHERD.TY: Factor w/ 2 levels "body","rim": 2 2 2 2 2 2 2 2 2 2 ...
$ BODY.THI: num 8.3 7.9 9.1 6.5 14.1 10.3 6.8 10.7 8 7 ...
$ RIM.THIC: int 11 9 10 10 14 10 9 10 9 8 ...
$ PASTE
: Factor w/ 2 levels "coarse","fine": 2 2 2 2 1 1 1 1 1 1 ...
The first line states the object is a dataframe with 261 observations (cases or rows) of six variables each. The
next lines show the names of the variables, the kinds of variables (integer, factor, etc.) and the first several
values of the variable.
Try the function summary() and see what happens.
Many people like to be able to look at their datasheet as they work on it. You can do this by toggling back
and forth between R and a program like Excel. You can also look at your data in RStudio: in the Navigation
ANTHRO 309, semester 2, 2016
Cochrane
Page 9 of 9
pane select the file tab. Navigate to your data file and click on it. A text version of the datafile will open as
another tab in the Code pane.
Write and annotate a short R script
Thus far you have done everything in the console. But one of the benefits of R is that you can write down all
your commands in a script, save the script as a text file and you (or anyone) can run the analysis again. Also
it is easy to go back to your script and make a few changes to perform some different analyses of your data.
By the way, the data you analyse in R is never changed. The csv file is not modified in any way.
(10) You can annotate a script by typing # at the beginning of any line. R sees a # and skips over the text
after the # on that line without doing anything. This is a great tool and allows you to take notes within your
script without it affecting what R does.
In the code pane write/copy this code, but replace the arguments in parentheses where necessary:
#
#
#
#
#
-----------------------------------------------------------Ethan Cochrane
January 1, 2000
Intro R Lab for ANTHRO 309
------------------------------------------------------------
#Removing all objects from R’s memory
rm(list=ls())
# getwd tells you where R is currently looking
getwd()
# setwd tells R where to look
setwd("C:/Users/Ethan/Documents/R/testanalyses")
# use getwd to confirm that R is now looking here
getwd()
# Read in the data and assign it a name
# note that we assigned the path above using setwd()
testdata<-read.csv("Tafuna_ceram_data_test.csv")
# Check the data and confirm it is what you expected
names(testdata) # returns the names of the columns
head(testdata) # returns the first 6 rows
dim(testdata) # returns the number of rows and columns
str(testdata) # a powerful compilation of the above
With this in the code pane, select it with your mouse and hit the run button. Check the output in the console;
it will appear from wherever the cursor was last left.
(11) Finally, save your script in your working directory by hitting the save button (or pull-down menu)
and naming the script. RStudio will give it the file extension “.R”