Anthony's R tutorial

R Tutorial
Anthony G. Peña
December 20, 2012
Introduction
The following tutorial was designed for life scientists interested in learning the basics about the R
programming language (with no previous programming experience) as well as experience computer
science students who may want to explore the potential biological applications that lay beyond
the realm of computer science. In no way is this tutorial all comprehensive. There are countless
additional features that are not discussed in this tutorial and that could not possibly be condensed
into a single tutorial anyway. Through out the tutorial I will use single quotes ' ' around commands
that will be used to input into R, and underlined words _ will respresent functions.
Part I
Vectors
Vectors are the fundamental ideas when "speaking" in the language of "R" so we will look at vectors
rst. A vector can be assigned using a backwards pointing arrow '<-' or '=' but it is good practice
to use '<-' for reasons outside the scope of this tutorial. Unlike languages like python where a
variable is assigned to a single value (e.g. character, integer), R allows the user to assign vectors
not only a single value but a series of values. When assigning a series of values to a vector, you
can use 'c' to combine the arguments followed by paranthesis surrounding your desired values. For
example, the vector 'trials' and 'patients_rst' have a series of numbers and characters assigned to
it, respectively:
experimental_trial_num<−c ( 7 , 8 , 9 , 1 0 )
experimental_trial_num
p a t i e n t s _ f i r s t <−c ( ' Bob ' , ' Anthony ' , ' J i l l ' , ' Margaret ' )
patients_first
After you've created a vector(or dataframe or function), you can produce a vector of character
strings giving the names of the objects in the current environment that you are in by typing in
the command: 'ls()'. Conversely, to remove vectors(or dataframes or functions), you can type the
command: 'rm()'where the parentheses will contain the vector(s) (or dataframe(s) or function(s))
that you wish to remove from your environment. Just like you use the 'c' function to combine
arguments when your creating a vector, we will also use the 'c' function within 'rm()' function to
combine multiple vectors for removal in this case.
The ability to assign a series of values to a single vector suggests that we must have a way of
accessing particular values within the vector itself, and this is achieved by use of the brackets'[ ]'.If
we use the vector 'experimental_trial_num' that we just created, we can access the rst value:
1
experimental _ trial _ num [1]
If we take this further, we can take advantage of the colon operator ':' as a shortcut to re-write
our rst vector 'experimental_trial_num':
p a t i e n t s_ f i r s t [ 1 : 3 ]
In contrast, we can use **'-'** in front of our numeric values as one way of indicating which
elements we wish to exclude from a vector, columns of a dataframe(which we will discuss soon), or
entire columns themselves. Using vectors as an example:
e x p e r i m e n t a l_ t r i a l _num[ − 1 , − 3]
p a t i e n t s_ f i r s t [ − 1: − 3]
Part II
Booleans
Another way of excluding certain elements from series of elements stored in a vector is to use
boolean tests '=='. If we only want elements within the vector 'patients_rst' that are equal to
the characters 'Anthony' then we give this test in the following way:
p a t i e n t s_ f i r s t==' Anthony '
Running this boolean results in a series of 'TRUE/FALSE' values that correspond to the
elements' position within the vector or dataframe columns(which we discuss in the next section)
that you tested.
In fact, we can use the boolean test results to serve as input for indexing another vector. To
illustrate:
Anthony_ f i r s t _p a t i e n t s<− p a t i e n t s_ f i r s t==' Anthony '
e x p e r i m e n t a l_ t r i a l _num [ Anthony_ f i r s t _p a t i e n t s ]
As noted in the comments above, only where TRUE in the boolean test results will the corresponding elements within 'experimental_trial_num' be returned/printed. So, in the example above,
the test for 'Anthony' was only TRUE for the second element within the vector 'patients_rst'.
This TRUE second element corresponds to the second element of 'experimental_trial_num' since
we are using the brackets to index the 'experimental_trial_num' vector.
Checkpoint I
1. Create a vector of your choice (containing characters or numerics) with at least 5 elements. Be
sure to use the 'c' function to combine all your arguments!
2. Using the vector you created, access the third element of your vector. Now print the elements
of your vector, excluding the third element.
3. See if it is possible to re-write your vector using the colon ':' as a shortcut.
4. Create another vector but this time it must be a vector containing a series of numerical
values. If possible, try to make your second vector the same element length as your rst. NOTE:
2
To get a length of your vector without having to count each one, use the length() function where
the parentheses will contain the name of your vector(without quotes).
5. Finally, list all the vectors in your environment and delete them.
Problem I
Using the very little we know about R, we can already begin to see some potential applications.
For example, imagine you are performing a human isotyping immunoassay in which you want to
document the dierent classes of immunoglobulins as numbers. Say, you have results from an ELISA
experiment (to be discussed later) that consists of numbers ranging from 1-7 that corresponds to
the 7 dierent immunoglobulin classes (IgG1,IgG2,IgG3,IgG4,IgM,IgA,IgE) that are of interest to
you in the experiment. We can convert these numerical values into meaningful units by using what
we know about vecotrs.
1.Create a vector 'Ig_molecules'that stores a string of characters for each class of immunoglobulins.
2. Next, create another vector 'ELISA_results'containing the following sequence of random integers between 1-7 that will represent our ELISA results: 1,2,2,7,5,4,4,3,2,5,6,5,2,2,5,4,5,5,7,7,1,3.NOTE:
We will not have to enter data in this archaic fashion in the future. Later we will learn how to
simply read a le with data already existing in the le.
3. With both of these vectors created, use the output of one vector ('ELISA_results')as the input
of the other('Ig_molecules'), as discussed earlier, to produce a list of Ig molecules that correspond
to the numerical values that represent your ELISA results.
4. Create a boolean test in which you test for the presence of 'IgM' immunoglobulin in the list
you generated in #3.HINT: It may be easier to create another vector that will contain the list you
generated in #3.
CHALLENGE: Using what we have learned so far, develop a way to utilize the boolean
TRUE/FALSE output produced in #4 to determine how many instances of 'IgM' occured without
having to count the number of 'TRUE' yourself. Your solution should produce a single numerical
value.
Part III
Dataframes
Now that we have an understanding of vectors we can begin to talk about another tool with more
capabilities: data frames.Data frames allow one to create rows and columns containing multiple
values much like vectors.To create a data frame use the 'data.frame' function within the context of
the following format:
my_table<−data . frame ( column_1=c ( ' row1 ' , ' row2 ' , ' row3 ' ) )
Notice that we used the letter 'c'to create the contents of rows within a column in this data
frame much like in the way we used it to create the contents of a vector. We could have included an
additional column by simply adding a comma to the end of the rst close parentheses ')' followed
by a denition of a second column in the same format as we used to dene the rst column, or we
can also add a column to an already existing table by using the '$' sign in the following way:
my_table$add_column=c ( ' row_a ' , ' row_b ' , ' row_c ' )
The '$' sign follows immediately after the name of your table and then you assign the disired
contents of your rows to a column name in the same way as before. Similarly, we can also use the
dollar sign to access a specic column in a preexisting table:
my_table$add_column
my_table$column_1
3
Alternatively, we can access entire rows and columns or single entries of a preexisting table with
brackets '[ ]'in the following format: '[row_number,column_number]'. Using the same dataframe,
'my_table', we will access the rst column in the format aforementioned:
my_table [ , 1 ] # a c c e s s an e n t i r e column
my_table [ 1 , ] # a c c e s s an e n t i r e row
my_table [ 1 , 1] # a c c e s s a s p e c i f i c entry ( row 1 i n column 1)
Here, the brackets '[ ]' behave in a similar fashion as to when we used them with vectors to
access specic values.This shorthand way of accessing or referring to specic rows and columns is
also very useful when you want to remove a row or column. To explain, to remove a row and/or
column:
my_table [ , − 1]# remove a column by adding a ' − ' i n f r o n t o f the column number
my_table [ − 1 ,]# remove a row by adding a ' − ' i n f r o n t o f the row number
You may have noticed that when we accessed the 'column_1', the contents of the rows appeared
and 'Levels:' followed just underneath. Without going into much detail, the reason for this is that
when we originally created the dataframe, R (by default) turns the entries that we assigned to our
column into what are called **factors**. Factors, in short, are data types for holding categories
that take less memory because they are saved as numbers rather than names. For the sake of our
discussion, however, We will not discuss factors further and for convenience we will not want them
in our dataframes. So, to ensure that we have a factor-free dataframe we will change the default
to false by adding'stringsAsFactors=F' after the creation of the nal column of any dataframe we
create in the future like so:
my_table<−data . frame ( column_1=c ( ' row1 ' , ' row2 ' , ' row3 ' ) , add_column=c ( ' row_a ' ,
'row_b ' , ' row_c ' ) , s t r i n g s A s F a c t o r s=F)
my_table$column_1 my_table$add_column
Now, we see that if we access 'column_1' again, this time we are not shown 'Levels:' beneath
the row contents. Besides not wanting the appearance of 'Levels:' beneath our output, We do not
want factors in our dataframe for our purposes for another,more important, reason. Let's say that
I want to see if the character 'row3' exists in 'column_1' of 'my_table'. I can check for this simply
by:
my_table$column_1=='row3 '
#t h i s i s o l a t e s a column and l o o k s f o r a match
The return is a set of boolean values indicating that the rst 2 rows of 'column_1' do not contain
a match for 'row3' as denoted by 'FALSE' and that the third row does contain a match as shown
by 'TRUE'. Hypothetically, if we had not changed the dataframe so that it still treated strings as
factors this would not have been possible because the contents of the rows would have been treated
like factors, so instead of matching the 'row3' characters against each row, it would have instead
returned all of the rows 'FALSE'on the grounds that all of the rows contain characters which are
not equivalent to factors. Feel free to re-create the rst version of 'my_table' (the version in which
we did not change the default settings for stringsAsFactors)and check for the existence of 'row3'
within 'column_1' as we just did to convince yourself that this in fact does happen.
Exercise I: Documenting ELISA Results
With our understanding of dataframes, we can begin to understand and imagine more useful applications such as the next scenario. You are working in a research lab and are assigned the task of
performing an Enzyme-linked immunosorbent assay (ELISA) to determine whether the mixture of
immunoglobulins you received contain the immunoglobulin G subclass (IgG4).You decide to perform a serial dilution of the mixture in the rst well column of a 96-titer well plate. If the subclass
of IgG is present then at the end of the ELISA your solution should turn yellow, otherwise the
solution should remain clear. To document our results, we decide to create a dataframe in R that
will include: well number, well letters, color of the wells, and the corresponding dilution. While
4
the creation of the dataframe can be achieved in a single line of code, we will break it into multiple
lines to review what we have learned so far and introduce some commonly used shortcuts and tools
that exist within R. First, let's create a dataframe with the rst column, titled 'well_num':
ELISA_IgG4<−data . frame ( well_num=c ( rep ( 1 , 8 ) ) , s t r i n g s A s F a c t o r s=F)
#R e c a l l : s t r i n g s A s F a c t o r s as F a l s e to c r e a t e a dataframe ELISA_IgG4
So, we have created a dataframe by abiding by the **'data.frame'** syntax with a 'well_num'
column containing '1' at each row since we are in the same column of wells as we perform our serial
dilution. Notice, however, that instead of assigning a string of numbers (e.g. well_num=c(1,1,1,1,1))
we used the 'rep' tool to repeat the number '1' a total of '8' times using this format: rep(number
or character, number of times to repeat). As a quick example:
rep ( ' Repeat ' , 1 0 )
Returning to our present table, we still need to add columns (well letters,colors of wells and
corresponding dilutions)to our 'ELISA_IgG4' dataframe which we will do now. For this, recall that
we use the '$' sign to add columns to a preexisting dataframe:
ELISA_IgG4$well_lett=c (LETTERS [ 1 : 8 ] )
Here, we have added a column 'well_lett' that lists the rst 8 letters of the alphabet as we
descend down the column; however,instead of listing a string of letters of the alphabet one-byone, we availed ourselves of the built in tool 'letters' that automatically generates each letter of
the alphabet.You may have guessed already based on the fact that our dataframe contained only
capital letters and that we used 'LETTERS' in all capitals in our line of code that 'letters' gives
you a list of the letters of the alphabet in lowercase and 'LETTERS' gives you a list of the letters
of the alphabet in all capitals.
letters
LETTERS
Next, we use the brackets '[ ]' to place a constraint on the number of letters we want to use.
Within these brackets we give a range using the colon ':' which we described back in our discussion
of vectors.In short, we are using the 'LETTERS' tool to list the rst to the eigth letter of the
alphabet as capital letters and assign this to the column 'well_lett'. Now that we have our rst
two columns, we will add the remaining two columns to the 'ELISA_IgG4' dataframe:
ELISA_IgG4$well_colors=c ( ' c l e a r ' , ' c l e a r ' , ' yellow ' , ' yellow ' , ' yellow ' , ' c l e a r ' ,
' yellow ' , ' c l e a r ' )
ELISA_IgG4$Dilutions=c (10 ∗∗ 0 ,10 ∗∗ − 1 ,10 ∗∗ − 2 ,10 ∗∗ − 3 ,10 ∗∗ − 4 ,10 ∗∗ − 5 , 'POS CNTRL' ,
'NEG CNTRL' )
Now that we have created all of our rows and columns we can see the complete dataframe by
simply calling it by name:
ELISA_IgG4
Using what we learned earlier about accessing rows, columns, and checking for matching values
we can apply the same commands to our 'ELISA_IgG4' dataframe. As we have seen before already,
we can use the '$' and '==' to reference a specic columns and check for any matches to the character
'yellow', respectively:
ELISA_IgG4$well_color=='yellow '
We can store this boolean string of 'TRUE/FALSE'in a vector titled 'yellow_occurrences' to
document which rows returned 'TRUE' for 'yellow' 'well_color' by assigning the code above to it
using the '<-'arrow:
yellow_occurences <−ELISA_IgG4$well_color=='yellow '
5
What would be even more useful than just a string of boolean values when 'yellow' is observed
is the contents of the neighboring rows such as the well letter and dilution number. This can be
achieved by using the boolean output of the R code above that we stored in 'yellow_occurences' as
the input for accessing rows and columns within the 'ELISA_IgG4' dataframe, following the same
format as before 'dataframe_name[row_number,column_number]'. Below, instead of giving row
numbers, we will call the rows that contain 'yellow' for the column 'well_color' and show all the
neighboring columns by not placing a value to the right of the comma for 'column_number'.
ELISA_IgG4 [ yellow_occurences , ]
The table produced, above,reveals all the information that we have gathered for the wells that
turned yellow which represent the presence of the immunoglobuling subclass IgG4. Furthermore,
this table may suggest which dilutions are ideal for future detection of this particular immunoglobulin.
Now, what can we learn about the clear wells? We could follow the same steps we took to isolate
the 'yellow' wells by creating a vector 'clear_occurences, or we can save ourselves time by using the
'!' symbol and placing it in front of 'yellow_occurences', essentially extracting every 'well_color'
that is not 'yellow', producing the following table:
ELISA_IgG4 [ ! yellow_occurences , ]
Again, we can extract information from this way of visualizing the dataframe.
Saving & Reading Your Dataframe
The next logical step is to save our dataframe. Using 'write.table' we can quickly create a le that
contains our dataframe but rst we will want to know where it will be saved and be able to change
the directory if necessary. Simply typing getwd() exactly as it appears will output the current
working directory that any les that you write will be saved to.If you need to change the directory
use setwd('le_path'), where 'le_path' is the precise location of where on your computer you wish
to save your le (containing your dataframe).Once we have set our working directory to where we
want to save our dataframe or any other le, the next step is to actually save the dataframe using
'write.table':
w r i t e . t a b l e (ELISA_IgG4 , ' ELISA_Results_Table . txt ' )
Notice that within the parentheses, we rst state the name of the dataframe (in this case
'ELISA_IgG4') followed by the le name that you wish to give it, in this case 'ELISA_Results_Table.txt'.
A quick way not only to verify that your le has been saved but also to reveal other documents in
your working directory is the list.les() function. If you are in the same working directory that you
saved your dataframe in and your dataframe successfully saved then you should see your le name
among the list of les. Once you have located your le containing your dataframeyou can open it or
any other dataframe-containing le that R recognizes using read.table('le_name'). Additionally,
you can read a .csv le as well using read.csv('csv_le_name').
Part IV
The 'Plot' Function
Equipped with our fundamental understanding of vectors and dataframes, we can begin to explore
the highly useful 'plot'function to visualize our data. The plot function is polymorphic, meaning
that plot(x) yields dierent results depending on whether x is a vector, a factor, or even a data
frame. Let's take a look at a vector rst. Imagine we are performing another ELISA but this
time we are measuring the concentration (in pg/ml) of a cytokine (IL-12) in proportion to the
color intensity as measured by optical density(OD). We will create two vectors: one named 'optical
density' and the other 'IL12 concentration' with the values indicated below.
6
o p t i c a l _ d e n s i t y <−c ( 0 . 0 5 , 0 . 0 9 , 0 . 2 , 0 . 4 5 , 0 . 6 5 , 0 . 9 0 , 1 . 1 )
IL12_concent<−c ( 0 . 8 , 1 , 3 , 6 , 9 , 1 1 , 1 3 )
We can independently plot a single vector using plot(x), where 'x' is a vector:
plot ( optical_density )
We could have just as easily produced a similar plot if we had plotted the vector 'IL12_concent'alone.
In either case, notice that the plot function will use the 'index' as the 'x' axis when only one vector
is entered. More commonly and perhaps more useful to us, however, will be to give both 'x' and
'y' values in the form of vectors. To plot these values we simply use plot(x,y) as follows:
p l o t ( IL12_concent , o p t i c a l _ d e n s i t y )
This is a very basic plot and we have more options available to us so we will look at some of
those options right now. In the plot above, we have points to represent our values but say we
wanted lines instead.By adding the argument type='l' we can change the representation to lines:
p l o t ( IL12_concent , o p t i c a l _ d e n s i t y , type =' l ' )
Likewise we can assign other one letter abbreviations to 'type' to change the way we represent
our values in the plot. Here are a few: 'p' for points, 'l' for lines,'h' for histogram-like vertical lines,
and 's' for steps. Take a moment to explore these dierent types of plot drawings. Changing the
color of the plot is just as easy as adding the argument "color='red'".
p l o t ( IL12_concent , o p t i c a l _ d e n s i t y , type =' l ' , c o l =' red ' )
The line appears a little too thin so let's increase the thickness a bit using the argument 'lwd=5'
to give us a thicker line:
p l o t ( IL12_concent , o p t i c a l _ d e n s i t y , type =' l ' , c o l =' red ' , lwd=5)
The arguments you make more use out of, however, are the 'xlab','ylab', and 'main' which allow
one to assign a label for the x axis, a label for the y axis, and create a main title, respectively.
This is in spite of the fact that by default, the plot function will automatically assign the vector
names you put into the plot function as labels on their respective axes, as you may have noticed
in the previous plots. Now, we will override the present default labels using the aforementioned
arguments:
p l o t ( IL12_concent , o p t i c a l _ d e n s i t y , type =' l ' , c o l =' red ' , lwd=5, xlab=
' Concentration o f IL − 12(pg/ml ) ' , ylab =' O p t i c a l Density (OD) ' , main='IL −12 Assay ' )
Another useful function that is related to 'plot' is the 'abline' function which will add one or
more straight lines through a given plot.If we wanted to add a best t line to the plot we have
above we can use 'abline' in the following way:
a b l i n e ( lm ( o p t i c a l _ d e n s i t y ~IL12_concent ))# a b l i n e adds l i n e to e x i s t i n g p l o t s
Within the abline function we used the function 'lm' which stands for linear model that carries
out linear regression for the values that we input, in our case the two vectors we created for the
ELISA Assay, seperated by the '~' symbol.
Part V
Getting 'Help'
As you can see, our command line can get lengthy with all these arguments and remembering what
the argument for color,thickness,etc. can become overwhelming so rather than memorize all the
arguments this is a good time to recall that 'help' is available. If you cannot remember the details
of a function such as what arguments you can use simply type '?' immediately followed by the
name of the function (e.g. '?plot') which will open the R documentation for the 'plot' function in
7
this case but this will work for any function if you can remember the function name. In the event
that you cannot remember the exact spelling of the function name, using the apropos() function
will produce a list of functions that are similar to what character you pass into the function. Typing
'plot' into apropos() for example will give the following results:
apropos ( ' plot ')#don ' t f o r g e t to put quotes around your c h a r a c t e r
Part VI
Plotting Dataframes
We had a chance to look at how we can plot and manipulate the plots of vectors, but as we
suggested earlier you can also plot databases. On this note, recall that the 'plot' function is
polymorphic, meaning it will behave dierently depending on what is passed into it. Here, we will
be passing a dataframe into the 'plot' function and, conveniently, the syntax is identical but it is
important to understand that the function treats the input dierently than it would a vector. For
this demonstration, we will be using a table adapted from a primary research article published by
researcher at Cornell University, studying the genetic diversity of MHC class I in six species of
frogs. As review, we will walk through the steps to load the table (saved as a le on a computer)
before plotting it.
setwd ( 'C: / Users /R t u t o r i a l ' )
l i s t . f i l e s ()# v e r i f y f i l e e x i s t s at l o c a t i o n by l i s t i n g a l l f i l e s
MHCI_frogs<−read . t a b l e ( ' MHCI_frogs . txt ')# s t o r e the t a b l e as a dataframe
MHCI_frogs#c a l l your t a b l e to v e r i f y s t o r a g e p l o t ( MHCI_frogs )
When we plot a dataframe we do not specify which columns should go on which axis so the way
the 'plot' function handles dataframes is to try out each viable possibility and plot each one. This
is dierent behavior and treatment of the data compared to plotting vectors and is what makes the
'plot' function as we have said before, polymorphic. Alternatively, we could call specic columns
from the dataframe and plot them as we would vectors. For example, let's plot the 'exon2' column
against 'No.unique sequences'.
p l o t ( MHCI_frogs$Exon2 , MHCI_frogs$No . unique_sequences , type ='h ' , pch =17,main=
' Exon2 vs No . unique_sequences ' , xlab ='Exon2 ' , ylab ='Unique Sequences ' , c o l =' blue ' )
We have most of the same arguments that we were able to add to our vector plots including:
adding a main title(main='titlename'),coloring the values that appear in the plot(col='colorname'),
and changing the characters used in the plots(pch=20) by varying the numeric value between 1-25.
Exercise II: IL-12 Production
Your lab is studying interleukin 12 (IL-12) production by peripheral blood mononuclear cells
(PBMC) following engagement with Streptococcus mutants. You are interested in determining
which receptor may be responsible for inducing the inux of IL-12, so you design an experiment
that will measure the concentration of IL-12 after subjecting three dierent possible receptors to
neutralizing antibodies. You anticipate that the receptor responsible for stimulation of IL-12 will
yield little to no IL-12 once inhibited.
The three possible receptors are: toll-like receptor-2(TLR-2),TLR-4, and cluster of dierentiation 14 (CD14). Your neutralizing antibodies will be IgG2a isotype to block TLR-2 and TLR-4
and IgG1 isotype to block CD14.
There will be 5 conditions that we will measure IL-12 concentration under. The rst condition
is the PBMC under normal conditions, the second being when the TLR-2 receptor is subjected
to neutralizing antibodies, the third when TLR-4 is in the presence of neutralizing antibodies,
8
the fourth when CD14 is subjected to neutralizing antibodies, and conditions ve and six will be
controls for the two IgG isotypes. You run your experiment and record your results into the vectors
below:
c o n d i t i o n s <−c ( 'PBMC' , 'TLR− 2 ' , 'TLR− 4 ' , 'CD14 ' , ' IgG2a i s o t y p e ' , ' IgG1 ' )
IL12production <−c ( 5 1 5 , 4 7 9 , 5 2 4 , 1 3 , 4 3 8 , 4 9 2 )
With our IL-12 concentration values, we can plot the single vector using the 'plot' function as
we have learned in this section. However, because we are only plotting a single vector, a plot with
bars equivalent to a bar graph is probably more appropriate. In fact, there is a 'barplot' function
that is similar to the 'plot' function that we will use here. We will use all the same arguments
that we have introduced already with one added one, 'names.arg'. The argument 'names.arg' will
allow us to assign our own labels under each bar that appears along the x axis but these labels,
themselves, will not take the place of the x axis label but are rather like an x axis sub label.Our
code will look something like this:
b a r p l o t ( IL12production , lwd=5, c o l =' red ' , xlab=c o n d i t i o n s , ylab ='IL − 12(pg/ml ) '
, main='IL −12 Production ' , names . arg=c o n d i t i o n s )
Plotting our data using the barplot() function visually makes it clear that IL-12 production is
signicantly lower when the CD14 receptor is inactivated by neutralizing antibodies. This suggest
that IL-12 in our particular experimental pathway is CD14-dependent.
Part VII
Multiple 'Plot' Windows
Not long after you begin exploring the 'plot' function, you will probably realize you may want to
compared multiple plots in a side-by-side fashion. Here, we will mention a few variations of the
function 'dev' for device that will allow you to manage multiple plot windows open simultaneously.
First, when we say device we are speaking more specically about the graphical device that generates
the plots we have seen thusfar. The default device is the 'null device'and is always assigned as device
1, but according to the R documentation for the 'dev' function, this is simply a "placeholder" so
any attempts to use it will result in a new device being opened. This will become more clear once
you become more familiar with the function.
The following steps are for initializing multiple windows. First, turn o the device: dev.o().
Next, open a new device: dev.new(). Plot your rst vector or dataframe. To plot another vector
or dataframe in a seperate window, use the dev.new() function again, followed by your second plot.
To reveal a list of all your device windows, type the command 'dev.list'. To set a graphical device
simply use the command 'dev.set(device_number)' where 'device_number' is the number of the
device window that you wish to set.
Part VIII
Statistical Applications
T-Test
R has built-in t.test() function like the many built-in functions we have seen already (e.g. plot
function) will allow you to enter data and pass additional arguments. The general format of the
t.test() function is: t.test(x,y,alternative='two.sided', mu=0, paired=FALSE, var.equal=FALSE,
conf.level=0.95). X is a vector that contains data itself or indexed from a dataframe. Y can be
another data set itself or from a dierent column of the same or dierent dataframe. Alternatively,
if running a one-sample t-test (which as the name implies is merely one sample dataset), we do not
9
enter another dataset into the function and it will automatically set 'y=NULL'. We use the 'mu'
argument to set a true value of the mean when dealing with a single sample t-test and dierence
in means when performing a two sample t-test. Based on what we set for 'mu', we set 'alternative'
as the alternative hypothesis, either 'two.sided'(default), 'greater' or 'less'. To illustrate this point,
if we were to perform the following t-test in R: t.test(x,alternative='less',mu=10). This command
will perform as single sample t-test on 'x' and assign the null hypothesis equal to 10 so that the
alternative is mu equal to anything less than 10.
Checkpoint VIII
1. Go ahead and practice varying the dierent arguments available in the t-test function that we
mentioned so far to clarify any confusion so far.
Exercise II
You are a quality control technician for a company that packages and ships iceberg lettuce. On
the look out for possible Listeria contamination, you take nine samples at random from a single
lot and test them using a method that produces values in units of MPN/g. The following are the
values you obtain: 0.593,0.142,0.329,0.691,0.231,0.793,0.519,0.392, 0.418.
1. Create a vector named 'listeria_results' that will contain these numerical values.
For your results to suggest Listeria contamination, the mean level of your readings must be
greater than 0.3 MPN/g (e.g. mu will be set to 0.3 as one of our arguments in the t.test function).
2. Apply the t.test() function to the data contained in the vector you just created in #1.Note:
you only have to add the arguments 'alternative' and 'mu'. Our alternative hypothesis is true
since we set it to values greater than 0.3 and in this example, the mean is 4.564 and this value is
signicant since p is greater than 0.05.
10