R Basics v8.pptx

R Basics / Course Business
! 
We’ll be using a sample dataset in class today:
! 
! 
! 
CourseWeb: Course Documents " Sample Data
" Week 2
Can download to your computer before class
CourseWeb survey on research/stats
background still available. Thanks to everyone
who’s responded!
R Basics
R Basics
R Basics
! 
! 
! 
! 
! 
! 
! 
! 
! 
! 
R commands & functions
Reading in data
Saving R scripts
Descriptive statistics
Subsetting data
Assigning new values
Referring to specific cells
Types & type conversion
NA values
Getting help
R Commands
! 
Simplest way to interact with R is by
typing in commands at the > prompt:
R STUDIO
R
R as a Calculator
! 
Typing in a simple calculation shows us
the result:
! 
! 
! 
608 + 28
What’s 11527 minus 283?
Some more examples:
! 
! 
! 
400 / 65
2 * 4
5 ^ 2
(division)
(multiplication)
(exponentiation)
Functions
! 
More complex calculations can be done
with functions:
sqrt(64)
! 
What the function
is (squareroot)
Can often read these
left to right (“square root of 64”)
! 
! 
In parenthesis: What
we want to perform the
function on
What do you think
this means?
! 
abs(-7)
Arguments
! 
! 
Some functions have settings
(“arguments”) that we can adjust:
round(3.14)
- 
! 
Rounds off to the nearest integer (zero
decimal places)
round(3.14, digits=1)
- 
One decimal place
Nested Functions
Nested Functions
! 
We can use multiple functions in a row,
one inside another
- 
- 
sqrt(abs(-16))
“Square root of the absolute value of -16”
Don't get scared when you see multiple
parentheses!
-  Can often just read left to right
-  R first figures out the thing nested in
the middle
•  Can you round off the square root of 7?
! 
Using Multiple Numbers at Once
! 
When we want to use multiple numbers,
we concatenate them
c(2,6,16)
! 
A list of the numbers 2, 6, and 16
Sometimes a computation requires
multiple numbers
- 
! 
- 
! 
mean(c(2,6,16))
Also a quick way to do the same thing to
multiple different numbers:
- 
sqrt(c(16,100,144))
R Basics
! 
! 
! 
! 
! 
! 
! 
! 
! 
! 
R commands & functions
Reading in data
Saving R scripts
Descriptive statistics
Subsetting data
Assigning new values
Referring to specific cells
Types & type conversion
NA values
Getting help
Course Documents: Sample Data: Week 2
! 
! 
Reading plausible versus implausible
sentences
“Scott chopped the carrots with a knife.”
Measure
reading time
on final
word
“Scott chopped the carrots with a spoon.”
Note: Simulated data; not a real experiment.
Course Documents: Sample Data: Week 2
! 
! 
! 
! 
! 
Reading plausible versus implausible
sentences
Reading time on critical word
36 subjects
Each subject sees 30 items (sentences)
—half plausible, half implausible
Interested in changes over time, so we’ll
track serial position (trial 1 vs trial 2 vs
trial 3…)
Reading in Data
! 
Make sure you have the dataset at this
point if you want to follow along:
Course Documents "
Sample Data "
Week 2
Reading in Data – RStudio
! 
Navigate to the
folder in lower-right
More ->
Set as Working Directory
! 
! 
Open a “comma-separated value” file:
- 
experiment <-read.csv('week2.csv')
Name of the “dataframe”
we’re creating (whatever
we want to call this dataset)
read.csv is the
function name
File name
Reading in Data – Regular R
! 
Read in a “comma-separated value” file:
- 
experiment <- read.csv('/Users/
scottfraundorf/Desktop/week2.csv')
Name of the “dataframe”
we’re creating (whatever
we want to call this dataset)
Folder & file name
read.csv is the
function name
•  Drag & drop the file into R to get the
full folder & filename
Looking at the Data: Summary
! 
A “big picture” of the dataset:
summary(experiment)
! 
! 
summary() is a very important function!
! 
! 
Basic info & descriptive statistics
Check to make sure the data are correct
Looking at the Data: Summary
! 
A “big picture” of the dataset:
summary(experiment)
! 
! 
We can use $ to refer to a specific
column/variable in our dataset:
! 
summary(experiment$ItemName)
Looking at the Data: Raw Data
! 
Let’s look at the data!
experiment
! 
! 
Ack! That’s too much! How about just a
few rows?
! 
! 
head(experiment)
head(experiment, n=10)
Reading in Data: Other Formats
! 
Excel:
- 
- 
! 
library(gdata)
experiment <- read.xls('/Users/
scottfraundorf/Desktop/week2.xls')
SPSS:
- 
- 
library(foreign)
experiment <- read.spss('/Users/
scottfraundorf/Desktop/
week2.spss', to.data.frame=TRUE)
R Basics
! 
! 
! 
! 
! 
! 
! 
! 
! 
! 
R commands & functions
Reading in data
Saving R scripts
Descriptive statistics
Subsetting data
Assigning new values
Referring to specific cells
Types & type conversion
NA values
Getting help
R Scripts
! 
Save & reuse commands with a script
R
File -> New Document
R STUDIO
R Scripts
! 
! 
Run commands without typing them all
again
R Studio:
! 
! 
! 
R:
- 
- 
! 
Code -> Run Region -> Run All: Run entire script
Code -> Run Line(s): Run just what you’ve
highlighted/selected
Highlight the section of script you want to run
Edit -> Execute
Keyboard shortcut for this:
- 
Ctrl+Enter (PC), ⌘+Enter (Mac)
R Scripts
! 
Saves times when re-running analyses
! 
Other advantages?
Some:
-  Documentation for yourself
-  Documentation for others
-  Reuse with new analyses/experiments
-  Quicker to run—can automatically
perform one analysis after another
! 
R Scripts—Comments
Add # before a line to make it a
comment
-  Not commands to R, just notes to self
(or other readers)
! 
Can also add a # to make the rest of a
line a comment
• 
• 
summary(experiment$Subject) #awesome
R Basics
! 
! 
! 
! 
! 
! 
! 
! 
! 
! 
R commands & functions
Reading in data
Saving R scripts
Descriptive statistics
Subsetting data
Assigning new values
Referring to specific cells
Types & type conversion
NA values
Getting help
Descriptive Statistics
! 
Remember how we referred to a
particular variable in a dataframe?
- 
! 
Combine that with functions:
- 
- 
- 
! 
$
mean(experiment$RT)
median(experiment$RT)
sd(experiment$RT)
Or, for a categorical variable:
- 
- 
levels(experiment$ItemName)
summary(experiment$Subject)
Descriptive Statistics
! 
We often want to look at a dependent variable
as a function of some independent variable(s)
tapply(experiment$RT, experiment
$Condition, mean)
- 
“Split up the RTs by Condition, then get the mean”
- 
! 
! 
! 
Try getting the mean RT for each item
How about the median RT for each subject?
To combine multiple results into one table,
“column bind” them with cbind():
! 
cbind(
tapply(experiment$RT, experiment$Condition, mean), tapply(experiment$RT, experiment$Condition, sd)
)
Descriptive Statistics
! 
Can have 2-way tables...
- 
tapply(experiment$RT,
list(experiment$Subject,
experiment$Condition), mean)
1st variable is rows, 2nd is columns
...or more!
- 
! 
- 
tapply(experiment$RT,
list(experiment$ItemName,
experiment$Condition, experiment
$TestingRoom), mean)
Descriptive Statistics
! 
Contingency tables for categorical
variables:
- 
xtabs (~ Subject + Condition,
data=experiment)
R Basics
! 
! 
! 
! 
! 
! 
! 
! 
! 
! 
R commands & functions
Reading in data
Saving R scripts
Descriptive statistics
Subsetting data
Assigning new values
Referring to specific cells
Types & type conversion
NA values
Getting help
Subsetting Data
! 
! 
Often, we want to examine or use just
part of a dataframe
Remember how we read our dataframe?
- 
! 
experiment <- read.csv(...)
Create a new dataframe that's just a
subset of experiment:
- 
experiment.LongRTsRemoved <subset(experiment, RT < 2000)
New dataframe name
Original dataframe
Inclusion criterion: RT
less than 2000 ms
Subsetting Data: Logical Operators
! 
Try getting just the observations with
RTs 200 ms or more:
experiment.ShortRTsRemoved <subset(experiment, RT >= 200)
- 
! 
Why not just delete the bad RTs from the
spreadsheet?
! 
! 
! 
! 
Easy to make a mistake / miss some of them
Faster to have the computer do it
We’d lose the original data
No documentation of how we subsetted the data
Subsetting Data: AND and OR
What if we wanted only RTs between 200
and 2000 ms?
-  Could do two steps:
! 
experiment.Temp <subset(experiment, RT >= 200)
-  experiment.BadRTsRemoved <subset(experiment.Temp, RT <= 2000)
!  One step with & for AND:
-  experiment2 <- subset(experiment,
RT >= 200 & RT <= 2000)
- 
Subsetting Data: AND and OR
! 
! 
What if we wanted only RTs between 200
and 2000 ms?
One step with & for AND:
- 
! 
experiment2 <- subset(experiment,
RT >= 200 & RT <= 2000)
| means OR:
- 
- 
experiment.BadRTs <subset(experiment, RT < 200 | RT >
2000)
Logical OR (“either or both”)
Subsetting Data: == and !=
! 
Get a match / equals:
- 
! 
Words/categorical variables need quotes:
- 
! 
experiment.FirstTrials <subset(experiment, SerialPosition == 1)
Note DOUBLE equals sign
experiment.ImplausibleSentences <subset(experiment,
Condition=='Implausible')
!= means “not equal to”:
- 
experiment.BadSubjectRemoved <subset(experiment, Subject !=
Drops subject “S23”
'S23')
Subsetting Data: %in%
! 
! 
! 
Sometimes our inclusion criteria aren't so
mathematical
Suppose I just want the “Ducks”,
“Hornet”, and “Panther” items
We can check against any arbitrary list:
- 
! 
experiment.SpecialItems <subset(experiment, ItemName %in%
c('Ducks', 'Hornet', 'Panther'))
Or, keep just things that aren't in a list:
- 
experiment.NonNativeSpeakersRemoved
<- subset(experiment, Subject %in%
c('S10', 'S23') == FALSE)
Logical Operators Review
! 
Summary
- 
- 
- 
- 
- 
- 
- 
- 
- 
>
>=
<
<=
&
|
==
!=
%in%
Greater than
Greater than or equal to
Less than
Less than or equal to
AND
OR
Equal to
Not equal to
Is this included in a list?
R Basics
! 
! 
! 
! 
! 
! 
! 
! 
! 
! 
R commands & functions
Reading in data
Saving R scripts
Descriptive statistics
Subsetting data
Assigning new values
Referring to specific cells
Types & type conversion
NA values
Getting help
Assignment
Remember the pointing arrow used to
create dataframes and subsets?
-  e.g., experiment <- read.csv(...) !  This is the assignment operator. It
saves results or values in a variable
! 
- 
- 
x <- sqrt(64)
CriticalTrialsPerSubject <- 30
Remember, typing a name by itself shows
you the current value:
- 
- 
! 
CriticalTrialsPerSubject
Assigning a new value overwrites the old
Assignment
! 
We can use this to create new columns
in our dataframe:
- 
- 
! 
experiment$ExperimentNumber <- 1
Here, the same number (1) is assigned to
every trial
Or, compute a value for each row:
- 
- 
- 
experiment$LogSerialPosition <log(experiment$SerialPosition)
For each trial, takes the log serial position and
saves that into LogSerialPosition
Similar to an Excel formula
ifelse()
IF YOU WANT
DESSERT, EAT
YOUR PEAS
… OR ELSE!
ifelse()
! 
! 
! 
! 
ifelse(): Use a test to decide which of two
values to assign:
Function name
experiment$Half <- ifelse(
experiment$SerialPosition <= 15,
If serial position IS <= 15…
“Half” is 1
1,
If it’s NOT, “Half” is 2
2)
Possible to nest ifelse() if we need more
than 2 categories:
experiment$Third <ifelse(experiment$SerialPosition <=
10, 1, ifelse(experiment
$SerialPosition <= 20, 2, 3))
ifelse()
! 
! 
! 
Instead of specific numbers, can use
other columns or a formula:
experiment$RT.Fixed <- ifelse(
experiment$TestingRoom==2,
experiment$RT + 100,
experiment$RT)
How can we check if this worked?
! 
head(experiment)
Fixed RTs are
indeed 100
ms longer for
TestingRoom
2
Which do you like better?
- 
experiment$Half <-ifelse(experiment
$SerialPosition <= 15, 1, 2)
- 
! 
Shorter & faster to write
vs:
-  CriticalTrialsPerSubject <- 30
- 
- 
- 
- 
experiment$Half <- ifelse(experiment
$SerialPosition <=
(CriticalTrialsPerSubject / 2) , 1, 2)
Explains where the 15 comes from—helpful if we come back
to this script later
We can also refer to CriticalTrialsPerSubject
variable later in the script & this ensure it’s consistent
Easy to update if we change the number of critical trials
R Basics
! 
! 
! 
! 
! 
! 
! 
! 
! 
! 
R commands & functions
Reading in data
Saving R scripts
Descriptive statistics
Subsetting data
Assigning new values
Referring to specific cells
Types & type conversion
NA values
Getting help
Referring to Specific Cells
! 
So far, we’ve seen how to
Create a new dataframe that’s a subset of an
existing dataframe
Modify a dataframe by creating an entire column
- 
- 
! 
What if we want to modify a dataframe by
adjusting some existing values?
! 
! 
! 
e.g., replace all RTs above 2000 ms with the
number 2000 (“fencing”)
Creating a new subset won’t work because we
want to change the original dataframe
Need a way to edit specific values
Referring to Specific Cells
! 
Use square brackets [ ] to refer to specific
entries in a dataframe:
Row, column
experiment[3,7]
- 
- 
! 
Omit the row or column number to mean all
rows or all columns:
experiment[3,] Row 3, all columns
-  experiment[,4] All rows in column 4
!  Can also use column names:
- 
! 
! 
experiment[,'RT'] All rows in the RT column
Remember c()? We can check multiple rows:
- 
experiment[c(1:4),]
Logical Indexing
! 
We can look at rows or columns that meet
a specific criterion...
- 
! 
experiment[experiment$RT < 200,]
Can use this as another way to subset:
experiment.ShortRTsRemoved <experiment[experiment$RT > 200, ]
-  Actually, subset() just does this
- 
! 
But we can also set values this way
- 
- 
experiment[experiment$RT < 200,
'RT'] <- 200
In the dataframe experiment, find the rows
where RT < 200, and set the column RT to 200
R Basics
! 
! 
! 
! 
! 
! 
! 
! 
! 
! 
R commands & functions
Reading in data
Saving R scripts
Descriptive statistics
Subsetting data
Assigning new values
Referring to specific cells
Types & type conversion
NA values
Getting help
Types
! 
R treats continuous & categorical variables
differently:
These are different data types:
-  Numeric
-  Factor: Variable w/ fixed set of
categories (e.g., treatment vs. placebo)
-  Character: Freely entered text (e.g.,
open response question)
! 
Types
R's heuristic when reading in data:
-  Letters anywhere in the column →
factor
-  No letters, purely numbers → numeric
! 
Type Conversion: Numeric → Factor
Sometimes we need to correct this
-  Room 4 is not “twice as much” Room 2
! 
! 
Create a new column that's the factor
(categorical) version of TestingRoom:
- 
! 
experiment$Room.Factor <as.factor(experiment$TestingRoom)
Or, just overwrite the old column:
- 
experiment$TestingRoom <as.factor(experiment$TestingRoom)
Conversion: Character → Factor
When ifelse() results in words, R creates
a character variable rather than a factor
-  Need to convert it
!  Wrong:
! 
- 
! 
experiment$FaveRoom <ifelse(experiment$TestingRoom==3, 'My
favorite room', 'Not favorite')
Right:
- 
experiment$FaveRoom <as.factor(ifelse(experiment
$TestingRoom== 3, 'My favorite room',
'Not favorite'))
Type Conversion: Factor → Numeric
! 
To change a factor to a number, need to
turn it into a character first:
- 
experiment$Age.Numeric <as.numeric(as.character(experiment$
Age))
R Basics
! 
! 
! 
! 
! 
! 
! 
! 
! 
! 
R commands & functions
Reading in data
Saving R scripts
Descriptive statistics
Subsetting data
Assigning new values
Referring to specific cells
Types & type conversion
NA values
Getting help
NA
! 
! 
We might have run into some problems
trying to change Age into a numerical
variable...
NA means “not available”...
- 
- 
- 
Characters that don't convert to numbers
Missing data in a spreadsheet
Invalid computations
NA
! 
If we try to do computations on a set of
numbers where any of them is NA, we get
NA as a result...
- 
! 
sd(experiment$Age.Numeric)
R wants you to think about how you want
to treat these missing values
NA – Solutions
! 
To ignore the NAs when doing a specific
computation, use na.rm=TRUE:
- 
! 
To get a copy of the dataframe that excludes
all rows with an NA (in any column):
- 
! 
mean(experiment$Age.Numeric,na.rm=TRUE)
experiment.NoNAs <- na.omit(experiment)
Change NAs to something else with logical
indexing:
- 
experiment[is.na(experiment$
Age.Numeric)==TRUE, ]$Age.Numeric <- 23
R Basics
! 
! 
! 
! 
! 
! 
! 
! 
! 
! 
R commands & functions
Reading in data
Saving R scripts
Descriptive statistics
Subsetting data
Assigning new values
Referring to specific cells
Types & type conversion
NA values
Getting help
Getting Help
! 
Get help on a specific known function:
- 
- 
! 
?sqrt
?write.csv
Try to find a function on a particular topic:
- 
??logarithm
External resources:
- 
- 
ling-R-lang-L < https://mailman.ucsd.edu/mailman/listinfo/ling-r-lang-l >
- 
Mailing list for using R in language research
Wrap-Up
! 
Can use R for:
Reading in data
Descriptive statistics
Subsetting data
Creating new variables
- 
- 
- 
- 
! 
Additional practice:
Baayen (2008) p. 20
- 
- 
• 
Use commands on pp. 1-2 to get the data
Next week: Mixed effects models!
• 
Baayen, Davidson, & Bates (2008): Introduced mixed
effects models to cog psych. Covers fixed and random
effects