STAT-100 Importing Data into R 1 Introduction

STAT-100
Importing Data into R
November, 2012
1
Introduction
By far, the most common and versatile way to import data into R (and in
most serious statistical software) is the CSV file format. It is the simpest
format possible: say you wanted to save a n × k table of numbers, then a
CSV file will have n rows, each row having k values separated by a comma
“,“.
The format of a CSV file is thus:
column_1,column_2,...,column_k
x11,x12,x13,..,x1k
...
xn1,xn2,...,xnk
The first line is optional and it is usually called header (or heading).
The header contains names for your columns, i.e. your variables. The rest
of the file contains only the data (numbers or alpha-numeric strings). One
example of a CSV file is:
state,police,corrections
Alabama,159,103
Alaska,412,273
Arizona,231,206
There are generally two ways to obtain such a file: (1) download it from
the web (see Section 2) and (2) convert a file (e.g. Excel) into CSV (see
Section 3) (Note: This tutorial uses RStudio (http://www.rstudio.com/
ide/) as the default IDE to R programming)
1
2
Importing file types
2.1
Comma-separated files (CSV)
Once you have the CSV file, there are two methods to import it into RStudio.
Both use the “Import Dataset“ button in R (see Figure 1)
Figure 1: “Import Dataset“ button in Rstudio
(a) From a text file:
1. Click Import Dataset
2. Click From Text File
3. Navigate to your CSV file and select it
(b) From a Web url:
1. Copy the link of the CSV file
2. Click Import Dataset
3. Click From Web URL
4. Paste the web link in the provided box and press OK
2.2
Converting spreadsheet (Excel) files to CSV
If you have to use proprietary data formats like Excel, then the best way to
import data is to transform the spreadsheet into a CSV file. Assume you
have a spreadsheet of your data spawn into multiple columns:
1. Click File-> Export (or Download As in other apps)
2. In the dialog box, make sure the extension of the file is “.csv“
3. Type a name of the file and click Save
There are a few more things to be careful:
2
1. Make sure your CSV conforms to the proper format, i.e. there is at
most one row as a “header“ (see Introduction) and that there are no
extra lines before or after your numbers. For example, in an Excel file
there might be a “Total“ row at the end of the spreadsheet. Do not
include these rows.
2. If your CSV file has a header you will probably need to check the
“Yes“ radio button in the “Heading“ option right after your import
your data (see Figure 3)
After you are done, you can use 2.1.(a) to import the CSV file into R.
Note, that conversion to a “.csv“ file should be an option in every spreadsheet
application you use. For example, if you are using “Google Docs“ then the
option is in File-> Download As-> Comma-separated files (.csv)
2.3
Online RStudio
If you are using the online version of RStudio (http://www.beta.rstudio.
org), there are again two ways: first you can use a link through the Import
Dataset button (see Section 1). The other way is to upload the file through
the Files tab in the bottom right pane (see Figure 2) and then use Import
Dataset->From a Text File option, see 2.1.(a)
Figure 2: Upload file in online RStudio
3
3
After importing
After following any of the above methods you will be given a dialog like the
one shown in Figure 3. There you can review the data to be imported in,
the name of the dataset and the available variables (columns) in the dataset.
There are two important options that you have: (1) set/change the name of
the dataset and (2) select if the dataset has a header or not.
After clicking Import, a new variable is created that contains the data as
an R data.frame.
Figure 3: Import dataset dialog
In the example of Figure 3, the dataset link is (http://isites.harvard.
edu/fs/docs/icb.topic1148787.files/criminal_justice.csv). After
importing, we can see that the dataset name is criminal justice. To
get an idea of this dataset, type in R:
head(criminal justice)
This will output:
state police judicial corrections violent_crime
1
Alabama
159
71
103
486
2
Alaska
412
204
273
567
3
Arizona
231
120
206
532
4
Arkansas
149
72
137
445
5 California
290
201
257
622
6
Colorado
238
86
190
3346
4
4
Extras
Say you have a CSV a file and for some reason you need to create a link to
it (e.g. easy share with collaborators). One quick way to do this is to use
services like Dropbox (www.dropbox.com). To share with Dropbox:
1. Save the file in the Public folder of your Dropbox
2. Navigate to the file
3. Right-click-> Dropbox -> Copy public link
4. Paste the link and save/share
Caution: Safety first! Make sure the data you save in the Public folder
are not sensitive.
5