STAT-100 Importing Data into R November, 2012 1 Introduction By far, the most common and versatile way to import data into R (and in most serious statistical software) is the CSV file format. It is the simpest format possible: say you wanted to save a n × k table of numbers, then a CSV file will have n rows, each row having k values separated by a comma “,“. The format of a CSV file is thus: column_1,column_2,...,column_k x11,x12,x13,..,x1k ... xn1,xn2,...,xnk The first line is optional and it is usually called header (or heading). The header contains names for your columns, i.e. your variables. The rest of the file contains only the data (numbers or alpha-numeric strings). One example of a CSV file is: state,police,corrections Alabama,159,103 Alaska,412,273 Arizona,231,206 There are generally two ways to obtain such a file: (1) download it from the web (see Section 2) and (2) convert a file (e.g. Excel) into CSV (see Section 3) (Note: This tutorial uses RStudio (http://www.rstudio.com/ ide/) as the default IDE to R programming) 1 2 Importing file types 2.1 Comma-separated files (CSV) Once you have the CSV file, there are two methods to import it into RStudio. Both use the “Import Dataset“ button in R (see Figure 1) Figure 1: “Import Dataset“ button in Rstudio (a) From a text file: 1. Click Import Dataset 2. Click From Text File 3. Navigate to your CSV file and select it (b) From a Web url: 1. Copy the link of the CSV file 2. Click Import Dataset 3. Click From Web URL 4. Paste the web link in the provided box and press OK 2.2 Converting spreadsheet (Excel) files to CSV If you have to use proprietary data formats like Excel, then the best way to import data is to transform the spreadsheet into a CSV file. Assume you have a spreadsheet of your data spawn into multiple columns: 1. Click File-> Export (or Download As in other apps) 2. In the dialog box, make sure the extension of the file is “.csv“ 3. Type a name of the file and click Save There are a few more things to be careful: 2 1. Make sure your CSV conforms to the proper format, i.e. there is at most one row as a “header“ (see Introduction) and that there are no extra lines before or after your numbers. For example, in an Excel file there might be a “Total“ row at the end of the spreadsheet. Do not include these rows. 2. If your CSV file has a header you will probably need to check the “Yes“ radio button in the “Heading“ option right after your import your data (see Figure 3) After you are done, you can use 2.1.(a) to import the CSV file into R. Note, that conversion to a “.csv“ file should be an option in every spreadsheet application you use. For example, if you are using “Google Docs“ then the option is in File-> Download As-> Comma-separated files (.csv) 2.3 Online RStudio If you are using the online version of RStudio (http://www.beta.rstudio. org), there are again two ways: first you can use a link through the Import Dataset button (see Section 1). The other way is to upload the file through the Files tab in the bottom right pane (see Figure 2) and then use Import Dataset->From a Text File option, see 2.1.(a) Figure 2: Upload file in online RStudio 3 3 After importing After following any of the above methods you will be given a dialog like the one shown in Figure 3. There you can review the data to be imported in, the name of the dataset and the available variables (columns) in the dataset. There are two important options that you have: (1) set/change the name of the dataset and (2) select if the dataset has a header or not. After clicking Import, a new variable is created that contains the data as an R data.frame. Figure 3: Import dataset dialog In the example of Figure 3, the dataset link is (http://isites.harvard. edu/fs/docs/icb.topic1148787.files/criminal_justice.csv). After importing, we can see that the dataset name is criminal justice. To get an idea of this dataset, type in R: head(criminal justice) This will output: state police judicial corrections violent_crime 1 Alabama 159 71 103 486 2 Alaska 412 204 273 567 3 Arizona 231 120 206 532 4 Arkansas 149 72 137 445 5 California 290 201 257 622 6 Colorado 238 86 190 3346 4 4 Extras Say you have a CSV a file and for some reason you need to create a link to it (e.g. easy share with collaborators). One quick way to do this is to use services like Dropbox (www.dropbox.com). To share with Dropbox: 1. Save the file in the Public folder of your Dropbox 2. Navigate to the file 3. Right-click-> Dropbox -> Copy public link 4. Paste the link and save/share Caution: Safety first! Make sure the data you save in the Public folder are not sensitive. 5
© Copyright 2025 Paperzz