Tutorial 1: How to R and R commander

Computerlab: Global health
Katarina Selling, IMCH
Tutorial 1: How to download R and R commander
INSTALLING R: It is always best to start your search at the R website: www.r-project.org, look in
‘Download, Packages’ and choose the CRAN-mirror closest to your geographic location. If you are
using R in Sweden go to: http://ftp.sunet.se/pub/lang/CRAN/. Now, select what you want to install
(Mac, Windows, Linux) and then start installing R. It should not take more than a few minutes. When
the downloading is finished you can open R as you usually open programs on your computer; when
you open R the view below and the window called R console will automatically open (se below).
INSTALLING THE PACKAGE R COMMANDER (Rcmdr) 1: Go to the menu: Packages – Install
Packages (see above). A new window called CRAN-mirror will appear (see below). Select the
CRAN-mirror closest to your current location (here: Sweden), click OK.
1
If you have a Mac, it may get a bit more complicated.
http://wiki.math.yorku.ca/index.php/R:Installing_R_and_Rcmdr_on_a_MAC
1
Computerlab: Global health
Katarina Selling, IMCH
Now a new window opens, called Packages (see below). Here all >4,000 R packages are listed by their
short name in alphabetical order, find Rcmdr, select it and click OK. This is how you install R
commander (and packages in general) in R.
NOTE: Installation of a package only has to be made once on a computer as long as you don’t
uninstall it of course.
Tutorial 2: How to open R commander (Rcmdr)
When you open R, a window called R console will automatically open (see first view on page 1). In
the R console window, write: library(Rcmdr) and press Enter. The R commander opens in a
new window:
2
Computerlab: Global health
Katarina Selling, IMCH
From now on you will be working strictly in the R commander window and do not have to care about
the R console window. The R commander window, in turn, consists of drop-down menus, a Script
window (where all the syntax or code generated by the commands that you select in the menus’
appear – you do not have to think of that window for now), and the Output window (where all
results appear). Note that all graphics (and the Data editor) will appear in new, separate windows
though (you will see later). In the beginning of every new session, only write library(Rcmdr), as
explained above, and press Enter and the R commander window will open.
…but what if I can’t open R commander?
If you can’t open the R commander window you have either misspelled the first command:
library(Rcmdr)OR the package Rcmdr has not been (properly) installed on your computer. In
the latter case, follow the instructions on how to install R commander (Tutorial 1).
Tutorial 3: Getting your data into R and R commander
Throughout this session we are going to work with a data set called Leinhardt. This is real data from
105 countries over the world in 1970s (each country consists of one observation). The following
variables were measured:
Income (per-capita income in US dollars)
Infant (infant mortality rate per 1000 live births)
Region (Africa; Americas; Asia; Europe)
Oil (Oil exporting country yes/no)
The New York Times, 28 September 1975, p. E-3, Table 3.
1. The most common way; by opening already existing data. Go to the menu Data; The
Leinhardt data set is stored in an R file called Computerlab_Reinhardt. To retrieve it go to
Load data set and browse through the folders until you find the right file. Now, go to
Edit active data set (which opens the data editor) and check that the data seems
to have been loaded correctly and that you have the right variable type (numeric or
character) for the included variables. This should always be the first thing you do!
a. Importing data from other programs, for example STATA. Go to the menu Data,
Import data..., From STATA data set... , change the name of the
data set (otherwise it is simply called “Dataset”), click OK. Now, go to Edit
active data set and follow the instructions above to check the data (you can
import data from Excel, Text files etc. in the same manner as above, follow the
instructions).
3
Computerlab: Global health
Katarina Selling, IMCH
2. If you wish to enter the data by hand into R commander, do so by clicking on the menu
Data and selecting New data set. Rename this data set as you like and click OK. Now a
new window called Data editor appears. Start by renaming the variables so that they are
more informative (var1 etc. is not so informative…). Make sure that you define the variables
correctly as either character or numeric. NOTE: R uses . as a decimal separator not ,!
Therefore, if , is incorrectly used as a decimal separator in R, the program will automatically
read it as a character-variable. Remember this in future use; it will save you a lot of trouble!
If you wish to save your data in R format; do so by selecting the menu Data, Active data
set, Save active data set.
Tutorial 4: Basic descriptive statistics (tables and graphs)
1. The first thing that I recommend is to check for apparent errors in the data by performing a
summary display of the entire data set. Go to Statistics, Summaries, Active data
set. The results will now be displayed in the Output window. For numerical variables,
minimum, maximum values as well as median, means etc. are displayed; for character/factor
variables the frequencies are displayed. If you see errors you can, at any time, click on Edit
data set and check and/or correct individual values (but remember to close the Data
editor window afterwards so that the changes in the data set are updated).
Suggestions for factor variables (non-numerical variables)
2. Frequency tables (contingency tables).
a. One variable at a time: Go to Statistics, Summaries, and Frequency
distributions. Select Region as you study variable and click OK. Now the
frequencies (n and %) of the selected variable is displayed in the output window.
Which regions have the highest/lowest frequencies of observations (= individual
countries)?
b. Two variables at a time: Go to Statistics, Contingency tables, Two-way
table. Pick one row and one column variable; this time we select Region as the
row variable and Oil as the column variable. Also select Row percent (under
Statistics). Which region has the highest/lowest percentages of oil exporting
countries?
3. Bar graph and pie charts. Go to Graphs, Bar graph or Pie chart and select Region.
Interpret the output. If you have the time, do the same for the variable oil. Do you see that
the frequency tables and the graphs are really expressing the same thing (distribution of n or
%)?
4
Computerlab: Global health
Katarina Selling, IMCH
Suggestions for numerical variables
4. Numerical summaries (by group (factor variable)).
a. One numerical variable at a time: Go to Statistics, Summaries, Numerical
summaries, and select Infant and Income as you study variables (by holding
down Ctrl you can select more than one variable at a time) and click OK. What is
the mean and standard deviation (SD) for the two variables, do they have missing
values?
b. One numerical variable by group: Go to Statistics, Summaries, Numerical
summaries, and select Infant and Income as your study variables (like above),
buy now also select Summarize by group and select Oil, and click OK. Do oil
producing countries have higher or lower income per-capita and infant mortality,
respectively, as compared to countries not producing oil? If you have the time, do
the same for the variable Region.
5. Histograms, Boxplots (by groups), Scatterplots.
a. One numerical variable at a time:
i. Go to Graphs, Histogram, and select Income as you study variable and
click OK. How is the income per capita distributed?
ii. Go to Graphs, Boxplot, and select Infant as you study variable and
click OK. What do you see? Are there any countries with extreme infant
mortality (this is called outliers in statistics)? If you do not know what a
boxplot is displaying, the look it up at Wikipedia, for example.
If you have the time, do a histogram of the variable Infant and a boxplot
of Income (reverse the above).
b. One numerical variable by group: Go to Graphs, Boxplot, and select Infant as
you study variable like above, now also select Plot by group and select
Region, and click OK. In which region would you say that the infant mortality was
the highest and lowest, respectively? If you have the time, do the same for the
variable Oil.
c. Two numerical variables (relationship): Go to Graphs, Scatterplot, and select
Income as X-variable and Infant as Y-variable. De-select Smooth
line (one of the Options). Would you say that there is a relationship between
these variables? Try to interpret it. Do you see any outliers?
Now, IF you would like to save the data set that you have been working on you click Data, Active
data set, Save active data set. You can also save the script window or the output
window via File. Also, if you would like to save a graph (or import it into Microsoft word), right click
on it and choose for example Save as metafile. However, the above is not necessary for
teaching purposes since we will not use this data set again.
When you are finished, go to File, Exit, From Commander and R to exit the program. You do
not have to save output, scripts etc.
5