Computerlab: Global health Katarina Selling, IMCH Tutorial 1: How to download R and R commander INSTALLING R: It is always best to start your search at the R website: www.r-project.org, look in ‘Download, Packages’ and choose the CRAN-mirror closest to your geographic location. If you are using R in Sweden go to: http://ftp.sunet.se/pub/lang/CRAN/. Now, select what you want to install (Mac, Windows, Linux) and then start installing R. It should not take more than a few minutes. When the downloading is finished you can open R as you usually open programs on your computer; when you open R the view below and the window called R console will automatically open (se below). INSTALLING THE PACKAGE R COMMANDER (Rcmdr) 1: Go to the menu: Packages – Install Packages (see above). A new window called CRAN-mirror will appear (see below). Select the CRAN-mirror closest to your current location (here: Sweden), click OK. 1 If you have a Mac, it may get a bit more complicated. http://wiki.math.yorku.ca/index.php/R:Installing_R_and_Rcmdr_on_a_MAC 1 Computerlab: Global health Katarina Selling, IMCH Now a new window opens, called Packages (see below). Here all >4,000 R packages are listed by their short name in alphabetical order, find Rcmdr, select it and click OK. This is how you install R commander (and packages in general) in R. NOTE: Installation of a package only has to be made once on a computer as long as you don’t uninstall it of course. Tutorial 2: How to open R commander (Rcmdr) When you open R, a window called R console will automatically open (see first view on page 1). In the R console window, write: library(Rcmdr) and press Enter. The R commander opens in a new window: 2 Computerlab: Global health Katarina Selling, IMCH From now on you will be working strictly in the R commander window and do not have to care about the R console window. The R commander window, in turn, consists of drop-down menus, a Script window (where all the syntax or code generated by the commands that you select in the menus’ appear – you do not have to think of that window for now), and the Output window (where all results appear). Note that all graphics (and the Data editor) will appear in new, separate windows though (you will see later). In the beginning of every new session, only write library(Rcmdr), as explained above, and press Enter and the R commander window will open. …but what if I can’t open R commander? If you can’t open the R commander window you have either misspelled the first command: library(Rcmdr)OR the package Rcmdr has not been (properly) installed on your computer. In the latter case, follow the instructions on how to install R commander (Tutorial 1). Tutorial 3: Getting your data into R and R commander Throughout this session we are going to work with a data set called Leinhardt. This is real data from 105 countries over the world in 1970s (each country consists of one observation). The following variables were measured: Income (per-capita income in US dollars) Infant (infant mortality rate per 1000 live births) Region (Africa; Americas; Asia; Europe) Oil (Oil exporting country yes/no) The New York Times, 28 September 1975, p. E-3, Table 3. 1. The most common way; by opening already existing data. Go to the menu Data; The Leinhardt data set is stored in an R file called Computerlab_Reinhardt. To retrieve it go to Load data set and browse through the folders until you find the right file. Now, go to Edit active data set (which opens the data editor) and check that the data seems to have been loaded correctly and that you have the right variable type (numeric or character) for the included variables. This should always be the first thing you do! a. Importing data from other programs, for example STATA. Go to the menu Data, Import data..., From STATA data set... , change the name of the data set (otherwise it is simply called “Dataset”), click OK. Now, go to Edit active data set and follow the instructions above to check the data (you can import data from Excel, Text files etc. in the same manner as above, follow the instructions). 3 Computerlab: Global health Katarina Selling, IMCH 2. If you wish to enter the data by hand into R commander, do so by clicking on the menu Data and selecting New data set. Rename this data set as you like and click OK. Now a new window called Data editor appears. Start by renaming the variables so that they are more informative (var1 etc. is not so informative…). Make sure that you define the variables correctly as either character or numeric. NOTE: R uses . as a decimal separator not ,! Therefore, if , is incorrectly used as a decimal separator in R, the program will automatically read it as a character-variable. Remember this in future use; it will save you a lot of trouble! If you wish to save your data in R format; do so by selecting the menu Data, Active data set, Save active data set. Tutorial 4: Basic descriptive statistics (tables and graphs) 1. The first thing that I recommend is to check for apparent errors in the data by performing a summary display of the entire data set. Go to Statistics, Summaries, Active data set. The results will now be displayed in the Output window. For numerical variables, minimum, maximum values as well as median, means etc. are displayed; for character/factor variables the frequencies are displayed. If you see errors you can, at any time, click on Edit data set and check and/or correct individual values (but remember to close the Data editor window afterwards so that the changes in the data set are updated). Suggestions for factor variables (non-numerical variables) 2. Frequency tables (contingency tables). a. One variable at a time: Go to Statistics, Summaries, and Frequency distributions. Select Region as you study variable and click OK. Now the frequencies (n and %) of the selected variable is displayed in the output window. Which regions have the highest/lowest frequencies of observations (= individual countries)? b. Two variables at a time: Go to Statistics, Contingency tables, Two-way table. Pick one row and one column variable; this time we select Region as the row variable and Oil as the column variable. Also select Row percent (under Statistics). Which region has the highest/lowest percentages of oil exporting countries? 3. Bar graph and pie charts. Go to Graphs, Bar graph or Pie chart and select Region. Interpret the output. If you have the time, do the same for the variable oil. Do you see that the frequency tables and the graphs are really expressing the same thing (distribution of n or %)? 4 Computerlab: Global health Katarina Selling, IMCH Suggestions for numerical variables 4. Numerical summaries (by group (factor variable)). a. One numerical variable at a time: Go to Statistics, Summaries, Numerical summaries, and select Infant and Income as you study variables (by holding down Ctrl you can select more than one variable at a time) and click OK. What is the mean and standard deviation (SD) for the two variables, do they have missing values? b. One numerical variable by group: Go to Statistics, Summaries, Numerical summaries, and select Infant and Income as your study variables (like above), buy now also select Summarize by group and select Oil, and click OK. Do oil producing countries have higher or lower income per-capita and infant mortality, respectively, as compared to countries not producing oil? If you have the time, do the same for the variable Region. 5. Histograms, Boxplots (by groups), Scatterplots. a. One numerical variable at a time: i. Go to Graphs, Histogram, and select Income as you study variable and click OK. How is the income per capita distributed? ii. Go to Graphs, Boxplot, and select Infant as you study variable and click OK. What do you see? Are there any countries with extreme infant mortality (this is called outliers in statistics)? If you do not know what a boxplot is displaying, the look it up at Wikipedia, for example. If you have the time, do a histogram of the variable Infant and a boxplot of Income (reverse the above). b. One numerical variable by group: Go to Graphs, Boxplot, and select Infant as you study variable like above, now also select Plot by group and select Region, and click OK. In which region would you say that the infant mortality was the highest and lowest, respectively? If you have the time, do the same for the variable Oil. c. Two numerical variables (relationship): Go to Graphs, Scatterplot, and select Income as X-variable and Infant as Y-variable. De-select Smooth line (one of the Options). Would you say that there is a relationship between these variables? Try to interpret it. Do you see any outliers? Now, IF you would like to save the data set that you have been working on you click Data, Active data set, Save active data set. You can also save the script window or the output window via File. Also, if you would like to save a graph (or import it into Microsoft word), right click on it and choose for example Save as metafile. However, the above is not necessary for teaching purposes since we will not use this data set again. When you are finished, go to File, Exit, From Commander and R to exit the program. You do not have to save output, scripts etc. 5
© Copyright 2026 Paperzz