Wednesday, June 29: Brief Introduction to Course Software

Lecture 2: Software Introduction
Regression III:
Advanced Methods
William G. Jacoby
Department of Political Science
Michigan State University
[email protected]
Getting Started with R
•
•
•
•
•
•
What is R?
A tiny R session
Resources
Setup under Windows
Getting data into R
Graphs and Statistical Models
2
What is R?
• A free, open-source implementation of the S language for
data analysis and graphics
• Available for various operating systems (including Linux, Mac
and Windows)
• A complete programming language
• Supported by a comprehensive help system and a large
international community of users
• Increasingly used in advanced social-science research, as well
as in many other disciplines
• In constant flux
• Not guaranteed by anyone to be fit for any purpose!
3
R Resources
• The main source for R and everything connected with it
is the Comprehensive R Archive Network (CRAN):
– http://cran.r-project.org
• There one can find:
– Binary versions of R to download and install
– The source code
– Extensive documentation and contributed guides
– Information about many add-on packages
4
Add-on Packages
• Add-on packages are easily installed from the menus within
R (We will use a number of packages in this course)
• Once installed, the package must be loaded into the current
interactive R session
• Many packages contain datasets. These must also be
loaded and attached to be used
• The number symbol “#” is used to insert comments—R will
not read anything after it (only works for a single line)
5
Documentation for R
• Installed as part of the R help system are the following
documents:
– An Introduction to R (about 100 pages). Gives an
introduction to the language and how to use R for
doing statistical analysis and graphics
– R Data Import/Export (about 35 pages). Describes
the import and export facilities available in R itself or
via the foreign package
– Writing R Extensions (about 75 pages). Covers how
to create your own packages, write R help files, etc.
• There are also various ‘unofficial’ guides on CRAN under
‘contributed’
• Finally, Fox (2002) provides a great introduction,
focusing on the use of R for regression analysis.
6
Getting Help in R
• A number of different types of help are available by
clicking the help menu:
– Documentation on all installed packages is available in
a web browser by clicking helpÆhtml help
– The ‘official’ manuals can be loaded in PDF format by
clicking helpÆmanuals
• Help about individual functions and objects can also be
obtained within R by typing
– help(data) or ?data, for help on something whose
name is known
– help.search(“ordinal”), to search all the installed
help files for occurrence of a particular text string
– apropos(“stem”), to look for ‘stem’ in the names of
objects available in the current R session
• If all else fails, use the R-help email list:
http://www.r-project.org/mail
7
Getting set-up
• Download the installer from CRAN
• A good way of working with R is:
1. For each project, make a directory containing data
and other materials.
2. Make a copy of the Rgui.exe shortcut. Rename the
new shortcut. Right-click “Properties” and replace the
entry in the “Start In” field with the new directory.
3. To use R for the project, double click on the newlycreated shortcut. Data files can be loaded easily, and
you can keep projects separate from each other.
4. It is also useful to have a good text editor. Notepad
will do, but there are much better alternatives
– The R plug-in for WinEdt is called RWinEdt
– An alternative is the Emacs Speaks Statistics (ESS)
package
8
Using a Text Editor with R
• It is useful to have a good text editor when working in
R. This will facilitate easier preparation of R commands
and saving program sessions in R Scripts.
• Windows Notepad provides minimal editing capabilities,
but there are much better alternatives.
• The WinEdt shareware program (also frequently used
with LaTeX) can be configured to work well with R.
• The RWinEdt plug-in sets everything up, and it is loaded
as anR package.
• An alternative is the Emacs Speaks Statistics (ESS)
package
• A text editor can be loaded into R automatically at the
beginning of each session, by adding a statement to the
Rprofile.
9
Getting data in (1): Entering data directly
• The concatenate function, c, combines individual cases
together into a vector
• The cbind (columns bind) and rbind (rows bind)
functions combine vectors together into a matrix
• The data.frame function makes the matrix and a data
frame object
10
Getting data in (2): External datasets
• For rectangular data in a text file, use read.table():
Mydata<-read.table(“dataname.txt”, header=TRUE)
– header=TRUE signifies that the first contains variable
names
• The foreign library imports data files from other formats:
• use.value.labels=TRUE converts SPSS value labels to
categories. If you specify FALSE, all variables will be treated
as quantitative
• All SPSS variable names will be imported in upper case
letters
11
Re-specifying variables after importing to R
• To make a numerically coded variable into an unordered
factor (categorical variable):
• To make a numerical variable into an ordered factor:
12
Recoding Variables using the recode
function in the car package
• Recoding into a quantitative variable:
• Recoding into an unordered factor:
13
S Modeling Language
• The S modeling language is convenient in that it has a
similar notation for most types of models
• Model specification generally takes the following form:
Response ~ Independent Variables
• Where the tilde sign (~) is interpreted as “regressed on”
• For the general linear model, terms represent additive
components as in the regression equation itself
• Some examples of formulas are:
14
Graphs in R
• All graphs are drawn on a chosen device either until a new
device is started, or the device is closed
dev.off()
• Some commonly used graphics devices are
postscript(“mygraphs.ps”)
– Necessary for LaTeX
pdf(“mygraph.pdf”)
– Necessary for PDF LaTeX
trellis.device()
- Used for the “Lattice” graphics system
windows()
– The default graphics device
• Graphs in R are very flexible.
15
A small graph example (1)
16
A small graph example (2)
Florida votes by county
150000
50000
PALM.BEACH
0
BUSH
250000
DADE
0
500
1000
1500
2000
BUCHANAN
2500
3000
3500
17
Arc Software (A Brief Look)
• A user-friendly software package for regression analysis,
focusing on regression graphics
• Available for various operating systems (including Linux, Mac
and Windows)
• Written in the Xlisp-Stat language
• Software can be downloaded from:
http://www.stat.umn.edu/arc/software.html
• Documentation for Arc is contained in the Cook and Weisberg
(1999) text
• Arc reads data from text files with internal formatting
commands. Easiest way to prepare data is to copy from
examples distributed with the software.
• Arc is a simple, but extremely powerful, tool for dynamic
statistical graphics.
18
Preparing Data for Arc
• Arc can read data in several formats. The easiest way is to
“load” an ASCII text file. Arc will prompt you (via dialog
boxes) for additional information about the dataset name,
variable names, and so on.
• Like the “read.table” function in R, you can include variable
names in the first row of a text data file. Variable names will
be converted to upper-case, unless they are enclosed in
double quote.
• Arc data files are stored with the extension “.lsp”. These data
files are also ASCII text files, but they include internal
formatting commands along with the data. The structure is
very simple, and I recommend preparing Arc data files in a
text processor before reading them in.
• Extensions to Arc (available on the web site) can read MS
Excel spreadsheets and SAS datasets.
19
Output from Arc
• Arc will respond to some Xlisp-Stat commands, but most
interaction with the software occurs through menu selections
and dialog boxes.
• Two different methods for saving printed output:
– Material in the Arc window can be cut and pasted as usual,
using the items from the Edit menu.
– Lines can be saved automatically to a specified file by
selecting “Dribble” from the Arc menu.
• Graphical output from Arc is usually saved by cutting and
pasting.
• Instructions for saving Arc graphs in LaTeX format are
available on the website for the software.
20
Next topics
• Examining data
• Transformations
21