Loops, Simulation

Programming and Simulations
Frank Witmer
6 January 2011
Outline
• General programming tips
• Programming loops
• Simulation
– Distributions
– Sampling
– Bootstrapping
General Programming Tips
• Use meaningful variable names
• Include more comments than you think necessary
• Debugging your code
– Since R is interpreted, non-function variables are
available for inspection if execution terminates
– Built-in debugging support: debug(), browser(), trace()
– But generally adding print statements in functions is
sufficient
• Syntax highlighting!
– http://sourceforge.net/projects/npptor/
Loops
• Because R is an interpreted language, all
variables in the system are evaluated and
stored at every step
• So avoid loops for computationally intense
analysis
For & While loop syntax
for (variable in sequence) {
expression
expression
}
while (condition) {
expression
expression
}
if/else control statements
if ( condition1 ) {
expression1
} else if ( condition2 ) {
expression2
} else {
expression3
}
Ways to avoid loops (sometimes)
• tapply: apply a function (FUN) to a variable
based on a grouping variable
• lapply: apply a function (FUN) to each variable
in a given list
– sapply: same as lapply but output is more userfriendly
Data simulation
• Can simulate data using standard distribution
functions, e.g. core names norm, pois
• Use ‘r’ prefix to generate random values of
the distribution
– rnorm(numVals, mean, sd)
– rpois(numVals, mean)
• Use set.seed() if you want your simulated data
to be reproducible
Standard distribution functions
Sampling
• Sample from a dataset using:
sample(dataset, numItems, replace?)
• Can use to simulate survey results or
bootstrap statistical estimates
Bootstrap overview
• Method to measure accuracy of estimates
from a sample empirically
• For a sample of size n, draw many random
samples, also of size n, with replacement
• Two ways to bootstrap regression estimates
– residual resampling: add resampled regression
residuals to the original dep. var. & re-estimate
– data resampling: sample complete cases of
original data and estimate coefficients
Recall: Boston Metadata
CRIM
per capita crime rate by town
ZN
proportion of residential land zoned for lots over 25,000 ft2
INDUS
proportion of non-retail business acres per town
CHAS
Charles River dummy variable (=1 if tract bounds river; 0 otherwise)
NOX
Nitrogen oxide concentration (parts per 10 million)
RM
average number of rooms per dwelling
AGE
proportion of owner-occupied units built prior to 1940
DIS
weighted distances to five Boston employment centres
RAD
index of accessibility to radial highways
TAX
full-value property-tax rate per $10,000
PTRATIO
pupil-teacher ratio by town
B
1000(Bk - 0.63)2 where Bk is the proportion of blacks by town
LSTAT
% lower status of the population
MEDV
Median value of owner-occupied homes in $1000's