R Style

Descriptive
1 / 38
R Style
Paul E. Johnson1
1 Department
2 Center
2
of Political Science
for Research Methods and Data Analysis, University of Kansas
2015
Outline
1 Overview
2 Format Highlights
3 Try formatR
4 Function Names
5 Variable Names
6 Other Miscellaneous
Descriptive
Overview
Outline
1 Overview
2 Format Highlights
3 Try formatR
4 Function Names
5 Variable Names
6 Other Miscellaneous
3 / 38
Descriptive
4 / 38
Overview
Overview
This presentation summarizes the vignette R Style that is
distributed with the rockchalk package
Primary emphasis: write code that sophisticated users will be
able to read
Suggestions:
Write code in the same style used by R Core Team members as
exemplified in the R source code
Write code in the style that R uses to display itself
Descriptive
Format Highlights
Outline
1 Overview
2 Format Highlights
3 Try formatR
4 Function Names
5 Variable Names
6 Other Miscellaneous
5 / 38
Descriptive
6 / 38
Format Highlights
1. Indentation
Use 4 spaces to indent sections
A good editor will convert a TAB you type into 4 spaces
If you use TAB symbols, set your editor to display them as 4
spaces
Descriptive
7 / 38
Format Highlights
2. The assignment symbol ”<-”
Not the equal sign
Descriptive
8 / 38
Format Highlights
3. Blank spaces
This is a GNU coding standard.
Around symbols! Spaces required on both sides of
math operators: - + * /
logical operators: = == | || & &&
R symbols <- %*% %o% %in%
One space before opening “(“ or “{” in if and for statements
One space after
comma
closing “)” or “}”
Descriptive
9 / 38
Format Highlights
3. Blank spaces
Unnecessary blank spaces are considered harmful, such as
( x ≤ y )
Spaces between function name and parens
lm ( y ∼ x )
Descriptive
10 / 38
Format Highlights
3. Blank spaces
There is a great deal of variety about equal signs in function
calls.
Once you get used to the GNU way of writing, this looks best
m1 <− lm ( y ∼ x , d a t a = r e d , s u b s e t = x < 7 7 , model =
FALSE )
but if you read much code, you find many authors “snug up”
around the equal signs.
m1 <− lm ( y∼x , d a t a=r e d , s u b s e t=x < 7 7 , model=FALSE )
The latter seems horrible to me now, but I can’t deny
it is widely used by authors smarter than I, and
publishers may insist on it to keep program code on the page.
Descriptive
11 / 38
Format Highlights
4. Squiggly braces
For loops, if statements, and the like generally have the
opening squggly brace on the same line as the language
construct
i f ( some s t a t e m e n t ) {
y <− 1
...
}
Note the “{” and ”}” are not vertically aligned, as they would be in
much C, C++ or Java code.
Main difficulty: indentation eats up “empty space” on left side of
page. Some text editors do this
Descriptive
12 / 38
Format Highlights
4. Squiggly braces ...
myFn <− f u n c t i o n ( x = 3 1 , y = 4 4 , z = NULL) {
j <− 1
i <− 5
...
To avoid that, you’ll generally see functions in R Core code
declared like so:
m y G i a n t L o n g B o r i n g F u n c t i o n N a m e <− f u n c t i o n ( x = 3 1 , y = 4 4 , z
= NULL)
{
j <− 1
...
Or Possibly
Descriptive
13 / 38
Format Highlights
4. Squiggly braces ...
m y G i a n t L o n g B o r i n g F u n c t i o n N a m e <−
f u n c t i o n ( x = 3 1 , y = 4 4 , z = NULL)
{
j <− 1
...
These were used to work around indentation styles of various
editors
Descriptive
14 / 38
Format Highlights
5. Evolving Indentation Standards
Emacs ESS now has a family of indentation styles, the default
is their interpretation of the R standard.
TAB key inserts 4 spaces
Consider plot.lm, which appears in R source code as:
p l o t . l m <−
f u n c t i o n ( x , w h i c h = c ( 1 L : 3 L , 5 L ) , # # was which = 1 L :4 L ,
c a p t i o n = l i s t ( ” R e s i d u a l s v s F i t t e d ” , ”Normal Q−Q” ,
”S c a l e − L o c a t i o n ” , ”Cook ' s d i s t a n c e ” ,
”R e s i d u a l s vs Leverage ” ,
e x p r e s s i o n ( ”Cook ' s d i s t v s L e v e r a g e
” * h [ i i ] / (1 − h [
i i ]) )) ,
panel = i f ( add.smooth ) panel.smooth e l s e points ,
s u b . c a p t i o n = NULL , main = ” ” ,
a s k = p r o d ( p a r ( ”m f c o l ”) ) < l e n g t h ( w h i c h ) &&
d e v . i n t e r a c t i v e () , . . . ,
Descriptive
15 / 38
Format Highlights
5. Evolving Indentation Standards ...
i d . n = 3 , l a b e l s . i d = names ( r e s i d u a l s ( x ) ) , c e x . i d = 0 . 7 5
,
q q l i n e = TRUE, c o o k . l e v e l s = c ( 0 . 5 , 1 . 0 ) ,
a d d . s m o o t h = g e t O p t i o n ( ”a d d . s m o o t h ”) ,
l a b e l . p o s = c (4 , 2 ) , c e x . c a p t i o n = 1)
{
Emacs 24.4 with ESS 15.09 turns that into this:
p l o t . l m <−
f u n c t i o n ( x , w h i c h = c ( 1 L : 3 L , 5 L ) , # # was which = 1 L :4 L ,
c a p t i o n = l i s t ( ” R e s i d u a l s v s F i t t e d ” , ”Normal Q−Q” ,
”S c a l e − L o c a t i o n ” , ”Cook ' s d i s t a n c e ” ,
”R e s i d u a l s vs Leverage ” ,
e x p r e s s i o n ( ”Cook ' s d i s t v s L e v e r a g e
” * h [ i i ] / (1
− h[ i i ]) )) ,
panel = i f ( add.smooth ) panel.smooth e l s e points ,
s u b . c a p t i o n = NULL , main = ” ” ,
a s k = p r o d ( p a r ( ”m f c o l ”) ) < l e n g t h ( w h i c h ) && d e v . i n t e r a c t i v e ( ) , . . .
,
i d . n = 3 , l a b e l s . i d = names ( r e s i d u a l s ( x ) ) , c e x . i d = 0 . 7 5 ,
q q l i n e = TRUE, c o o k . l e v e l s = c ( 0 . 5 , 1 . 0 ) ,
a d d . s m o o t h = g e t O p t i o n ( ”a d d . s m o o t h ”) ,
Descriptive
16 / 38
Format Highlights
5. Evolving Indentation Standards ...
l a b e l . p o s = c ( 4 , 2 ) , c e x . c a p t i o n = 1)
{
Note that Emacs-ESS will not let the opening squiggly brace go
flush left, it is always indented 4 spaces.
Should coders insert lots of line breaks within function
declarations? Some coders like a line break after each
argument is defined:
p l o t . l m <−
function (x ,
w h i c h = c ( 1 L : 3 L , 5 L ) , # # was which = 1 L :4 L ,
c a p t i o n = l i s t ( ” R e s i d u a l s v s F i t t e d ” , ”Normal Q−Q” , ”
Scale−Location ” ,
”Cook ' s d i s t a n c e ” , ” R e s i d u a l s v s L e v e r a g e ” ,
e x p r e s s i o n ( ”Cook ' s d i s t v s L e v e r a g e
” * h [ i i ] / (1
− h[ i i ]) )) ,
panel = i f ( add.smooth ) panel.smooth e l s e points ,
s u b . c a p t i o n = NULL ,
Descriptive
17 / 38
Format Highlights
5. Evolving Indentation Standards ...
main = ” ” ,
a s k = p r o d ( p a r ( ”m f c o l ”) ) < l e n g t h ( w h i c h ) &&
d e v . i n t e r a c t i v e () ,
... ,
id.n = 3,
l a b e l s . i d = names ( r e s i d u a l s ( x ) ) ,
c e x . i d = 0 .75 ,
q q l i n e = TRUE,
c o o k . l e v e l s = c (0 .5 , 1 . 0 ) ,
a d d . s m o o t h = g e t O p t i o n ( ”a d d . s m o o t h ”) ,
l a b e l . p o s = c (4 ,2) ,
c e x . c a p t i o n = 1)
{
You don’t generally see that in the R code prepared by R Core
Team.
Descriptive
18 / 38
Format Highlights
6. I suggest ”} else {”
It is possible to write code that runs if it is inside a closure
(function)
but it does not run from the command line.
Example
i <− 7
i f ( i < 5) {
j <− 1
}
else {
j <− 12
}
will fail at the command prompt
Descriptive
19 / 38
Format Highlights
6. I suggest ”} else {” ...
> i f ( i < 5) {
+
j <− 1
+ }
> else {
Error : unexpected
'
else
'
in ”e l s e ”
But inside a function it will succeed:
myfn <− f u n c t i o n ( i ) {
i f ( i < 5) {
j <− 1
}
else {
j <− 12
}
j
}
Try that:
Descriptive
20 / 38
Format Highlights
6. I suggest ”} else {” ...
> myfn ( 9 9 )
[ 1 ] 12
> myfn ( 1 )
[1] 1
Why does it fail in the command line, but succeed inside the
function?
Why do you care?
While preparing a function, you may want to run the
commands “line by line” in a session, to find out what they do!
Alternatives, if you don’t like “} else {”
Write your function then run it in the debugger.
Descriptive
Try formatR
Outline
1 Overview
2 Format Highlights
3 Try formatR
4 Function Names
5 Variable Names
6 Other Miscellaneous
21 / 38
Descriptive
22 / 38
Try formatR
Install the formatR package
The “tidy.source” function
> myfn <− f u n c t i o n ( x ) { i f ( x < 7 ) { i = 7 7 ; p r i n t ( p a s t e ( ”x i s
l e s s t h a n 7 b u t i i s ” , i ) ) } e l s e { p r i n t ( ”x i s e x c e s s i v e
”) }}
> l i b r a r y ( formatR )
> t i d y . s o u r c e ( s o u r c e = ” c l i p b o a r d ” , r e p l a c e . a s s i g n = TRUE)
function (x) {
i f ( x < 7) {
i <− 77
p r i n t ( p a s t e ( ”x i s l e s s t h a n 7 b u t i i s ” , i ) )
} else {
p r i n t ( ”x i s e x c e s s i v e ”)
}
}
Will fail with errors if you have comments inserted in middle
of lines.
Descriptive
Function Names
Outline
1 Overview
2 Format Highlights
3 Try formatR
4 Function Names
5 Variable Names
6 Other Miscellaneous
23 / 38
Descriptive
24 / 38
Function Names
1. Names to avoid
Don’t create confusion by creating new functions with names
like “seq()”, “rep()”, “lm()”, or such
would obscure access to functions from R base
Now R Core Namespace policy has “defended” many functions
from that accidental abuse
stats::lm() can find the lm function in the stats package, even
if you have lm in your packages
Still wise to avoid creating new functions with same name
because
Confuse/frustrate experts who might read your code and help
you with it
Confuse yourself during your session
Descriptive
25 / 38
Function Names
2. Suggest Camel Case Function Names
If you are naming a new function, don’t use periods for
punctuation
Better to write
g e t P a r m s <− f u n c t i o n ( x , y , z ) {
than
g e t . p a r m s <− f u n c t i o n ( x , y , z ) {
Why?
The R object framework has “generic functions” like “plot” and
“summary”
Which have customized “methods” (implementations) like
“plot.lm”, “plot.glm”, etc.
A class name follows the period
Descriptive
26 / 38
Function Names
2. Suggest Camel Case Function Names ...
In the R runtime system, calculations are sent among
functions by parsing the last part of the method name
Your “get.parms” function makes a reader think there is an
object of type “parms” and a generic function named “get”.
Descriptive
27 / 38
Function Names
3. Think Carefully on Function Names
Short names for frequently used functions & arguments
Think of R’s common pieces. When you create your own
classes, name your functions similarly.
Descriptive
Variable Names
Outline
1 Overview
2 Format Highlights
3 Try formatR
4 Function Names
5 Variable Names
6 Other Miscellaneous
28 / 38
Descriptive
29 / 38
Variable Names
1. No funny symbols
Variable names
begin with letters, generally SMALL letters
include only letters, numbers, as well as “ ”, “.”
AND NO math symbols like “-”and “+?” or “%”“ˆ”“&”!
See R base function “make.names” which can clean up name
vectors.
Descriptive
30 / 38
Variable Names
2. Variable names to Avoid
“T” or “F”. Cause confusion with abbreviated TRUE and
FALSE
function names in R.
Previously, was possible to obliterate R base functions by
declaring variables like “seq” and “rep”
Now still confusing to readers if you name a variable “c” or
“rep”.
Descriptive
31 / 38
Variable Names
3. Long and Short: When to be terse?
Long name OK for something you use once or twice
If used often, create a 1-5 letter name.
Descriptive
32 / 38
Variable Names
4. Append variations on end of name
Given a variable
uranium
don’t do this
y <− u r a n i u m
or this
l o g u <− l o g ( u r a n i u m )
Please consider this:
u r a n i u m l o g <− l o g ( u r a n i u m )
Descriptive
33 / 38
Variable Names
any of these (which are all better than y or logu
u r a n i u m l n <− l o g ( u r a n i u m )
u r a n i u m . l o g <− l o g ( u r a n i u m )
u r a n i u m l o g <− l o g ( u r a n i u m )
u l o g <− l o g ( u r a n i u m )
Why? related things stay together alphabetically! run “ls()”
Descriptive
Other Miscellaneous
Outline
1 Overview
2 Format Highlights
3 Try formatR
4 Function Names
5 Variable Names
6 Other Miscellaneous
34 / 38
Descriptive
35 / 38
Other Miscellaneous
1. Work with a fixed width font
If you have a programmer’s file editor that uses a
proportionally spaced font, get a different font, or editor
Descriptive
36 / 38
Other Miscellaneous
2. Use the ###, ##, and # style for indentations
## means a comment indented to match the context
### means flush left comment
# means comment at far right
Advice:
don’t append comments at end of lines (no matter how
tempting to ’save space’)
2 develop a style to insert your comments either before or after
lines they address. Be consistent! I’m trying to remember to
use the BEFORE strategy
1
Descriptive
37 / 38
Other Miscellaneous
3. Keep Short line lengths
Suggestion: 80 characters or less per line
While writing code, I’ll often have very long lines that take
advantage of the wide screen. Sometimes I forget, but I try to
go back and cut lines into 80-100 character widths.
Some evidence suggests humans read badly with long lines
Long lines don’t translate well into documents, and they either
go off the right edge of the page, or
have “line breaks” at bad spots
This is required in R documentation, where packages with
very long lines in Rd files are rejected.
Relates to problem of “multi line strings”, which are
discouraged in the wider programming arena, but tolerated in
R.
Descriptive
38 / 38
Other Miscellaneous
3. Keep Short line lengths ...
An R user (me) might write all on one line:
i f ( ! c ( ” r c r e g ”) %i n% c l a s s ( o b j e c t ) ) s t o p ( ” p r e d i c t . r c r e g
i n t e n d e d f o r r c r e g o b j e c t s , w h i c h a r e c r e a t e d by
r e s i d u a l C e n t e r i n t h e r o c k c h a l k p a c k a g e ”)
But it is certainly better to write 3 lines, using paste to
connect them together:
i f ( ! c ( ” r c r e g ”) %i n% c l a s s ( o b j e c t ) ) s t o p ( p a s t e ( ” p r e d i c t . r c r e g ” ,
” i s i n t e n d e d f o r r c r e g o b j e c t s , which a r e c r e a t e d ” ,
”by r e s i d u a l C e n t e r i n t h e r o c k c h a l k p a c k a g e ”) )
is