Descriptive 1 / 38 R Style Paul E. Johnson1 1 Department 2 Center 2 of Political Science for Research Methods and Data Analysis, University of Kansas 2015 Outline 1 Overview 2 Format Highlights 3 Try formatR 4 Function Names 5 Variable Names 6 Other Miscellaneous Descriptive Overview Outline 1 Overview 2 Format Highlights 3 Try formatR 4 Function Names 5 Variable Names 6 Other Miscellaneous 3 / 38 Descriptive 4 / 38 Overview Overview This presentation summarizes the vignette R Style that is distributed with the rockchalk package Primary emphasis: write code that sophisticated users will be able to read Suggestions: Write code in the same style used by R Core Team members as exemplified in the R source code Write code in the style that R uses to display itself Descriptive Format Highlights Outline 1 Overview 2 Format Highlights 3 Try formatR 4 Function Names 5 Variable Names 6 Other Miscellaneous 5 / 38 Descriptive 6 / 38 Format Highlights 1. Indentation Use 4 spaces to indent sections A good editor will convert a TAB you type into 4 spaces If you use TAB symbols, set your editor to display them as 4 spaces Descriptive 7 / 38 Format Highlights 2. The assignment symbol ”<-” Not the equal sign Descriptive 8 / 38 Format Highlights 3. Blank spaces This is a GNU coding standard. Around symbols! Spaces required on both sides of math operators: - + * / logical operators: = == | || & && R symbols <- %*% %o% %in% One space before opening “(“ or “{” in if and for statements One space after comma closing “)” or “}” Descriptive 9 / 38 Format Highlights 3. Blank spaces Unnecessary blank spaces are considered harmful, such as ( x ≤ y ) Spaces between function name and parens lm ( y ∼ x ) Descriptive 10 / 38 Format Highlights 3. Blank spaces There is a great deal of variety about equal signs in function calls. Once you get used to the GNU way of writing, this looks best m1 <− lm ( y ∼ x , d a t a = r e d , s u b s e t = x < 7 7 , model = FALSE ) but if you read much code, you find many authors “snug up” around the equal signs. m1 <− lm ( y∼x , d a t a=r e d , s u b s e t=x < 7 7 , model=FALSE ) The latter seems horrible to me now, but I can’t deny it is widely used by authors smarter than I, and publishers may insist on it to keep program code on the page. Descriptive 11 / 38 Format Highlights 4. Squiggly braces For loops, if statements, and the like generally have the opening squggly brace on the same line as the language construct i f ( some s t a t e m e n t ) { y <− 1 ... } Note the “{” and ”}” are not vertically aligned, as they would be in much C, C++ or Java code. Main difficulty: indentation eats up “empty space” on left side of page. Some text editors do this Descriptive 12 / 38 Format Highlights 4. Squiggly braces ... myFn <− f u n c t i o n ( x = 3 1 , y = 4 4 , z = NULL) { j <− 1 i <− 5 ... To avoid that, you’ll generally see functions in R Core code declared like so: m y G i a n t L o n g B o r i n g F u n c t i o n N a m e <− f u n c t i o n ( x = 3 1 , y = 4 4 , z = NULL) { j <− 1 ... Or Possibly Descriptive 13 / 38 Format Highlights 4. Squiggly braces ... m y G i a n t L o n g B o r i n g F u n c t i o n N a m e <− f u n c t i o n ( x = 3 1 , y = 4 4 , z = NULL) { j <− 1 ... These were used to work around indentation styles of various editors Descriptive 14 / 38 Format Highlights 5. Evolving Indentation Standards Emacs ESS now has a family of indentation styles, the default is their interpretation of the R standard. TAB key inserts 4 spaces Consider plot.lm, which appears in R source code as: p l o t . l m <− f u n c t i o n ( x , w h i c h = c ( 1 L : 3 L , 5 L ) , # # was which = 1 L :4 L , c a p t i o n = l i s t ( ” R e s i d u a l s v s F i t t e d ” , ”Normal Q−Q” , ”S c a l e − L o c a t i o n ” , ”Cook ' s d i s t a n c e ” , ”R e s i d u a l s vs Leverage ” , e x p r e s s i o n ( ”Cook ' s d i s t v s L e v e r a g e ” * h [ i i ] / (1 − h [ i i ]) )) , panel = i f ( add.smooth ) panel.smooth e l s e points , s u b . c a p t i o n = NULL , main = ” ” , a s k = p r o d ( p a r ( ”m f c o l ”) ) < l e n g t h ( w h i c h ) && d e v . i n t e r a c t i v e () , . . . , Descriptive 15 / 38 Format Highlights 5. Evolving Indentation Standards ... i d . n = 3 , l a b e l s . i d = names ( r e s i d u a l s ( x ) ) , c e x . i d = 0 . 7 5 , q q l i n e = TRUE, c o o k . l e v e l s = c ( 0 . 5 , 1 . 0 ) , a d d . s m o o t h = g e t O p t i o n ( ”a d d . s m o o t h ”) , l a b e l . p o s = c (4 , 2 ) , c e x . c a p t i o n = 1) { Emacs 24.4 with ESS 15.09 turns that into this: p l o t . l m <− f u n c t i o n ( x , w h i c h = c ( 1 L : 3 L , 5 L ) , # # was which = 1 L :4 L , c a p t i o n = l i s t ( ” R e s i d u a l s v s F i t t e d ” , ”Normal Q−Q” , ”S c a l e − L o c a t i o n ” , ”Cook ' s d i s t a n c e ” , ”R e s i d u a l s vs Leverage ” , e x p r e s s i o n ( ”Cook ' s d i s t v s L e v e r a g e ” * h [ i i ] / (1 − h[ i i ]) )) , panel = i f ( add.smooth ) panel.smooth e l s e points , s u b . c a p t i o n = NULL , main = ” ” , a s k = p r o d ( p a r ( ”m f c o l ”) ) < l e n g t h ( w h i c h ) && d e v . i n t e r a c t i v e ( ) , . . . , i d . n = 3 , l a b e l s . i d = names ( r e s i d u a l s ( x ) ) , c e x . i d = 0 . 7 5 , q q l i n e = TRUE, c o o k . l e v e l s = c ( 0 . 5 , 1 . 0 ) , a d d . s m o o t h = g e t O p t i o n ( ”a d d . s m o o t h ”) , Descriptive 16 / 38 Format Highlights 5. Evolving Indentation Standards ... l a b e l . p o s = c ( 4 , 2 ) , c e x . c a p t i o n = 1) { Note that Emacs-ESS will not let the opening squiggly brace go flush left, it is always indented 4 spaces. Should coders insert lots of line breaks within function declarations? Some coders like a line break after each argument is defined: p l o t . l m <− function (x , w h i c h = c ( 1 L : 3 L , 5 L ) , # # was which = 1 L :4 L , c a p t i o n = l i s t ( ” R e s i d u a l s v s F i t t e d ” , ”Normal Q−Q” , ” Scale−Location ” , ”Cook ' s d i s t a n c e ” , ” R e s i d u a l s v s L e v e r a g e ” , e x p r e s s i o n ( ”Cook ' s d i s t v s L e v e r a g e ” * h [ i i ] / (1 − h[ i i ]) )) , panel = i f ( add.smooth ) panel.smooth e l s e points , s u b . c a p t i o n = NULL , Descriptive 17 / 38 Format Highlights 5. Evolving Indentation Standards ... main = ” ” , a s k = p r o d ( p a r ( ”m f c o l ”) ) < l e n g t h ( w h i c h ) && d e v . i n t e r a c t i v e () , ... , id.n = 3, l a b e l s . i d = names ( r e s i d u a l s ( x ) ) , c e x . i d = 0 .75 , q q l i n e = TRUE, c o o k . l e v e l s = c (0 .5 , 1 . 0 ) , a d d . s m o o t h = g e t O p t i o n ( ”a d d . s m o o t h ”) , l a b e l . p o s = c (4 ,2) , c e x . c a p t i o n = 1) { You don’t generally see that in the R code prepared by R Core Team. Descriptive 18 / 38 Format Highlights 6. I suggest ”} else {” It is possible to write code that runs if it is inside a closure (function) but it does not run from the command line. Example i <− 7 i f ( i < 5) { j <− 1 } else { j <− 12 } will fail at the command prompt Descriptive 19 / 38 Format Highlights 6. I suggest ”} else {” ... > i f ( i < 5) { + j <− 1 + } > else { Error : unexpected ' else ' in ”e l s e ” But inside a function it will succeed: myfn <− f u n c t i o n ( i ) { i f ( i < 5) { j <− 1 } else { j <− 12 } j } Try that: Descriptive 20 / 38 Format Highlights 6. I suggest ”} else {” ... > myfn ( 9 9 ) [ 1 ] 12 > myfn ( 1 ) [1] 1 Why does it fail in the command line, but succeed inside the function? Why do you care? While preparing a function, you may want to run the commands “line by line” in a session, to find out what they do! Alternatives, if you don’t like “} else {” Write your function then run it in the debugger. Descriptive Try formatR Outline 1 Overview 2 Format Highlights 3 Try formatR 4 Function Names 5 Variable Names 6 Other Miscellaneous 21 / 38 Descriptive 22 / 38 Try formatR Install the formatR package The “tidy.source” function > myfn <− f u n c t i o n ( x ) { i f ( x < 7 ) { i = 7 7 ; p r i n t ( p a s t e ( ”x i s l e s s t h a n 7 b u t i i s ” , i ) ) } e l s e { p r i n t ( ”x i s e x c e s s i v e ”) }} > l i b r a r y ( formatR ) > t i d y . s o u r c e ( s o u r c e = ” c l i p b o a r d ” , r e p l a c e . a s s i g n = TRUE) function (x) { i f ( x < 7) { i <− 77 p r i n t ( p a s t e ( ”x i s l e s s t h a n 7 b u t i i s ” , i ) ) } else { p r i n t ( ”x i s e x c e s s i v e ”) } } Will fail with errors if you have comments inserted in middle of lines. Descriptive Function Names Outline 1 Overview 2 Format Highlights 3 Try formatR 4 Function Names 5 Variable Names 6 Other Miscellaneous 23 / 38 Descriptive 24 / 38 Function Names 1. Names to avoid Don’t create confusion by creating new functions with names like “seq()”, “rep()”, “lm()”, or such would obscure access to functions from R base Now R Core Namespace policy has “defended” many functions from that accidental abuse stats::lm() can find the lm function in the stats package, even if you have lm in your packages Still wise to avoid creating new functions with same name because Confuse/frustrate experts who might read your code and help you with it Confuse yourself during your session Descriptive 25 / 38 Function Names 2. Suggest Camel Case Function Names If you are naming a new function, don’t use periods for punctuation Better to write g e t P a r m s <− f u n c t i o n ( x , y , z ) { than g e t . p a r m s <− f u n c t i o n ( x , y , z ) { Why? The R object framework has “generic functions” like “plot” and “summary” Which have customized “methods” (implementations) like “plot.lm”, “plot.glm”, etc. A class name follows the period Descriptive 26 / 38 Function Names 2. Suggest Camel Case Function Names ... In the R runtime system, calculations are sent among functions by parsing the last part of the method name Your “get.parms” function makes a reader think there is an object of type “parms” and a generic function named “get”. Descriptive 27 / 38 Function Names 3. Think Carefully on Function Names Short names for frequently used functions & arguments Think of R’s common pieces. When you create your own classes, name your functions similarly. Descriptive Variable Names Outline 1 Overview 2 Format Highlights 3 Try formatR 4 Function Names 5 Variable Names 6 Other Miscellaneous 28 / 38 Descriptive 29 / 38 Variable Names 1. No funny symbols Variable names begin with letters, generally SMALL letters include only letters, numbers, as well as “ ”, “.” AND NO math symbols like “-”and “+?” or “%”“ˆ”“&”! See R base function “make.names” which can clean up name vectors. Descriptive 30 / 38 Variable Names 2. Variable names to Avoid “T” or “F”. Cause confusion with abbreviated TRUE and FALSE function names in R. Previously, was possible to obliterate R base functions by declaring variables like “seq” and “rep” Now still confusing to readers if you name a variable “c” or “rep”. Descriptive 31 / 38 Variable Names 3. Long and Short: When to be terse? Long name OK for something you use once or twice If used often, create a 1-5 letter name. Descriptive 32 / 38 Variable Names 4. Append variations on end of name Given a variable uranium don’t do this y <− u r a n i u m or this l o g u <− l o g ( u r a n i u m ) Please consider this: u r a n i u m l o g <− l o g ( u r a n i u m ) Descriptive 33 / 38 Variable Names any of these (which are all better than y or logu u r a n i u m l n <− l o g ( u r a n i u m ) u r a n i u m . l o g <− l o g ( u r a n i u m ) u r a n i u m l o g <− l o g ( u r a n i u m ) u l o g <− l o g ( u r a n i u m ) Why? related things stay together alphabetically! run “ls()” Descriptive Other Miscellaneous Outline 1 Overview 2 Format Highlights 3 Try formatR 4 Function Names 5 Variable Names 6 Other Miscellaneous 34 / 38 Descriptive 35 / 38 Other Miscellaneous 1. Work with a fixed width font If you have a programmer’s file editor that uses a proportionally spaced font, get a different font, or editor Descriptive 36 / 38 Other Miscellaneous 2. Use the ###, ##, and # style for indentations ## means a comment indented to match the context ### means flush left comment # means comment at far right Advice: don’t append comments at end of lines (no matter how tempting to ’save space’) 2 develop a style to insert your comments either before or after lines they address. Be consistent! I’m trying to remember to use the BEFORE strategy 1 Descriptive 37 / 38 Other Miscellaneous 3. Keep Short line lengths Suggestion: 80 characters or less per line While writing code, I’ll often have very long lines that take advantage of the wide screen. Sometimes I forget, but I try to go back and cut lines into 80-100 character widths. Some evidence suggests humans read badly with long lines Long lines don’t translate well into documents, and they either go off the right edge of the page, or have “line breaks” at bad spots This is required in R documentation, where packages with very long lines in Rd files are rejected. Relates to problem of “multi line strings”, which are discouraged in the wider programming arena, but tolerated in R. Descriptive 38 / 38 Other Miscellaneous 3. Keep Short line lengths ... An R user (me) might write all on one line: i f ( ! c ( ” r c r e g ”) %i n% c l a s s ( o b j e c t ) ) s t o p ( ” p r e d i c t . r c r e g i n t e n d e d f o r r c r e g o b j e c t s , w h i c h a r e c r e a t e d by r e s i d u a l C e n t e r i n t h e r o c k c h a l k p a c k a g e ”) But it is certainly better to write 3 lines, using paste to connect them together: i f ( ! c ( ” r c r e g ”) %i n% c l a s s ( o b j e c t ) ) s t o p ( p a s t e ( ” p r e d i c t . r c r e g ” , ” i s i n t e n d e d f o r r c r e g o b j e c t s , which a r e c r e a t e d ” , ”by r e s i d u a l C e n t e r i n t h e r o c k c h a l k p a c k a g e ”) ) is
© Copyright 2024 Paperzz