Package ‘SciencesPo’ August 5, 2016 Type Package Title A Tool Set for Analyzing Political Behavior Data Date/Publication 2016-08-05 00:24:29 Version 1.4.1 Date 2016-08-02 Maintainer Daniel Marcelino <[email protected]> Description Provide functions for analyzing elections and political behavior data, including measures of political fragmentation, seat apportionment, and small data visualization graphs. URL http://CRAN.R-project.org/package=SciencesPo License GPL (>= 2) Encoding UTF-8 Depends R (>= 3.2.0), stats, utils, graphics, grDevices, methods, ggplot2 (>= 2.1.0) Imports Rcpp (>= 0.12.3), pander (>= 0.6.0), data.table (>= 1.9.4), grid (>= 3.0.0), rstudioapi, htmltools, lubridate, stringr, xtable, shiny Suggests testthat, devtools, scales, extrafont, gtable, knitr LinkingTo Rcpp VignetteBuilder knitr NeedsCompilation yes LazyData TRUE Repository CRAN BugReports http://github.com/danielmarcelino/SciencesPo/issues ByteCompile TRUE RoxygenNote 5.0.1 1 R topics documented: 2 Collate 'Anonymize.R' 'Atkinson.R' 'Bootstrap.R' 'CI.R' 'CMDcheckVars.R' 'CalibratePlot.R' 'Clear.R' 'Condorcet.R' 'CountMissing.R' 'CrossValidation.R' 'Crosstable.R' 'DATA.R' 'DEPRECATED.R' 'Describe.R' 'Dirichlet.R' 'Dummy.R' 'Frequency.R' 'Geoms.R' 'Gini.R' 'Herfindahl.R' 'HighestAverages.R' 'Jackknife.R' 'Label.R' 'LargestRemainders.R' 'Lorenz.R' 'MOMENTS.R' 'MeanFromRange.R' 'Normalize.R' 'Operators.R' 'Outlier.R' 'Palettes.R' 'Permutate.R' 'PoliticalDiversity.R' 'Proportionality.R' 'RcppExports.R' 'Recode.R' 'Rosenbluth.R' 'RunShinyApp.R' 'ScatterPlot.R' 'SciencesPo.R' 'Stratify.R' 'TESTS.R' 'Tableplot.R' 'Ternaryplot.R' 'Themes.R' 'Transform.R' 'UTILS.R' 'Untable.R' 'Voronoi.R' 'Waffleplot.R' 'Winsorize.R' 'flag.R' 'print.SciencesPo.R' 'zTree.R' 'zzz.R' Author Daniel Marcelino [aut, cre] R topics documented: alpha . . . . . . Anonymize . . Atkinson . . . . auto . . . . . . Bartels . . . . . baseball . . . . bhodrick93 . . big5 . . . . . . bollen . . . . . Bootstrap . . . bush . . . . . . Calibrateplot . cathedrals . . . cgreene76 . . . chismoke . . . CI . . . . . . . Clear . . . . . . Condorcet . . . CountMissing . Crosstable . . . CrossValidation ddirichlet . . . Describe . . . . dHondt . . . . Dummify . . . election2008 . . Flag . . . . . . Footnote . . . . Forge . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 5 6 7 8 10 10 11 12 13 14 14 15 16 17 17 18 19 19 20 21 22 23 24 25 26 27 27 28 R topics documented: Formatted . . . . . freq . . . . . . . . Frequency . . . . . galton . . . . . . . GeomSpotlight . . geom_dumbbell . . geom_lollipop . . . Gini . . . . . . . . GiniSimpson . . . griliches76 . . . . . Hamilton . . . . . Herfindahl . . . . . HighestAverages . Jackknife . . . . . JamesStein . . . . Kurtosis . . . . . . Label . . . . . . . Label<- . . . . . . LargestRemainders Lorenz . . . . . . . ltaylor96 . . . . . . MakeNumeric . . . MakeShare . . . . marriage . . . . . . MeanFromRange . mishkin92 . . . . . nerlove63 . . . . . Normalize . . . . . Outlier . . . . . . . paired . . . . . . . Palettes . . . . . . party_pal . . . . . Permutate . . . . . Plotting . . . . . . PoliticalDiversity . pollster2008 . . . . Proportionality . . pub_pal . . . . . . pub_shapes . . . . rdirichlet . . . . . . read.zTree . . . . . Recode . . . . . . religiosity . . . . . Render . . . . . . . Rosenbluth . . . . RunShinyApp . . . scale_color_party . scale_color_pub . . 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 30 31 32 32 33 35 37 38 39 40 41 42 44 45 45 47 47 48 50 51 51 52 52 53 54 55 55 57 58 59 59 60 61 63 65 66 68 69 70 71 72 73 73 74 75 75 76 4 alpha scale_fill_party . . . . scale_fill_pub . . . . . scale_shape_pub . . . Scatterplot . . . . . . . SciencesPo . . . . . . SciencesPo-deprecated SciencesPoFont . . . . sheston91 . . . . . . . Skewness . . . . . . . stature . . . . . . . . . Stratify . . . . . . . . summary.Crosstable . . swatson93 . . . . . . . Tableplot . . . . . . . Ternaryplot . . . . . . theme_blank . . . . . . theme_darkside . . . . theme_fte . . . . . . . theme_pub . . . . . . . theme_scipo . . . . . . titanic . . . . . . . . . Today . . . . . . . . . turnout . . . . . . . . . twins . . . . . . . . . . TwoWay . . . . . . . . unions . . . . . . . . . Untable . . . . . . . . Voronoi . . . . . . . . vote . . . . . . . . . . Waffleplot . . . . . . . WeightedCorrelation . WeightedMean . . . . WeightedStdz . . . . . WeightedVariance . . . Winsorize . . . . . . . words . . . . . . . . . %nin% . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 77 77 78 79 79 80 81 82 83 84 85 85 86 86 87 88 88 89 90 92 93 93 94 95 95 96 97 98 98 99 100 101 101 102 103 104 105 alpha A dataset that contains four test items Description A dataset with four test items used in SPSS to to compute Cronbach’s alpha. • q1:q4 Numeric items of a scale. Anonymize 5 Usage data(alpha) Format A data.frame object with 4 variables and 60 observations. Anonymize To Make Anonymous a Data Frame Description Replaces factor and character variables by a combination of random sampled letters and numbers. Numeric columns are also transformed, see details. Usage Anonymize(.data, col.names = "V", row.names = "") ## Default S3 method: Anonymize(.data, col.names = "V", row.names = "") Arguments .data A vector or a data frame. col.names A string for column names. row.names A string for rown names, default is empty. Details Strings will is replaced by an algorithm with 52/102 chance of choosing a letter and 50/102 chance of choosing a number; then it joins everything in a 5-digits long character string. Value An object of the same type as .data. Author(s) Daniel Marcelino, <[email protected]>. 6 Atkinson Examples dt <- data.frame( Z = sample(LETTERS,10), X = sample(1:10), Y = sample(c("yes", "no"), 10, replace = TRUE) ) dt; Anonymize(dt) Atkinson Atkinson Index of Inequality Description Calculates the Atkinson index A. This inequality measure is especially good at determining which end of the distribution is contributing most to the observed inequality. Usage Atkinson(x, n = rep(1, length(x)), epsilon = NULL, na.rm = FALSE, ...) ## Default S3 method: Atkinson(x, n = rep(1, length(x)), epsilon = NULL, na.rm = FALSE, ...) Arguments x a vector of data values of non-negative elements. n a vector of frequencies of the same length as x. epsilon a parameter of the inequality measure (if NULL, the default parameter (0.5) of the respective measure is used). na.rm logical. Should missing values be removed? Defaults is set to FALSE. ... additional arguements (currently ignored) Details epsilon = 0,5: little inequality aversion epsilon = 1,0: medium inequality aversion epsilon = 2,0: great inequality aversion References Cowell, F. A. (2000) Measurement of Inequality in Atkinson, A. B. / Bourguignon, F. (Eds): Handbook of Income Distribution. Amsterdam. Cowell, F. A. (1995) Measuring Inequality Harvester Wheatshef: Prentice Hall. auto 7 See Also Herfindahl, Rosenbluth, Gini. For more details see the “Indices” vignette. Examples if (interactive()) { # generate a vector (of incomes) # y <- c(80, 60, 10, 20, 30) # Entropy 1.392321 # Maximum Entropy 1.609438 # Normalized Entropy 0.865098 # Exponential Index 0.248498 # Herfindahl 0.285000 # Normalized Herfindahl 0.106250 # Gini Coefficient 0.360000 # Concentration Coefficient 0.450000 x <- c(778, 815, 857, 888, 925, 930, 965, 990, 1012) # compute Atkinson coefficient with epsilon=0.5 Atkinson(x, epsilon=0.5) w <- c(10, 15, 20, 25, 40, 20, 30, 35, 45, 90) } auto Statistics of 1979 automobile models Description The data give the following statistics for 74 automobiles in the 1979 model year as sold in the US. Usage data(auto) Format A data.frame object with 14 variables and 74 observations. Model Make and model of car. Origin a factor with levels A,E,J Price Price in dollars. MPG Miles per gallon. Rep78 Repair record for 1978 on 1 (worst) to 5 (best) scale. Rep77 Repair record for 1978 on 1 to 5 scale. 8 Bartels Hroom Headroom in inches. Rseat Rear seat clearance in inches. Trunk Trunk volume in cubic feet. Weight Weight in pounds. Length Length in inches. Turn Turning diameter in feet. Displa Engine displacement in cubic inches. Gratio Gear ratio for high gear. Source This data frame was created from http://euclid.psych.yorku.ca/ftp/sas/sssg/data/auto. sas. References Originally published in Chambers, Cleveland, Kleiner, and Tukey, Graphical Methods for Data Analysis, 1983, pages 352-355. Bartels Bartels Rank Test of Randomness Description Performs Bartels rank test of randomness. The default method for testing the null hypothesis of randomness is two.sided. By using the alternative left.sided, the null hypothesis is tested against a trend. By using the alternative right.sided, the null hypothesis of randomness is tested against a systematic oscillation in the observed data. Usage Bartels(x, alternative = "two.sided", pvalue = "normal") ## Default S3 method: Bartels(x, alternative = "two.sided", pvalue = "normal") Arguments x a numeric vector of data values. alternative a character string for hypothesis testing method; must be one of two.sided (default), left.sided or right.sided. pvalue A method for asymptotic aproximation used to compute the p-value. Bartels 9 Details Missing values are by default removed. The RVN test statistic is Pn−1 RV N = Pn i=1 i=1 (Ri − Ri+1 )2 (Ri − (n + 1)/2) 2 where Ri = rank(Xi ), i = 1, . . . , n. It is known that (RV N − 2)/σ is asymptotically standard 2 −2n−9) normal, where σ 2 = 4(n−2)(5n 5n(n+1)(n−1)2 . Value statistic The value of the RVN statistic test and the theoretical mean value and variance of the RVN statistic test. n the sample size, after the remotion of consecutive duplicate values. p.value the asymptotic p-value. method a character string indicating the test performed. data.name a character string giving the name of the data. alternative a character string describing the alternative. References Bartels, R. (1982). The Rank Version of von Neumann’s Ratio Test for Randomness, Journal of the American Statistical Association, 77(377), 40-46. Gibbons, J.D. and Chakraborti, S. (2003). Nonparametric Statistical Inference, 4th ed. (pp. 97-98). URL: http://books.google.pt/books?id=dPhtioXwI9cC&lpg=PA97&ots=ZGaQCmuEUq Examples # Example 5.1 in Gibbons and Chakraborti (2003), p.98. # Annual data on total number of tourists to the United States for 1970-1982. years <- 1970:1982 tourists <- c(12362, 12739, 13057, 13955, 14123, 15698, 17523, 18610, 19842, 20310, 22500, 23080, 21916) # See it graphically qplot(factor(years), tourists)+ geom_point() # Test the null against a trend Bartels(tourists, alternative="left.sided", pvalue="beta") 10 bhodrick93 baseball Bradley Efron and Carl Morris ("Stein’s Paradox in Statistics) Description A dataset used in Efron and Morris’s 1977 paper, "Stein’s Paradox in Statistics". • • • • • • • • • nameFirst and last name of selected player. atBatsnumber of times at bats. hitsnumber of hits after 45 at bats. avg45is the average after 45 at bats. remainingAtBatsremaining number at bats. remainingAvgis the remaining average to the end of the season. seasonAtBatsis the number of times at bats in the end of the season. seasonHitsis the number of hits in the end of the season. avgSeasonis the end of the season average. Usage data(baseball) Format A data.frame object with 9 variables and 18 observations. bhodrick93 Bekaert’s and Hodrick’s (1993) Data Description Data set used by Bekaert and Hodrick (1993) on biases in the measurement of foreign exchange risk premiums. This dataset contains the following columns: • • • • • • • • • • • date A character vector for date. jyspot Price of USD in JY, spot. jyfwd Price of USD in JY, 30-day forward. jys30 Price of USD in JY, spot market at 30-day forward deliver/date. dmspot Price of USD in DM, spot. dmfwd Price of USD in DM, 30-day forward dms30 Price of USD in DM, spot market at 30-day forward deliver/date. bpspot Price of USD in BP, spot. bpfwd Price of USD in BP, 30-day forward. bps30 Price of USD in BP, spot market at 30-day forward deliver/date. quote A numeric vector. big5 11 Usage data(bhodrick93) Format A data.frame object with 11 variables and 778 observations. Source Hayashi, F. (2000). Econometrics. Princeton. New Jersey, USA: Princeton University. http: //fmwww.bc.edu/ec-p/data/Hayashi/ References Bekaert, G., and Hodrick, R. J. (1993) On biases in the measurement of foreign exchange risk premiums. Journal of International Money and Finance, 12(2), 115-138. big5 Big 5 dataset Description This is a dataset of the Big 5 personality traits. It is a measurement of the Dutch translation of the NEO-PI-R on 500 first year psychology students (Dolan, Oort, Stoel, Wicherts, 2009). The test consists of fifty items that you must rate on how true they are about you on a five point scale where 1=Disagree, 3=Neutral and 5=Agree. It takes most people 3-8 minutes to complete. • NeuroticismThe level of neuroticism • ExtraversionThe level of extraversion • OpennessThe level of openness • AgreeablenessThe level of agreeableness • ConscientiousnessThe level of conscientiousness Usage data(big5) Format A data.frame object with 5 variables and 500 observations. 12 bollen References Hoekstra, H. A., Ormel, J., & De Fruyt, F. (2003). NEO-PI-R/NEO-FFI: Big 5 persoonlijkheidsvragenlijst. Handleiding Manual of the Dutch version of the NEO-PI-R/NEO-FFI. Lisse, The Netherlands: Swets and Zeitlinger. Dolan, C. V., Oort, F. J., Stoel, R. D., & Wicherts, J. M. (2009). Testing measurement invariance in the target rotates multigroup exploratory factor model. Structural Equation Modeling, 16, 295<96>314. bollen Bollen Data on Industrialization and Political Democracy Description The data were reported in Bollen (1989, p. 428, Table 9.4) This set includes data from 75 developing countries each assessed on four measures of democracy measured twice (1960 and 1965), and three measures of industrialization measured once (1960). • y1Freedom of the press, 1960 • y2Freedom of political opposition, 1960 • y3Fairness of elections, 1960 • y4Effectiveness of elected legislature, 1960 • y5Freedom of the press, 1965 • y6Freedom of political opposition, 1965 • y7Fairness of elections, 1965 • y8Effectiveness of elected legislature, 1965 • x1GNP per capita, 1960 • x2Energy consumption per capita, 1960 • x3Percentage of labor force in industry, 1960 @references Bollen, K. A. (1979). Political democracy and the timing of development. American Sociological Review, 44, 572-587. Bollen, K. A. (1980). Issues in the comparative measurement of political democracy. American Sociological Review, 45, 370-390. Usage data(bollen) Format A data.frame object with 11 variables and 75 observations. Bootstrap 13 Bootstrap Method for Bootstrapping Description this method provides bootstrapping statistics. Usage Bootstrap(x, ...) ## Default S3 method: Bootstrap(x, nboots = 30, FUN, ...) ## S3 method for class 'model' Bootstrap(x, ...) Arguments x is a vector or a fitted model object whose parameters will be used to produce bootstrapped statistics. Model objects are assumed to be of class “glm” or “lm”. nboots an integer for the number of reiteration. FUN a statistic function name to bootstrap, i.e., ’mean’, ’var’, ’cov’, etc. ... further arguments passed to or used by other methods. Value A list with “alpha” and “beta” slots. The “alpha” corresponds to ancillary parameters and “beta” to systematic components of the model. Author(s) Daniel Marcelino, <[email protected]> Examples x = runif(10, 0, 1) Bootstrap(x, FUN=mean) 14 Calibrateplot bush Approval Ratings for President George W. Bush Description Approval ratings for George W. Bush. Usage data(bush) Format A data.frame object with 5 variables and 270 observations. • • • • • start.date. Start date of the survey. end.date. End date of the survey. approve. Percent which approve of the president. disapprove. Percent which disapprove of the president. undecided. Percent undecided about the president. Details A data set of approval ratings of George Bush over the time of his presidency, as reported by several agencies. Most polls were of size approximately 1,000 so the margin of error is about 3 percentage points. @source http://www.pollingreport.com/BushJob.htm Calibrateplot Calibrate Plot Description Plots a calibration plot for overlaping observed versus actual values. Usage Calibrateplot(.data = NULL, x, y, ci = 0.95, xlab = "Forecast Probability", ylab = "Observed Frequency") Arguments .data x, y ci xlab ylab the data frame. are the forecast and observed data vectors. is the size of the confidence interval a character name for horizontal axis. a character name for vertical axis. cathedrals 15 Author(s) Daniel Marcelino Examples data <- data.frame(x=c(10,20,25,30,35,40,45,50,55,60,65,70,75,80,85,90,95,100), y=c(0,05,10,20,25,30,40,50,60,70,75,80,95,100,100,100,100,100)) Calibrateplot(data, x, y) cathedrals Cathedrals Description Heights and lengths of Gothic and Romanesque cathedrals. This dataset contains the following columns: • Type Romanesque or Gothic. • Height Total height, feet. • Length Total length, feet. @references Weisberg, S. (2014). Applied Linear Regression, 4th edition. Hoboken NJ: Wiley. Usage data(cathedrals) Format A data.frame object with 3 variables and 25 observations. 16 cgreene76 cgreene76 Christensen’s and Greene’s (1976) Data Description Data set used by Christensen and Greene (1976) on economies of scale in US electric power generation. This dataset contains the following columns: • firmid Observation id. • costs Costs in 1970, MM USD. • output Output, million KwH. • plabor Price of labor. • pkap Price of capital. • pfuel Price of fuel. • labshr Labor’s cost share. • kapshr Capital’s cost share. Usage data(cgreene76) Format A data.frame object with 8 variables and 99 observations. Source Hayashi, F. (2000). Econometrics. Princeton. New Jersey, USA: Princeton University. http: //fmwww.bc.edu/ec-p/data/Hayashi/ References Christensen, L. R., and Greene, W. H. (1976) Economies of scale in US electric power generation. The Journal of Political Economy, 655–676. chismoke 17 chismoke Smoking and Lung Cancer in China Description The Chinese smoking and lung cancer data (Agresti, Table 5.12, p. 60). • Citycity name. • Smokerif smoker or not. • Cancerif detected cancer or not. • Countthe frequencies. @references Agresti, A. An introduction to categorical Data Analysis, (2nd ed.). Wiley. Usage data(chismoke) Format A data.frame object with 4 variables and 32 observations. CI An S4 Class to Confidence Intervals Description Calculates the confidence intervals for a vector of data values. Usage CI(x, level = 0.95, alpha = 1 - level, na.rm = FALSE, ...) CI(x, level = 0.95, alpha = 1 - level, na.rm = FALSE, ...) Arguments x A vector of data values. level The confidence level. Default is 0.95. alpha The significance level. Default is 1-level. If alpha equals 0.05, then your confidence level is 0.95. na.rm A logical value, default is FALSE ... Additional arguements (currently ignored) 18 Clear Value CI lower Est. Mean Mean of data. CI upper Upper bound of interval. Std. Error Standard Error of the mean. Slots lower Lower bound of interval. mean Estimated mean. upper Upper bound of interval. stderr Standard Error of the mean. Author(s) Daniel Marcelino, <[email protected]>. Examples x <- c(1, 2.3, 2, 3, 4, 8, 12, 43, -1,-4) CI(x, level=.90) Clear Clear Memory of All Objects Description This function is a wrapper for the command rm(list=ls()). Usage Clear(what = NULL, keep = TRUE) Arguments what The object (as a string) that needs to be removed (or kept) keep Should obj be kept (i.e., everything but obj removed)? Or dropped? Author(s) Daniel Marcelino, <[email protected]>. Condorcet 19 Examples # create objects a=1; b=2; c=3; d=4; e=5 # remove d Clear("d", keep=FALSE) ls() # remove all but a and b Clear(c("a", "b"), keep=TRUE) ls() Condorcet Condorcet Voting Description Condorcet Voting. Usage Condorcet() CountMissing Count missing data. Description Function to count missing data. Usage CountMissing(.data, vars = NULL, prop = TRUE, exclude.complete = TRUE) Arguments .data the data frame. vars vector with name of variables. prop logical. If this is TRUE will show share of missing data by variable. exclude.complete Logical. Exclude complete variables from the output. Examples CountMissing(marriage, prop = FALSE) 20 Crosstable Crosstable Cross-tabulation Description Crosstable produces all possible two-way and three-way tabulations of variables. Usage Crosstable(..., digits = 2, chisq = FALSE, fisher = FALSE, mcnemar = FALSE, latex = FALSE, alt = "two.sided", deparse.level = 2) ## Default S3 method: Crosstable(..., digits = 2, chisq = FALSE, fisher = FALSE, mcnemar = FALSE, alt = "two.sided", latex = FALSE, deparse.level = 2) Arguments digits number of digits after the decimal point. chisq if TRUE, the results of a chi-square test will be shown below the table. fisher if TRUE, the results of a Fisher Exact test will be shown below the table. mcnemar if TRUE, the results of a McNemar test will be shown below the table. latex if the output is to be printed in latex format. alt the alternative hypothesis and must be one of "two.sided", "greater" or "less". Only used in the 2 by 2 case. deparse.level an integer controlling the construction of labels in the case of non-matrix-like arguments. If 0, middle 2 rownames, if 1, 3 rownames, if 2, 4 rownames (default). ... the variables for the cross tabulation. Value A cross-tabulated object. See Also xtabs, Frequency, table, prop.table Other Crosstables: summary.Crosstable CrossValidation 21 Examples with(titanic, Crosstable( SEX, AGE, SURVIVED)) #' # Agresti (2002), table 3.10, p. 106 # 1992 General Social Survey--Race and Party affiliation gss <- data.frame( expand.grid(Race=c("black", "white"), party=c("dem", "indep", "rep")), count=c(103,341,15,105,11,405)) df <- gss[rep(1:nrow(gss), gss[["count"]]), ] with(df, Crosstable(Race, party)) # Agresti (1990, p. 61f; 2002, p. 91) Fisher's Tea Drinker tea <- data.frame( expand.grid(poured=c("Yes", "No"), guess=c("Yes", "No")), count=c(3,1,1,3)) # nicer way of recreating long tables data = Untable(tea, freq="count") with(data, Crosstable(poured, guess, fisher=TRUE, alt="greater")) # fisher=TRUE CrossValidation Cross Validation of SciencesPo Trees and Forests Description Cross Validation of SciencesPo Trees and Forests. Usage CrossValidation(object, ...) ## S3 method for class 'Tree' CrossValidation(object, FUN, ...) ## S3 method for class 'Forest' CrossValidation(object, FUN, ...) Arguments object a fitted Tree or Forest object in the formula. ... additional arguments. FUN either tree or forest 22 ddirichlet Value An object of class CV with elements: call the function call Author(s) Daniel Marcelino, <[email protected]>. ddirichlet Dirichlet Distribution Description Density function and random number generation for the Dirichlet distribution Usage ddirichlet(x, alpha, log = FALSE, sum = FALSE) Arguments x a matrix containing observations. alpha the Dirichlet distribution’s parameters. Can be a vector (one set of parameters for all observations) or a matrix (a different set of parameters for each observation), see “Details”. log if TRUE, logarithmic densities are returned. sum if TRUE, the (log-)likelihood is returned. Value the ddirichlet returns a vector of densities (if sum = FALSE) or the (log-)likelihood (if sum = TRUE) for the given data and alphas. Author(s) Daniel Marcelino, <[email protected]> Examples mat <- cbind(1:10, 5, 10:1); mat; x <- rdirichlet(10, mat); ddirichlet(x, mat); Describe Describe 23 Summary Description Table Description Produce summary description table of a dataframe with the following features: variable names, labels, factor levels, frequencies or summary statistics. Usage Describe(.data, style = "multiline", justify = "left", max.number = 10, trim.strings = FALSE, max.width = 15, digits = 2, split.cells = 35, display.labels = FALSE, display.attributes = FALSE, file = NA, append = FALSE, escape.pipe = FALSE, ...) Arguments .data a data frame. style the style to be used in pander table, one of “multiline”, “grid”, “simple”, and “rmarkdown”. justify pander argument; defaults is justified to “left”. max.number an integer for the maximum number of items to be displayed in the frequency cell. If variable has more distinct values, no frequency will be shown (only a message stating the number of distinct values). trim.strings remove white spaces at the beginning or end of the string. This may impact the frequencies, so interpret the frequencies cell accordingly. Defaults to FALSE. max.width Limits the number of characters to display in the frequency tables. Defaults to 15. digits the number of digits for rounding. split.cells pander argument. Number of characters allowed on a line before splitting the cell. Defaults to 35. display.labels If TRUE, variable labels will be displayed. Defaults to FALSE. display.attributes If TRUE, variable attibutes will be displayed. Defaults to FALSE. file the text file to be written to disk. Defaults to NA. append When “file” argument is supplied, this indicates whether to append output to existing file (TRUE) or to overwrite any existing file (FALSE, default). If TRUE and no file exists, a new file will be created. escape.pipe Only useful when style='grid' and file argument is not NA, in which case it will escape the pipe character (|) to allow Pandoc to correctly convert multiline cells. ... additional arguments passed to pander. 24 dHondt Details The IQR formula used is the R standard IQR(x) = quantile(x, 3/4) - quantile(x, 1/4). The (CV) stands for the coefficient of variation also known as relative standard deviation (RSD), defined as the ratio of the standard deviation to the mean. Author(s) Daniel Marcelino, <[email protected]> Examples data(religiosity) Describe(religiosity) dHondt The D’Hondt Method of Allocating Seats Proportionally Description The function calculates seats allotment in legislative house, given the total number of seats and the votes for each party based on the Victor D’Hondt’s method (1878). The D’Hondt’s method is mathematically equivalent to the method proposed by Thomas Jefferson few years before (1792). Usage dHondt(parties = NULL, votes = NULL, seats = NULL, ...) dHondt(parties = NULL, votes = NULL, seats = NULL, ...) Arguments parties a vector containig parties labels or candidates accordingly to the votes vector order. votes a vector containing the total number of formal votes received by the parties/candidates. seats an integer for the number of seats to be filled (the district magnitude). ... Additional arguements (currently ignored) Value A data.frame of length parties containing apportioned integers (seats) summing to seats. Note Adapted from Carlos Bellosta’s replies in the R-list. Dummify 25 Author(s) Daniel Marcelino, <[email protected]>. References Lijphart, Arend (1994). Electoral Systems and Party Systems: A Study of Twenty-Seven Democracies, 1945-1990. Oxford University Press. See Also HighestAverages, LargestRemainders, Hamilton, PoliticalDiversity. Examples votes <- sample(1:10000, 5) parties <- sample(LETTERS, 5) dHondt(parties, votes, seats = 4) # Example: 2014 Brazilian election for the lower house in # the state of Ceara. Coalitions were leading by the # following parties: results <- c(DEM=490205, PMDB=1151547, PRB=2449440, PSB=48274, PSTU=54403, PTC=173151) dHondt(parties=names(results), votes=results, seats=19) Dummify Generate Dummy Variables in a Data Frame Description Quickly create dummies in a data frame. Usage Dummify(x, .data = NULL, drop = TRUE) Arguments x the column position to generate dummies .data the a data.frame. drop A logical value. If TRUE, unused levels will be omitted. 26 election2008 Details A matrix object Author(s) Daniel Marcelino, <[email protected]> Examples df <- data.frame(y = rnorm(10), x = runif(10,0,1), sex = sample(1:2, 10, rep=TRUE)) Dummify(df$sex) ; cbind(df, Dummify(df$sex)); election2008 2008 U.S. presidential election Description State-by-state information from the 2008 U.S. presidential election. Usage data(election2008) Format A data.frame object with 7 variables and 51 observations. • State Name of the state. • Abr Abbreviation for the state. • Income Per capita income in the state as of 2007 (in dollars). • HS Percentage of adults with at least a high school education. • BA Percentage of adults with at least a college education. • Dem.Rep Difference in %Democrat-%Republican (according to 2008 Gallup survey). • ObamaWin 1= Obama (Democrat) wins state in 2008 or 0=McCain (Republican wins). @source State income data from: Census Bureau Table 659. Personal Income Per Capita (in 2007). High school data from: U.S. Census Bureau, 1990 Census of Population, http://www.nces.ed. gov/programs/digest/d08/tables/dt08_011.asp. College data from: Census Bureau Table 225. Educational Attainment by State (in 2007) Democrat and Republican data from: http:// www.gallup.com/poll/114016/state-states-political-party-affiliation.aspx Flag 27 Flag Add an "id" Variable to a Dataset Description Many functions will not work properly if there are duplicated ID variables in a dataset. This function is a convenience function for .N from the "data.table" package to create an .id. variable that when used in conjunction with the existing ID variables, should be unique. Usage Flag(.data, id.vars = NULL) Arguments .data The input data.frame or data.table. id.vars The variables that should be treated as ID variables. Defaults to NULL, at which point all variables are used to create the new ID variable. Value The input dataset (as a data.table) if ID variables are unique, or the input dataset with a new column named ".ID". Examples df <- data.frame(A = c("a", "a", "a", "b", "b"), B = c(1, 1, 1, 1, 1), values = 1:5); df df = Flag(df, c("A", "B")) df <- data.frame(A = c("a", "a", "a", "b", "b"), B = c(1, 2, 1, 1, 2), values = 1:5) df (df <- Flag(df, 1:2) ) Footnote Add Footnote to ggplot2 Objects Description Add footnote to ggplot2 objects. 28 Forge Usage Footnote(note, x = 0.85, y = 0.014, hjust = 0.5, vjust = 0.5, newlines = TRUE, fontfamily = "serif", fontface = "plain", color = "gray85", size = 9, angle = 0, lineheight = 0.9, alpha = 1) Arguments note x y hjust vjust newlines fontfamily fontface color size angle lineheight alpha Forge string or plotmath expression to be drawn. the x location of the label. the y location of the label. horizontal justification vertical justification should a new line be appended to the start and end of the string, for spacing? the font family the font face ("plain", "bold", etc.) text color point size of text angle at which text is drawn line height of text the alpha value of the text Veritical, left-aligned layout for plots Description Left-align the waffle plots by x-axis. Use the pad parameter in waffle to pad each plot to the max width (num of squares), otherwise the plots will be scaled. Usage Forge(...) Arguments ... one or more waffle plots Examples parts <- c(80, 30, 20, 10) w1 <- Waffleplot(parts, rows=8) w2 <- Waffleplot(parts, rows=8) w3 <- Waffleplot(parts, rows=8) chart <- Forge(w1, w2, w3) print(chart) Formatted 29 Formatted Some Formats for Nicer Display Description Some predefined formats for nicer display. Usage Formatted(x, style = c("USD", "BRL", "EUR", "Perc"), digits = 2, nsmall = 2, decimal.mark = getOption("OutDec"), flag = "") Arguments x a numeric vector. style a character name for style. One of "USD", "BRL", "EUR", "Perc". digits an integer for the number of significant digits to be used for numeric and complex x nsmall an integer for the minimum number of digits to the right of the decimal point. decimal.mark decimal mark style to be used with Percents (%), usually (",") or ("."). flag a character string giving a format modifier as "-", "+", "#". Examples x <- as.double(c(0.1, 1, 10, 100, 1000, 10000)) Formatted(x) Formatted(x, "BRL") Formatted(x, "EUR") p = c(0.25, 25, 50) Formatted(p, "Perc", flag="+") Formatted(p, "Perc", decimal.mark=",") 30 freq freq Simple Frequency Table Description Creates a frequency table or data frame. Usage freq(x, weighs = NULL, breaks = graphics::hist(x, plot = FALSE)$breaks, digits = 3, include.lowest = TRUE, order = c("desc", "asc", "level", "name"), perc = FALSE, useNA = c("no", "ifany", "always"), ...) ## Default S3 method: freq(x, weighs = NULL, breaks = graphics::hist(x, plot = FALSE)$breaks, digits = 3, include.lowest = TRUE, order = c("desc", "asc", "level", "name"), perc = FALSE, useNA = c("no", "ifany", "always"), ...) Arguments x A vector of values for which the frequency is desired. weighs A vector of weights. breaks one of: 1) a vector giving the breakpoints between histogram cells; 2) a function to compute the vector of breakpoints; 3) a single number giving the number of cells for the histogram; 4) a character string naming an algorithm to compute the number of cells (see ’Details’); 5) a function to compute the number of cells. digits The number of significant digits required. include.lowest Logical; if TRUE, an x[i] equal to the breaks value will be included in the first (or last) category or bin. order The order method. perc logical; if TRUE percents are returned. useNA Logical; if TRUE NA’s values are included. ... Additional arguements (currently ignored) Author(s) Daniel Marcelino, <[email protected]>. See Also Frequency, Crosstable. Frequency 31 Frequency Frequency Table Description Simulating the FREQ procedure of SPSS. Usage Frequency(.data, x = NULL, verbose = TRUE, ...) ## Default S3 method: Frequency(.data, x = NULL, verbose = TRUE, ...) Freq(.data, x = NULL, verbose = TRUE, ...) Arguments .data The data.frame. x A column for which a frequency of values is desired. verbose A logical value, if TRUE, extra statistics are also provided. ... Additional arguements (currently ignored) See Also freq, Crosstable. Examples data(cathedrals) Frequency(cathedrals, Type) # may work with operators like %>% cathedrals %>% Frequency(Height) 32 GeomSpotlight galton Galton’s Family Data on Human Stature. Description It is a reproduction of the data set used by Galton in his 1885’s paper on correlation between parent’s height and their children, although the concept of correlation would be only introduced few years later, in 1888. This data.frame contains the following columns: • parent The parents’ average height • child The child’s height Usage data(galton) Format A data.frame object with 2 variables and 928 observations. Details Regression analysis is the statistical method most often used in political science research. The reason is that most scholars are interested in identifying “causal” effects from non-experimental data and that the linear regression is the method for doing this. The term “regression” (1889) was first crafted by Sir Francis Galton upon investigating the relationship between body size of fathers and sons. Thereby he “invented” regression analysis by estimating: Ss = 85.7 + 0.56SF , meaning that the size of the son regresses towards the mean. References Francis Galton (1886) Regression Towards Mediocrity in Hereditary Stature. The Journal of the Anthropological Institute of Great Britain and Ireland, Vol. 15, pp. 246–263. GeomSpotlight Automatically enclose points in a polygon Description Automatically enclose points in a polygon Usage geom_spotlight(mapping = NULL, data = NULL, stat = "identity", position = "identity", na.rm = FALSE, show.legend = NA, inherit.aes = TRUE, ...) geom_dumbbell 33 Arguments mapping mapping data data stat stat position position na.rm na.rm show.legend show.legend inherit.aes inherit.aes ... dots Value adds a circle around the specified points Examples d <- data.frame(x=c(1,1,2),y=c(1,2,2)*100) gg gg gg gg <- ggplot(d,aes(x,y)) <- gg + scale_x_continuous(expand=c(0.5,1)) <- gg + scale_y_continuous(expand=c(0.5,1)) + geom_spotlight(s_shape=1, expand=0) + geom_point() gg <- ggplot(mpg, aes(displ, hwy)) ss <- subset(mpg,hwy>29 & displ<3) gg + geom_spotlight(data=ss, colour="blue", s_shape=.8, expand=0) + geom_point() + geom_point(data=ss, colour="blue") geom_dumbbell Dumbell charts Description The dumbbell geom is used to create dumbbell charts. Dumbbell dot plots–dot plots with two or more series of data–are an alternative to the clustered bar chart or slope graph. Usage geom_dumbbell(mapping = NULL, data = NULL, ..., point.colour.l = NULL, point.size.l = NULL, point.colour.r = NULL, point.size.r = NULL, na.rm = FALSE, show.legend = NA, inherit.aes = TRUE) 34 geom_dumbbell Arguments mapping Set of aesthetic mappings created by aes or aes_. If specified and inherit.aes = TRUE (the default), it is combined with the default mapping at the top level of the plot. You must supply mapping if there is no plot mapping. data The data to be displayed in this layer. There are three options: If NULL, the default, the data is inherited from the plot data as specified in the call to ggplot. A data.frame, or other object, will override the plot data. All objects will be fortified to produce a data frame. See fortify for which variables will be created. A function will be called with a single argument, the plot data. The return value must be a data.frame., and will be used as the layer data. ... other arguments passed on to layer. These are often aesthetics, used to set an aesthetic to a fixed value, like color = "red" or size = 3. They may also be parameters to the paired geom/stat. point.colour.l the colour of the left point point.size.l the size of the left point point.colour.r the colour of the right point point.size.r the size of the right point na.rm If FALSE (the default), removes missing values with a warning. If TRUE silently removes missing values. show.legend logical. Should this layer be included in the legends? NA, the default, includes if any aesthetics are mapped. FALSE never includes, and TRUE always includes. inherit.aes If FALSE, overrides the default aesthetics, rather than combining with them. This is most useful for helper functions that define both data and aesthetics and shouldn’t inherit behaviour from the default plot specification, e.g. borders. Aesthetics geom_segment understands the following aesthetics (required aesthetics are in bold): • x • xend • y • yend • alpha • colour • linetype • size geom_lollipop 35 Examples df <- data.frame(trt=LETTERS[1:5], l=c(20, 40, 10, 30, 50), r=c(70, 50, 30, 60, 80)) ggplot(df, aes(y=trt, x=l, xend=r)) + geom_dumbbell() ggplot(df, aes(y=trt, x=l, xend=r)) + geom_dumbbell(color="#a3c4dc", size=0.75, point.colour.l="#0e668b") geom_lollipop Lollipop charts Description The lollipop geom is used to create lollipop charts. Lollipop charts are the creation of Andy Cotgreave going back to 2011. They are a combination of a thin segment, starting at with a dot at the top and are a suitable alternative to or replacement for bar charts. Geom Proto Usage geom_lollipop(mapping = NULL, data = NULL, ..., horizontal = FALSE, point.colour = NULL, point.size = NULL, stem.colour = NULL, stem.size = NULL, na.rm = FALSE, show.legend = NA, inherit.aes = TRUE) Arguments mapping data ... horizontal Set of aesthetic mappings created by aes or aes_. If specified and inherit.aes = TRUE (the default), it is combined with the default mapping at the top level of the plot. You must supply mapping if there is no plot mapping. The data to be displayed in this layer. There are three options: If NULL, the default, the data is inherited from the plot data as specified in the call to ggplot. A data.frame, or other object, will override the plot data. All objects will be fortified to produce a data frame. See fortify for which variables will be created. A function will be called with a single argument, the plot data. The return value must be a data.frame., and will be used as the layer data. other arguments passed on to layer. These are often aesthetics, used to set an aesthetic to a fixed value, like color = "red" or size = 3. They may also be parameters to the paired geom/stat. If is FALSE (the default), the function will draw the lollipops up from the X axis (i.e. it will set xend to x & yend to 0). If TRUE, it wiill set yend to y & xend to 0). Make sure you map the x & y aesthetics accordingly. This parameter helps avoid the need for coord_flip(). 36 geom_lollipop point.colour the colour of the point point.size the size of the point stem.colour the colour of the stem stem.size the size of the stem na.rm If FALSE (the default), removes missing values with a warning. If TRUE silently removes missing values. show.legend logical. Should this layer be included in the legends? NA, the default, includes if any aesthetics are mapped. FALSE never includes, and TRUE always includes. inherit.aes If FALSE, overrides the default aesthetics, rather than combining with them. This is most useful for helper functions that define both data and aesthetics and shouldn’t inherit behaviour from the default plot specification, e.g. borders. Details Use the horizontal parameter to abate the need for coord_flip() (see the Arguments section for details). Aesthetics geom_point understands the following aesthetics (required aesthetics are in bold): • x • y • alpha • colour • fill • shape • size • stroke Examples df <- data.frame(trt=LETTERS[1:10], value=seq(100, 10, by=-10)) ggplot(df, aes(trt, value)) + geom_lollipop() ggplot(df, aes(value, trt)) + geom_lollipop(horizontal=TRUE) Gini 37 Gini Gini Coefficient G Description Computes the unweighted and weighted Gini index of a distribution. Usage Gini(x, weights, ...) ## Default S3 method: Gini(x, weights = rep(1, length = length(x)), ...) Arguments x a data.frame, a matrix-like, or a vector. weights a vector containing weights for x. ... additional arguements (currently ignored) Details One might say the Gini is oversensitive to changes in the middle, while undersensitive at the extremes. The G coefficient doesn’t capture very explicitly changes in the top 10 inequality research in the past 10 years – or the bottom 40 most poverty lies. The alternative Palma ratio does. Author(s) Daniel Marcelino, <[email protected]>. See Also GiniSimpson, Lorenz, Herfindahl, Rosenbluth, Atkinson.. Examples # generate a vector (of incomes) x <- c(778, 815, 857, 888, 925, 930, 965, 990, 1012) # compute Gini coefficient Gini(x) # For Gini index: Gini(x)*100 Gini(c(100,0,0,0), c(1,33,33,33)) # Considers this A <- c(20000, 30000, 40000, 50000, 60000) B <- c(9000, 40000, 48000, 48000, 55000) 38 GiniSimpson Gini(A); Gini(B); GiniSimpson Gini-Simpson Index Description Computes the Gini/Simpson coefficient. NAs from the data are omitted. Usage GiniSimpson(x, na.rm = TRUE, ...) ## Default S3 method: GiniSimpson(x, na.rm = TRUE, ...) Arguments x na.rm ... a data.frame, a matrix-like, or a vector. a logical value to deal with NAs. additional arguements (currently ignored). Details The Gini-Simpson quadratic index is a classic measure of diversity, widely used by social scientists and ecologists. The Gini-Simpson is also known as Gibbs-Martin index in sociology, psychology and management studies, which PRin turn is also known as the Blau index. The Gini-Simpson index is computed as 1 − λ = 1 − i=1 p2i = 1 − 1/2 D. Author(s) Daniel Marcelino, <[email protected]>. See Also Gini. Examples # generate a vector (of incomes) x <- as.table(c(69,50,40,22)) # let's say AB coalesced rownames(x) <- c("AB","C","D","E") print(x) GiniSimpson(x) griliches76 griliches76 39 Griliches’s (1976) Data Description Data set used by Griliches (1976) on wages of very young men. This dataset contains the following columns: • • • • • • • • • • • • • • • • • • • • rns residency in South. rns80 a numeric vector. mrt marital status = 1 if married. mrt80 a numeric vector. smsa reside metro area = 1 if urban. smsa80 a numeric vector. med mother’s education, years. iq iq score. kww score on knowledge in world of work test. year Year. age a numeric vector. age80 a numeric vector. s completed years of schooling. s80 a numeric vector. expr experience, years. expr80 a numeric vector. tenure tenure, years. tenure80 a numeric vector. lw log wage. lw80 a numeric vector. Usage data(griliches76) Format A data.frame object with 20 variables and 758 observations. Source Hayashi, F. (2000). Econometrics. Princeton. New Jersey, USA: Princeton University. http: //fmwww.bc.edu/ec-p/data/Hayashi/ References Griliches, Z. (1976) Wages of very young men. The Journal of Political Economy, 84(4), 69–85. 40 Hamilton Hamilton The Hamilton Method of Allocating Seats Proportionally Description Computes the Alexander Hamilton’s apportionment method (1792), also known as Hare-Niemeyer method or as Vinton’s method. The Hamilton method is a largest-remainder method which uses the Hare Quota. Usage Hamilton(parties = NULL, votes = NULL, seats = NULL, ...) Hamilton(parties = NULL, votes = NULL, seats = NULL, ...) Arguments parties A vector containig parties labels or candidates in the same order of votes. votes A vector with the formal votes received by the parties/candidates. seats An integer for the number of seats to be returned. ... Additional arguements (currently ignored) Details The Hamilton/Vinton Method sets the divisor as the proportion of the total population per house seat. After each state’s population is divided by the divisor, the whole number of the quotient is kept and the fraction dropped resulting in surplus house seats. Then, the first surplus seat is assigned to the state with the largest fraction after the original division. The next is assigned to the state with the second-largest fraction and so on. Value A data.frame of length parties containing apportioned integers (seats) summing to seats. Author(s) Daniel Marcelino, <[email protected]>. References Lijphart, Arend (1994). Electoral Systems and Party Systems: A Study of Twenty-Seven Democracies, 1945-1990. Oxford University Press. See Also dHondt, HighestAverages, LargestRemainders, PoliticalDiversity. Herfindahl 41 Examples votes <- sample(1:10000, 5) parties <- sample(LETTERS, 5) Hamilton(parties, votes, seats = 4) Herfindahl Herfindahl Index of Concentration Description Calculates the Herfindahl Index of concentration. Usage Herfindahl(x, n = rep(1, length(x)), base = 1, na.rm = FALSE, ...) ## Default S3 method: Herfindahl(x, n = rep(1, length(x)), base = NULL, na.rm = FALSE, ...) Arguments x a vector of data values of non-negative elements. n a vector of frequencies of the same length as x. base a parameter of the concentration measure (if NULL, the default parameter (1.0) is used instead). na.rm a logical. Should missing values be removed? The Default is set to na.rm=FALSE. ... additional arguements (currently ignored) Details This index is also known as the Simpson Index in ecology, the Herfindahl-Hirschman Index (HHI) in economics, and as the Effective Number of Parties (ENP) in political science. References Cowell, F. A. (2000) Measurement of Inequality in Atkinson, A. B. / Bourguignon, F. (Eds): Handbook of Income Distribution. Amsterdam. Cowell, F. A. (1995) Measuring Inequality Harvester Wheatshef: Prentice Hall. See Also Atkinson, Rosenbluth, PoliticalDiversity, Gini. For more details see the Indices vignette: vignette("Indices", package = "SciencesPo") 42 HighestAverages Examples # generate a vector (of incomes) x <- c(778, 815, 857, 888, 925, 930, 965, 990, 1012) # compute the Herfindahl coefficient with base=1 Herfindahl(x, base=1) HighestAverages Highest Averages Methods of Allocating Seats Proportionally Description Computes the highest averages method for a variety of formulas of allocating seats proportionally. Usage HighestAverages(parties = NULL, votes = NULL, seats = NULL, method = c("dh", "sl", "msl", "danish", "hsl", "hh", "imperiali", "wb", "jef", "ad", "hb"), threshold = 0, ...) ## Default S3 method: HighestAverages(parties = NULL, votes = NULL, seats = NULL, method = c("dh", "sl", "msl", "danish", "hsl", "hh", "imperiali", "wb", "jef", "ad", "hb"), threshold = 0, ...) Arguments parties A character vector for parties labels or candidates in the same order as votes. If NULL, alphabet will be assigned. votes A numeric vector for the number of formal votes received by each party or candidate. seats The number of seats to be filled (scalar or vector). method A character name for the method to be used. See details. threshold A numeric value between (0~1). Default is set to 0. ... Additional arguements (currently ignored) Details The following methods are available: • "dh"d’Hondt method • "sl"Sainte-Lague method • "msl"Modified Sainte-Lague method HighestAverages 43 • "danish"Danish modified Sainte-Lague method • "hsl"Hungarian modified Sainte-Lague method • "imperiali"The Italian Imperiali (not to be confused with the Imperiali Quota, which is a Largest remainder method) • "hh"Huntington-Hill method • "wb"Webster’s method • "jef"Jefferson’s method • "ad"Adams’s method • "hb"Hagenbach-Bischoff method Value A data.frame of length parties containing apportioned integers (seats) summing to seats. Author(s) Daniel Marcelino, <[email protected]>. References Gallagher, Michael (1992). "Comparing Proportional Representation Electoral Systems: Quotas, Thresholds, Paradoxes and Majorities". British Journal of Political Science, 22, 4, 469-496. Lijphart, Arend (1994). Electoral Systems and Party Systems: A Study of Twenty-Seven Democracies, 1945-1990. Oxford University Press. See Also LargestRemainders, Proportionality, PoliticalDiversity. For more details see the Indices vignette: vignette('Indices', package = 'SciencesPo'). Examples # Results for the state legislative house of Ceara (2014): votes <- c(187906, 326841, 132531, 981096, 2043217, 15061, 103679,109830, 213988, 67145, 278267) parties <- c("PCdoB", "PDT", "PEN", "PMDB", "PRB", "PSB", "PSC", "PSTU", "PTdoB", "PTC", "PTN") HighestAverages(parties, votes, seats = 42, method = "dh") # Let's create a data.frame with typical election results # with the following parties and votes to return 10 seats: my_election_data <- data.frame( party=c("Yellow", "White", "Red", "Green", "Blue", "Pink"), votes=c(47000, 16000,15900,12000,6000,3100)) HighestAverages(my_election_data$party, my_election_data$votes, 44 Jackknife seats = 10, method="dh") # How this compares to the Sainte-Lague Method (dat= HighestAverages(my_election_data$party, my_election_data$votes, seats = 10, method="sl")) # Plot it # Barplot(data=dat, "Party", "Seats") + # theme_fte() Jackknife Resamples Data Using the Jackknife Method Description This function is used for estimating standard errors when the distribution is not know. Usage Jackknife(x, p) Arguments x p A vector An function name for estimation of parameter as a string Value est orignial estimation of parameter jkest jackknife estimation of parameter jkvar jackknife estimation of variance jkbias jackknife estimate of biasness of parameter jkbiascorr bias corrected parameter estimate Author(s) Daniel Marcelino, <[email protected]> Examples x = runif(10, 0, 1) mean(x) Jackknife(x,'mean') JamesStein JamesStein 45 James-Stein Shrunken Estimates Description Computes James-Stein shrunken estimates of cell means given a response variable (which may be binary) and a grouping indicator. Usage JamesStein(x, tau2 = 1) JamesStein(x, tau2 = 1) Arguments x a vector, matrix, or array of numerics; the data. tau2 a positive numeric. The variance, assumed known and defaults to 1. # @param k the grouping factor. ... extra parameters typically ignored. Details Missing values are by default removed. References Efron, Bradley and Morris, Carl (1977) “Stein’s Paradox in Statistics.” Scientific American Vol. 236 (5): 119-127. James, W., & Stein, C. (1961, June). Estimation with quadratic loss. In Proceedings of the fourth Berkeley symposium on mathematical statistics and probability (Vol. 1, No. 1961, pp. 361-379). Kurtosis Compute the Kurtosis Description Provides three methods for performing kurtosis test. Usage Kurtosis(x, na.rm = FALSE, type = 3) 46 Kurtosis Arguments x a numeric vector na.rm a logical value for na.rm, default is na.rm=FALSE. type an integer between 1 and 3 selecting one of the algorithms for computing kurtosis detailed below Details Kurtosis measures the peakedness of a data distribution. A distribution with zero kurtosis has a shape as the normal curve (mesokurtic, or mesokurtotic). A positive kurtosis has a curve more peaked about the mean and the its shape is narrower than the normal curve (leptokurtic, or leptokurtotic). A distribution with negative kurtosis has a curve less peaked about the mean and the its shape is flatter than the normal curve (platykurtic, or platykurtotic). Value An object of the same type as x. Note To be consistent with classical use of kurtosis in political science analyses, the default type is the same equation used in SPSS and SAS, which is the bias-corrected formula: Type 2: G_2 = ((n + 1) g_2+6) * (n-1)/(n-2)(n-3). When you set type to 1, the following equation applies: Type 1: g_2 = m_4/m_2^2-3. When you set type to 3, the following equation applies: Type 3: b_2 = m_4/s^4-3 = (g_2+3)(1-1/n)^2-3. You must have at least 4 observations in your vector to apply this function. Skewness and Kurtosis are functions to measure the third and fourth central moment of a data distribution. Author(s) Daniel Marcelino, <[email protected]> References Balanda, K. P. and H. L. MacGillivray. (1988) Kurtosis: A Critical Review. The American Statistician, 42(2), pp. 111–119. Examples w<-sample(4,10, TRUE) x <- sample(10, 1000, replace=TRUE, prob=w) Kurtosis(x, type=1) Kurtosis(x, type=2) Kurtosis(x) Label 47 Label Get Variable Label Description This function returns character value previously stored in variable’s label attribute. If none found, and fallback argument is set to TRUE, the function returns object’s name (retrieved by deparse(substitute(x))), otherwise NA is returned with a warning notice. Usage Label(x, fallback = TRUE, simplify = TRUE) Arguments x an R object to extract labels from. fallback a logical indicating if labels should fallback to object name(s). Default is set to TRUE. simplify a logical indicating if coerce results to a vector, otherwise, a list is returned. Default is set to TRUE. Author(s) Daniel Marcelino, <[email protected]> Examples x <- rnorm(100) Label(x) # returns "x" Label(twins$IQb) <- "IQ biological scores" Label(twins, FALSE) Label<- # returns NA where no labels are found Set Variable Label Description This function sets a label to a variable, by storing a character string to its label attribute. Usage Label(x) <- value 48 LargestRemainders Arguments x value a valid variable i.e. a non-NULL atomic vector that has no dimension attribute (see dim for details). a character value that is to be set as variable label See Also label Examples ## Not run: Label(religiosity$Religiosity) <- "Religiosity index" vec <- rnorm(100) Label(vec) <- "random numbers" ## End(Not run) LargestRemainders Largest Remainders Methods of Allocating Seats Proportionally Description Computes the largest remainders method for a variety of formulas of allocating seats proportionally. Usage LargestRemainders(parties = NULL, votes = NULL, seats = NULL, method = c("hare", "droop", "hagb", "imperiali", "imperiali.adj"), threshold = 0, ...) ## Default S3 method: LargestRemainders(parties = NULL, votes = NULL, seats = NULL, method = c("hare", "droop", "hagb", "imperiali", "imperiali.adj"), threshold = 0, ...) Arguments parties votes seats method threshold ... A character vector for parties labels or candidates in the order as votes. If NULL, a random combination of letters will be assigned. A numeric vector for the number of formal votes received by each party or candidate. The number of seats to be filled (scalar or vector). A character name for the method to be used. See details. A numeric value between (0~1). Default is set to 0. Additional arguements (currently ignored) LargestRemainders 49 Details The following methods are available: • "droop"Droop quota method • "hare"Hare method • "hagb"Hagenbach-Bischoff • "imperiali"Imperiali quota (do not confuse with the Italian Imperiali, which is a highest averages method) • "imperiali.adj"Reinforced or adjusted Imperiali quota Value A data.frame of length parties containing apportioned integers (seats) summing to seats. Author(s) Daniel Marcelino, <[email protected]>. References Gallagher, Michael (1992). "Comparing Proportional Representation Electoral Systems: Quotas, Thresholds, Paradoxes and Majorities". British Journal of Political Science, 22, 4, 469-496. Lijphart, Arend (1994). Electoral Systems and Party Systems: A Study of Twenty-Seven Democracies, 1945-1990. Oxford University Press. See Also HighestAverages, Proportionality, PoliticalDiversity. For more details see the Indices vignette: vignette('Indices', package = 'SciencesPo'). Examples # Let's create a data.frame with typical election results # with the following parties and votes to return 10 seats: my_election_data <- data.frame( party=c("Yellow", "White", "Red", "Green", "Blue", "Pink"), votes=c(47000, 16000,15900,12000,6000,3100)) LargestRemainders(my_election_data$party, my_election_data$votes, seats = 10, method="droop") with(my_election_data, LargestRemainders(party, votes, seats = 10, method="hare")) 50 Lorenz Lorenz The Lorenz Curve Description Computes the (empirical) ordinary and generalized Lorenz curve of a vector. Usage Lorenz(x, n = rep(1, length(x)), plot = FALSE, ...) ## Default S3 method: Lorenz(x, n = rep(1, length(x)), plot = FALSE, ...) Arguments x A vector of non-negative values. n A vector of frequencies of the same length as x. plot A logical. If TRUE the Lorenz curve will be plotted. ... Additional arguements (currently ignored) Details The Gini coefficient ranges from a minimum value of zero, when all individuals are equal, to a theoretical maximum of one in an infinite population in which every individual except one has a size of zero. It has been shown that the sample Gini coefficients originally defined need to be multiplied by n/(n-1) in order to become unbiased estimators for the population coefficients. Author(s) Daniel Marcelino, <[email protected]>. See Also Gini, GiniSimpson. Examples # generate a vector (of incomes) x <- c(778, 815, 857, 888, 925, 930, 965, 990, 1012) # compute Lorenz values Lorenz(x) # generate some weights: wgt <- runif(n=length(x)) # compute the lorenz with especific weights Lorenz(x, wgt) ltaylor96 ltaylor96 51 Lothian’s and Taylor’s (1996) Data Set Description Data used by Lothian and Taylor (1996) on the real exchange rate behaviour. This dataset contains the following columns: • year Year • spot dollar/sterling exchange rate. • USwpi U.S. wholesale price index, 1914==100. • UKwpi U.K. wholesale price index, 1914==100. Usage data(ltaylor96) Format A data.frame object with 4 variables and 200 observations. Source Hayashi, F. (2000). Econometrics. Princeton. New Jersey, USA: Princeton University. http: //fmwww.bc.edu/ec-p/data/Hayashi/ References Lothian, J. R., and Taylor, M. P. (1996) Real exchange rate behavior: the recent float from the perspective of the past two centuries. Journal of Political Economy, 488–509. MakeNumeric Clean up a character vector to make it numeric Description Remove commas and whitespace, primarily Usage MakeNumeric(x) Arguments x character vector to process 52 marriage Value numeric vector MakeShare Clean up a character vector to make it a percent Description Remove " Usage MakeShare(x) Arguments x character vector to process Value numeric vector marriage Same Sex Marriage Public Opinion Data Description Data set fielded by the PEW Research Center on same sex marriage support in US. It covers public opinion on the issue starting from 1996 up to date. This dataset contains the following columns: • Date The year of the measurement • Oppose Percent opposing same-sex marriage • Favor Percent favoring same-sex marriage • DK Percent of Don’t Know Usage data(marriage) Format A data.frame object with 4 variables and 18 observations. References PEW Research Center. Support for same-sex marriage. http://www.pewresearch.org MeanFromRange MeanFromRange 53 Estimates Mean and Standard Deviation from Median and Range Description When conductig a meta-analysis study, it is not always possible to recover from reports, the mean and standard deviation values, but rather the median and range of values. This function provides a method for computing the mean and variance estimates from median/range or IQR estimates. Usage MeanFromRange(low, med, high, n) Arguments low The min of the data. med The median of the data. high The max of the data n The size of the sample. Note One should use these calculations ONLY if there are strong hints, that the overall distribution does not relevantly deviate from normal distribution. The question is then, why some papers report median and IQR if data are not far from normal distribution. Author(s) Daniel Marcelino, <[email protected]> References Hozo, Stela P.; et al (2005) Estimating the mean and variance from the median, range, and the size of a sample. BMC Medical Research Methodology, 5:13. Examples # # # # # mean=median and SD = IQR/1.35. median= 3 Iqr= 2-3 median=mean=3 sd=iqr/1.35 2-3 means 2,5/1.35= sd = 1,85 MeanFromRange(low=5, med=8, high=12, n=10) 54 mishkin92 mishkin92 Mishkin’s (1992) Data Description Data from the Frederic S. Mishkin (1992) paper “Is the Fisher Effect for real?”. This dataset contains the following columns: • year Year • mon a numeric vector • inf1mo a numeric vector • inf3mo a numeric vector • tbill1mo a numeric vector • tbill3mo a numeric vector • cpiu a numeric vector • quote a numeric vector Usage data(mishkin92) Format A data.frame object with 8 variables and 491 observations. Source Hayashi, F. (2000). Econometrics. Princeton. New Jersey, USA: Princeton University. http: //fmwww.bc.edu/ec-p/data/Hayashi/ References Mishkin, F. S. (1992) Is the Fisher effect for real?: A reexamination of the relationship between inflation and interest rates. Journal of Monetary Economics, 30(2), 195–215. nerlove63 nerlove63 55 Marc Nerlove’s (1963) data Description Data used by Marc Nerlove (1963) on returns of electricity supply. This dataset contains the following columns: • totcost costs in 1970, MM USD. • output output, billion KwH. • plabor price of labor. • pfuel price of fuel. • pkap price of capital. #’ @source Hayashi, F. (2000). Econometrics. Princeton. New Jersey, USA: Princeton University. http://fmwww.bc.edu/ec-p/data/Hayashi/ Usage data(nerlove63) Format A data.frame object with 5 variables and 145 observations. References Nerlove, M. (1963) Returns to Scale in Electricity Supply. In Measurement in Economics-Studies in Mathematical Economics and Econometrics in Memory of Yehuda Grunfeld, edited by Carl F. Christ. Stanford: Stanford. Normalize Unity-based normalization Description Normalizes as feature scaling min - max, or unity-based normalization typically used to bring the values into the range [0,1]. Other methods are also available, including scoring, centering, and Smithson and Verkuilen (2006) method of dependent variable transformation. Usage Normalize(x, method = "range", ...) 56 Normalize Arguments x is a vector to be normalized. method A string for the method used for normalization. Default is method = "range", which coerces values into [0,1] range. See details for other implemented methods. ... Additional arguements (currently ignored). Details The following methods are available: • "range"Ranging is done by coercing x values into [0,1] range. However, this may be generalized to follow the range of other arbitrary values using a and b: X0 = a + (x − xmin )(b − a) (xmax − xmin ) . • "scale"Scaling is done by dividing (centered) values of x by their standard deviations. • "center"Centering is done by subtracting the means (omitting NAs) of x from its observed value. • "z-score"Scoring is done by dividing the values of x from their root mean square. • "SV"Transform a dependent variable in [0,1] rather than (0, 1) to beta regression as suggested by Smithson and Verkuilen (2006). Value Normalized values in an object of the same class as x. Author(s) Daniel Marcelino, <[email protected]> References Smithson M, Verkuilen J (2006) A Better Lemon Squeezer? Maximum-Likelihood Regression with Beta-Distributed Dependent Variables. Psychological Methods, 11(1), 54-71. See Also scale. Examples x <- sample(10) (y = Normalize(x) ) # equivalently to Outlier 57 (x-min(x))/(max(x)-min(x)) # Smithson and Verkuilen approach (y = Normalize(x, method="SV") ) # look at what happens to the correlation of two independent variables and their "interaction" # With non-centered interactions: a = rnorm(10000,20,2) b = rnorm(10000,10,2) cor(a,b) cor(a,a*b) # With centered interactions: c = a - 20 d = b - 10 cor(c,c*d) Outlier Detect Outliers Description Perform exploratory test to detect outliers. Usage Outlier(x, index = NULL, ...) ## Default S3 method: Outlier(x, index = NULL, ...) Arguments x A numeric object, a vector. index A numeric value to be considered in the computations. ... Parameters which are typically ignored. Details The quantity in min reveals the minimum deviation from the mean, the integer value in the closest indicates the position of that element. The quantity in max is the maximum deviation from the mean, and the farthest integer value indicates the position of that value. Value Returns the minimum and maximum values, respectively preceded by their positions in the vector, matrix or data.frame. 58 paired Author(s) Daniel Marcelino, <[email protected]> References Dixon, W.J. (1950) Analysis of extreme values. Ann. Math. Stat. 21(4), 488–506. See Also Winsorize for reduce the impact of outliers. Examples Outlier(x <- rnorm(20)) #data frame: age <- sample(1:100, 1000, rep=TRUE); Outlier(age) paired Paired Data Description Artificial data for a paired experiment. This dataset contains the following columns: • patient the patient id. • before_X before treatment. • after_Y after treatment. Usage data(paired) Format A data.frame object with 3 variables and 9 observations. Palettes Palettes 59 Palette data for the themes used by package Description Data used by the palettes in the package. Usage Palettes Format A list. party_pal Political Parties Color Palette (Discrete) and Scales Description An N-color discrete palette for political parties. Usage party_pal(palette = "BRA", plot = FALSE, hex = FALSE) Arguments palette the palette name, a character string. plot logical, if TRUE a plot is returned. hex logical, if FALSE, the associated color name (label) is returned. See Also Other color party: scale_color_party Examples library(scales) # Brazil show_col(party_pal("BRA")(20)) # Argentine show_col(party_pal("ARG")(12)) 60 Permutate # US show_col(party_pal("USA")(6)) # Canada show_col(party_pal("CAN")(10)) party_pal("CAN", plot=TRUE, hex=FALSE) Permutate Create k random permutations of a vector Description Creates a k random permutation of a vector. Usage Permutate(x, k) Arguments x a vector to be permutated. k number of permutations to be conducted. Details should be used only for length(input)! » k Author(s) Daniel Marcelino, <[email protected]>. Examples # 5! = 5 x 4 x 3 x 2 x 1 = 120 # row wise permutations Permutate(x=1:5, k=5) Plotting Plotting 61 Plot Customizing Description Several wrapper functions to deal with ggplot2. Usage Previewplot() PreviewTheme() align_title_left() align_title_right() no_title() no_y_gridlines() no_x_gridlines() no_major_x_gridlines() no_major_y_gridlines() no_major_gridlines() no_minor_x_gridlines() no_minor_y_gridlines() no_minor_gridlines() no_gridlines() rotate_axes_text(angle) rotate_x_text(angle) rotate_y_text(angle) no_axes() no_x_axis() 62 Plotting no_y_axis() no_x_line() no_y_line() no_x_text() no_y_text() no_ticks() no_x_ticks() no_y_ticks() no_axes_titles() no_x_title() no_y_title() move_legend(position) legend_bottom() legend_top() legend_left() legend_right() no_legend() no_legend_title() Arguments angle the angle of rotation. position the location of the legend (’top’, ’bottom’, ’right’, ’left’ or ’none’). Author(s) NA Examples Previewplot() PoliticalDiversity 63 PoliticalDiversity Political Diversity Indices Description Computes political diversity indices or fragmentation/concetration measures like the effective number of parties for an electoral unity or across unities. The very intuition of these coefficients is to counting parties while weighting them by their relative political–or electoral strength. Usage PoliticalDiversity(x, index = "laakso/taagepera", margin = 1, base = exp(1)) ## Default S3 method: PoliticalDiversity(x, index = "laakso/taagepera", margin = 1, base = exp(1)) Arguments x A data.frame, a matrix, or a vector containing values for the number of votes or seats each party received. index The type of index desired, one of "laakso/taagepera", "golosov", "herfindahl", "gini", "shannon", "simpson", "invsimpson". margin The margin for which the index is computed. base The logarithm base used in some indices, such as the "shannon" index. Details Very often, political analysts say things like ‘two-party system’ and ‘multi-party system’ to refer to a particular kind of political party system. However, these terms alone does not tell exactly how fragmented–or concentrated a party system actually is. For instance, after the 2010 general election, 22 parties obtained representation in the Lower Chamber in Brazil. Nonetheless, among these 22 parties, nine parties together returned only 28 MPs. Thus, an index to assess the weigh or the Effective Number of Parties is important and helps to go beyond the simple count of parties in a legislative branch. A widely accepted algorithm was proposed by M. Laakso and R. Taagepera: 1 N=P 2 pi , where N denotes the effective number of parties and p_i denotes the ith party’s fraction of the seats. In fact, this formula may be used to compute the vote share for each party. This formula is the reciprocal of a well-known concentration index (the Herfindahl-Hirschman index) used in economics to study the degree to which ownership of firms in an industry is concentrated. Laakso 64 PoliticalDiversity and Taagepera correctly saw that the effective number of parties is simply an instance of the inverse measurement problem to that one. This index makes rough but fairly reliable international comparisons of party systems possible. The Inverse Simpson index, 1 1/λ = PR i=1 p2i = 2D Where λ equals the probability that two types taken at random from the dataset (with replacement) represent the same type. This simply equals true fragmentation of order 2, i.e. the effective number of parties that is obtained when the weighted arithmetic mean is used to quantify average proportional diversity of political parties in the election of interest. Another measure is the Least squares index (lsq), which measures the disproportionality produced by the election. Specifically, by the disparity between the distribution of votes and seats allocation. Recently, Grigorii Golosov proposed a new method for computing the effective number of parties in which both larger and smaller parties are not attributed unrealistic scores as those resulted by using the Laakso/Taagepera index.I will call this as (Golosov) and is given by the following formula: N= n X i=1 pi pi + p2i − p2i Author(s) Daniel Marcelino, <[email protected]>. References Gallagher, Michael and Paul Mitchell (2005) The Politics of Electoral Systems. Oxford University Press. Golosov, Grigorii (2010) The Effective Number of Parties: A New Approach, Party Politics, 16: 171-192. Laakso, Markku and Rein Taagepera (1979) Effective Number of Parties: A Measure with Application to West Europe, Comparative Political Studies, 12: 3-27. Nicolau, Jairo (2008) Sistemas Eleitorais. Rio de Janeiro, FGV. Taagepera, Rein and Matthew S. Shugart (1989) Seats and Votes: The Effects and Determinants of Electoral Systems. New Haven: Yale University Press. Examples # Here are some examples, help yourself: # The wikipedia examples A <- c(.75,.25); B <- c(.75,.10,rep(0.01,15)) C <- c(.55,.45); # The index by "laakso/taagepera" is the default PoliticalDiversity(A) PoliticalDiversity(B) pollster2008 65 # Using the method proposed by Golosov gives: PoliticalDiversity(B, index="golosov") PoliticalDiversity(C, index="golosov") # The 1980 presidential election in the US (vote share): US1980 <- c("Democratic"=0.410, "Republican"=0.507, "Independent"=0.066, "Libertarian"=0.011, "Citizens"=0.003, "Others"=0.003) PoliticalDiversity(US1980) # 2010 Brazilian legislative election votes_2010 = c("PT"=13813587, "PMDB"=11692384, "PSDB"=9421347, "DEM"=6932420, "PR"=7050274, "PP"=5987670, "PSB"=6553345, "PDT"=4478736, "PTB"=3808646, "PSC"=2981714, "PV"=2886633, "PC do B"=2545279, "PPS"=2376475, "PRB"=1659973, "PMN"=1026220, "PT do B"=605768, "PSOL"=968475, "PHS"=719611, "PRTB"=283047, "PRP"=232530, "PSL"=457490,"PTC"=563145) seats_2010 = c("PT"=88, "PMDB"=79, "PSDB"=53, "DEM"=43, "PR"=41, "PP"=41, "PSB"=34, "PDT"=28, "PTB"=21, "PSC"=17, "PV"=15, "PC do B"=15, "PPS"=12, "PRB"=8, "PMN"=4, "PT do B"=3, "PSOL"=3, "PHS"=2, "PRTB"=2, "PRP"=2, "PSL"=1,"PTC"=1) PoliticalDiversity(seats_2010) PoliticalDiversity(seats_2010, index= "golosov") pollster2008 Polls for 2008 U.S. presidential election Description Polls for the 2008 U.S. presidential election. The data includes all presidential polls reported on the internet site http://www.elections.huffingtonpost.com/pollster that were taken between August 29th, when John Mc-Cain announced that Sarah Palin would be his running mate as the Republican nominee for vice president, and the end of September. Usage data(pollster2008) Format A data.frame object with 11 variables and 102 observations. • PollTaker Polling organization. • PollDates Dates the poll data were collected. 66 Proportionality • MidDate Midpoint of the polling period. • Days Number of days after August 28th (end of Democratic convention). • n Sample size for the poll. • Pop A=all, LV=likely voters, RV=registered voters. • McCain Percent supporting John McCain. • Obama Percent supporting Barak Obama. • Margin Obama percent minus McCain percent. • Charlie Indicator for polls after Charlie Gibson interview with VP candidate Sarah Palin (9/11). • Meltdown Indicator for polls after Lehman Brothers bankruptcy (9/15). @source http://www.elections.huffingtonpost.com/pollster Proportionality Indexes of (Disproportionality) Proportionality Description Calculates several indexes of (dis)-proportionality, which are for the most part used to show the relationship of votes to seats. Usage Proportionality(v, s, index = "Gallagher", ...) ## Default S3 method: Proportionality(v, s, index = "Gallagher", margin = 1, ...) Arguments v a numeric vector with the percentage share of votes obtained by each party. s a numeric vector with the percentage share of seats obtained by each party. index the desired method or type of index, see details below for the correct name. margin The margin for which the index is computed. ... additional arguments (currently ignored) Proportionality 67 Details The following measures are available: • • • • • • • • • • • • • • • • • Loosemore-Hanby (Percent) Loosemore-Hanby Index of disproportionality Rae (Percent) Rae Index of disproportionality Cox-Shugart Cox-Shugart Index of proportionality Inv.Cox-Shugart The inverted Cox-Shugart index Farina Farina index of proportionality, aka cosine proportionality score "Gallagher" (Percent) Gallagher index of disproportionality "Inv.Gallagher" The inverse of Gallagher index "Grofman" Grofman index of proportionality "Inv.Grofman" Grofman index of proportionality "Lijphart" Lijphart index of proportionality "Inv.Rae" The inverse of Rae index "Rose" Rose index of disproportionality "Inv.Rose" The inverse of Rose index "Sainte-Lague" Sainte-Lague index of disproportionality "DHondt" D’Hondt index of proportionality "Gini" Gini index of disproportionality "Monroe" Monroe index of inequity Value Each iteration returns a single score. Author(s) Daniel Marcelino <[email protected]> References Duncan, O. and Duncan, B. (1955) A methodological analysis of segregation indexes. American Sociological Review 20:210-7. Gallagher, M. (1991) Proportionality, disproportionality and electoral systems. Electoral Studies 10(1):33-51. Loosemore, J. and Hanby, V. (1971) The theoretical limits of maximum distortion: Som analytical expressions for electoral systems. British Journal of Political Science 1:467-77. Koppel, M., and A. Diskin. (2009) Measuring disproportionality, volatility and malapportionment: axiomatization and solutions. Social Choice and Welfare 33, no. 2: 281-286. Rae, D. (1967) The Political Consequences of Electoral Laws. London: Yale University Press. Rose, Richard, Neil Munro and Tom Mackie (1998) Elections in Central and Eastern Europe Since 1990. Glasgow: Centre for the Study of Public Policy, University of Strathclyde. Taagepera, R., and B. Grofman. Mapping the indices of seats-votes disproportionality and interelection volatility. Party Politics 9, no. 6 (2003): 659-77. 68 pub_pal See Also PoliticalDiversity, LargestRemainders, HighestAverages. For more details, see the Indices vignette: vignette("Indices", package = "SciencesPo"). Examples #' # 2012 Queensland state elecion pvotes= c(49.65, 26.66, 11.5, 7.53, 3.16, 1.47) pseats = c(87.64, 7.87, 2.25, 0.00, 2.25, 0.00) Proportionality(pvotes, pseats) # default is Gallagher Proportionality(pvotes, pseats, index="Rae") # Proportionality(pvotes, pseats, index="Cox-Shugart") # 2012 Quebec provincial election: pvotes = c(PQ=31.95, Lib=31.20, CAQ=27.05, QS=6.03, Option=1.89, Other=1.88) pseats = c(PQ=54, Lib=50, CAQ=19, QS=2, Option=0, Other=0) Proportionality(pvotes, pseats, index="Rae") pub_pal Color Palettes for Publication (discrete) Description Color palettes for publication-quality graphs. See details. Usage pub_pal(palette = "pub12") Arguments palette the palette name, a character string. Details The following palettes are available: • "scipo"A 12-color colorblind discrete palette. • "gray9"5-tons of gray. • "carnival"A 5-color palette inspired in the Brazilian samba schools. • "tableau20"Based on software Tableau • "tableau10"Based on software Tableau pub_shapes 69 • "tableau10light"Based on software Tableau • "tableau10medium"Based on software Tableau • "fte"fivethirtyeight.com color scales Examples library(scales) library(ggplot2) show_col(pub_pal("scipo")(12)) show_col(pub_pal("gray9")(9), labels = FALSE) show_col(pub_pal("carnival")(4)) show_col(pub_pal("gdocs")(18)) show_col(pub_pal("manyeyes")(20)) show_col(pub_pal("tableau10")(10)) show_col(pub_pal("tableau10medium")(10)) show_col(pub_pal("tableau10light")(10)) show_col(pub_pal("cyclic")(20)) show_col(pub_pal("bivariate1")(9)) pub_shapes Shape scales for theme_pub (discrete) Description Discrete shape scales for theme_pub(). Usage pub_shapes(palette = "pub") Arguments palette the palette name, a character string. See Also Other shape pub: scale_shape_pub Examples library(scales) pub_shapes("default")(6) pub_shapes("proportions")(4) pub_shapes("gender")(2) 70 rdirichlet rdirichlet Dirichlet Distribution Description Density function and random number generation for the Dirichlet distribution Usage rdirichlet(n, alpha) Arguments n number of random observations to draw. alpha the Dirichlet distribution’s parameters. Can be a vector (one set of parameters for all observations) or a matrix (a different set of parameters for each observation), see “Details”. If alpha is a matrix, a complete set of α-parameters must be supplied for each observation. log returns the logarithm of the densities (therefore the log-likelihood) and sum.up returns the product or sum and thereby the likelihood or log-likelihood. Value The rdirichlet returns a matrix with n rows, each containing a single random number according to the supplied alpha vector or matrix. Author(s) Daniel Marcelino, <[email protected]> Examples # 1) General usage: rdirichlet(20, c(1,1,1) ); alphas <- cbind(1:10, 5, 10:1); alphas; rdirichlet(10, alphas ); alpha.0 <- sum( alphas ); test <- rdirichlet(10, alphas ); apply( test, 2, mean ); alphas / alpha.0; apply( test, 2, var ); alphas * ( alpha.0 - alphas ) / ( alpha.0^2 * ( alpha.0 + 1 ) ); # # # # 2) A practical example of usage: A Brazilian face-to-face poll by Datafolha conducted on Oct 03-04 with 18,116 interviews asking for their vote preferences among the presidential candidates. read.zTree 71 ## First, draw a sample from the posterior set.seed(1234); n <- 18116; poll <- c(40,24,22,5,5,4) / 100 * n; # The data mcmc <- 100000; sim <- rdirichlet(mcmc, alpha = poll + 1); ## Second, look at the margins of Aecio over Marina in the very last moment of the campaign: margin <- sim[,2] - sim[,3]; mn <- mean(margin); # Bayes estimate mn; s <- sd(margin); # posterior standard deviation qnts <- quantile(margin, probs = c(0.025, 0.975)); # 90% credible interval qnts; pr <- mean(margin > 0); # posterior probability of a positive margin pr; ## Third, plot the posterior density hist(margin, prob = TRUE, # posterior distribution breaks = "FD", xlab = expression(p[2] - p[3]), main = expression(paste(bold("Posterior distribution of "), p[2] - p[3]))); abline(v=mn, col='red', lwd=3, lty=3); read.zTree Reads zTree output files Description Extracts variables from a zTree output file. Usage read.zTree(object, tables = c("globals", "subjects")) Arguments object a zTree file or a list of files. tables the tables of intrest. Value A list of dataframes, one for each table 72 Recode Examples ## Not run: url <zTables <- read.zTree( "131126_0009.xls" , "contracts" ) zTables <- read.zTree( c("131126_0009.xls", "131126_0010.xls"), c("globals","subjects", "contracts" )) ## End(Not run) Recode Recode or Replace Values With New Values Description Recodes a value or a vector of values. Usage Recode(x, old, into, warn = TRUE) ## Default S3 method: Recode(x, old, into, warn = TRUE) Arguments x The vector whose values will be recoded. old the old value or a vector of old values (numeric/factor/string). into the new value or a vector of replacement values (1 to 1 recoding). warn A logical to print a message if any of the old values are not actually present in x. Examples x <- LETTERS[1:5] Recode(x, c("B", "D"), c("Beta", "Delta")) # On numeric vectors x <- c(1, 4, 5, 9) Recode(x, old = c(1, 4, 5, 9), into = c(10, 40, 50, 90)) religiosity 73 religiosity Data on religiosity of countries Description Data from the 2007 Spring Survey conducted through the Pew Global Attitudes Project. Usage data(religiosity) Format A data.frame object with 9 variables and 44 observations. • • • • • • • • • Country Name of country. Religiosity A measure of degree of religiosity for residents of the country. GDP Per capita Gross Domestic Product in the country. Africa Indicator for countries in Africa. EastEurope Indicator for countries in Eastern Europe. MiddleEast Indicator for countries in the Middle East. Asia Indicator for countries in Asia. WestEurope Indicator for countries in Western Europe. Americas Indicator for countries in North/South America. @source The Pew Research Center’s Global Attitudes Project surveyed people around the world and asked (among many other questions) whether they agreed that "belief in God is necessary for morality," whether religion is very important in their lives, and whether the at least once per day. The variable Religiosity is the sum of the percentage of positive responses on these three items, measured in each of 44 countries. The dataset also includes the per capita GDP for each country and indicator variables that record the part of the world the country is in. http://www.pewglobal.org Render Render layer on top of a ggplot Description Set up a rendering layer on top of a ggplot. Usage Render(plot = NULL) Arguments plot the plot to use as a starting point, either a ggplot2 or gtable. 74 Rosenbluth Rosenbluth Rosenbluth Index of Concentration Description Calculates the Rosenbluth index of concentration, also known as Hall or Tiedemann Indices. Usage Rosenbluth(x, n = rep(1, length(x)), na.rm = FALSE, ...) ## Default S3 method: Rosenbluth(x, n = rep(1, length(x)), na.rm = FALSE, ...) Arguments x A vector of data values of non-negative elements. n A vector of frequencies of the same length as x. na.rm A logical. Should missing values be removed? The Default is set to na.rm=FALSE. ... Additional arguements (currently ignored) References Cowell, F. A. (2000) Measurement of Inequality in Atkinson, A. B. / Bourguignon, F. (Eds): Handbook of Income Distribution. Amsterdam. Cowell, F. A. (1995) Measuring Inequality Harvester Wheatshef: Prentice Hall. See Also Atkinson, Herfindahl, Gini, Lorenz. For more details see the Indices vignette: vignette("Indices", package = "Scien Examples # generate a vector (of incomes) x <- c(778, 815, 857, 888, 925, 930, 965, 990, 1012) # compute Rosenbluth coefficient Rosenbluth(x) RunShinyApp 75 RunShinyApp Examples from the SciencesPo Package Description Launch a Shiny app that shows a demo. Usage RunShinyApp(app) Arguments app the name of the shiny application. Examples # A demo of what \code{\link[SciencesPo]{PoliticalDiversity}} does. if (interactive()) { RunShinyApp("PoliticalDiversity") } # Other examples if (interactive()) { RunShinyApp("ConjugateNormal") } if (interactive()) { RunShinyApp("Regression") } scale_color_party Political Parties Color Palette (Discrete) and Scales Description An N-color discrete palette for political parties. Usage scale_color_party(palette = "BRA", ...) Arguments palette the palette name, a character string. ... Other arguments passed on to discrete_scale to control name, limits, breaks, labels and so forth. 76 scale_fill_party See Also party_pal for details and references. Other color party: party_pal scale_color_pub Publication color scales. Description See pub_pal for details. Usage scale_color_pub(palette = "pub12", ...) Arguments palette ... the palette name, a character string. Other arguments passed on to discrete_scale to control name, limits, breaks, labels and so forth. See Also pub_pal for references. Other color publication: scale_fill_pub scale_fill_party Political Parties Color Palette (Discrete) and Scales Description An N-color discrete palette for political parties. Usage scale_fill_party(palette = "BRA", ...) Arguments palette ... the palette name, a character string. Other arguments passed on to discrete_scale to control name, limits, breaks, labels and so forth. See Also party_pal for details and references. scale_fill_pub 77 scale_fill_pub Publication color scales. Description See pub_pal for details. Usage scale_fill_pub(palette = "pub12", ...) Arguments palette the palette name, a character string. ... Other arguments passed on to discrete_scale to control name, limits, breaks, labels and so forth. See Also Other color publication: scale_color_pub scale_shape_pub Shape scales for theme_pub (discrete) Description See pub_shapes for details. Usage scale_shape_pub(palette = "pub", ...) Arguments palette the palette name, a character string. ... common discrete scale parameters: name, breaks, labels, na.value, limits and guide. See discrete_scale for more details See Also Other shape pub: pub_shapes 78 Scatterplot Examples library("ggplot2") library("scales") p <- ggplot(mtcars) + geom_point(aes(x = wt, y = mpg, shape = factor(gear))) + facet_wrap(~am) p + scale_shape_pub() Scatterplot Quartile-Frame Scatterplot Description Plots a scatterplot with a custom axis that acts as a quartile plot (quartile-frame). Inspired by The Visual Display of Quantitative Information by Edward R. Tufte. Usage Scatterplot(x, y, ...) Arguments x, y aesthetics passed into each layer variables # @param data data frame to use (optional). ... additional arguments. Examples with(mtcars, Scatterplot(x = wt, y = mpg, main = "Vehicle Weight-Gas Mileage Relationship", xlab = "Vehicle Weight", ylab = "Miles per Gallon", font.family = "serif") ) SciencesPo SciencesPo 79 A Tool Set for Analyzing Political Behavior Data Description Provide functions for analyzing elections and political behavior data, including measures of political fragmentation, seat apportionment, and small data visualization graphs. Author(s) NA References Marcelino, Daniel (2013). SciencesPo: A Tool Set for Analyzing Political Behaviour Data. Available at SSRN: http://dx.doi.org/10.2139/ssrn.2320547 SciencesPo-deprecated Deprecated functions in package ‘SciencesPo’ Description These functions are provided for compatibility with older versions of ‘SciencesPo’ only, and will be defunct at the next release. Usage svTransform(y) politicalDiversity(x, index = "laakso/taagepera", margin = 1, base = exp(1)) untable(x, ...) detail(x, ...) dummy(x, ...) voronoi(x, ...) normalize(x, ...) outliers(x, ...) destring(x, ...) winsorize(x, ...) 80 SciencesPoFont Arguments y x index margin base ... the dependent variable. A data.frame, a matrix-like, or a vector containing values for the number of votes or seats each party received. The type of index desired, one of "laakso/taagepera", "golosov", "herfindahl", "gini", "shannon", "simpson", "invsimpson". The margin for which the index is computed. The logarithm base used in some indices, such as the "shannon" index. Extra parameters. Details The following functions are deprecated and will be made defunct; use the replacement indicated below: • • • • • • • • • • • • • • • svTransform: Normalize untable: Untable politicalDiversity: PoliticalDiversity winsorize: Winsorize recode: Recode dummy: Dummify gini: Gini clear: Clear detail: Destring destring: Destring outliers: Outlier voronoi: Voronoi jackknife: Jackknife bootstrap: Bootstrap normalize: Normalize SciencesPoFont SciencesPo Base Fonts Description Used to ascertain required theme fonts. Usage SciencesPoFont() Author(s) NA sheston91 sheston91 81 The Penn World Table Description The Penn World Table used in Summers and Heston (1991). This dataset contains the following columns: • year Year • pop Population (thousands) • rgdppc Real per capita GDP • savrat a numeric vector • country Country • com Communist regime • opec OPEC country • name Country name Usage data(sheston91) Format A data.frame object with 8 variables and 3250 observations. Source Hayashi, F. (2000). Econometrics. Princeton. New Jersey, USA: Princeton University. http: //fmwww.bc.edu/ec-p/data/Hayashi/ References Summers, R. and Heston, A. (1991) The Penn World Table (Mark 5): an expanded set of international comparisons, 1950–1988. The Quarterly Journal of Economics, 106(2), 327–368. 82 Skewness Skewness Compute the Skewness Description Provides three methods for performing skewness test. Usage Skewness(x, na.rm = TRUE, type = 3) Arguments x a numeric vector containing the values whose skewness is to be computed. na.rm a logical value for na.rm, default is na.rm=TRUE. type an integer between 1 and 3 for selecting the algorithms for computing the skewness, see details below. Details Skewness is a measure of symmetry distribution. Negative skewness (g_1 < 0) indicates that the mean of the data distribution is less than the median, and the data distribution is left-skewed. Positive skewness (g_1 > 0) indicates that the mean of the data values is larger than the median, and the data distribution is right-skewed. Values of g_1 near zero indicate a symmetric distribution. Value An object of the same type as x Note There are several methods to compute skewness, Joanes and Gill (1998) discuss three of the most traditional methods. According to them, type 3 performs better in non-normal population distribution, whereas in normal-like population distribution type 2 fits better the data. Such difference between the two formulae tend to disappear in large samples. Type 1: g_1 = m_3/m_2^(3/2). Type 2: G_1 = g_1*sqrt(n(n-1))/(n-2). Type 3: b_1 = m_3/s^3 = g_1 ((n-1)/n)^(3/2). Author(s) Daniel Marcelino, <[email protected]> References Joanes, D. N. and C. A. Gill. (1998) Comparing measures of sample skewness and kurtosis. The Statistician, 47, 183–189. stature 83 Examples w <-sample(4,10, TRUE) x <- sample(10, 1000, replace=TRUE, prob=w) Skewness(x, type = 1) Skewness(x, type = 2) Skewness(x) stature The Measure of US Presidents Description The US presidents and their main opponents’ heights (cm). This dataset contains the following columns: • election The election year. • winner The winner candidate. • winner.height The winner candidate’s height in centimeters (cm). • winner.vote The popular vote support for the winner. • winner.party The winner’s party. • opponent The main opponent candidate. • opponent.height The opponent candidate’s height in centimeters (cm). • opponent.vote Popular vote support for the main opponent candidate. • opponent.party The opponent’s party. • turnout The electorate turnout in percentages. • winner.bmi The winner Body Mass Index (BMI) estimate (BMI = weight in kg/(height in meter)**2) @references Murray, G. R. (2014) Evolutionary preferences for physical formidability in leaders. Politics and the Life Science, 33(1), 33-53. Usage data(stature) # t.test(stature$winner.height, stature$opponent.height, var.equal=TRUE) Format A data.frame object with 11 variables and 57 observations. Source US Presidents: http://www.lingerandlook.com/Names/Presidents.php Inside Gov. http: //www.us-presidents.insidegov.com. 84 Stratify Stratify Stratified Sampling Description Sample row values of a data frame conditional to some strata attributes. Usage Stratify(.data, group, size, select = NULL, replace = FALSE, both.sets = FALSE) Arguments .data the data frame. group the grouping factor, may be a list. size the sample size. select if sampling from a specific group or list of groups. replace should sampling be with replacement? both.sets if TRUE, both ‘sample‘ and ‘.data‘ are returned. Examples data(pollster2008) # Let's take a 10% sample from all -PollTaker- groups in pollster2008 Stratify(pollster2008, "PollTaker", 0.1) # Let's take a 10% sample from only 'LV' and 'RV' groups from -Pop- in pollster2008 Stratify(pollster2008, "Pop", 0.1, select = list(Pop = c("LV", "RV"))) # Let's take 3 samples from all -PollTaker- groups in pollster2008, # specified by column 1 Stratify(pollster2008, group = 1, size = 3) # Let's take a sample from all -Pop- groups in pollster2008, where we # specify the number wanted from each group Stratify(pollster2008, "Pop", size = c(3, 5, 4)) # Use a two-column strata (-Pop- and -PollTaker-) but only interested in # cases where -Pop- == 'LV' Stratify(pollster2008, c("Pop", "PollTaker"), 0.15, select = list(Pop = "LV")) summary.Crosstable 85 summary.Crosstable Print Crosstable style Description Print Crosstable style Usage ## S3 method for class 'Crosstable' summary(object, digits = 2, chisq = FALSE, fisher = FALSE, mcnemar = FALSE, alt = "two.sided", latex = FALSE, ...) Arguments object The table object. digits number of digits after the decimal point. chisq if TRUE, the results of a chi-square test will be shown below the table. fisher if TRUE, the results of a Fisher Exact test will be shown below the table. mcnemar if TRUE, the results of a McNemar test will be shown below the table. alt the alternative hypothesis and must be one of "two.sided", "greater" or "less". Only used in the 2 by 2 case. latex if the output is to be printed in latex format. ... extra parameters. See Also Other Crosstables: Crosstable swatson93 Stock’s and Watson’s (1993) Data. Description Data set used by Stock and Watson (1993) to estimate co-integration. This dataset contains the following columns: • lnm1 Log M1. • lnp Log NNP price deflator. • lnnnp Log NNP. • cprate A numeric vector. • year Commercial paper rate. 86 Ternaryplot Usage data(swatson93) Format A data.frame object with 5 variables and 90 observations. Source Hayashi, F. (2000). Econometrics. Princeton. New Jersey, USA: Princeton University. http: //fmwww.bc.edu/ec-p/data/Hayashi/ References Stock, J. H., and Watson, M. W. (1993) A simple estimator of cointegrating vectors in higher order integrated systems. Econometrica: Journal of the Econometric Society, 783–820. Tableplot Tableplot: a Visualization of Datasets Description Visualization of Large Datasets. Usage Tableplot(.data = NULL) Arguments .data Ternaryplot the data frame. Ternary Plot Build ternary plots. Description Ternary Plot Build ternary plots. Usage Ternaryplot(data, ...) theme_blank 87 Arguments data Default dataset to use for plot. If not already a data.frame, will be converted to one by fortify. If not specified, must be suppled in each layer added to the plot. ... Other arguments passed on to methods. Not currently used. theme_blank Create a Completely Empty Theme Description The theme created by this function shows nothing but the plot panel. Usage theme_blank(base_size = 12, base_family = "serif", legend = "none") Arguments base_size overall font size. Default is 12. base_family a name for default font family. legend the legend position. Value The theme. Author(s) NA Examples # plot with small amount of remaining padding qplot(1:10, (1:10)^2) + theme_blank() # remaining padding removed qplot(1:10, (1:10)^2) + theme_blank() + labs(x = NULL, y = NULL) # Check that it is a complete theme attr(theme_blank(), "complete") 88 theme_fte theme_darkside The Dark Side Theme Description The dark side of the things. Usage theme_darkside(base_size = 12, base_family = "serif", legend = "none") Arguments base_size overall font size. Default is 12. base_family the name for default font family. legend the position of the legend if any. Value The theme. # plot with small amount of padding qplot(1:10, (1:10)^2, color="green") + theme_darkside() # Check that it is a complete theme attr(theme_darkside(), "complete") Author(s) NA theme_fte Themes for ggplot2 Graphs Description Theme for plotting with ggplot2. Usage theme_fte(legend = "none", legend_title = FALSE, base_size = 12, horizontal = TRUE, base_family = "", colors = c("#F0F0F0", "#D9D9D9", "#60636A", "#525252")) theme_pub 89 Arguments legend enables to set legend position, default is "none". legend_title will the legend have a title? Defaults is FALSE. base_size overall font size. Default is 13. horizontal logical. Horizontal axis lines? base_family a nmae for default font family. colors default colors used in the plot in the following order: background, lines, text, and title. Value The theme. Author(s) NA Examples qplot(1:10, (1:10)^3) + theme_fte() mycolors = c("wheat", "#C2AF8D", qplot(1:10, (1:10)^3) + theme_fte(colors=mycolors) "#8F6D2F", "darkred") # Check that it is a complete theme attr(theme_fte(), "complete") theme_pub The Default Theme Description After loading the SciencesPo package, this theme will be set to default for all subsequent graphs made with ggplot2. Usage theme_pub(legend = "bottom", base_size = 12, base_family = "", horizontal = FALSE, axis_line = FALSE) 90 theme_scipo Arguments legend enables to set legend position, default is "bottom". base_size overall font size. Default is 14. base_family a name for default font family. horizontal logical. Horizontal axis lines? axis_line enables to set x and y axes. Value The theme. Author(s) NA See Also theme, theme_538, theme_blank. Examples Previewplot() + theme_pub() # Anscombe data dat <- data.frame() for(i in 1:4) dat <- rbind(dat, data.frame(set=i, x=anscombe[,i], y=anscombe[,i+4])) gg <- ggplot(dat, aes(x, y)) gg <- gg + geom_point(size=5, color="red", fill="orange", shape=21) gg <- gg + geom_smooth(method="lm", fill=NA, fullrange=TRUE) gg <- gg + facet_wrap(~set, ncol=2) gg <- gg + theme_pub(base_family=SciencesPoFont()) gg <- gg + theme(plot.background=element_rect(fill="#f7f7f7")) gg <- gg + theme(panel.background=element_rect(fill="#f7f7f7")) theme_scipo The SciPo Theme Description The scpo theme for using with ggplot2 objects. theme_scipo 91 Usage theme_scipo(base_family = "", base_size = 11, strip_text_family = base_family, strip_text_size = 12, title_family = "", title_size = 18, title_margin = 10, margins = margin(10, 10, 10, 10), subtitle_family = "", subtitle_size = 12, subtitle_margin = 15, caption_family = "", caption_size = 9, caption_margin = 10, axis_title_family = subtitle_family, axis_title_size = 9, axis_title_just = "rt", grid = TRUE, axis = FALSE, ticks = FALSE) Arguments base_family base font family base_size base font size strip_text_family facet label font family strip_text_size facet label text size title_family plot tilte family title_size plot title font size title_margin plot title margin margins plot margins subtitle_family plot subtitle family subtitle_size plot subtitle size subtitle_margin plot subtitle margin caption_family plot caption family caption_size plot caption size caption_margin plot caption margin axis_title_family axis title font family axis_title_size axis title font size axis_title_just axis title font justification blmcrt grid panel grid (TRUE, FALSE, or a combination of X, x, Y, y) axis axis TRUE, FALSE, [xy] ticks ticks Value The theme. 92 titanic Examples # plot with small amount of padding # qplot(1:10, (1:10)^2, color="green") + theme_scipo() # Check that it is a complete theme # attr(theme_scipo(), "complete") titanic Titanic Description Population at Risk and Death Rates for an Unusual Episode. For each person on board the fatal maiden voyage of the ocean liner Titanic, this dataset records sex, age [adult/child], economic status [first/second/third class, or crew] and whether or not that person survived. This dataset contains the following columns: • • • • CLASS Class (0 = crew, 1 = first, 2 = second, 3 = third) AGE Age (1 = adult, 0 = child) SEX Sex (1 = male, 0 = female) SURVIVED Survived (1 = yes, 0 = no) Usage data(titanic) Format A data.frame object with 4 variables and 2201 observations. Note There is not complete agreement among primary sources as to the exact numbers on board, rescued, or lost. STORY BEHIND THE DATA: The sinking of the Titanic is a famous event, and new books are still being published about it. Many well-known facts–from the proportions of first-class passengers to the "women and children first" policy, and the fact that that policy was not entirely successful in saving the women and children in the third class–are reflected in the survival rates for various classes of passenger. These data were originally collected by the British Board of Trade in their investigation of the sinking. Source British Board of Trade Inquiry Report (1990). Report on the Loss of the ‘Titanic’ (S.S.)", Gloucester, UK: Allan Sutton Publishing. References Dawson (1995). "The ‘Unusual Episode’ Data Revisited" in the Journal of Statistics Education. Today 93 Today Print the current date in a pretty format Description Print the current date in a pretty format. Usage Today() Value string Examples Today() turnout Turnout Data Description Data on voter turnout in the 50 states and D.C. for the 1992 Presidential election and 1990 Congressional elections. Per capita income, populations in poverty and populations with no high school degree are also given. This dataset contains the following columns: • v1 state name (alphabetic, 20 characters) • v2 region of country: 1 Northeast 2 Midwest 3 South 4 West • v3 division within region: 1 Northeast-New England 2 Northeast-Middle Atlantic 3 MidwestEast North Central 4 Midwest-West North Central 5 South-South Atlantic 6 South-East South Central 7 South-West South Central 8 West-Mountain 9 West-Pacific. • v4 "Elazar’s state political culture assignments: 1 = moralistic 2 = individualistic 3 = traditionalistic • v5 percent of population below poverty level, 1992. • v6 per capita personal income, 1993. • v7 percent casting votes for presidential electors, 1992. • v8 percent casting votes for U.S. Representatives, 1990. • v9 population without a high school degree, of those 25 years or older, 1990. • v10 population 25 years or older, 1990. • v11 South = 1; all others = 0. 94 twins Usage data(turnout) Format A data.frame object with 11 variables and 50 observations. Source U.S. Bureau of the Census, Statistical Abstract of the United States, 1994. twins Burt’s Twin Data Description The given data are IQ scores from identical twins; one raised in a foster home, and the other raised by birth parents. This dataset contains the following columns: • C Social class, C1=high, C2=medium, C3=low, a factor. • IQb biological. • IQf foster. @references Weisberg, S. (2014). Applied Linear Regression, 4th edition. Hoboken NJ: Wiley. Usage data(twins) Format A data.frame object with 3 variables and 27 observations. Source Burt, C. (1966). The genetic estimation of differences in intelligence: A study of monozygotic twins reared together and apart. Br. J. Psych., 57, 147-153. TwoWay TwoWay 95 Cross Tabulation Description TwoWay is a modified version of CrossTable (gmodels). TwoWay will print a summary table with cell counts and column proportions (similar to STATA’s tabulate. Usage TwoWay(x, y, digits = 3) Arguments x, y the variables for the cross tabulation. digits number of digits for rounding proportions. unions Poll attitudes towards British trade unions Description The British polling company Ipsos MORI conducted several opinion polls in the UK between 1975 and 1995 in which they asked whether people agree or disagree with the statement "Trade unions have too much power in Britain today". Usage data(unions) Format A data.frame object with 7 variables and 17 observations. • Date Month of the poll Aug-77 to Sep-79. • AgreePct Percent who agree (unions have too much power). • DisagreePct Percent who disagree. • NetSupport DisagreePct-AgreePct. • Months Months since August 1975. • Late 1=after 1986 or 0=before 1986. • Unemployment Unemployment rate. @source http://www.ipsos-mori.com/researchpublications/researcharchive/poll.aspx? oItemID=94 96 Untable Untable Untable an Aggregated Data Frame Description A method for recovering a data.frame out of summarized data, i.e. contingency table or aggregated data. Usage Untable(x, ...) ## S3 method for class 'data.frame' Untable(x, freq = "Freq", row.names = NULL, ...) ## Default S3 method: Untable(x, dimnames = NULL, type = NULL, row.names = NULL, col.names = NULL, ...) Arguments x the table object as a data.frame, table, or, matrix. freq the column name of count values. row.names row names to add to data.frame if any. dimnames set the dimnames of object if required. type the type of variable. If NULL, ordered factor is returned. col.names column names to add to the data.frame. ... Extra parameters ignored. See Also expand.grid, gl. Examples gss <- data.frame( expand.grid(sex=c("female", "male"), party=c("dem", "indep", "rep")), count=c(279,165,73,47,225,191)) print(gss) # aggregated data.frame # Then expand it: GSS <- Untable(gss, freq="count") head(GSS) Voronoi 97 # Expand from a table or xtable object: # Fisher's Tea-Tasting Experiment data tea <- table(poured=c("Yes", "Yes", "Yes", "No","Yes","No", "No", "No"), guess=c("Yes", "Yes", "Yes", "Yes", "No", "No","No","No")) Untable(tea) # Expand with a vector of weights Untable(c(3,3,3), dimnames=list(c("Brazil","Colombia","Argentina"))) Voronoi Voronoi tessellation diagram Description Computes voronoi tessellation diagrams, and Dirichlet tessellation (after Peter Gustav Lejeune Dirichlet). Usage Voronoi(n = 100, d, dim = 1000, method = NULL, seed = 51, plot = TRUE) Arguments n an integer for a finite set of points. d an integer for a finite set of dimensions. dim the image dimension. method the distance computation method. One of euclidean, manhattan, maximum, canberra (currently not implemented). seed an integer for random seed. plot logical. If TRUE, a plot is returned, else, a data.frame is returned. Details https://en.wikipedia.org/wiki/Voronoi_diagram Author(s) Daniel Marcelino, <[email protected]> Examples ## Not run: Voronoi(n=20, d=5, dim=1000) 98 Waffleplot vote Voting correlations Description A dataset with voting correlations of traits, where each trait is measured for 17 developed countries (Europe, US, Japan, Australia, New Zealand). Usage data(vote) Format A 12x12 matrix object with 12 variables and 12 observations. Source Torben Iversen and David Soskice (2006). Electoral institutions and the politics of coalitions: Why some democracies redistribute more than others. American Political Science Review, 100, 165-81. Table A2. References Using Graphs Instead of Tables. http://www.tables2graphs.com/doku.php?id=03_descriptive_ statistics Waffleplot Make waffle (square pie) charts Description Given a named vector, this function will return a ggplot object that represents a waffle chart of the values. The individual values will be summed up and each that will be the total number of squares in the grid. Usage Waffleplot(parts, rows = 10, xlab = NULL, title = NULL, colors = NA, size = 2, legend_pos = "right", flip = FALSE, reverse = FALSE, equal = TRUE, pad = 0, use_glyph = FALSE, glyph_size = 12) WeightedCorrelation 99 Arguments parts named vector of values to use for the chart rows number of rows of blocks xlab text for below the chart. Highly suggested this be used to give the "1 sq == xyz" relationship if it’s not obvious title chart title colors exactly the number of colors as values in parts. If omitted, Color Brewer "Set2" colors are used. size width of the separator between blocks (defaults to 2) legend_pos position of legend flip flips x & y axes reverse reverses the order of the data equal by default, waffle uses coord_equal; this can cause layout problems, so you an use this to disable it if you are using ggsave or knitr to control output sizes (or manually sizing the chart) pad how many blocks to right-pad the grid with use_glyph use specified FontAwesome glyph glyph_size size of the FontAwesome font Note If the vector is not named or only partially named, capital letters will be used instead. If you specify a string (vs FALSE) to use_glyph the function will map the input to a FontAwesome glyph name and use that glyph for the tile instead of a block (making it more like an isotype pictogram than a waffle chart). You’ll need to actually install FontAwesome and use the extrafont package (https://github.com/wch/extrafont) to be able to use the FontAwesome glyphs. Examples tiles <- c(One=80, Two=30, Three=20, Four=10) Waffleplot(tiles, rows=8) Senate <- c(`Male (44%)`=44, `Female (56%)`=56) Waffleplot(Senate, rows=10, size=0.5, colors=c("#af9139", "#544616")) WeightedCorrelation Compute Weighted Correlations Description Compute the weighted correlation. 100 WeightedMean Usage WeightedCorrelation(x, y = NULL, weights = NULL) Arguments x a matrix or vector to correlate with y. y a matrix or vector to correlate with x. If y is NULL, x will be used instead. weights an optional vector of weights to be used to determining the weighted mean and variance for calculation of the correlations. Examples x <- sample(10,10) y <- sample(10,10) w <- sample(5,10, replace=TRUE) WeightedCorrelation(x, y, w) WeightedMean Computes Weighted Mean Description Compute the weighted mean of data. Usage WeightedMean(x, weights = NULL, normwt = "ignored", na.rm = TRUE) Arguments x a numeric vector. weights a numeric vector of weights of x. normwt ignored at the moment. na.rm a logical, if TRUE, missing data will be dropped. If na.rm = FALSE, missing data will return an error. Examples x <- sample(10,10) w <- sample(5,10, replace=TRUE) WeightedMean(x, w) WeightedStdz 101 WeightedStdz Computes Weighted Standardized Values Description Computes weighted standardized values of data. Usage WeightedStdz(x, weights = NULL) Arguments x weights a vector for which a set of standardized values is desired. a vector of weights to be used to determining weighted values of x. Examples x <- sample(10,10) w <- sample(5,10, replace=TRUE) WeightedStdz(x, w) WeightedVariance Weighted Variance Description Compute the weighted variance. Usage WeightedVariance(x, weights = NULL, na.rm = FALSE) Arguments x weights na.rm a numeric vector. a numeric vector of weights for x. a logical if NA should be disregarded. Examples wt=c(1.23, 2.12, 1.23, 0.32, 1.53, 0.59, 0.94, 0.94, 0.84, 0.73) x = c(5, 5, 4, 4, 3, 4, 3, 2, 2, 1) WeightedVariance(x, wt) 102 Winsorize Winsorize Winsorized Mean Description Compute the winsorized mean, which consists of recoding the top k values of a vector. Usage Winsorize(x, k = 1, na.rm = TRUE) Arguments x The vector to be winsorized k An integer for the quantity of outlier elements that to be replaced in the calculation process na.rm a logical value for na.rm, default is na.rm=TRUE. Details Winsorizing a vector will produce different results than trimming it. While by trimming a vector causes extreme values to be discarded, by winsorizing it in the other hand, causes extreme values to be replaced by certain percentiles. Value An object of the same type as x References Dixon, W. J., and Yuen, K. K. (1999) Trimming and winsorization: A review. The American Statistician, 53(3), 267–269. Dixon, W. J., and Yuen, K. K. (1960) Simplified Estimation from Censored Normal Samples, The Annals of Mathematical Statistics, 31, 385–391. @references Wilcox, R. R. (2012) Introduction to robust estimation and hypothesis testing. Academic Press, 30-32. Statistics Canada (2010) Survey Methods and Practices. @note One may want to winsorize estimators, however, winsorization tends to be used for onevariable situations. @author Daniel Marcelino, <[email protected]> @examples set.seed(51) # for reproducibility x <- rnorm(50) ## introduce outlier x[1] <- x[1] * 10 # Compare to mean: mean(x) Winsorize(x) words 103 words Word frequencies from Mosteller and Wallace Description The data give the frequencies of words in works from four different sources: the political writings of eighteenth century American political figures Alexander Hamilton, James Madison, and John Jay, and the book Ulysses by twentieth century Irish writer James Joyce. This dataset contains the following columns: • Hamilton Hamilton frequency. • HamiltonRank Hamilton rank. • Madison Madison frequency. • MadisonRank Madison rank. • Jay Jay frequency. • JayRank Jay rank. • Ulysses Word frequency in Ulysses. • UlyssesRank Word rank in Ulysses. @references Weisberg, S. (2014). Applied Linear Regression, 4th edition. Hoboken NJ: Wiley. Usage data(words) Format A data.frame object with 8 variables and 165 observations. Source Mosteller, F. and Wallace, D. (1964). Inference and Disputed Authorship: The Federalist. Reading, MA: Addison-Wesley. 104 %nin% %nin% Not in (Find Matching or Non-Matching Elements) Description " logical vector indicating if there is a match or not for its left operand. A true vector element indicates no match in left operand, false indicates a match. Usage x %nin% table Arguments x A vector of numeric, character, or factor values. table A vector (numeric, character, factor), matching the mode of x Examples c('a','b','c') %nin% c('a','b') Index ∗Topic Rescaling Normalize, 55 ∗Topic Sampling Permutate, 60 ∗Topic Tests Bartels, 8 JamesStein, 45 ∗Topic Transformation Normalize, 55 ∗Topic Viz Lorenz, 50 ∗Topic datasets alpha, 4 auto, 7 baseball, 10 bhodrick93, 10 big5, 11 bollen, 12 bush, 14 cathedrals, 15 cgreene76, 16 chismoke, 17 election2008, 26 galton, 32 geom_dumbbell, 33 geom_lollipop, 35 GeomSpotlight, 32 griliches76, 39 ltaylor96, 51 marriage, 52 mishkin92, 54 nerlove63, 55 paired, 58 Palettes, 59 pollster2008, 65 religiosity, 73 sheston91, 81 stature, 83 swatson93, 85 ∗Topic Clean-up MakeNumeric, 51 MakeShare, 52 ∗Topic Concentration, GiniSimpson, 38 Lorenz, 50 ∗Topic Concentration Gini, 37 ∗Topic Distributions ddirichlet, 22 rdirichlet, 70 ∗Topic Diversity, Gini, 37 GiniSimpson, 38 Lorenz, 50 ∗Topic Electoral dHondt, 24 HighestAverages, 42 LargestRemainders, 48 PoliticalDiversity, 63 ∗Topic Exploratory, PoliticalDiversity, 63 ∗Topic Exploratory CI, 17 Crosstable, 20 Describe, 23 Outlier, 57 WeightedVariance, 101 Winsorize, 102 ∗Topic Inequality GiniSimpson, 38 ∗Topic Manipulation Recode, 72 Stratify, 84 ∗Topic Modelling Bootstrap, 13 Normalize, 55 ∗Topic Models Dummify, 25 105 106 titanic, 92 turnout, 93 twins, 94 unions, 95 vote, 98 words, 103 ∗Topic ggplot2 geom_dumbbell, 33 geom_lollipop, 35 GeomSpotlight, 32 party_pal, 59 Plotting, 61 pub_pal, 68 pub_shapes, 69 scale_color_party, 75 scale_color_pub, 76 scale_fill_party, 76 scale_fill_pub, 77 scale_shape_pub, 77 SciencesPoFont, 80 theme_blank, 87 theme_darkside, 88 theme_fte, 88 theme_pub, 89 %nin%, 104 aes, 34, 35 aes_, 34, 35 align_title_left (Plotting), 61 align_title_right (Plotting), 61 alpha, 4 Anonymize, 5 Atkinson, 6, 37, 41, 74 auto, 7 Bartels, 8 baseball, 10 bhodrick93, 10 big5, 11 bollen, 12 Bootstrap, 13, 80 borders, 34, 36 bush, 14 Calibrateplot, 14 cathedrals, 15 cgreene76, 16 chismoke, 17 CI, 17 INDEX Clear, 18, 80 Condorcet, 19 CountMissing, 19 CrossTable, 95 Crosstable, 20, 30, 31, 85 CrossValidation, 21 ddirichlet, 22 Describe, 23 Destring, 80 destring (SciencesPo-deprecated), 79 Detail (Describe), 23 detail (SciencesPo-deprecated), 79 dHondt, 24, 40 Dirichlet (rdirichlet), 70 discrete_scale, 75–77 Disproportionality (Proportionality), 66 Dummify, 25, 80 dummy (SciencesPo-deprecated), 79 election2008, 26 expand.grid, 96 Flag, 27 Footnote, 27 Forge, 28 Formatted, 29 fortify, 34, 35, 87 Freq (Frequency), 31 freq, 30, 31 Frequency, 20, 30, 31 galton, 32 geom_dumbbell, 33 geom_lollipop, 35 geom_spotlight (GeomSpotlight), 32 GeomDumbbell (geom_dumbbell), 33 GeomLollipop (geom_lollipop), 35 GeomSpotlight, 32 ggplot, 34, 35 Gini, 7, 37, 38, 41, 50, 74, 80 GiniSimpson, 37, 38, 50 gl, 96 griliches76, 39 Hamilton, 25, 40 Herfindahl, 7, 37, 41, 74 HighestAverages, 25, 40, 42, 49, 68 Jackknife, 44, 80 INDEX JamesStein, 45 Kurtosis, 45 Label, 47 label, 48 Label<-, 47 LargestRemainders, 25, 40, 43, 48, 68 layer, 34, 35 legend_bottom (Plotting), 61 legend_left (Plotting), 61 legend_right (Plotting), 61 legend_top (Plotting), 61 Lorenz, 37, 50, 74 ltaylor96, 51 MakeNumeric, 51 MakeShare, 52 marriage, 52 MeanFromRange, 53 mishkin92, 54 move_legend (Plotting), 61 nerlove63, 55 no_axes (Plotting), 61 no_axes_titles (Plotting), 61 no_gridlines (Plotting), 61 no_legend (Plotting), 61 no_legend_title (Plotting), 61 no_major_gridlines (Plotting), 61 no_major_x_gridlines (Plotting), 61 no_major_y_gridlines (Plotting), 61 no_minor_gridlines (Plotting), 61 no_minor_x_gridlines (Plotting), 61 no_minor_y_gridlines (Plotting), 61 no_ticks (Plotting), 61 no_title (Plotting), 61 no_x_axis (Plotting), 61 no_x_gridlines (Plotting), 61 no_x_line (Plotting), 61 no_x_text (Plotting), 61 no_x_ticks (Plotting), 61 no_x_title (Plotting), 61 no_y_axis (Plotting), 61 no_y_gridlines (Plotting), 61 no_y_line (Plotting), 61 no_y_text (Plotting), 61 no_y_ticks (Plotting), 61 no_y_title (Plotting), 61 107 Normalize, 55, 80 normalize (SciencesPo-deprecated), 79 Outlier, 57, 80 outliers (SciencesPo-deprecated), 79 paired, 58 Palettes, 59 party_pal, 59, 76 Permutate, 60 Plotting, 61 PoliticalDiversity, 25, 40, 41, 43, 49, 63, 68, 80 politicalDiversity (SciencesPo-deprecated), 79 pollster2008, 65 Previewplot (Plotting), 61 PreviewTheme (Plotting), 61 prop.table, 20 Proportionality, 43, 49, 66 pub_pal, 68, 76, 77 pub_shapes, 69, 77 rdirichlet, 70 read.zTree, 71 Recode, 72, 80 religiosity, 73 Render, 73 Rosenbluth, 7, 37, 41, 74 rotate_axes_text (Plotting), 61 rotate_x_text (Plotting), 61 rotate_y_text (Plotting), 61 RunShinyApp, 75 scale, 56 scale_color_party, 59, 75 scale_color_pub, 76, 77 scale_fill_party, 76 scale_fill_pub, 76, 77 scale_shape_pub, 69, 77 Scatterplot, 78 SciencesPo, 79 SciencesPo-deprecated, 79 SciencesPo-package (SciencesPo), 79 SciencesPoFont, 80 sheston91, 81 Simpson (Herfindahl), 41 Skewness, 82 stature, 83 108 Stratify, 84 summary.Crosstable, 20, 85 svTransform (SciencesPo-deprecated), 79 swatson93, 85 table, 20 Tableplot, 86 Ternaryplot, 86 theme, 90 theme_538, 90 theme_538 (theme_fte), 88 theme_blank, 87, 90 theme_darkside, 88 theme_fte, 88 theme_pub, 89 theme_scipo, 90 titanic, 92 Today, 93 turnout, 93 twins, 94 TwoWay, 95 unions, 95 Untable, 80, 96 untable (SciencesPo-deprecated), 79 Voronoi, 80, 97 voronoi (SciencesPo-deprecated), 79 vote, 98 Waffleplot, 98 WeightedCorrelation, 99 WeightedMean, 100 WeightedStdz, 101 WeightedVariance, 101 Winsorize, 58, 80, 102 winsorize (SciencesPo-deprecated), 79 words, 103 xtabs, 20 INDEX
© Copyright 2026 Paperzz