Package `SciencesPo`

Package ‘SciencesPo’
August 5, 2016
Type Package
Title A Tool Set for Analyzing Political Behavior Data
Date/Publication 2016-08-05 00:24:29
Version 1.4.1
Date 2016-08-02
Maintainer Daniel Marcelino <[email protected]>
Description Provide functions for analyzing elections and political behavior
data, including measures of political fragmentation, seat apportionment, and
small data visualization graphs.
URL http://CRAN.R-project.org/package=SciencesPo
License GPL (>= 2)
Encoding UTF-8
Depends R (>= 3.2.0), stats, utils, graphics, grDevices, methods,
ggplot2 (>= 2.1.0)
Imports Rcpp (>= 0.12.3), pander (>= 0.6.0), data.table (>= 1.9.4),
grid (>= 3.0.0), rstudioapi, htmltools, lubridate, stringr,
xtable, shiny
Suggests testthat, devtools, scales, extrafont, gtable, knitr
LinkingTo Rcpp
VignetteBuilder knitr
NeedsCompilation yes
LazyData TRUE
Repository CRAN
BugReports http://github.com/danielmarcelino/SciencesPo/issues
ByteCompile TRUE
RoxygenNote 5.0.1
1
R topics documented:
2
Collate 'Anonymize.R' 'Atkinson.R' 'Bootstrap.R' 'CI.R'
'CMDcheckVars.R' 'CalibratePlot.R' 'Clear.R' 'Condorcet.R'
'CountMissing.R' 'CrossValidation.R' 'Crosstable.R' 'DATA.R'
'DEPRECATED.R' 'Describe.R' 'Dirichlet.R' 'Dummy.R'
'Frequency.R' 'Geoms.R' 'Gini.R' 'Herfindahl.R'
'HighestAverages.R' 'Jackknife.R' 'Label.R'
'LargestRemainders.R' 'Lorenz.R' 'MOMENTS.R' 'MeanFromRange.R'
'Normalize.R' 'Operators.R' 'Outlier.R' 'Palettes.R'
'Permutate.R' 'PoliticalDiversity.R' 'Proportionality.R'
'RcppExports.R' 'Recode.R' 'Rosenbluth.R' 'RunShinyApp.R'
'ScatterPlot.R' 'SciencesPo.R' 'Stratify.R' 'TESTS.R'
'Tableplot.R' 'Ternaryplot.R' 'Themes.R' 'Transform.R'
'UTILS.R' 'Untable.R' 'Voronoi.R' 'Waffleplot.R' 'Winsorize.R'
'flag.R' 'print.SciencesPo.R' 'zTree.R' 'zzz.R'
Author Daniel Marcelino [aut, cre]
R topics documented:
alpha . . . . . .
Anonymize . .
Atkinson . . . .
auto . . . . . .
Bartels . . . . .
baseball . . . .
bhodrick93 . .
big5 . . . . . .
bollen . . . . .
Bootstrap . . .
bush . . . . . .
Calibrateplot .
cathedrals . . .
cgreene76 . . .
chismoke . . .
CI . . . . . . .
Clear . . . . . .
Condorcet . . .
CountMissing .
Crosstable . . .
CrossValidation
ddirichlet . . .
Describe . . . .
dHondt . . . .
Dummify . . .
election2008 . .
Flag . . . . . .
Footnote . . . .
Forge . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
4
5
6
7
8
10
10
11
12
13
14
14
15
16
17
17
18
19
19
20
21
22
23
24
25
26
27
27
28
R topics documented:
Formatted . . . . .
freq . . . . . . . .
Frequency . . . . .
galton . . . . . . .
GeomSpotlight . .
geom_dumbbell . .
geom_lollipop . . .
Gini . . . . . . . .
GiniSimpson . . .
griliches76 . . . . .
Hamilton . . . . .
Herfindahl . . . . .
HighestAverages .
Jackknife . . . . .
JamesStein . . . .
Kurtosis . . . . . .
Label . . . . . . .
Label<- . . . . . .
LargestRemainders
Lorenz . . . . . . .
ltaylor96 . . . . . .
MakeNumeric . . .
MakeShare . . . .
marriage . . . . . .
MeanFromRange .
mishkin92 . . . . .
nerlove63 . . . . .
Normalize . . . . .
Outlier . . . . . . .
paired . . . . . . .
Palettes . . . . . .
party_pal . . . . .
Permutate . . . . .
Plotting . . . . . .
PoliticalDiversity .
pollster2008 . . . .
Proportionality . .
pub_pal . . . . . .
pub_shapes . . . .
rdirichlet . . . . . .
read.zTree . . . . .
Recode . . . . . .
religiosity . . . . .
Render . . . . . . .
Rosenbluth . . . .
RunShinyApp . . .
scale_color_party .
scale_color_pub . .
3
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
29
30
31
32
32
33
35
37
38
39
40
41
42
44
45
45
47
47
48
50
51
51
52
52
53
54
55
55
57
58
59
59
60
61
63
65
66
68
69
70
71
72
73
73
74
75
75
76
4
alpha
scale_fill_party . . . .
scale_fill_pub . . . . .
scale_shape_pub . . .
Scatterplot . . . . . . .
SciencesPo . . . . . .
SciencesPo-deprecated
SciencesPoFont . . . .
sheston91 . . . . . . .
Skewness . . . . . . .
stature . . . . . . . . .
Stratify . . . . . . . .
summary.Crosstable . .
swatson93 . . . . . . .
Tableplot . . . . . . .
Ternaryplot . . . . . .
theme_blank . . . . . .
theme_darkside . . . .
theme_fte . . . . . . .
theme_pub . . . . . . .
theme_scipo . . . . . .
titanic . . . . . . . . .
Today . . . . . . . . .
turnout . . . . . . . . .
twins . . . . . . . . . .
TwoWay . . . . . . . .
unions . . . . . . . . .
Untable . . . . . . . .
Voronoi . . . . . . . .
vote . . . . . . . . . .
Waffleplot . . . . . . .
WeightedCorrelation .
WeightedMean . . . .
WeightedStdz . . . . .
WeightedVariance . . .
Winsorize . . . . . . .
words . . . . . . . . .
%nin% . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Index
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
76
77
77
78
79
79
80
81
82
83
84
85
85
86
86
87
88
88
89
90
92
93
93
94
95
95
96
97
98
98
99
100
101
101
102
103
104
105
alpha
A dataset that contains four test items
Description
A dataset with four test items used in SPSS to to compute Cronbach’s alpha.
• q1:q4 Numeric items of a scale.
Anonymize
5
Usage
data(alpha)
Format
A data.frame object with 4 variables and 60 observations.
Anonymize
To Make Anonymous a Data Frame
Description
Replaces factor and character variables by a combination of random sampled letters and numbers.
Numeric columns are also transformed, see details.
Usage
Anonymize(.data, col.names = "V", row.names = "")
## Default S3 method:
Anonymize(.data, col.names = "V", row.names = "")
Arguments
.data
A vector or a data frame.
col.names
A string for column names.
row.names
A string for rown names, default is empty.
Details
Strings will is replaced by an algorithm with 52/102 chance of choosing a letter and 50/102 chance
of choosing a number; then it joins everything in a 5-digits long character string.
Value
An object of the same type as .data.
Author(s)
Daniel Marcelino, <[email protected]>.
6
Atkinson
Examples
dt <- data.frame(
Z = sample(LETTERS,10),
X = sample(1:10),
Y = sample(c("yes", "no"), 10, replace = TRUE)
)
dt;
Anonymize(dt)
Atkinson
Atkinson Index of Inequality
Description
Calculates the Atkinson index A. This inequality measure is especially good at determining which
end of the distribution is contributing most to the observed inequality.
Usage
Atkinson(x, n = rep(1, length(x)), epsilon = NULL, na.rm = FALSE, ...)
## Default S3 method:
Atkinson(x, n = rep(1, length(x)), epsilon = NULL,
na.rm = FALSE, ...)
Arguments
x
a vector of data values of non-negative elements.
n
a vector of frequencies of the same length as x.
epsilon
a parameter of the inequality measure (if NULL, the default parameter (0.5) of the
respective measure is used).
na.rm
logical. Should missing values be removed? Defaults is set to FALSE.
...
additional arguements (currently ignored)
Details
epsilon = 0,5: little inequality aversion epsilon = 1,0: medium inequality aversion epsilon = 2,0:
great inequality aversion
References
Cowell, F. A. (2000) Measurement of Inequality in Atkinson, A. B. / Bourguignon, F. (Eds): Handbook of Income Distribution. Amsterdam.
Cowell, F. A. (1995) Measuring Inequality Harvester Wheatshef: Prentice Hall.
auto
7
See Also
Herfindahl, Rosenbluth, Gini. For more details see the “Indices” vignette.
Examples
if (interactive()) {
# generate a vector (of incomes)
# y <- c(80, 60, 10, 20, 30)
# Entropy 1.392321
# Maximum Entropy 1.609438
# Normalized Entropy 0.865098
# Exponential Index 0.248498
# Herfindahl 0.285000
# Normalized Herfindahl 0.106250
# Gini Coefficient 0.360000
# Concentration Coefficient 0.450000
x <- c(778, 815, 857, 888, 925, 930, 965, 990, 1012)
# compute Atkinson coefficient with epsilon=0.5
Atkinson(x, epsilon=0.5)
w <- c(10, 15, 20, 25, 40, 20, 30, 35, 45, 90)
}
auto
Statistics of 1979 automobile models
Description
The data give the following statistics for 74 automobiles in the 1979 model year as sold in the US.
Usage
data(auto)
Format
A data.frame object with 14 variables and 74 observations.
Model Make and model of car.
Origin a factor with levels A,E,J
Price Price in dollars.
MPG Miles per gallon.
Rep78 Repair record for 1978 on 1 (worst) to 5 (best) scale.
Rep77 Repair record for 1978 on 1 to 5 scale.
8
Bartels
Hroom Headroom in inches.
Rseat Rear seat clearance in inches.
Trunk Trunk volume in cubic feet.
Weight Weight in pounds.
Length Length in inches.
Turn Turning diameter in feet.
Displa Engine displacement in cubic inches.
Gratio Gear ratio for high gear.
Source
This data frame was created from http://euclid.psych.yorku.ca/ftp/sas/sssg/data/auto.
sas.
References
Originally published in Chambers, Cleveland, Kleiner, and Tukey, Graphical Methods for Data
Analysis, 1983, pages 352-355.
Bartels
Bartels Rank Test of Randomness
Description
Performs Bartels rank test of randomness. The default method for testing the null hypothesis of randomness is two.sided. By using the alternative left.sided, the null hypothesis is tested against
a trend. By using the alternative right.sided, the null hypothesis of randomness is tested against
a systematic oscillation in the observed data.
Usage
Bartels(x, alternative = "two.sided", pvalue = "normal")
## Default S3 method:
Bartels(x, alternative = "two.sided", pvalue = "normal")
Arguments
x
a numeric vector of data values.
alternative
a character string for hypothesis testing method; must be one of two.sided
(default), left.sided or right.sided.
pvalue
A method for asymptotic aproximation used to compute the p-value.
Bartels
9
Details
Missing values are by default removed.
The RVN test statistic is
Pn−1
RV N = Pn
i=1
i=1
(Ri − Ri+1 )2
(Ri − (n + 1)/2)
2
where Ri = rank(Xi ), i = 1, . . . , n. It is known that (RV N − 2)/σ is asymptotically standard
2
−2n−9)
normal, where σ 2 = 4(n−2)(5n
5n(n+1)(n−1)2 .
Value
statistic
The value of the RVN statistic test and the theoretical mean value and variance
of the RVN statistic test.
n
the sample size, after the remotion of consecutive duplicate values.
p.value
the asymptotic p-value.
method
a character string indicating the test performed.
data.name
a character string giving the name of the data.
alternative
a character string describing the alternative.
References
Bartels, R. (1982). The Rank Version of von Neumann’s Ratio Test for Randomness, Journal of the
American Statistical Association, 77(377), 40-46.
Gibbons, J.D. and Chakraborti, S. (2003). Nonparametric Statistical Inference, 4th ed. (pp. 97-98).
URL: http://books.google.pt/books?id=dPhtioXwI9cC&lpg=PA97&ots=ZGaQCmuEUq
Examples
# Example 5.1 in Gibbons and Chakraborti (2003), p.98.
# Annual data on total number of tourists to the United States for 1970-1982.
years <- 1970:1982
tourists <- c(12362, 12739, 13057, 13955, 14123, 15698, 17523,
18610, 19842, 20310, 22500, 23080, 21916)
# See it graphically
qplot(factor(years), tourists)+ geom_point()
# Test the null against a trend
Bartels(tourists, alternative="left.sided", pvalue="beta")
10
bhodrick93
baseball
Bradley Efron and Carl Morris ("Stein’s Paradox in Statistics)
Description
A dataset used in Efron and Morris’s 1977 paper, "Stein’s Paradox in Statistics".
•
•
•
•
•
•
•
•
•
nameFirst and last name of selected player.
atBatsnumber of times at bats.
hitsnumber of hits after 45 at bats.
avg45is the average after 45 at bats.
remainingAtBatsremaining number at bats.
remainingAvgis the remaining average to the end of the season.
seasonAtBatsis the number of times at bats in the end of the season.
seasonHitsis the number of hits in the end of the season.
avgSeasonis the end of the season average.
Usage
data(baseball)
Format
A data.frame object with 9 variables and 18 observations.
bhodrick93
Bekaert’s and Hodrick’s (1993) Data
Description
Data set used by Bekaert and Hodrick (1993) on biases in the measurement of foreign exchange
risk premiums. This dataset contains the following columns:
•
•
•
•
•
•
•
•
•
•
•
date A character vector for date.
jyspot Price of USD in JY, spot.
jyfwd Price of USD in JY, 30-day forward.
jys30 Price of USD in JY, spot market at 30-day forward deliver/date.
dmspot Price of USD in DM, spot.
dmfwd Price of USD in DM, 30-day forward
dms30 Price of USD in DM, spot market at 30-day forward deliver/date.
bpspot Price of USD in BP, spot.
bpfwd Price of USD in BP, 30-day forward.
bps30 Price of USD in BP, spot market at 30-day forward deliver/date.
quote A numeric vector.
big5
11
Usage
data(bhodrick93)
Format
A data.frame object with 11 variables and 778 observations.
Source
Hayashi, F. (2000). Econometrics. Princeton. New Jersey, USA: Princeton University. http:
//fmwww.bc.edu/ec-p/data/Hayashi/
References
Bekaert, G., and Hodrick, R. J. (1993) On biases in the measurement of foreign exchange risk
premiums. Journal of International Money and Finance, 12(2), 115-138.
big5
Big 5 dataset
Description
This is a dataset of the Big 5 personality traits. It is a measurement of the Dutch translation of the
NEO-PI-R on 500 first year psychology students (Dolan, Oort, Stoel, Wicherts, 2009). The test
consists of fifty items that you must rate on how true they are about you on a five point scale where
1=Disagree, 3=Neutral and 5=Agree. It takes most people 3-8 minutes to complete.
• NeuroticismThe level of neuroticism
• ExtraversionThe level of extraversion
• OpennessThe level of openness
• AgreeablenessThe level of agreeableness
• ConscientiousnessThe level of conscientiousness
Usage
data(big5)
Format
A data.frame object with 5 variables and 500 observations.
12
bollen
References
Hoekstra, H. A., Ormel, J., & De Fruyt, F. (2003). NEO-PI-R/NEO-FFI: Big 5 persoonlijkheidsvragenlijst. Handleiding Manual of the Dutch version of the NEO-PI-R/NEO-FFI. Lisse, The
Netherlands: Swets and Zeitlinger.
Dolan, C. V., Oort, F. J., Stoel, R. D., & Wicherts, J. M. (2009). Testing measurement invariance in the target rotates multigroup exploratory factor model. Structural Equation Modeling, 16,
295<96>314.
bollen
Bollen Data on Industrialization and Political Democracy
Description
The data were reported in Bollen (1989, p. 428, Table 9.4) This set includes data from 75 developing
countries each assessed on four measures of democracy measured twice (1960 and 1965), and three
measures of industrialization measured once (1960).
• y1Freedom of the press, 1960
• y2Freedom of political opposition, 1960
• y3Fairness of elections, 1960
• y4Effectiveness of elected legislature, 1960
• y5Freedom of the press, 1965
• y6Freedom of political opposition, 1965
• y7Fairness of elections, 1965
• y8Effectiveness of elected legislature, 1965
• x1GNP per capita, 1960
• x2Energy consumption per capita, 1960
• x3Percentage of labor force in industry, 1960
@references Bollen, K. A. (1979). Political democracy and the timing of development. American
Sociological Review, 44, 572-587.
Bollen, K. A. (1980). Issues in the comparative measurement of political democracy. American
Sociological Review, 45, 370-390.
Usage
data(bollen)
Format
A data.frame object with 11 variables and 75 observations.
Bootstrap
13
Bootstrap
Method for Bootstrapping
Description
this method provides bootstrapping statistics.
Usage
Bootstrap(x, ...)
## Default S3 method:
Bootstrap(x, nboots = 30, FUN, ...)
## S3 method for class 'model'
Bootstrap(x, ...)
Arguments
x
is a vector or a fitted model object whose parameters will be used to produce
bootstrapped statistics. Model objects are assumed to be of class “glm” or “lm”.
nboots
an integer for the number of reiteration.
FUN
a statistic function name to bootstrap, i.e., ’mean’, ’var’, ’cov’, etc.
...
further arguments passed to or used by other methods.
Value
A list with “alpha” and “beta” slots. The “alpha” corresponds to ancillary parameters and “beta” to
systematic components of the model.
Author(s)
Daniel Marcelino, <[email protected]>
Examples
x = runif(10, 0, 1)
Bootstrap(x, FUN=mean)
14
Calibrateplot
bush
Approval Ratings for President George W. Bush
Description
Approval ratings for George W. Bush.
Usage
data(bush)
Format
A data.frame object with 5 variables and 270 observations.
•
•
•
•
•
start.date. Start date of the survey.
end.date. End date of the survey.
approve. Percent which approve of the president.
disapprove. Percent which disapprove of the president.
undecided. Percent undecided about the president.
Details
A data set of approval ratings of George Bush over the time of his presidency, as reported by several
agencies. Most polls were of size approximately 1,000 so the margin of error is about 3 percentage
points. @source http://www.pollingreport.com/BushJob.htm
Calibrateplot
Calibrate Plot
Description
Plots a calibration plot for overlaping observed versus actual values.
Usage
Calibrateplot(.data = NULL, x, y, ci = 0.95,
xlab = "Forecast Probability", ylab = "Observed Frequency")
Arguments
.data
x, y
ci
xlab
ylab
the data frame.
are the forecast and observed data vectors.
is the size of the confidence interval
a character name for horizontal axis.
a character name for vertical axis.
cathedrals
15
Author(s)
Daniel Marcelino
Examples
data <- data.frame(x=c(10,20,25,30,35,40,45,50,55,60,65,70,75,80,85,90,95,100),
y=c(0,05,10,20,25,30,40,50,60,70,75,80,95,100,100,100,100,100))
Calibrateplot(data, x, y)
cathedrals
Cathedrals
Description
Heights and lengths of Gothic and Romanesque cathedrals. This dataset contains the following
columns:
• Type Romanesque or Gothic.
• Height Total height, feet.
• Length Total length, feet.
@references Weisberg, S. (2014). Applied Linear Regression, 4th edition. Hoboken NJ: Wiley.
Usage
data(cathedrals)
Format
A data.frame object with 3 variables and 25 observations.
16
cgreene76
cgreene76
Christensen’s and Greene’s (1976) Data
Description
Data set used by Christensen and Greene (1976) on economies of scale in US electric power generation. This dataset contains the following columns:
• firmid Observation id.
• costs Costs in 1970, MM USD.
• output Output, million KwH.
• plabor Price of labor.
• pkap Price of capital.
• pfuel Price of fuel.
• labshr Labor’s cost share.
• kapshr Capital’s cost share.
Usage
data(cgreene76)
Format
A data.frame object with 8 variables and 99 observations.
Source
Hayashi, F. (2000). Econometrics. Princeton. New Jersey, USA: Princeton University. http:
//fmwww.bc.edu/ec-p/data/Hayashi/
References
Christensen, L. R., and Greene, W. H. (1976) Economies of scale in US electric power generation.
The Journal of Political Economy, 655–676.
chismoke
17
chismoke
Smoking and Lung Cancer in China
Description
The Chinese smoking and lung cancer data (Agresti, Table 5.12, p. 60).
• Citycity name.
• Smokerif smoker or not.
• Cancerif detected cancer or not.
• Countthe frequencies.
@references Agresti, A. An introduction to categorical Data Analysis, (2nd ed.). Wiley.
Usage
data(chismoke)
Format
A data.frame object with 4 variables and 32 observations.
CI
An S4 Class to Confidence Intervals
Description
Calculates the confidence intervals for a vector of data values.
Usage
CI(x, level = 0.95, alpha = 1 - level, na.rm = FALSE, ...)
CI(x, level = 0.95, alpha = 1 - level, na.rm = FALSE, ...)
Arguments
x
A vector of data values.
level
The confidence level. Default is 0.95.
alpha
The significance level. Default is 1-level. If alpha equals 0.05, then your
confidence level is 0.95.
na.rm
A logical value, default is FALSE
...
Additional arguements (currently ignored)
18
Clear
Value
CI lower
Est. Mean
Mean of data.
CI upper
Upper bound of interval.
Std. Error
Standard Error of the mean.
Slots
lower Lower bound of interval.
mean Estimated mean.
upper Upper bound of interval.
stderr Standard Error of the mean.
Author(s)
Daniel Marcelino, <[email protected]>.
Examples
x <- c(1, 2.3, 2, 3, 4, 8, 12, 43, -1,-4)
CI(x, level=.90)
Clear
Clear Memory of All Objects
Description
This function is a wrapper for the command rm(list=ls()).
Usage
Clear(what = NULL, keep = TRUE)
Arguments
what
The object (as a string) that needs to be removed (or kept)
keep
Should obj be kept (i.e., everything but obj removed)? Or dropped?
Author(s)
Daniel Marcelino, <[email protected]>.
Condorcet
19
Examples
# create objects
a=1; b=2; c=3; d=4; e=5
# remove d
Clear("d", keep=FALSE)
ls()
# remove all but a and b
Clear(c("a", "b"), keep=TRUE)
ls()
Condorcet
Condorcet Voting
Description
Condorcet Voting.
Usage
Condorcet()
CountMissing
Count missing data.
Description
Function to count missing data.
Usage
CountMissing(.data, vars = NULL, prop = TRUE, exclude.complete = TRUE)
Arguments
.data
the data frame.
vars
vector with name of variables.
prop
logical. If this is TRUE will show share of missing data by variable.
exclude.complete
Logical. Exclude complete variables from the output.
Examples
CountMissing(marriage, prop = FALSE)
20
Crosstable
Crosstable
Cross-tabulation
Description
Crosstable produces all possible two-way and three-way tabulations of variables.
Usage
Crosstable(..., digits = 2, chisq = FALSE, fisher = FALSE,
mcnemar = FALSE, latex = FALSE, alt = "two.sided", deparse.level = 2)
## Default S3 method:
Crosstable(..., digits = 2, chisq = FALSE,
fisher = FALSE, mcnemar = FALSE, alt = "two.sided", latex = FALSE,
deparse.level = 2)
Arguments
digits
number of digits after the decimal point.
chisq
if TRUE, the results of a chi-square test will be shown below the table.
fisher
if TRUE, the results of a Fisher Exact test will be shown below the table.
mcnemar
if TRUE, the results of a McNemar test will be shown below the table.
latex
if the output is to be printed in latex format.
alt
the alternative hypothesis and must be one of "two.sided", "greater" or "less".
Only used in the 2 by 2 case.
deparse.level
an integer controlling the construction of labels in the case of non-matrix-like
arguments. If 0, middle 2 rownames, if 1, 3 rownames, if 2, 4 rownames (default).
...
the variables for the cross tabulation.
Value
A cross-tabulated object.
See Also
xtabs, Frequency, table, prop.table
Other Crosstables: summary.Crosstable
CrossValidation
21
Examples
with(titanic, Crosstable( SEX, AGE, SURVIVED))
#' # Agresti (2002), table 3.10, p. 106
# 1992 General Social Survey--Race and Party affiliation
gss <- data.frame(
expand.grid(Race=c("black", "white"),
party=c("dem", "indep", "rep")),
count=c(103,341,15,105,11,405))
df <- gss[rep(1:nrow(gss), gss[["count"]]), ]
with(df, Crosstable(Race, party))
# Agresti (1990, p. 61f; 2002, p. 91) Fisher's Tea Drinker
tea <- data.frame(
expand.grid(poured=c("Yes", "No"),
guess=c("Yes", "No")),
count=c(3,1,1,3))
# nicer way of recreating long tables
data = Untable(tea, freq="count")
with(data, Crosstable(poured, guess, fisher=TRUE, alt="greater")) # fisher=TRUE
CrossValidation
Cross Validation of SciencesPo Trees and Forests
Description
Cross Validation of SciencesPo Trees and Forests.
Usage
CrossValidation(object, ...)
## S3 method for class 'Tree'
CrossValidation(object, FUN, ...)
## S3 method for class 'Forest'
CrossValidation(object, FUN, ...)
Arguments
object
a fitted Tree or Forest object in the formula.
...
additional arguments.
FUN
either tree or forest
22
ddirichlet
Value
An object of class CV with elements:
call the function call
Author(s)
Daniel Marcelino, <[email protected]>.
ddirichlet
Dirichlet Distribution
Description
Density function and random number generation for the Dirichlet distribution
Usage
ddirichlet(x, alpha, log = FALSE, sum = FALSE)
Arguments
x
a matrix containing observations.
alpha
the Dirichlet distribution’s parameters. Can be a vector (one set of parameters
for all observations) or a matrix (a different set of parameters for each observation), see “Details”.
log
if TRUE, logarithmic densities are returned.
sum
if TRUE, the (log-)likelihood is returned.
Value
the ddirichlet returns a vector of densities (if sum = FALSE) or the (log-)likelihood (if sum = TRUE)
for the given data and alphas.
Author(s)
Daniel Marcelino, <[email protected]>
Examples
mat <- cbind(1:10, 5, 10:1);
mat;
x <- rdirichlet(10, mat);
ddirichlet(x, mat);
Describe
Describe
23
Summary Description Table
Description
Produce summary description table of a dataframe with the following features: variable names,
labels, factor levels, frequencies or summary statistics.
Usage
Describe(.data, style = "multiline", justify = "left", max.number = 10,
trim.strings = FALSE, max.width = 15, digits = 2, split.cells = 35,
display.labels = FALSE, display.attributes = FALSE, file = NA,
append = FALSE, escape.pipe = FALSE, ...)
Arguments
.data
a data frame.
style
the style to be used in pander table, one of “multiline”, “grid”, “simple”, and
“rmarkdown”.
justify
pander argument; defaults is justified to “left”.
max.number
an integer for the maximum number of items to be displayed in the frequency
cell. If variable has more distinct values, no frequency will be shown (only a
message stating the number of distinct values).
trim.strings
remove white spaces at the beginning or end of the string. This may impact the
frequencies, so interpret the frequencies cell accordingly. Defaults to FALSE.
max.width
Limits the number of characters to display in the frequency tables. Defaults to
15.
digits
the number of digits for rounding.
split.cells
pander argument. Number of characters allowed on a line before splitting the
cell. Defaults to 35.
display.labels If TRUE, variable labels will be displayed. Defaults to FALSE.
display.attributes
If TRUE, variable attibutes will be displayed. Defaults to FALSE.
file
the text file to be written to disk. Defaults to NA.
append
When “file” argument is supplied, this indicates whether to append output to
existing file (TRUE) or to overwrite any existing file (FALSE, default). If TRUE
and no file exists, a new file will be created.
escape.pipe
Only useful when style='grid' and file argument is not NA, in which case it
will escape the pipe character (|) to allow Pandoc to correctly convert multiline
cells.
...
additional arguments passed to pander.
24
dHondt
Details
The IQR formula used is the R standard IQR(x) = quantile(x, 3/4) - quantile(x, 1/4).
The (CV) stands for the coefficient of variation also known as relative standard deviation (RSD),
defined as the ratio of the standard deviation to the mean.
Author(s)
Daniel Marcelino, <[email protected]>
Examples
data(religiosity)
Describe(religiosity)
dHondt
The D’Hondt Method of Allocating Seats Proportionally
Description
The function calculates seats allotment in legislative house, given the total number of seats and
the votes for each party based on the Victor D’Hondt’s method (1878). The D’Hondt’s method is
mathematically equivalent to the method proposed by Thomas Jefferson few years before (1792).
Usage
dHondt(parties = NULL, votes = NULL, seats = NULL, ...)
dHondt(parties = NULL, votes = NULL, seats = NULL, ...)
Arguments
parties
a vector containig parties labels or candidates accordingly to the votes vector
order.
votes
a vector containing the total number of formal votes received by the parties/candidates.
seats
an integer for the number of seats to be filled (the district magnitude).
...
Additional arguements (currently ignored)
Value
A data.frame of length parties containing apportioned integers (seats) summing to seats.
Note
Adapted from Carlos Bellosta’s replies in the R-list.
Dummify
25
Author(s)
Daniel Marcelino, <[email protected]>.
References
Lijphart, Arend (1994). Electoral Systems and Party Systems: A Study of Twenty-Seven Democracies, 1945-1990. Oxford University Press.
See Also
HighestAverages, LargestRemainders, Hamilton, PoliticalDiversity.
Examples
votes <- sample(1:10000, 5)
parties <- sample(LETTERS, 5)
dHondt(parties, votes, seats = 4)
# Example: 2014 Brazilian election for the lower house in
# the state of Ceara. Coalitions were leading by the
# following parties:
results <- c(DEM=490205, PMDB=1151547, PRB=2449440,
PSB=48274, PSTU=54403, PTC=173151)
dHondt(parties=names(results), votes=results, seats=19)
Dummify
Generate Dummy Variables in a Data Frame
Description
Quickly create dummies in a data frame.
Usage
Dummify(x, .data = NULL, drop = TRUE)
Arguments
x
the column position to generate dummies
.data
the a data.frame.
drop
A logical value. If TRUE, unused levels will be omitted.
26
election2008
Details
A matrix object
Author(s)
Daniel Marcelino, <[email protected]>
Examples
df <- data.frame(y = rnorm(10), x = runif(10,0,1), sex = sample(1:2, 10, rep=TRUE))
Dummify(df$sex) ;
cbind(df, Dummify(df$sex));
election2008
2008 U.S. presidential election
Description
State-by-state information from the 2008 U.S. presidential election.
Usage
data(election2008)
Format
A data.frame object with 7 variables and 51 observations.
• State Name of the state.
• Abr Abbreviation for the state.
• Income Per capita income in the state as of 2007 (in dollars).
• HS Percentage of adults with at least a high school education.
• BA Percentage of adults with at least a college education.
• Dem.Rep Difference in %Democrat-%Republican (according to 2008 Gallup survey).
• ObamaWin 1= Obama (Democrat) wins state in 2008 or 0=McCain (Republican wins).
@source State income data from: Census Bureau Table 659. Personal Income Per Capita (in 2007).
High school data from: U.S. Census Bureau, 1990 Census of Population, http://www.nces.ed.
gov/programs/digest/d08/tables/dt08_011.asp. College data from: Census Bureau Table
225. Educational Attainment by State (in 2007) Democrat and Republican data from: http://
www.gallup.com/poll/114016/state-states-political-party-affiliation.aspx
Flag
27
Flag
Add an "id" Variable to a Dataset
Description
Many functions will not work properly if there are duplicated ID variables in a dataset. This function
is a convenience function for .N from the "data.table" package to create an .id. variable that when
used in conjunction with the existing ID variables, should be unique.
Usage
Flag(.data, id.vars = NULL)
Arguments
.data
The input data.frame or data.table.
id.vars
The variables that should be treated as ID variables. Defaults to NULL, at which
point all variables are used to create the new ID variable.
Value
The input dataset (as a data.table) if ID variables are unique, or the input dataset with a new
column named ".ID".
Examples
df <- data.frame(A = c("a", "a", "a", "b", "b"),
B = c(1, 1, 1, 1, 1), values = 1:5);
df
df = Flag(df, c("A", "B"))
df <- data.frame(A = c("a", "a", "a", "b", "b"),
B = c(1, 2, 1, 1, 2), values = 1:5)
df
(df <- Flag(df, 1:2) )
Footnote
Add Footnote to ggplot2 Objects
Description
Add footnote to ggplot2 objects.
28
Forge
Usage
Footnote(note, x = 0.85, y = 0.014, hjust = 0.5, vjust = 0.5,
newlines = TRUE, fontfamily = "serif", fontface = "plain",
color = "gray85", size = 9, angle = 0, lineheight = 0.9, alpha = 1)
Arguments
note
x
y
hjust
vjust
newlines
fontfamily
fontface
color
size
angle
lineheight
alpha
Forge
string or plotmath expression to be drawn.
the x location of the label.
the y location of the label.
horizontal justification
vertical justification
should a new line be appended to the start and end of the string, for spacing?
the font family
the font face ("plain", "bold", etc.)
text color
point size of text
angle at which text is drawn
line height of text
the alpha value of the text
Veritical, left-aligned layout for plots
Description
Left-align the waffle plots by x-axis. Use the pad parameter in waffle to pad each plot to the max
width (num of squares), otherwise the plots will be scaled.
Usage
Forge(...)
Arguments
...
one or more waffle plots
Examples
parts <- c(80, 30, 20, 10)
w1 <- Waffleplot(parts, rows=8)
w2 <- Waffleplot(parts, rows=8)
w3 <- Waffleplot(parts, rows=8)
chart <- Forge(w1, w2, w3)
print(chart)
Formatted
29
Formatted
Some Formats for Nicer Display
Description
Some predefined formats for nicer display.
Usage
Formatted(x, style = c("USD", "BRL", "EUR", "Perc"), digits = 2,
nsmall = 2, decimal.mark = getOption("OutDec"), flag = "")
Arguments
x
a numeric vector.
style
a character name for style. One of "USD", "BRL", "EUR", "Perc".
digits
an integer for the number of significant digits to be used for numeric and complex x
nsmall
an integer for the minimum number of digits to the right of the decimal point.
decimal.mark
decimal mark style to be used with Percents (%), usually (",") or (".").
flag
a character string giving a format modifier as "-", "+", "#".
Examples
x <- as.double(c(0.1, 1, 10, 100, 1000, 10000))
Formatted(x)
Formatted(x, "BRL")
Formatted(x, "EUR")
p = c(0.25, 25, 50)
Formatted(p, "Perc", flag="+")
Formatted(p, "Perc", decimal.mark=",")
30
freq
freq
Simple Frequency Table
Description
Creates a frequency table or data frame.
Usage
freq(x, weighs = NULL, breaks = graphics::hist(x, plot = FALSE)$breaks,
digits = 3, include.lowest = TRUE, order = c("desc", "asc", "level",
"name"), perc = FALSE, useNA = c("no", "ifany", "always"), ...)
## Default S3 method:
freq(x, weighs = NULL, breaks = graphics::hist(x, plot =
FALSE)$breaks, digits = 3, include.lowest = TRUE, order = c("desc",
"asc", "level", "name"), perc = FALSE, useNA = c("no", "ifany", "always"),
...)
Arguments
x
A vector of values for which the frequency is desired.
weighs
A vector of weights.
breaks
one of: 1) a vector giving the breakpoints between histogram cells; 2) a function
to compute the vector of breakpoints; 3) a single number giving the number of
cells for the histogram; 4) a character string naming an algorithm to compute the
number of cells (see ’Details’); 5) a function to compute the number of cells.
digits
The number of significant digits required.
include.lowest Logical; if TRUE, an x[i] equal to the breaks value will be included in the first (or
last) category or bin.
order
The order method.
perc
logical; if TRUE percents are returned.
useNA
Logical; if TRUE NA’s values are included.
...
Additional arguements (currently ignored)
Author(s)
Daniel Marcelino, <[email protected]>.
See Also
Frequency, Crosstable.
Frequency
31
Frequency
Frequency Table
Description
Simulating the FREQ procedure of SPSS.
Usage
Frequency(.data, x = NULL, verbose = TRUE, ...)
## Default S3 method:
Frequency(.data, x = NULL, verbose = TRUE, ...)
Freq(.data, x = NULL, verbose = TRUE, ...)
Arguments
.data
The data.frame.
x
A column for which a frequency of values is desired.
verbose
A logical value, if TRUE, extra statistics are also provided.
...
Additional arguements (currently ignored)
See Also
freq, Crosstable.
Examples
data(cathedrals)
Frequency(cathedrals, Type)
# may work with operators like %>%
cathedrals %>% Frequency(Height)
32
GeomSpotlight
galton
Galton’s Family Data on Human Stature.
Description
It is a reproduction of the data set used by Galton in his 1885’s paper on correlation between parent’s
height and their children, although the concept of correlation would be only introduced few years
later, in 1888. This data.frame contains the following columns:
• parent The parents’ average height
• child The child’s height
Usage
data(galton)
Format
A data.frame object with 2 variables and 928 observations.
Details
Regression analysis is the statistical method most often used in political science research. The
reason is that most scholars are interested in identifying “causal” effects from non-experimental
data and that the linear regression is the method for doing this. The term “regression” (1889) was
first crafted by Sir Francis Galton upon investigating the relationship between body size of fathers
and sons. Thereby he “invented” regression analysis by estimating: Ss = 85.7 + 0.56SF , meaning
that the size of the son regresses towards the mean.
References
Francis Galton (1886) Regression Towards Mediocrity in Hereditary Stature. The Journal of the
Anthropological Institute of Great Britain and Ireland, Vol. 15, pp. 246–263.
GeomSpotlight
Automatically enclose points in a polygon
Description
Automatically enclose points in a polygon
Usage
geom_spotlight(mapping = NULL, data = NULL, stat = "identity",
position = "identity", na.rm = FALSE, show.legend = NA,
inherit.aes = TRUE, ...)
geom_dumbbell
33
Arguments
mapping
mapping
data
data
stat
stat
position
position
na.rm
na.rm
show.legend
show.legend
inherit.aes
inherit.aes
...
dots
Value
adds a circle around the specified points
Examples
d <- data.frame(x=c(1,1,2),y=c(1,2,2)*100)
gg
gg
gg
gg
<- ggplot(d,aes(x,y))
<- gg + scale_x_continuous(expand=c(0.5,1))
<- gg + scale_y_continuous(expand=c(0.5,1))
+ geom_spotlight(s_shape=1, expand=0) + geom_point()
gg <- ggplot(mpg, aes(displ, hwy))
ss <- subset(mpg,hwy>29 & displ<3)
gg + geom_spotlight(data=ss, colour="blue", s_shape=.8, expand=0) +
geom_point() + geom_point(data=ss, colour="blue")
geom_dumbbell
Dumbell charts
Description
The dumbbell geom is used to create dumbbell charts. Dumbbell dot plots–dot plots with two or
more series of data–are an alternative to the clustered bar chart or slope graph.
Usage
geom_dumbbell(mapping = NULL, data = NULL, ..., point.colour.l = NULL,
point.size.l = NULL, point.colour.r = NULL, point.size.r = NULL,
na.rm = FALSE, show.legend = NA, inherit.aes = TRUE)
34
geom_dumbbell
Arguments
mapping
Set of aesthetic mappings created by aes or aes_. If specified and inherit.aes = TRUE
(the default), it is combined with the default mapping at the top level of the plot.
You must supply mapping if there is no plot mapping.
data
The data to be displayed in this layer. There are three options:
If NULL, the default, the data is inherited from the plot data as specified in the
call to ggplot.
A data.frame, or other object, will override the plot data. All objects will
be fortified to produce a data frame. See fortify for which variables will be
created.
A function will be called with a single argument, the plot data. The return
value must be a data.frame., and will be used as the layer data.
...
other arguments passed on to layer. These are often aesthetics, used to set an
aesthetic to a fixed value, like color = "red" or size = 3. They may also be
parameters to the paired geom/stat.
point.colour.l the colour of the left point
point.size.l
the size of the left point
point.colour.r the colour of the right point
point.size.r
the size of the right point
na.rm
If FALSE (the default), removes missing values with a warning. If TRUE silently
removes missing values.
show.legend
logical. Should this layer be included in the legends? NA, the default, includes if
any aesthetics are mapped. FALSE never includes, and TRUE always includes.
inherit.aes
If FALSE, overrides the default aesthetics, rather than combining with them.
This is most useful for helper functions that define both data and aesthetics and
shouldn’t inherit behaviour from the default plot specification, e.g. borders.
Aesthetics
geom_segment understands the following aesthetics (required aesthetics are in bold):
• x
• xend
• y
• yend
• alpha
• colour
• linetype
• size
geom_lollipop
35
Examples
df <- data.frame(trt=LETTERS[1:5],
l=c(20, 40, 10, 30, 50),
r=c(70, 50, 30, 60, 80))
ggplot(df, aes(y=trt, x=l, xend=r)) + geom_dumbbell()
ggplot(df, aes(y=trt, x=l, xend=r)) +
geom_dumbbell(color="#a3c4dc", size=0.75, point.colour.l="#0e668b")
geom_lollipop
Lollipop charts
Description
The lollipop geom is used to create lollipop charts. Lollipop charts are the creation of Andy Cotgreave going back to 2011. They are a combination of a thin segment, starting at with a dot at the
top and are a suitable alternative to or replacement for bar charts.
Geom Proto
Usage
geom_lollipop(mapping = NULL, data = NULL, ..., horizontal = FALSE,
point.colour = NULL, point.size = NULL, stem.colour = NULL,
stem.size = NULL, na.rm = FALSE, show.legend = NA, inherit.aes = TRUE)
Arguments
mapping
data
...
horizontal
Set of aesthetic mappings created by aes or aes_. If specified and inherit.aes = TRUE
(the default), it is combined with the default mapping at the top level of the plot.
You must supply mapping if there is no plot mapping.
The data to be displayed in this layer. There are three options:
If NULL, the default, the data is inherited from the plot data as specified in the
call to ggplot.
A data.frame, or other object, will override the plot data. All objects will
be fortified to produce a data frame. See fortify for which variables will be
created.
A function will be called with a single argument, the plot data. The return
value must be a data.frame., and will be used as the layer data.
other arguments passed on to layer. These are often aesthetics, used to set an
aesthetic to a fixed value, like color = "red" or size = 3. They may also be
parameters to the paired geom/stat.
If is FALSE (the default), the function will draw the lollipops up from the X axis
(i.e. it will set xend to x & yend to 0). If TRUE, it wiill set yend to y & xend to
0). Make sure you map the x & y aesthetics accordingly. This parameter helps
avoid the need for coord_flip().
36
geom_lollipop
point.colour
the colour of the point
point.size
the size of the point
stem.colour
the colour of the stem
stem.size
the size of the stem
na.rm
If FALSE (the default), removes missing values with a warning. If TRUE silently
removes missing values.
show.legend
logical. Should this layer be included in the legends? NA, the default, includes if
any aesthetics are mapped. FALSE never includes, and TRUE always includes.
inherit.aes
If FALSE, overrides the default aesthetics, rather than combining with them.
This is most useful for helper functions that define both data and aesthetics and
shouldn’t inherit behaviour from the default plot specification, e.g. borders.
Details
Use the horizontal parameter to abate the need for coord_flip() (see the Arguments section for
details).
Aesthetics
geom_point understands the following aesthetics (required aesthetics are in bold):
• x
• y
• alpha
• colour
• fill
• shape
• size
• stroke
Examples
df <- data.frame(trt=LETTERS[1:10],
value=seq(100, 10, by=-10))
ggplot(df, aes(trt, value)) + geom_lollipop()
ggplot(df, aes(value, trt)) + geom_lollipop(horizontal=TRUE)
Gini
37
Gini
Gini Coefficient G
Description
Computes the unweighted and weighted Gini index of a distribution.
Usage
Gini(x, weights, ...)
## Default S3 method:
Gini(x, weights = rep(1, length = length(x)), ...)
Arguments
x
a data.frame, a matrix-like, or a vector.
weights
a vector containing weights for x.
...
additional arguements (currently ignored)
Details
One might say the Gini is oversensitive to changes in the middle, while undersensitive at the extremes. The G coefficient doesn’t capture very explicitly changes in the top 10 inequality research
in the past 10 years – or the bottom 40 most poverty lies. The alternative Palma ratio does.
Author(s)
Daniel Marcelino, <[email protected]>.
See Also
GiniSimpson, Lorenz, Herfindahl, Rosenbluth, Atkinson..
Examples
# generate a vector (of incomes)
x <- c(778, 815, 857, 888, 925, 930, 965, 990, 1012)
# compute Gini coefficient
Gini(x)
# For Gini index: Gini(x)*100
Gini(c(100,0,0,0), c(1,33,33,33))
# Considers this
A <- c(20000, 30000, 40000, 50000, 60000)
B <- c(9000, 40000, 48000, 48000, 55000)
38
GiniSimpson
Gini(A); Gini(B);
GiniSimpson
Gini-Simpson Index
Description
Computes the Gini/Simpson coefficient. NAs from the data are omitted.
Usage
GiniSimpson(x, na.rm = TRUE, ...)
## Default S3 method:
GiniSimpson(x, na.rm = TRUE, ...)
Arguments
x
na.rm
...
a data.frame, a matrix-like, or a vector.
a logical value to deal with NAs.
additional arguements (currently ignored).
Details
The Gini-Simpson quadratic index is a classic measure of diversity, widely used by social scientists
and ecologists. The Gini-Simpson is also known as Gibbs-Martin index in sociology, psychology
and management studies, which
PRin turn is also known as the Blau index. The Gini-Simpson index
is computed as 1 − λ = 1 − i=1 p2i = 1 − 1/2 D.
Author(s)
Daniel Marcelino, <[email protected]>.
See Also
Gini.
Examples
# generate a vector (of incomes)
x <- as.table(c(69,50,40,22))
# let's say AB coalesced
rownames(x) <- c("AB","C","D","E")
print(x)
GiniSimpson(x)
griliches76
griliches76
39
Griliches’s (1976) Data
Description
Data set used by Griliches (1976) on wages of very young men. This dataset contains the following
columns:
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
rns residency in South.
rns80 a numeric vector.
mrt marital status = 1 if married.
mrt80 a numeric vector.
smsa reside metro area = 1 if urban.
smsa80 a numeric vector.
med mother’s education, years.
iq iq score.
kww score on knowledge in world of work test.
year Year.
age a numeric vector.
age80 a numeric vector.
s completed years of schooling.
s80 a numeric vector.
expr experience, years.
expr80 a numeric vector.
tenure tenure, years.
tenure80 a numeric vector.
lw log wage.
lw80 a numeric vector.
Usage
data(griliches76)
Format
A data.frame object with 20 variables and 758 observations.
Source
Hayashi, F. (2000). Econometrics. Princeton. New Jersey, USA: Princeton University. http:
//fmwww.bc.edu/ec-p/data/Hayashi/
References
Griliches, Z. (1976) Wages of very young men. The Journal of Political Economy, 84(4), 69–85.
40
Hamilton
Hamilton
The Hamilton Method of Allocating Seats Proportionally
Description
Computes the Alexander Hamilton’s apportionment method (1792), also known as Hare-Niemeyer
method or as Vinton’s method. The Hamilton method is a largest-remainder method which uses the
Hare Quota.
Usage
Hamilton(parties = NULL, votes = NULL, seats = NULL, ...)
Hamilton(parties = NULL, votes = NULL, seats = NULL, ...)
Arguments
parties
A vector containig parties labels or candidates in the same order of votes.
votes
A vector with the formal votes received by the parties/candidates.
seats
An integer for the number of seats to be returned.
...
Additional arguements (currently ignored)
Details
The Hamilton/Vinton Method sets the divisor as the proportion of the total population per house
seat. After each state’s population is divided by the divisor, the whole number of the quotient
is kept and the fraction dropped resulting in surplus house seats. Then, the first surplus seat is
assigned to the state with the largest fraction after the original division. The next is assigned to the
state with the second-largest fraction and so on.
Value
A data.frame of length parties containing apportioned integers (seats) summing to seats.
Author(s)
Daniel Marcelino, <[email protected]>.
References
Lijphart, Arend (1994). Electoral Systems and Party Systems: A Study of Twenty-Seven Democracies, 1945-1990. Oxford University Press.
See Also
dHondt, HighestAverages, LargestRemainders, PoliticalDiversity.
Herfindahl
41
Examples
votes <- sample(1:10000, 5)
parties <- sample(LETTERS, 5)
Hamilton(parties, votes, seats = 4)
Herfindahl
Herfindahl Index of Concentration
Description
Calculates the Herfindahl Index of concentration.
Usage
Herfindahl(x, n = rep(1, length(x)), base = 1, na.rm = FALSE, ...)
## Default S3 method:
Herfindahl(x, n = rep(1, length(x)), base = NULL,
na.rm = FALSE, ...)
Arguments
x
a vector of data values of non-negative elements.
n
a vector of frequencies of the same length as x.
base
a parameter of the concentration measure (if NULL, the default parameter (1.0) is
used instead).
na.rm
a logical. Should missing values be removed? The Default is set to na.rm=FALSE.
...
additional arguements (currently ignored)
Details
This index is also known as the Simpson Index in ecology, the Herfindahl-Hirschman Index (HHI)
in economics, and as the Effective Number of Parties (ENP) in political science.
References
Cowell, F. A. (2000) Measurement of Inequality in Atkinson, A. B. / Bourguignon, F. (Eds): Handbook of Income Distribution. Amsterdam.
Cowell, F. A. (1995) Measuring Inequality Harvester Wheatshef: Prentice Hall.
See Also
Atkinson, Rosenbluth, PoliticalDiversity, Gini. For more details see the Indices vignette:
vignette("Indices", package = "SciencesPo")
42
HighestAverages
Examples
# generate a vector (of incomes)
x <- c(778, 815, 857, 888, 925, 930, 965, 990, 1012)
# compute the Herfindahl coefficient with base=1
Herfindahl(x, base=1)
HighestAverages
Highest Averages Methods of Allocating Seats Proportionally
Description
Computes the highest averages method for a variety of formulas of allocating seats proportionally.
Usage
HighestAverages(parties = NULL, votes = NULL, seats = NULL,
method = c("dh", "sl", "msl", "danish", "hsl", "hh", "imperiali", "wb",
"jef", "ad", "hb"), threshold = 0, ...)
## Default S3 method:
HighestAverages(parties = NULL, votes = NULL,
seats = NULL, method = c("dh", "sl", "msl", "danish", "hsl", "hh",
"imperiali", "wb", "jef", "ad", "hb"), threshold = 0, ...)
Arguments
parties
A character vector for parties labels or candidates in the same order as votes. If
NULL, alphabet will be assigned.
votes
A numeric vector for the number of formal votes received by each party or candidate.
seats
The number of seats to be filled (scalar or vector).
method
A character name for the method to be used. See details.
threshold
A numeric value between (0~1). Default is set to 0.
...
Additional arguements (currently ignored)
Details
The following methods are available:
• "dh"d’Hondt method
• "sl"Sainte-Lague method
• "msl"Modified Sainte-Lague method
HighestAverages
43
• "danish"Danish modified Sainte-Lague method
• "hsl"Hungarian modified Sainte-Lague method
• "imperiali"The Italian Imperiali (not to be confused with the Imperiali Quota, which is a
Largest remainder method)
• "hh"Huntington-Hill method
• "wb"Webster’s method
• "jef"Jefferson’s method
• "ad"Adams’s method
• "hb"Hagenbach-Bischoff method
Value
A data.frame of length parties containing apportioned integers (seats) summing to seats.
Author(s)
Daniel Marcelino, <[email protected]>.
References
Gallagher, Michael (1992). "Comparing Proportional Representation Electoral Systems: Quotas,
Thresholds, Paradoxes and Majorities". British Journal of Political Science, 22, 4, 469-496.
Lijphart, Arend (1994). Electoral Systems and Party Systems: A Study of Twenty-Seven Democracies, 1945-1990. Oxford University Press.
See Also
LargestRemainders, Proportionality, PoliticalDiversity. For more details see the Indices
vignette: vignette('Indices', package = 'SciencesPo').
Examples
# Results for the state legislative house of Ceara (2014):
votes <- c(187906, 326841, 132531, 981096, 2043217, 15061, 103679,109830, 213988, 67145, 278267)
parties <- c("PCdoB", "PDT", "PEN", "PMDB", "PRB", "PSB", "PSC", "PSTU", "PTdoB", "PTC", "PTN")
HighestAverages(parties, votes, seats = 42, method = "dh")
# Let's create a data.frame with typical election results
# with the following parties and votes to return 10 seats:
my_election_data <- data.frame(
party=c("Yellow", "White", "Red", "Green", "Blue", "Pink"),
votes=c(47000, 16000,15900,12000,6000,3100))
HighestAverages(my_election_data$party,
my_election_data$votes,
44
Jackknife
seats = 10,
method="dh")
# How this compares to the Sainte-Lague Method
(dat= HighestAverages(my_election_data$party,
my_election_data$votes,
seats = 10,
method="sl"))
# Plot it
# Barplot(data=dat, "Party", "Seats") +
# theme_fte()
Jackknife
Resamples Data Using the Jackknife Method
Description
This function is used for estimating standard errors when the distribution is not know.
Usage
Jackknife(x, p)
Arguments
x
p
A vector
An function name for estimation of parameter as a string
Value
est orignial estimation of parameter
jkest jackknife estimation of parameter
jkvar jackknife estimation of variance
jkbias jackknife estimate of biasness of parameter
jkbiascorr bias corrected parameter estimate
Author(s)
Daniel Marcelino, <[email protected]>
Examples
x = runif(10, 0, 1)
mean(x)
Jackknife(x,'mean')
JamesStein
JamesStein
45
James-Stein Shrunken Estimates
Description
Computes James-Stein shrunken estimates of cell means given a response variable (which may be
binary) and a grouping indicator.
Usage
JamesStein(x, tau2 = 1)
JamesStein(x, tau2 = 1)
Arguments
x
a vector, matrix, or array of numerics; the data.
tau2
a positive numeric. The variance, assumed known and defaults to 1. # @param
k the grouping factor.
...
extra parameters typically ignored.
Details
Missing values are by default removed.
References
Efron, Bradley and Morris, Carl (1977) “Stein’s Paradox in Statistics.” Scientific American Vol.
236 (5): 119-127.
James, W., & Stein, C. (1961, June). Estimation with quadratic loss. In Proceedings of the fourth
Berkeley symposium on mathematical statistics and probability (Vol. 1, No. 1961, pp. 361-379).
Kurtosis
Compute the Kurtosis
Description
Provides three methods for performing kurtosis test.
Usage
Kurtosis(x, na.rm = FALSE, type = 3)
46
Kurtosis
Arguments
x
a numeric vector
na.rm
a logical value for na.rm, default is na.rm=FALSE.
type
an integer between 1 and 3 selecting one of the algorithms for computing kurtosis detailed below
Details
Kurtosis measures the peakedness of a data distribution. A distribution with zero kurtosis has a
shape as the normal curve (mesokurtic, or mesokurtotic). A positive kurtosis has a curve more
peaked about the mean and the its shape is narrower than the normal curve (leptokurtic, or leptokurtotic). A distribution with negative kurtosis has a curve less peaked about the mean and the its shape
is flatter than the normal curve (platykurtic, or platykurtotic).
Value
An object of the same type as x.
Note
To be consistent with classical use of kurtosis in political science analyses, the default type is the
same equation used in SPSS and SAS, which is the bias-corrected formula: Type 2: G_2 = ((n + 1)
g_2+6) * (n-1)/(n-2)(n-3). When you set type to 1, the following equation applies: Type 1: g_2 =
m_4/m_2^2-3. When you set type to 3, the following equation applies: Type 3: b_2 = m_4/s^4-3
= (g_2+3)(1-1/n)^2-3. You must have at least 4 observations in your vector to apply this function.
Skewness and Kurtosis are functions to measure the third and fourth central moment of a data
distribution.
Author(s)
Daniel Marcelino, <[email protected]>
References
Balanda, K. P. and H. L. MacGillivray. (1988) Kurtosis: A Critical Review. The American Statistician, 42(2), pp. 111–119.
Examples
w<-sample(4,10, TRUE)
x <- sample(10, 1000, replace=TRUE, prob=w)
Kurtosis(x, type=1)
Kurtosis(x, type=2)
Kurtosis(x)
Label
47
Label
Get Variable Label
Description
This function returns character value previously stored in variable’s label attribute. If none found,
and fallback argument is set to TRUE, the function returns object’s name (retrieved by deparse(substitute(x))),
otherwise NA is returned with a warning notice.
Usage
Label(x, fallback = TRUE, simplify = TRUE)
Arguments
x
an R object to extract labels from.
fallback
a logical indicating if labels should fallback to object name(s). Default is set to
TRUE.
simplify
a logical indicating if coerce results to a vector, otherwise, a list is returned.
Default is set to TRUE.
Author(s)
Daniel Marcelino, <[email protected]>
Examples
x <- rnorm(100)
Label(x)
# returns "x"
Label(twins$IQb) <- "IQ biological scores"
Label(twins, FALSE)
Label<-
# returns NA where no labels are found
Set Variable Label
Description
This function sets a label to a variable, by storing a character string to its label attribute.
Usage
Label(x) <- value
48
LargestRemainders
Arguments
x
value
a valid variable i.e. a non-NULL atomic vector that has no dimension attribute
(see dim for details).
a character value that is to be set as variable label
See Also
label
Examples
## Not run:
Label(religiosity$Religiosity) <- "Religiosity index"
vec <- rnorm(100)
Label(vec) <- "random numbers"
## End(Not run)
LargestRemainders
Largest Remainders Methods of Allocating Seats Proportionally
Description
Computes the largest remainders method for a variety of formulas of allocating seats proportionally.
Usage
LargestRemainders(parties = NULL, votes = NULL, seats = NULL,
method = c("hare", "droop", "hagb", "imperiali", "imperiali.adj"),
threshold = 0, ...)
## Default S3 method:
LargestRemainders(parties = NULL, votes = NULL,
seats = NULL, method = c("hare", "droop", "hagb", "imperiali",
"imperiali.adj"), threshold = 0, ...)
Arguments
parties
votes
seats
method
threshold
...
A character vector for parties labels or candidates in the order as votes. If NULL,
a random combination of letters will be assigned.
A numeric vector for the number of formal votes received by each party or candidate.
The number of seats to be filled (scalar or vector).
A character name for the method to be used. See details.
A numeric value between (0~1). Default is set to 0.
Additional arguements (currently ignored)
LargestRemainders
49
Details
The following methods are available:
• "droop"Droop quota method
• "hare"Hare method
• "hagb"Hagenbach-Bischoff
• "imperiali"Imperiali quota (do not confuse with the Italian Imperiali, which is a highest averages method)
• "imperiali.adj"Reinforced or adjusted Imperiali quota
Value
A data.frame of length parties containing apportioned integers (seats) summing to seats.
Author(s)
Daniel Marcelino, <[email protected]>.
References
Gallagher, Michael (1992). "Comparing Proportional Representation Electoral Systems: Quotas,
Thresholds, Paradoxes and Majorities". British Journal of Political Science, 22, 4, 469-496.
Lijphart, Arend (1994). Electoral Systems and Party Systems: A Study of Twenty-Seven Democracies, 1945-1990. Oxford University Press.
See Also
HighestAverages, Proportionality, PoliticalDiversity. For more details see the Indices
vignette: vignette('Indices', package = 'SciencesPo').
Examples
# Let's create a data.frame with typical election results
# with the following parties and votes to return 10 seats:
my_election_data <- data.frame(
party=c("Yellow", "White", "Red", "Green", "Blue", "Pink"),
votes=c(47000, 16000,15900,12000,6000,3100))
LargestRemainders(my_election_data$party,
my_election_data$votes, seats = 10, method="droop")
with(my_election_data, LargestRemainders(party,
votes, seats = 10, method="hare"))
50
Lorenz
Lorenz
The Lorenz Curve
Description
Computes the (empirical) ordinary and generalized Lorenz curve of a vector.
Usage
Lorenz(x, n = rep(1, length(x)), plot = FALSE, ...)
## Default S3 method:
Lorenz(x, n = rep(1, length(x)), plot = FALSE, ...)
Arguments
x
A vector of non-negative values.
n
A vector of frequencies of the same length as x.
plot
A logical. If TRUE the Lorenz curve will be plotted.
...
Additional arguements (currently ignored)
Details
The Gini coefficient ranges from a minimum value of zero, when all individuals are equal, to a
theoretical maximum of one in an infinite population in which every individual except one has a
size of zero. It has been shown that the sample Gini coefficients originally defined need to be
multiplied by n/(n-1) in order to become unbiased estimators for the population coefficients.
Author(s)
Daniel Marcelino, <[email protected]>.
See Also
Gini, GiniSimpson.
Examples
# generate a vector (of incomes)
x <- c(778, 815, 857, 888, 925, 930, 965, 990, 1012)
# compute Lorenz values
Lorenz(x)
# generate some weights:
wgt <- runif(n=length(x))
# compute the lorenz with especific weights
Lorenz(x, wgt)
ltaylor96
ltaylor96
51
Lothian’s and Taylor’s (1996) Data Set
Description
Data used by Lothian and Taylor (1996) on the real exchange rate behaviour. This dataset contains
the following columns:
• year Year
• spot dollar/sterling exchange rate.
• USwpi U.S. wholesale price index, 1914==100.
• UKwpi U.K. wholesale price index, 1914==100.
Usage
data(ltaylor96)
Format
A data.frame object with 4 variables and 200 observations.
Source
Hayashi, F. (2000). Econometrics. Princeton. New Jersey, USA: Princeton University. http:
//fmwww.bc.edu/ec-p/data/Hayashi/
References
Lothian, J. R., and Taylor, M. P. (1996) Real exchange rate behavior: the recent float from the
perspective of the past two centuries. Journal of Political Economy, 488–509.
MakeNumeric
Clean up a character vector to make it numeric
Description
Remove commas and whitespace, primarily
Usage
MakeNumeric(x)
Arguments
x
character vector to process
52
marriage
Value
numeric vector
MakeShare
Clean up a character vector to make it a percent
Description
Remove "
Usage
MakeShare(x)
Arguments
x
character vector to process
Value
numeric vector
marriage
Same Sex Marriage Public Opinion Data
Description
Data set fielded by the PEW Research Center on same sex marriage support in US. It covers public
opinion on the issue starting from 1996 up to date. This dataset contains the following columns:
• Date The year of the measurement
• Oppose Percent opposing same-sex marriage
• Favor Percent favoring same-sex marriage
• DK Percent of Don’t Know
Usage
data(marriage)
Format
A data.frame object with 4 variables and 18 observations.
References
PEW Research Center. Support for same-sex marriage. http://www.pewresearch.org
MeanFromRange
MeanFromRange
53
Estimates Mean and Standard Deviation from Median and Range
Description
When conductig a meta-analysis study, it is not always possible to recover from reports, the mean
and standard deviation values, but rather the median and range of values. This function provides a
method for computing the mean and variance estimates from median/range or IQR estimates.
Usage
MeanFromRange(low, med, high, n)
Arguments
low
The min of the data.
med
The median of the data.
high
The max of the data
n
The size of the sample.
Note
One should use these calculations ONLY if there are strong hints, that the overall distribution does
not relevantly deviate from normal distribution. The question is then, why some papers report
median and IQR if data are not far from normal distribution.
Author(s)
Daniel Marcelino, <[email protected]>
References
Hozo, Stela P.; et al (2005) Estimating the mean and variance from the median, range, and the size
of a sample. BMC Medical Research Methodology, 5:13.
Examples
#
#
#
#
#
mean=median and SD = IQR/1.35.
median= 3
Iqr= 2-3
median=mean=3
sd=iqr/1.35 2-3 means 2,5/1.35= sd = 1,85
MeanFromRange(low=5, med=8, high=12, n=10)
54
mishkin92
mishkin92
Mishkin’s (1992) Data
Description
Data from the Frederic S. Mishkin (1992) paper “Is the Fisher Effect for real?”. This dataset contains the following columns:
• year Year
• mon a numeric vector
• inf1mo a numeric vector
• inf3mo a numeric vector
• tbill1mo a numeric vector
• tbill3mo a numeric vector
• cpiu a numeric vector
• quote a numeric vector
Usage
data(mishkin92)
Format
A data.frame object with 8 variables and 491 observations.
Source
Hayashi, F. (2000). Econometrics. Princeton. New Jersey, USA: Princeton University. http:
//fmwww.bc.edu/ec-p/data/Hayashi/
References
Mishkin, F. S. (1992) Is the Fisher effect for real?: A reexamination of the relationship between
inflation and interest rates. Journal of Monetary Economics, 30(2), 195–215.
nerlove63
nerlove63
55
Marc Nerlove’s (1963) data
Description
Data used by Marc Nerlove (1963) on returns of electricity supply. This dataset contains the following columns:
• totcost costs in 1970, MM USD.
• output output, billion KwH.
• plabor price of labor.
• pfuel price of fuel.
• pkap price of capital.
#’ @source Hayashi, F. (2000). Econometrics. Princeton. New Jersey, USA: Princeton University.
http://fmwww.bc.edu/ec-p/data/Hayashi/
Usage
data(nerlove63)
Format
A data.frame object with 5 variables and 145 observations.
References
Nerlove, M. (1963) Returns to Scale in Electricity Supply. In Measurement in Economics-Studies
in Mathematical Economics and Econometrics in Memory of Yehuda Grunfeld, edited by Carl F.
Christ. Stanford: Stanford.
Normalize
Unity-based normalization
Description
Normalizes as feature scaling min - max, or unity-based normalization typically used to bring
the values into the range [0,1]. Other methods are also available, including scoring, centering, and
Smithson and Verkuilen (2006) method of dependent variable transformation.
Usage
Normalize(x, method = "range", ...)
56
Normalize
Arguments
x
is a vector to be normalized.
method
A string for the method used for normalization. Default is method = "range",
which coerces values into [0,1] range. See details for other implemented methods.
...
Additional arguements (currently ignored).
Details
The following methods are available:
• "range"Ranging is done by coercing x values into [0,1] range. However, this may be generalized to follow the range of other arbitrary values using a and b:
X0 = a +
(x − xmin )(b − a)
(xmax − xmin )
.
• "scale"Scaling is done by dividing (centered) values of x by their standard deviations.
• "center"Centering is done by subtracting the means (omitting NAs) of x from its observed
value.
• "z-score"Scoring is done by dividing the values of x from their root mean square.
• "SV"Transform a dependent variable in [0,1] rather than (0, 1) to beta regression as suggested
by Smithson and Verkuilen (2006).
Value
Normalized values in an object of the same class as x.
Author(s)
Daniel Marcelino, <[email protected]>
References
Smithson M, Verkuilen J (2006) A Better Lemon Squeezer? Maximum-Likelihood Regression with
Beta-Distributed Dependent Variables. Psychological Methods, 11(1), 54-71.
See Also
scale.
Examples
x <- sample(10)
(y = Normalize(x) )
# equivalently to
Outlier
57
(x-min(x))/(max(x)-min(x))
# Smithson and Verkuilen approach
(y = Normalize(x, method="SV") )
# look at what happens to the correlation of two independent variables and their "interaction"
# With non-centered interactions:
a = rnorm(10000,20,2)
b = rnorm(10000,10,2)
cor(a,b)
cor(a,a*b)
# With centered interactions:
c = a - 20
d = b - 10
cor(c,c*d)
Outlier
Detect Outliers
Description
Perform exploratory test to detect outliers.
Usage
Outlier(x, index = NULL, ...)
## Default S3 method:
Outlier(x, index = NULL, ...)
Arguments
x
A numeric object, a vector.
index
A numeric value to be considered in the computations.
...
Parameters which are typically ignored.
Details
The quantity in min reveals the minimum deviation from the mean, the integer value in the closest
indicates the position of that element. The quantity in max is the maximum deviation from the
mean, and the farthest integer value indicates the position of that value.
Value
Returns the minimum and maximum values, respectively preceded by their positions in the vector,
matrix or data.frame.
58
paired
Author(s)
Daniel Marcelino, <[email protected]>
References
Dixon, W.J. (1950) Analysis of extreme values. Ann. Math. Stat. 21(4), 488–506.
See Also
Winsorize for reduce the impact of outliers.
Examples
Outlier(x <- rnorm(20))
#data frame:
age <- sample(1:100, 1000, rep=TRUE);
Outlier(age)
paired
Paired Data
Description
Artificial data for a paired experiment. This dataset contains the following columns:
• patient the patient id.
• before_X before treatment.
• after_Y after treatment.
Usage
data(paired)
Format
A data.frame object with 3 variables and 9 observations.
Palettes
Palettes
59
Palette data for the themes used by package
Description
Data used by the palettes in the package.
Usage
Palettes
Format
A list.
party_pal
Political Parties Color Palette (Discrete) and Scales
Description
An N-color discrete palette for political parties.
Usage
party_pal(palette = "BRA", plot = FALSE, hex = FALSE)
Arguments
palette
the palette name, a character string.
plot
logical, if TRUE a plot is returned.
hex
logical, if FALSE, the associated color name (label) is returned.
See Also
Other color party: scale_color_party
Examples
library(scales)
# Brazil
show_col(party_pal("BRA")(20))
# Argentine
show_col(party_pal("ARG")(12))
60
Permutate
# US
show_col(party_pal("USA")(6))
# Canada
show_col(party_pal("CAN")(10))
party_pal("CAN", plot=TRUE, hex=FALSE)
Permutate
Create k random permutations of a vector
Description
Creates a k random permutation of a vector.
Usage
Permutate(x, k)
Arguments
x
a vector to be permutated.
k
number of permutations to be conducted.
Details
should be used only for length(input)! » k
Author(s)
Daniel Marcelino, <[email protected]>.
Examples
# 5! = 5 x 4 x 3 x 2 x 1 = 120
# row wise permutations
Permutate(x=1:5, k=5)
Plotting
Plotting
61
Plot Customizing
Description
Several wrapper functions to deal with ggplot2.
Usage
Previewplot()
PreviewTheme()
align_title_left()
align_title_right()
no_title()
no_y_gridlines()
no_x_gridlines()
no_major_x_gridlines()
no_major_y_gridlines()
no_major_gridlines()
no_minor_x_gridlines()
no_minor_y_gridlines()
no_minor_gridlines()
no_gridlines()
rotate_axes_text(angle)
rotate_x_text(angle)
rotate_y_text(angle)
no_axes()
no_x_axis()
62
Plotting
no_y_axis()
no_x_line()
no_y_line()
no_x_text()
no_y_text()
no_ticks()
no_x_ticks()
no_y_ticks()
no_axes_titles()
no_x_title()
no_y_title()
move_legend(position)
legend_bottom()
legend_top()
legend_left()
legend_right()
no_legend()
no_legend_title()
Arguments
angle
the angle of rotation.
position
the location of the legend (’top’, ’bottom’, ’right’, ’left’ or ’none’).
Author(s)
NA
Examples
Previewplot()
PoliticalDiversity
63
PoliticalDiversity
Political Diversity Indices
Description
Computes political diversity indices or fragmentation/concetration measures like the effective number of parties for an electoral unity or across unities. The very intuition of these coefficients is to
counting parties while weighting them by their relative political–or electoral strength.
Usage
PoliticalDiversity(x, index = "laakso/taagepera", margin = 1,
base = exp(1))
## Default S3 method:
PoliticalDiversity(x, index = "laakso/taagepera",
margin = 1, base = exp(1))
Arguments
x
A data.frame, a matrix, or a vector containing values for the number of votes or
seats each party received.
index
The type of index desired, one of "laakso/taagepera", "golosov", "herfindahl",
"gini", "shannon", "simpson", "invsimpson".
margin
The margin for which the index is computed.
base
The logarithm base used in some indices, such as the "shannon" index.
Details
Very often, political analysts say things like ‘two-party system’ and ‘multi-party system’ to refer
to a particular kind of political party system. However, these terms alone does not tell exactly
how fragmented–or concentrated a party system actually is. For instance, after the 2010 general
election, 22 parties obtained representation in the Lower Chamber in Brazil. Nonetheless, among
these 22 parties, nine parties together returned only 28 MPs. Thus, an index to assess the weigh or
the Effective Number of Parties is important and helps to go beyond the simple count of parties in
a legislative branch. A widely accepted algorithm was proposed by M. Laakso and R. Taagepera:
1
N=P 2
pi
, where N denotes the effective number of parties and p_i denotes the ith party’s fraction of the
seats.
In fact, this formula may be used to compute the vote share for each party. This formula is the
reciprocal of a well-known concentration index (the Herfindahl-Hirschman index) used in economics to study the degree to which ownership of firms in an industry is concentrated. Laakso
64
PoliticalDiversity
and Taagepera correctly saw that the effective number of parties is simply an instance of the inverse measurement problem to that one. This index makes rough but fairly reliable international
comparisons of party systems possible.
The Inverse Simpson index,
1
1/λ = PR
i=1
p2i
= 2D
Where λ equals the probability that two types taken at random from the dataset (with replacement)
represent the same type. This simply equals true fragmentation of order 2, i.e. the effective number
of parties that is obtained when the weighted arithmetic mean is used to quantify average proportional diversity of political parties in the election of interest. Another measure is the Least squares
index (lsq), which measures the disproportionality produced by the election. Specifically, by the
disparity between the distribution of votes and seats allocation.
Recently, Grigorii Golosov proposed a new method for computing the effective number of parties in
which both larger and smaller parties are not attributed unrealistic scores as those resulted by using
the Laakso/Taagepera index.I will call this as (Golosov) and is given by the following formula:
N=
n
X
i=1
pi
pi + p2i − p2i
Author(s)
Daniel Marcelino, <[email protected]>.
References
Gallagher, Michael and Paul Mitchell (2005) The Politics of Electoral Systems. Oxford University
Press.
Golosov, Grigorii (2010) The Effective Number of Parties: A New Approach, Party Politics, 16:
171-192.
Laakso, Markku and Rein Taagepera (1979) Effective Number of Parties: A Measure with Application to West Europe, Comparative Political Studies, 12: 3-27.
Nicolau, Jairo (2008) Sistemas Eleitorais. Rio de Janeiro, FGV.
Taagepera, Rein and Matthew S. Shugart (1989) Seats and Votes: The Effects and Determinants of
Electoral Systems. New Haven: Yale University Press.
Examples
# Here are some examples, help yourself:
# The wikipedia examples
A <- c(.75,.25);
B <- c(.75,.10,rep(0.01,15))
C <- c(.55,.45);
# The index by "laakso/taagepera" is the default
PoliticalDiversity(A)
PoliticalDiversity(B)
pollster2008
65
# Using the method proposed by Golosov gives:
PoliticalDiversity(B, index="golosov")
PoliticalDiversity(C, index="golosov")
# The 1980 presidential election in the US (vote share):
US1980 <- c("Democratic"=0.410, "Republican"=0.507,
"Independent"=0.066, "Libertarian"=0.011, "Citizens"=0.003,
"Others"=0.003)
PoliticalDiversity(US1980)
# 2010 Brazilian legislative election
votes_2010 = c("PT"=13813587, "PMDB"=11692384, "PSDB"=9421347,
"DEM"=6932420, "PR"=7050274, "PP"=5987670, "PSB"=6553345,
"PDT"=4478736, "PTB"=3808646, "PSC"=2981714, "PV"=2886633,
"PC do B"=2545279, "PPS"=2376475, "PRB"=1659973, "PMN"=1026220,
"PT do B"=605768, "PSOL"=968475, "PHS"=719611, "PRTB"=283047,
"PRP"=232530, "PSL"=457490,"PTC"=563145)
seats_2010 = c("PT"=88, "PMDB"=79, "PSDB"=53, "DEM"=43,
"PR"=41, "PP"=41, "PSB"=34, "PDT"=28, "PTB"=21, "PSC"=17,
"PV"=15, "PC do B"=15, "PPS"=12, "PRB"=8, "PMN"=4, "PT do B"=3,
"PSOL"=3, "PHS"=2, "PRTB"=2, "PRP"=2, "PSL"=1,"PTC"=1)
PoliticalDiversity(seats_2010)
PoliticalDiversity(seats_2010, index= "golosov")
pollster2008
Polls for 2008 U.S. presidential election
Description
Polls for the 2008 U.S. presidential election. The data includes all presidential polls reported on the
internet site http://www.elections.huffingtonpost.com/pollster that were taken between
August 29th, when John Mc-Cain announced that Sarah Palin would be his running mate as the
Republican nominee for vice president, and the end of September.
Usage
data(pollster2008)
Format
A data.frame object with 11 variables and 102 observations.
• PollTaker Polling organization.
• PollDates Dates the poll data were collected.
66
Proportionality
• MidDate Midpoint of the polling period.
• Days Number of days after August 28th (end of Democratic convention).
• n Sample size for the poll.
• Pop A=all, LV=likely voters, RV=registered voters.
• McCain Percent supporting John McCain.
• Obama Percent supporting Barak Obama.
• Margin Obama percent minus McCain percent.
• Charlie Indicator for polls after Charlie Gibson interview with VP candidate Sarah Palin
(9/11).
• Meltdown Indicator for polls after Lehman Brothers bankruptcy (9/15).
@source http://www.elections.huffingtonpost.com/pollster
Proportionality
Indexes of (Disproportionality) Proportionality
Description
Calculates several indexes of (dis)-proportionality, which are for the most part used to show the
relationship of votes to seats.
Usage
Proportionality(v, s, index = "Gallagher", ...)
## Default S3 method:
Proportionality(v, s, index = "Gallagher", margin = 1,
...)
Arguments
v
a numeric vector with the percentage share of votes obtained by each party.
s
a numeric vector with the percentage share of seats obtained by each party.
index
the desired method or type of index, see details below for the correct name.
margin
The margin for which the index is computed.
...
additional arguments (currently ignored)
Proportionality
67
Details
The following measures are available:
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
Loosemore-Hanby (Percent) Loosemore-Hanby Index of disproportionality
Rae (Percent) Rae Index of disproportionality
Cox-Shugart Cox-Shugart Index of proportionality
Inv.Cox-Shugart The inverted Cox-Shugart index
Farina Farina index of proportionality, aka cosine proportionality score
"Gallagher" (Percent) Gallagher index of disproportionality
"Inv.Gallagher" The inverse of Gallagher index
"Grofman" Grofman index of proportionality
"Inv.Grofman" Grofman index of proportionality
"Lijphart" Lijphart index of proportionality
"Inv.Rae" The inverse of Rae index
"Rose" Rose index of disproportionality
"Inv.Rose" The inverse of Rose index
"Sainte-Lague" Sainte-Lague index of disproportionality
"DHondt" D’Hondt index of proportionality
"Gini" Gini index of disproportionality
"Monroe" Monroe index of inequity
Value
Each iteration returns a single score.
Author(s)
Daniel Marcelino <[email protected]>
References
Duncan, O. and Duncan, B. (1955) A methodological analysis of segregation indexes. American
Sociological Review 20:210-7.
Gallagher, M. (1991) Proportionality, disproportionality and electoral systems. Electoral Studies
10(1):33-51.
Loosemore, J. and Hanby, V. (1971) The theoretical limits of maximum distortion: Som analytical
expressions for electoral systems. British Journal of Political Science 1:467-77.
Koppel, M., and A. Diskin. (2009) Measuring disproportionality, volatility and malapportionment:
axiomatization and solutions. Social Choice and Welfare 33, no. 2: 281-286.
Rae, D. (1967) The Political Consequences of Electoral Laws. London: Yale University Press.
Rose, Richard, Neil Munro and Tom Mackie (1998) Elections in Central and Eastern Europe Since
1990. Glasgow: Centre for the Study of Public Policy, University of Strathclyde.
Taagepera, R., and B. Grofman. Mapping the indices of seats-votes disproportionality and interelection volatility. Party Politics 9, no. 6 (2003): 659-77.
68
pub_pal
See Also
PoliticalDiversity, LargestRemainders, HighestAverages. For more details, see the Indices
vignette: vignette("Indices", package = "SciencesPo").
Examples
#' # 2012 Queensland state elecion
pvotes= c(49.65, 26.66, 11.5, 7.53, 3.16, 1.47)
pseats = c(87.64, 7.87, 2.25, 0.00, 2.25, 0.00)
Proportionality(pvotes, pseats) # default is Gallagher
Proportionality(pvotes, pseats, index="Rae")
# Proportionality(pvotes, pseats, index="Cox-Shugart")
# 2012 Quebec provincial election:
pvotes = c(PQ=31.95, Lib=31.20, CAQ=27.05, QS=6.03, Option=1.89, Other=1.88)
pseats = c(PQ=54, Lib=50, CAQ=19, QS=2, Option=0, Other=0)
Proportionality(pvotes, pseats, index="Rae")
pub_pal
Color Palettes for Publication (discrete)
Description
Color palettes for publication-quality graphs. See details.
Usage
pub_pal(palette = "pub12")
Arguments
palette
the palette name, a character string.
Details
The following palettes are available:
• "scipo"A 12-color colorblind discrete palette.
• "gray9"5-tons of gray.
• "carnival"A 5-color palette inspired in the Brazilian samba schools.
• "tableau20"Based on software Tableau
• "tableau10"Based on software Tableau
pub_shapes
69
• "tableau10light"Based on software Tableau
• "tableau10medium"Based on software Tableau
• "fte"fivethirtyeight.com color scales
Examples
library(scales)
library(ggplot2)
show_col(pub_pal("scipo")(12))
show_col(pub_pal("gray9")(9), labels = FALSE)
show_col(pub_pal("carnival")(4))
show_col(pub_pal("gdocs")(18))
show_col(pub_pal("manyeyes")(20))
show_col(pub_pal("tableau10")(10))
show_col(pub_pal("tableau10medium")(10))
show_col(pub_pal("tableau10light")(10))
show_col(pub_pal("cyclic")(20))
show_col(pub_pal("bivariate1")(9))
pub_shapes
Shape scales for theme_pub (discrete)
Description
Discrete shape scales for theme_pub().
Usage
pub_shapes(palette = "pub")
Arguments
palette
the palette name, a character string.
See Also
Other shape pub: scale_shape_pub
Examples
library(scales)
pub_shapes("default")(6)
pub_shapes("proportions")(4)
pub_shapes("gender")(2)
70
rdirichlet
rdirichlet
Dirichlet Distribution
Description
Density function and random number generation for the Dirichlet distribution
Usage
rdirichlet(n, alpha)
Arguments
n
number of random observations to draw.
alpha
the Dirichlet distribution’s parameters. Can be a vector (one set of parameters
for all observations) or a matrix (a different set of parameters for each observation), see “Details”.
If alpha is a matrix, a complete set of α-parameters must be supplied for each
observation. log returns the logarithm of the densities (therefore the log-likelihood)
and sum.up returns the product or sum and thereby the likelihood or log-likelihood.
Value
The rdirichlet returns a matrix with n rows, each containing a single random number according
to the supplied alpha vector or matrix.
Author(s)
Daniel Marcelino, <[email protected]>
Examples
# 1) General usage:
rdirichlet(20, c(1,1,1) );
alphas <- cbind(1:10, 5, 10:1);
alphas;
rdirichlet(10, alphas );
alpha.0 <- sum( alphas );
test <- rdirichlet(10, alphas );
apply( test, 2, mean );
alphas / alpha.0;
apply( test, 2, var );
alphas * ( alpha.0 - alphas ) / ( alpha.0^2 * ( alpha.0 + 1 ) );
#
#
#
#
2) A practical example of usage:
A Brazilian face-to-face poll by Datafolha conducted on Oct 03-04
with 18,116 interviews asking for their vote preferences among the
presidential candidates.
read.zTree
71
## First, draw a sample from the posterior
set.seed(1234);
n <- 18116;
poll <- c(40,24,22,5,5,4) / 100 * n; # The data
mcmc <- 100000;
sim <- rdirichlet(mcmc, alpha = poll + 1);
## Second, look at the margins of Aecio over Marina in the very last moment of the campaign:
margin <- sim[,2] - sim[,3];
mn <- mean(margin); # Bayes estimate
mn;
s <- sd(margin); # posterior standard deviation
qnts <- quantile(margin, probs = c(0.025, 0.975)); # 90% credible interval
qnts;
pr <- mean(margin > 0); # posterior probability of a positive margin
pr;
## Third, plot the posterior density
hist(margin, prob = TRUE, # posterior distribution
breaks = "FD", xlab = expression(p[2] - p[3]),
main = expression(paste(bold("Posterior distribution of "), p[2] - p[3])));
abline(v=mn, col='red', lwd=3, lty=3);
read.zTree
Reads zTree output files
Description
Extracts variables from a zTree output file.
Usage
read.zTree(object, tables = c("globals", "subjects"))
Arguments
object
a zTree file or a list of files.
tables
the tables of intrest.
Value
A list of dataframes, one for each table
72
Recode
Examples
## Not run: url <zTables <- read.zTree( "131126_0009.xls" , "contracts" )
zTables <- read.zTree( c("131126_0009.xls",
"131126_0010.xls"), c("globals","subjects", "contracts" ))
## End(Not run)
Recode
Recode or Replace Values With New Values
Description
Recodes a value or a vector of values.
Usage
Recode(x, old, into, warn = TRUE)
## Default S3 method:
Recode(x, old, into, warn = TRUE)
Arguments
x
The vector whose values will be recoded.
old
the old value or a vector of old values (numeric/factor/string).
into
the new value or a vector of replacement values (1 to 1 recoding).
warn
A logical to print a message if any of the old values are not actually present in x.
Examples
x <- LETTERS[1:5]
Recode(x, c("B", "D"), c("Beta", "Delta"))
# On numeric vectors
x <- c(1, 4, 5, 9)
Recode(x, old = c(1, 4, 5, 9), into = c(10, 40, 50, 90))
religiosity
73
religiosity
Data on religiosity of countries
Description
Data from the 2007 Spring Survey conducted through the Pew Global Attitudes Project.
Usage
data(religiosity)
Format
A data.frame object with 9 variables and 44 observations.
•
•
•
•
•
•
•
•
•
Country Name of country.
Religiosity A measure of degree of religiosity for residents of the country.
GDP Per capita Gross Domestic Product in the country.
Africa Indicator for countries in Africa.
EastEurope Indicator for countries in Eastern Europe.
MiddleEast Indicator for countries in the Middle East.
Asia Indicator for countries in Asia.
WestEurope Indicator for countries in Western Europe.
Americas Indicator for countries in North/South America.
@source The Pew Research Center’s Global Attitudes Project surveyed people around the world
and asked (among many other questions) whether they agreed that "belief in God is necessary for
morality," whether religion is very important in their lives, and whether the at least once per day.
The variable Religiosity is the sum of the percentage of positive responses on these three items,
measured in each of 44 countries. The dataset also includes the per capita GDP for each country and
indicator variables that record the part of the world the country is in. http://www.pewglobal.org
Render
Render layer on top of a ggplot
Description
Set up a rendering layer on top of a ggplot.
Usage
Render(plot = NULL)
Arguments
plot
the plot to use as a starting point, either a ggplot2 or gtable.
74
Rosenbluth
Rosenbluth
Rosenbluth Index of Concentration
Description
Calculates the Rosenbluth index of concentration, also known as Hall or Tiedemann Indices.
Usage
Rosenbluth(x, n = rep(1, length(x)), na.rm = FALSE, ...)
## Default S3 method:
Rosenbluth(x, n = rep(1, length(x)), na.rm = FALSE, ...)
Arguments
x
A vector of data values of non-negative elements.
n
A vector of frequencies of the same length as x.
na.rm
A logical. Should missing values be removed? The Default is set to na.rm=FALSE.
...
Additional arguements (currently ignored)
References
Cowell, F. A. (2000) Measurement of Inequality in Atkinson, A. B. / Bourguignon, F. (Eds): Handbook of Income Distribution. Amsterdam.
Cowell, F. A. (1995) Measuring Inequality Harvester Wheatshef: Prentice Hall.
See Also
Atkinson, Herfindahl, Gini, Lorenz. For more details see the Indices vignette: vignette("Indices", package = "Scien
Examples
# generate a vector (of incomes)
x <- c(778, 815, 857, 888, 925, 930, 965, 990, 1012)
# compute Rosenbluth coefficient
Rosenbluth(x)
RunShinyApp
75
RunShinyApp
Examples from the SciencesPo Package
Description
Launch a Shiny app that shows a demo.
Usage
RunShinyApp(app)
Arguments
app
the name of the shiny application.
Examples
# A demo of what \code{\link[SciencesPo]{PoliticalDiversity}} does.
if (interactive()) {
RunShinyApp("PoliticalDiversity")
}
# Other examples
if (interactive()) {
RunShinyApp("ConjugateNormal")
}
if (interactive()) {
RunShinyApp("Regression")
}
scale_color_party
Political Parties Color Palette (Discrete) and Scales
Description
An N-color discrete palette for political parties.
Usage
scale_color_party(palette = "BRA", ...)
Arguments
palette
the palette name, a character string.
...
Other arguments passed on to discrete_scale to control name, limits, breaks,
labels and so forth.
76
scale_fill_party
See Also
party_pal for details and references.
Other color party: party_pal
scale_color_pub
Publication color scales.
Description
See pub_pal for details.
Usage
scale_color_pub(palette = "pub12", ...)
Arguments
palette
...
the palette name, a character string.
Other arguments passed on to discrete_scale to control name, limits, breaks,
labels and so forth.
See Also
pub_pal for references.
Other color publication: scale_fill_pub
scale_fill_party
Political Parties Color Palette (Discrete) and Scales
Description
An N-color discrete palette for political parties.
Usage
scale_fill_party(palette = "BRA", ...)
Arguments
palette
...
the palette name, a character string.
Other arguments passed on to discrete_scale to control name, limits, breaks,
labels and so forth.
See Also
party_pal for details and references.
scale_fill_pub
77
scale_fill_pub
Publication color scales.
Description
See pub_pal for details.
Usage
scale_fill_pub(palette = "pub12", ...)
Arguments
palette
the palette name, a character string.
...
Other arguments passed on to discrete_scale to control name, limits, breaks,
labels and so forth.
See Also
Other color publication: scale_color_pub
scale_shape_pub
Shape scales for theme_pub (discrete)
Description
See pub_shapes for details.
Usage
scale_shape_pub(palette = "pub", ...)
Arguments
palette
the palette name, a character string.
...
common discrete scale parameters: name, breaks, labels, na.value, limits
and guide. See discrete_scale for more details
See Also
Other shape pub: pub_shapes
78
Scatterplot
Examples
library("ggplot2")
library("scales")
p <- ggplot(mtcars) +
geom_point(aes(x = wt, y = mpg, shape = factor(gear))) +
facet_wrap(~am)
p + scale_shape_pub()
Scatterplot
Quartile-Frame Scatterplot
Description
Plots a scatterplot with a custom axis that acts as a quartile plot (quartile-frame). Inspired by The
Visual Display of Quantitative Information by Edward R. Tufte.
Usage
Scatterplot(x, y, ...)
Arguments
x, y
aesthetics passed into each layer variables # @param data data frame to use
(optional).
...
additional arguments.
Examples
with(mtcars, Scatterplot(x = wt, y = mpg,
main = "Vehicle Weight-Gas Mileage Relationship",
xlab = "Vehicle Weight",
ylab = "Miles per Gallon",
font.family = "serif") )
SciencesPo
SciencesPo
79
A Tool Set for Analyzing Political Behavior Data
Description
Provide functions for analyzing elections and political behavior data, including measures of political
fragmentation, seat apportionment, and small data visualization graphs.
Author(s)
NA
References
Marcelino, Daniel (2013). SciencesPo: A Tool Set for Analyzing Political Behaviour Data. Available at SSRN: http://dx.doi.org/10.2139/ssrn.2320547
SciencesPo-deprecated Deprecated functions in package ‘SciencesPo’
Description
These functions are provided for compatibility with older versions of ‘SciencesPo’ only, and will
be defunct at the next release.
Usage
svTransform(y)
politicalDiversity(x, index = "laakso/taagepera", margin = 1,
base = exp(1))
untable(x, ...)
detail(x, ...)
dummy(x, ...)
voronoi(x, ...)
normalize(x, ...)
outliers(x, ...)
destring(x, ...)
winsorize(x, ...)
80
SciencesPoFont
Arguments
y
x
index
margin
base
...
the dependent variable.
A data.frame, a matrix-like, or a vector containing values for the number of
votes or seats each party received.
The type of index desired, one of "laakso/taagepera", "golosov", "herfindahl",
"gini", "shannon", "simpson", "invsimpson".
The margin for which the index is computed.
The logarithm base used in some indices, such as the "shannon" index.
Extra parameters.
Details
The following functions are deprecated and will be made defunct; use the replacement indicated
below:
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
svTransform: Normalize
untable: Untable
politicalDiversity: PoliticalDiversity
winsorize: Winsorize
recode: Recode
dummy: Dummify
gini: Gini
clear: Clear
detail: Destring
destring: Destring
outliers: Outlier
voronoi: Voronoi
jackknife: Jackknife
bootstrap: Bootstrap
normalize: Normalize
SciencesPoFont
SciencesPo Base Fonts
Description
Used to ascertain required theme fonts.
Usage
SciencesPoFont()
Author(s)
NA
sheston91
sheston91
81
The Penn World Table
Description
The Penn World Table used in Summers and Heston (1991). This dataset contains the following
columns:
• year Year
• pop Population (thousands)
• rgdppc Real per capita GDP
• savrat a numeric vector
• country Country
• com Communist regime
• opec OPEC country
• name Country name
Usage
data(sheston91)
Format
A data.frame object with 8 variables and 3250 observations.
Source
Hayashi, F. (2000). Econometrics. Princeton. New Jersey, USA: Princeton University. http:
//fmwww.bc.edu/ec-p/data/Hayashi/
References
Summers, R. and Heston, A. (1991) The Penn World Table (Mark 5): an expanded set of international comparisons, 1950–1988. The Quarterly Journal of Economics, 106(2), 327–368.
82
Skewness
Skewness
Compute the Skewness
Description
Provides three methods for performing skewness test.
Usage
Skewness(x, na.rm = TRUE, type = 3)
Arguments
x
a numeric vector containing the values whose skewness is to be computed.
na.rm
a logical value for na.rm, default is na.rm=TRUE.
type
an integer between 1 and 3 for selecting the algorithms for computing the skewness, see details below.
Details
Skewness is a measure of symmetry distribution. Negative skewness (g_1 < 0) indicates that the
mean of the data distribution is less than the median, and the data distribution is left-skewed. Positive skewness (g_1 > 0) indicates that the mean of the data values is larger than the median, and the
data distribution is right-skewed. Values of g_1 near zero indicate a symmetric distribution.
Value
An object of the same type as x
Note
There are several methods to compute skewness, Joanes and Gill (1998) discuss three of the most
traditional methods. According to them, type 3 performs better in non-normal population distribution, whereas in normal-like population distribution type 2 fits better the data. Such difference
between the two formulae tend to disappear in large samples. Type 1: g_1 = m_3/m_2^(3/2).
Type 2: G_1 = g_1*sqrt(n(n-1))/(n-2).
Type 3: b_1 = m_3/s^3 = g_1 ((n-1)/n)^(3/2).
Author(s)
Daniel Marcelino, <[email protected]>
References
Joanes, D. N. and C. A. Gill. (1998) Comparing measures of sample skewness and kurtosis. The
Statistician, 47, 183–189.
stature
83
Examples
w <-sample(4,10, TRUE)
x <- sample(10, 1000, replace=TRUE, prob=w)
Skewness(x, type = 1)
Skewness(x, type = 2)
Skewness(x)
stature
The Measure of US Presidents
Description
The US presidents and their main opponents’ heights (cm). This dataset contains the following
columns:
• election The election year.
• winner The winner candidate.
• winner.height The winner candidate’s height in centimeters (cm).
• winner.vote The popular vote support for the winner.
• winner.party The winner’s party.
• opponent The main opponent candidate.
• opponent.height The opponent candidate’s height in centimeters (cm).
• opponent.vote Popular vote support for the main opponent candidate.
• opponent.party The opponent’s party.
• turnout The electorate turnout in percentages.
• winner.bmi The winner Body Mass Index (BMI) estimate (BMI = weight in kg/(height in meter)**2)
@references Murray, G. R. (2014) Evolutionary preferences for physical formidability in leaders.
Politics and the Life Science, 33(1), 33-53.
Usage
data(stature)
# t.test(stature$winner.height, stature$opponent.height, var.equal=TRUE)
Format
A data.frame object with 11 variables and 57 observations.
Source
US Presidents: http://www.lingerandlook.com/Names/Presidents.php Inside Gov. http:
//www.us-presidents.insidegov.com.
84
Stratify
Stratify
Stratified Sampling
Description
Sample row values of a data frame conditional to some strata attributes.
Usage
Stratify(.data, group, size, select = NULL, replace = FALSE,
both.sets = FALSE)
Arguments
.data
the data frame.
group
the grouping factor, may be a list.
size
the sample size.
select
if sampling from a specific group or list of groups.
replace
should sampling be with replacement?
both.sets
if TRUE, both ‘sample‘ and ‘.data‘ are returned.
Examples
data(pollster2008)
# Let's take a 10% sample from all -PollTaker- groups in pollster2008
Stratify(pollster2008, "PollTaker", 0.1)
# Let's take a 10% sample from only 'LV' and 'RV' groups from -Pop- in pollster2008
Stratify(pollster2008, "Pop", 0.1, select = list(Pop = c("LV", "RV")))
# Let's take 3 samples from all -PollTaker- groups in pollster2008,
# specified by column 1
Stratify(pollster2008, group = 1, size = 3)
# Let's take a sample from all -Pop- groups in pollster2008, where we
# specify the number wanted from each group
Stratify(pollster2008, "Pop", size = c(3, 5, 4))
# Use a two-column strata (-Pop- and -PollTaker-) but only interested in
# cases where -Pop- == 'LV'
Stratify(pollster2008, c("Pop", "PollTaker"), 0.15, select = list(Pop = "LV"))
summary.Crosstable
85
summary.Crosstable
Print Crosstable style
Description
Print Crosstable style
Usage
## S3 method for class 'Crosstable'
summary(object, digits = 2, chisq = FALSE,
fisher = FALSE, mcnemar = FALSE, alt = "two.sided", latex = FALSE,
...)
Arguments
object
The table object.
digits
number of digits after the decimal point.
chisq
if TRUE, the results of a chi-square test will be shown below the table.
fisher
if TRUE, the results of a Fisher Exact test will be shown below the table.
mcnemar
if TRUE, the results of a McNemar test will be shown below the table.
alt
the alternative hypothesis and must be one of "two.sided", "greater" or "less".
Only used in the 2 by 2 case.
latex
if the output is to be printed in latex format.
...
extra parameters.
See Also
Other Crosstables: Crosstable
swatson93
Stock’s and Watson’s (1993) Data.
Description
Data set used by Stock and Watson (1993) to estimate co-integration. This dataset contains the
following columns:
• lnm1 Log M1.
• lnp Log NNP price deflator.
• lnnnp Log NNP.
• cprate A numeric vector.
• year Commercial paper rate.
86
Ternaryplot
Usage
data(swatson93)
Format
A data.frame object with 5 variables and 90 observations.
Source
Hayashi, F. (2000). Econometrics. Princeton. New Jersey, USA: Princeton University. http:
//fmwww.bc.edu/ec-p/data/Hayashi/
References
Stock, J. H., and Watson, M. W. (1993) A simple estimator of cointegrating vectors in higher order
integrated systems. Econometrica: Journal of the Econometric Society, 783–820.
Tableplot
Tableplot: a Visualization of Datasets
Description
Visualization of Large Datasets.
Usage
Tableplot(.data = NULL)
Arguments
.data
Ternaryplot
the data frame.
Ternary Plot Build ternary plots.
Description
Ternary Plot Build ternary plots.
Usage
Ternaryplot(data, ...)
theme_blank
87
Arguments
data
Default dataset to use for plot. If not already a data.frame, will be converted to
one by fortify. If not specified, must be suppled in each layer added to the
plot.
...
Other arguments passed on to methods. Not currently used.
theme_blank
Create a Completely Empty Theme
Description
The theme created by this function shows nothing but the plot panel.
Usage
theme_blank(base_size = 12, base_family = "serif", legend = "none")
Arguments
base_size
overall font size. Default is 12.
base_family
a name for default font family.
legend
the legend position.
Value
The theme.
Author(s)
NA
Examples
# plot with small amount of remaining padding
qplot(1:10, (1:10)^2) + theme_blank()
# remaining padding removed
qplot(1:10, (1:10)^2) + theme_blank() + labs(x = NULL, y = NULL)
# Check that it is a complete theme
attr(theme_blank(), "complete")
88
theme_fte
theme_darkside
The Dark Side Theme
Description
The dark side of the things.
Usage
theme_darkside(base_size = 12, base_family = "serif", legend = "none")
Arguments
base_size
overall font size. Default is 12.
base_family
the name for default font family.
legend
the position of the legend if any.
Value
The theme. # plot with small amount of padding qplot(1:10, (1:10)^2, color="green") + theme_darkside()
# Check that it is a complete theme attr(theme_darkside(), "complete")
Author(s)
NA
theme_fte
Themes for ggplot2 Graphs
Description
Theme for plotting with ggplot2.
Usage
theme_fte(legend = "none", legend_title = FALSE, base_size = 12,
horizontal = TRUE, base_family = "", colors = c("#F0F0F0", "#D9D9D9",
"#60636A", "#525252"))
theme_pub
89
Arguments
legend
enables to set legend position, default is "none".
legend_title
will the legend have a title? Defaults is FALSE.
base_size
overall font size. Default is 13.
horizontal
logical. Horizontal axis lines?
base_family
a nmae for default font family.
colors
default colors used in the plot in the following order: background, lines, text,
and title.
Value
The theme.
Author(s)
NA
Examples
qplot(1:10, (1:10)^3) + theme_fte()
mycolors = c("wheat", "#C2AF8D",
qplot(1:10, (1:10)^3) +
theme_fte(colors=mycolors)
"#8F6D2F", "darkred")
# Check that it is a complete theme
attr(theme_fte(), "complete")
theme_pub
The Default Theme
Description
After loading the SciencesPo package, this theme will be set to default for all subsequent graphs
made with ggplot2.
Usage
theme_pub(legend = "bottom", base_size = 12, base_family = "",
horizontal = FALSE, axis_line = FALSE)
90
theme_scipo
Arguments
legend
enables to set legend position, default is "bottom".
base_size
overall font size. Default is 14.
base_family
a name for default font family.
horizontal
logical. Horizontal axis lines?
axis_line
enables to set x and y axes.
Value
The theme.
Author(s)
NA
See Also
theme, theme_538, theme_blank.
Examples
Previewplot() + theme_pub()
# Anscombe data
dat <- data.frame()
for(i in 1:4)
dat <- rbind(dat, data.frame(set=i, x=anscombe[,i], y=anscombe[,i+4]))
gg <- ggplot(dat, aes(x, y))
gg <- gg + geom_point(size=5, color="red", fill="orange", shape=21)
gg <- gg + geom_smooth(method="lm", fill=NA, fullrange=TRUE)
gg <- gg + facet_wrap(~set, ncol=2)
gg <- gg + theme_pub(base_family=SciencesPoFont())
gg <- gg + theme(plot.background=element_rect(fill="#f7f7f7"))
gg <- gg + theme(panel.background=element_rect(fill="#f7f7f7"))
theme_scipo
The SciPo Theme
Description
The scpo theme for using with ggplot2 objects.
theme_scipo
91
Usage
theme_scipo(base_family = "", base_size = 11,
strip_text_family = base_family, strip_text_size = 12,
title_family = "", title_size = 18, title_margin = 10,
margins = margin(10, 10, 10, 10), subtitle_family = "",
subtitle_size = 12, subtitle_margin = 15, caption_family = "",
caption_size = 9, caption_margin = 10,
axis_title_family = subtitle_family, axis_title_size = 9,
axis_title_just = "rt", grid = TRUE, axis = FALSE, ticks = FALSE)
Arguments
base_family
base font family
base_size
base font size
strip_text_family
facet label font family
strip_text_size
facet label text size
title_family
plot tilte family
title_size
plot title font size
title_margin
plot title margin
margins
plot margins
subtitle_family
plot subtitle family
subtitle_size plot subtitle size
subtitle_margin
plot subtitle margin
caption_family plot caption family
caption_size
plot caption size
caption_margin plot caption margin
axis_title_family
axis title font family
axis_title_size
axis title font size
axis_title_just
axis title font justification blmcrt
grid
panel grid (TRUE, FALSE, or a combination of X, x, Y, y)
axis
axis TRUE, FALSE, [xy]
ticks
ticks
Value
The theme.
92
titanic
Examples
# plot with small amount of padding
# qplot(1:10, (1:10)^2, color="green") + theme_scipo()
# Check that it is a complete theme
# attr(theme_scipo(), "complete")
titanic
Titanic
Description
Population at Risk and Death Rates for an Unusual Episode. For each person on board the fatal
maiden voyage of the ocean liner Titanic, this dataset records sex, age [adult/child], economic status
[first/second/third class, or crew] and whether or not that person survived. This dataset contains the
following columns:
•
•
•
•
CLASS Class (0 = crew, 1 = first, 2 = second, 3 = third)
AGE Age (1 = adult, 0 = child)
SEX Sex (1 = male, 0 = female)
SURVIVED Survived (1 = yes, 0 = no)
Usage
data(titanic)
Format
A data.frame object with 4 variables and 2201 observations.
Note
There is not complete agreement among primary sources as to the exact numbers on board, rescued,
or lost. STORY BEHIND THE DATA: The sinking of the Titanic is a famous event, and new
books are still being published about it. Many well-known facts–from the proportions of first-class
passengers to the "women and children first" policy, and the fact that that policy was not entirely
successful in saving the women and children in the third class–are reflected in the survival rates for
various classes of passenger. These data were originally collected by the British Board of Trade in
their investigation of the sinking.
Source
British Board of Trade Inquiry Report (1990). Report on the Loss of the ‘Titanic’ (S.S.)", Gloucester,
UK: Allan Sutton Publishing.
References
Dawson (1995). "The ‘Unusual Episode’ Data Revisited" in the Journal of Statistics Education.
Today
93
Today
Print the current date in a pretty format
Description
Print the current date in a pretty format.
Usage
Today()
Value
string
Examples
Today()
turnout
Turnout Data
Description
Data on voter turnout in the 50 states and D.C. for the 1992 Presidential election and 1990 Congressional elections. Per capita income, populations in poverty and populations with no high school
degree are also given. This dataset contains the following columns:
• v1 state name (alphabetic, 20 characters)
• v2 region of country: 1 Northeast 2 Midwest 3 South 4 West
• v3 division within region: 1 Northeast-New England 2 Northeast-Middle Atlantic 3 MidwestEast North Central 4 Midwest-West North Central 5 South-South Atlantic 6 South-East South
Central 7 South-West South Central 8 West-Mountain 9 West-Pacific.
• v4 "Elazar’s state political culture assignments: 1 = moralistic 2 = individualistic 3 = traditionalistic
• v5 percent of population below poverty level, 1992.
• v6 per capita personal income, 1993.
• v7 percent casting votes for presidential electors, 1992.
• v8 percent casting votes for U.S. Representatives, 1990.
• v9 population without a high school degree, of those 25 years or older, 1990.
• v10 population 25 years or older, 1990.
• v11 South = 1; all others = 0.
94
twins
Usage
data(turnout)
Format
A data.frame object with 11 variables and 50 observations.
Source
U.S. Bureau of the Census, Statistical Abstract of the United States, 1994.
twins
Burt’s Twin Data
Description
The given data are IQ scores from identical twins; one raised in a foster home, and the other raised
by birth parents. This dataset contains the following columns:
• C Social class, C1=high, C2=medium, C3=low, a factor.
• IQb biological.
• IQf foster.
@references Weisberg, S. (2014). Applied Linear Regression, 4th edition. Hoboken NJ: Wiley.
Usage
data(twins)
Format
A data.frame object with 3 variables and 27 observations.
Source
Burt, C. (1966). The genetic estimation of differences in intelligence: A study of monozygotic
twins reared together and apart. Br. J. Psych., 57, 147-153.
TwoWay
TwoWay
95
Cross Tabulation
Description
TwoWay is a modified version of CrossTable (gmodels). TwoWay will print a summary table with
cell counts and column proportions (similar to STATA’s tabulate.
Usage
TwoWay(x, y, digits = 3)
Arguments
x, y
the variables for the cross tabulation.
digits
number of digits for rounding proportions.
unions
Poll attitudes towards British trade unions
Description
The British polling company Ipsos MORI conducted several opinion polls in the UK between 1975
and 1995 in which they asked whether people agree or disagree with the statement "Trade unions
have too much power in Britain today".
Usage
data(unions)
Format
A data.frame object with 7 variables and 17 observations.
• Date Month of the poll Aug-77 to Sep-79.
• AgreePct Percent who agree (unions have too much power).
• DisagreePct Percent who disagree.
• NetSupport DisagreePct-AgreePct.
• Months Months since August 1975.
• Late 1=after 1986 or 0=before 1986.
• Unemployment Unemployment rate.
@source http://www.ipsos-mori.com/researchpublications/researcharchive/poll.aspx?
oItemID=94
96
Untable
Untable
Untable an Aggregated Data Frame
Description
A method for recovering a data.frame out of summarized data, i.e. contingency table or aggregated
data.
Usage
Untable(x, ...)
## S3 method for class 'data.frame'
Untable(x, freq = "Freq", row.names = NULL, ...)
## Default S3 method:
Untable(x, dimnames = NULL, type = NULL,
row.names = NULL, col.names = NULL, ...)
Arguments
x
the table object as a data.frame, table, or, matrix.
freq
the column name of count values.
row.names
row names to add to data.frame if any.
dimnames
set the dimnames of object if required.
type
the type of variable. If NULL, ordered factor is returned.
col.names
column names to add to the data.frame.
...
Extra parameters ignored.
See Also
expand.grid, gl.
Examples
gss <- data.frame(
expand.grid(sex=c("female", "male"),
party=c("dem", "indep", "rep")),
count=c(279,165,73,47,225,191))
print(gss) # aggregated data.frame
# Then expand it:
GSS <- Untable(gss, freq="count")
head(GSS)
Voronoi
97
# Expand from a table or xtable object:
# Fisher's Tea-Tasting Experiment data
tea <- table(poured=c("Yes", "Yes", "Yes", "No","Yes","No", "No", "No"),
guess=c("Yes", "Yes", "Yes", "Yes", "No", "No","No","No"))
Untable(tea)
# Expand with a vector of weights
Untable(c(3,3,3), dimnames=list(c("Brazil","Colombia","Argentina")))
Voronoi
Voronoi tessellation diagram
Description
Computes voronoi tessellation diagrams, and Dirichlet tessellation (after Peter Gustav Lejeune
Dirichlet).
Usage
Voronoi(n = 100, d, dim = 1000, method = NULL, seed = 51, plot = TRUE)
Arguments
n
an integer for a finite set of points.
d
an integer for a finite set of dimensions.
dim
the image dimension.
method
the distance computation method. One of euclidean, manhattan, maximum, canberra
(currently not implemented).
seed
an integer for random seed.
plot
logical. If TRUE, a plot is returned, else, a data.frame is returned.
Details
https://en.wikipedia.org/wiki/Voronoi_diagram
Author(s)
Daniel Marcelino, <[email protected]>
Examples
## Not run:
Voronoi(n=20, d=5, dim=1000)
98
Waffleplot
vote
Voting correlations
Description
A dataset with voting correlations of traits, where each trait is measured for 17 developed countries
(Europe, US, Japan, Australia, New Zealand).
Usage
data(vote)
Format
A 12x12 matrix object with 12 variables and 12 observations.
Source
Torben Iversen and David Soskice (2006). Electoral institutions and the politics of coalitions: Why
some democracies redistribute more than others. American Political Science Review, 100, 165-81.
Table A2.
References
Using Graphs Instead of Tables. http://www.tables2graphs.com/doku.php?id=03_descriptive_
statistics
Waffleplot
Make waffle (square pie) charts
Description
Given a named vector, this function will return a ggplot object that represents a waffle chart of the
values. The individual values will be summed up and each that will be the total number of squares
in the grid.
Usage
Waffleplot(parts, rows = 10, xlab = NULL, title = NULL, colors = NA,
size = 2, legend_pos = "right", flip = FALSE, reverse = FALSE,
equal = TRUE, pad = 0, use_glyph = FALSE, glyph_size = 12)
WeightedCorrelation
99
Arguments
parts
named vector of values to use for the chart
rows
number of rows of blocks
xlab
text for below the chart. Highly suggested this be used to give the "1 sq == xyz"
relationship if it’s not obvious
title
chart title
colors
exactly the number of colors as values in parts. If omitted, Color Brewer "Set2"
colors are used.
size
width of the separator between blocks (defaults to 2)
legend_pos
position of legend
flip
flips x & y axes
reverse
reverses the order of the data
equal
by default, waffle uses coord_equal; this can cause layout problems, so you an
use this to disable it if you are using ggsave or knitr to control output sizes (or
manually sizing the chart)
pad
how many blocks to right-pad the grid with
use_glyph
use specified FontAwesome glyph
glyph_size
size of the FontAwesome font
Note
If the vector is not named or only partially named, capital letters will be used instead. If you
specify a string (vs FALSE) to use_glyph the function will map the input to a FontAwesome glyph
name and use that glyph for the tile instead of a block (making it more like an isotype pictogram
than a waffle chart). You’ll need to actually install FontAwesome and use the extrafont package
(https://github.com/wch/extrafont) to be able to use the FontAwesome glyphs.
Examples
tiles <- c(One=80, Two=30, Three=20, Four=10)
Waffleplot(tiles, rows=8)
Senate <- c(`Male (44%)`=44, `Female (56%)`=56)
Waffleplot(Senate, rows=10, size=0.5, colors=c("#af9139", "#544616"))
WeightedCorrelation
Compute Weighted Correlations
Description
Compute the weighted correlation.
100
WeightedMean
Usage
WeightedCorrelation(x, y = NULL, weights = NULL)
Arguments
x
a matrix or vector to correlate with y.
y
a matrix or vector to correlate with x. If y is NULL, x will be used instead.
weights
an optional vector of weights to be used to determining the weighted mean and
variance for calculation of the correlations.
Examples
x <- sample(10,10)
y <- sample(10,10)
w <- sample(5,10, replace=TRUE)
WeightedCorrelation(x, y, w)
WeightedMean
Computes Weighted Mean
Description
Compute the weighted mean of data.
Usage
WeightedMean(x, weights = NULL, normwt = "ignored", na.rm = TRUE)
Arguments
x
a numeric vector.
weights
a numeric vector of weights of x.
normwt
ignored at the moment.
na.rm
a logical, if TRUE, missing data will be dropped. If na.rm = FALSE, missing
data will return an error.
Examples
x <- sample(10,10)
w <- sample(5,10, replace=TRUE)
WeightedMean(x, w)
WeightedStdz
101
WeightedStdz
Computes Weighted Standardized Values
Description
Computes weighted standardized values of data.
Usage
WeightedStdz(x, weights = NULL)
Arguments
x
weights
a vector for which a set of standardized values is desired.
a vector of weights to be used to determining weighted values of x.
Examples
x <- sample(10,10)
w <- sample(5,10, replace=TRUE)
WeightedStdz(x, w)
WeightedVariance
Weighted Variance
Description
Compute the weighted variance.
Usage
WeightedVariance(x, weights = NULL, na.rm = FALSE)
Arguments
x
weights
na.rm
a numeric vector.
a numeric vector of weights for x.
a logical if NA should be disregarded.
Examples
wt=c(1.23, 2.12, 1.23, 0.32, 1.53, 0.59, 0.94, 0.94, 0.84, 0.73)
x = c(5, 5, 4, 4, 3, 4, 3, 2, 2, 1)
WeightedVariance(x, wt)
102
Winsorize
Winsorize
Winsorized Mean
Description
Compute the winsorized mean, which consists of recoding the top k values of a vector.
Usage
Winsorize(x, k = 1, na.rm = TRUE)
Arguments
x
The vector to be winsorized
k
An integer for the quantity of outlier elements that to be replaced in the calculation process
na.rm
a logical value for na.rm, default is na.rm=TRUE.
Details
Winsorizing a vector will produce different results than trimming it. While by trimming a vector
causes extreme values to be discarded, by winsorizing it in the other hand, causes extreme values to
be replaced by certain percentiles.
Value
An object of the same type as x
References
Dixon, W. J., and Yuen, K. K. (1999) Trimming and winsorization: A review. The American
Statistician, 53(3), 267–269.
Dixon, W. J., and Yuen, K. K. (1960) Simplified Estimation from Censored Normal Samples, The
Annals of Mathematical Statistics, 31, 385–391. @references Wilcox, R. R. (2012) Introduction to
robust estimation and hypothesis testing. Academic Press, 30-32. Statistics Canada (2010) Survey
Methods and Practices.
@note One may want to winsorize estimators, however, winsorization tends to be used for onevariable situations.
@author Daniel Marcelino, <[email protected]>
@examples set.seed(51) # for reproducibility x <- rnorm(50) ## introduce outlier x[1] <- x[1] * 10
# Compare to mean: mean(x) Winsorize(x)
words
103
words
Word frequencies from Mosteller and Wallace
Description
The data give the frequencies of words in works from four different sources: the political writings
of eighteenth century American political figures Alexander Hamilton, James Madison, and John
Jay, and the book Ulysses by twentieth century Irish writer James Joyce. This dataset contains the
following columns:
• Hamilton Hamilton frequency.
• HamiltonRank Hamilton rank.
• Madison Madison frequency.
• MadisonRank Madison rank.
• Jay Jay frequency.
• JayRank Jay rank.
• Ulysses Word frequency in Ulysses.
• UlyssesRank Word rank in Ulysses.
@references Weisberg, S. (2014). Applied Linear Regression, 4th edition. Hoboken NJ: Wiley.
Usage
data(words)
Format
A data.frame object with 8 variables and 165 observations.
Source
Mosteller, F. and Wallace, D. (1964). Inference and Disputed Authorship: The Federalist. Reading,
MA: Addison-Wesley.
104
%nin%
%nin%
Not in (Find Matching or Non-Matching Elements)
Description
" logical vector indicating if there is a match or not for its left operand. A true vector element
indicates no match in left operand, false indicates a match.
Usage
x %nin% table
Arguments
x
A vector of numeric, character, or factor values.
table
A vector (numeric, character, factor), matching the mode of x
Examples
c('a','b','c') %nin% c('a','b')
Index
∗Topic Rescaling
Normalize, 55
∗Topic Sampling
Permutate, 60
∗Topic Tests
Bartels, 8
JamesStein, 45
∗Topic Transformation
Normalize, 55
∗Topic Viz
Lorenz, 50
∗Topic datasets
alpha, 4
auto, 7
baseball, 10
bhodrick93, 10
big5, 11
bollen, 12
bush, 14
cathedrals, 15
cgreene76, 16
chismoke, 17
election2008, 26
galton, 32
geom_dumbbell, 33
geom_lollipop, 35
GeomSpotlight, 32
griliches76, 39
ltaylor96, 51
marriage, 52
mishkin92, 54
nerlove63, 55
paired, 58
Palettes, 59
pollster2008, 65
religiosity, 73
sheston91, 81
stature, 83
swatson93, 85
∗Topic Clean-up
MakeNumeric, 51
MakeShare, 52
∗Topic Concentration,
GiniSimpson, 38
Lorenz, 50
∗Topic Concentration
Gini, 37
∗Topic Distributions
ddirichlet, 22
rdirichlet, 70
∗Topic Diversity,
Gini, 37
GiniSimpson, 38
Lorenz, 50
∗Topic Electoral
dHondt, 24
HighestAverages, 42
LargestRemainders, 48
PoliticalDiversity, 63
∗Topic Exploratory,
PoliticalDiversity, 63
∗Topic Exploratory
CI, 17
Crosstable, 20
Describe, 23
Outlier, 57
WeightedVariance, 101
Winsorize, 102
∗Topic Inequality
GiniSimpson, 38
∗Topic Manipulation
Recode, 72
Stratify, 84
∗Topic Modelling
Bootstrap, 13
Normalize, 55
∗Topic Models
Dummify, 25
105
106
titanic, 92
turnout, 93
twins, 94
unions, 95
vote, 98
words, 103
∗Topic ggplot2
geom_dumbbell, 33
geom_lollipop, 35
GeomSpotlight, 32
party_pal, 59
Plotting, 61
pub_pal, 68
pub_shapes, 69
scale_color_party, 75
scale_color_pub, 76
scale_fill_party, 76
scale_fill_pub, 77
scale_shape_pub, 77
SciencesPoFont, 80
theme_blank, 87
theme_darkside, 88
theme_fte, 88
theme_pub, 89
%nin%, 104
aes, 34, 35
aes_, 34, 35
align_title_left (Plotting), 61
align_title_right (Plotting), 61
alpha, 4
Anonymize, 5
Atkinson, 6, 37, 41, 74
auto, 7
Bartels, 8
baseball, 10
bhodrick93, 10
big5, 11
bollen, 12
Bootstrap, 13, 80
borders, 34, 36
bush, 14
Calibrateplot, 14
cathedrals, 15
cgreene76, 16
chismoke, 17
CI, 17
INDEX
Clear, 18, 80
Condorcet, 19
CountMissing, 19
CrossTable, 95
Crosstable, 20, 30, 31, 85
CrossValidation, 21
ddirichlet, 22
Describe, 23
Destring, 80
destring (SciencesPo-deprecated), 79
Detail (Describe), 23
detail (SciencesPo-deprecated), 79
dHondt, 24, 40
Dirichlet (rdirichlet), 70
discrete_scale, 75–77
Disproportionality (Proportionality), 66
Dummify, 25, 80
dummy (SciencesPo-deprecated), 79
election2008, 26
expand.grid, 96
Flag, 27
Footnote, 27
Forge, 28
Formatted, 29
fortify, 34, 35, 87
Freq (Frequency), 31
freq, 30, 31
Frequency, 20, 30, 31
galton, 32
geom_dumbbell, 33
geom_lollipop, 35
geom_spotlight (GeomSpotlight), 32
GeomDumbbell (geom_dumbbell), 33
GeomLollipop (geom_lollipop), 35
GeomSpotlight, 32
ggplot, 34, 35
Gini, 7, 37, 38, 41, 50, 74, 80
GiniSimpson, 37, 38, 50
gl, 96
griliches76, 39
Hamilton, 25, 40
Herfindahl, 7, 37, 41, 74
HighestAverages, 25, 40, 42, 49, 68
Jackknife, 44, 80
INDEX
JamesStein, 45
Kurtosis, 45
Label, 47
label, 48
Label<-, 47
LargestRemainders, 25, 40, 43, 48, 68
layer, 34, 35
legend_bottom (Plotting), 61
legend_left (Plotting), 61
legend_right (Plotting), 61
legend_top (Plotting), 61
Lorenz, 37, 50, 74
ltaylor96, 51
MakeNumeric, 51
MakeShare, 52
marriage, 52
MeanFromRange, 53
mishkin92, 54
move_legend (Plotting), 61
nerlove63, 55
no_axes (Plotting), 61
no_axes_titles (Plotting), 61
no_gridlines (Plotting), 61
no_legend (Plotting), 61
no_legend_title (Plotting), 61
no_major_gridlines (Plotting), 61
no_major_x_gridlines (Plotting), 61
no_major_y_gridlines (Plotting), 61
no_minor_gridlines (Plotting), 61
no_minor_x_gridlines (Plotting), 61
no_minor_y_gridlines (Plotting), 61
no_ticks (Plotting), 61
no_title (Plotting), 61
no_x_axis (Plotting), 61
no_x_gridlines (Plotting), 61
no_x_line (Plotting), 61
no_x_text (Plotting), 61
no_x_ticks (Plotting), 61
no_x_title (Plotting), 61
no_y_axis (Plotting), 61
no_y_gridlines (Plotting), 61
no_y_line (Plotting), 61
no_y_text (Plotting), 61
no_y_ticks (Plotting), 61
no_y_title (Plotting), 61
107
Normalize, 55, 80
normalize (SciencesPo-deprecated), 79
Outlier, 57, 80
outliers (SciencesPo-deprecated), 79
paired, 58
Palettes, 59
party_pal, 59, 76
Permutate, 60
Plotting, 61
PoliticalDiversity, 25, 40, 41, 43, 49, 63,
68, 80
politicalDiversity
(SciencesPo-deprecated), 79
pollster2008, 65
Previewplot (Plotting), 61
PreviewTheme (Plotting), 61
prop.table, 20
Proportionality, 43, 49, 66
pub_pal, 68, 76, 77
pub_shapes, 69, 77
rdirichlet, 70
read.zTree, 71
Recode, 72, 80
religiosity, 73
Render, 73
Rosenbluth, 7, 37, 41, 74
rotate_axes_text (Plotting), 61
rotate_x_text (Plotting), 61
rotate_y_text (Plotting), 61
RunShinyApp, 75
scale, 56
scale_color_party, 59, 75
scale_color_pub, 76, 77
scale_fill_party, 76
scale_fill_pub, 76, 77
scale_shape_pub, 69, 77
Scatterplot, 78
SciencesPo, 79
SciencesPo-deprecated, 79
SciencesPo-package (SciencesPo), 79
SciencesPoFont, 80
sheston91, 81
Simpson (Herfindahl), 41
Skewness, 82
stature, 83
108
Stratify, 84
summary.Crosstable, 20, 85
svTransform (SciencesPo-deprecated), 79
swatson93, 85
table, 20
Tableplot, 86
Ternaryplot, 86
theme, 90
theme_538, 90
theme_538 (theme_fte), 88
theme_blank, 87, 90
theme_darkside, 88
theme_fte, 88
theme_pub, 89
theme_scipo, 90
titanic, 92
Today, 93
turnout, 93
twins, 94
TwoWay, 95
unions, 95
Untable, 80, 96
untable (SciencesPo-deprecated), 79
Voronoi, 80, 97
voronoi (SciencesPo-deprecated), 79
vote, 98
Waffleplot, 98
WeightedCorrelation, 99
WeightedMean, 100
WeightedStdz, 101
WeightedVariance, 101
Winsorize, 58, 80, 102
winsorize (SciencesPo-deprecated), 79
words, 103
xtabs, 20
INDEX