Package `CDF.PSIdekick`

Package ‘CDF.PSIdekick’
August 19, 2016
Type Package
Title Evaluate Differentially Private Algorithms for Publishing
Cumulative Distribution Functions
Version 1.2
Date 2016-08-05
Author Daniel Muise [aut,cre],
Kobbi Nissim [aut],
Georgios Kellaris [aut]
Maintainer Daniel Muise <[email protected]>
Description Designed by and for the community of differential privacy algorithm developers. It can be used to empirically evaluate and visualize Cumulative Distribution Functions incorporating noise that satisfies differential privacy, with numerous options made to streamline collection of utility measurements across variations of key parameters, such as epsilon, domain size, sample size, data shape, etc. Developed by researchers at Harvard PSI.
License GPL (>= 2)
Imports Rcpp (>= 0.12.6)
LinkingTo Rcpp
RoxygenNote 5.0.1.9000
NeedsCompilation yes
Repository CRAN
Date/Publication 2016-08-19 19:41:43
R topics documented:
dpCDFtesting-package
Abbrev . . . . . . . .
badCDF . . . . . . . .
CDFtest . . . . . . . .
CDFtestTrack . . . . .
CDFtestTrackx . . . .
DerivDiff . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
1
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. 3
. 3
. 4
. 4
. 8
. 10
. 11
R topics documented:
2
diffat25 . . . . . .
diffat75 . . . . . .
diffatMedian . . . .
diffatQuantile . . .
findMaxError . . .
functionH . . . . .
functionHmono . .
functionS2 . . . . .
functionSUB . . .
getMaxError . . . .
getMean . . . . . .
horzdiffat25 . . . .
horzdiffat75 . . . .
horzdiffatMed . . .
horzdiffatQuantile .
KurtDiffpdf . . . .
L1empiric . . . . .
L2empiric . . . . .
MAE . . . . . . .
MaxErrorAt_CDF .
MaxErrorAt_PDF .
MaxError_CDF . .
MaxError_PDF . .
MeanDiffpdf . . .
Medians . . . . . .
ModeDiffpdf . . .
MovetoRange . . .
MSE . . . . . . . .
MSEanalytic . . .
nodes . . . . . . .
QuantileFromCDF
SDempiric . . . . .
SkewDiffpdf . . . .
Smooth . . . . . .
smoothVector2 . .
StdDiffpdf . . . . .
TreeCDF . . . . .
VarDiffpdf . . . . .
Index
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
11
12
13
13
14
15
16
17
18
19
19
20
21
21
22
23
24
24
25
26
26
27
28
28
29
30
30
31
32
32
33
34
34
35
36
36
37
38
39
dpCDFtesting-package
3
dpCDFtesting-package
Comprehensively evaluate and visualize the output of dpCDFgenerating algorithm implementations. dp = Differential Privacy
Description
This package’s primary contribution is the function CDFtest, which is used to visualize and collect
large empirical diagnostic data on the performance of user-defined dpCDF implementations. It also
includes 4 simple dpCDF implementations.
Details
Use ?CDFtest for best information. Other valuable functions are "functionH", "functionHmono",
"functionS2", and "functionSUB", which generate dpCDFs through different methods.
Author(s)
Daniel Muise, Harvard SEAS Privacy Tools group Kobbi Nissim, Harvard CRCS Privacy Tools
group Georgios Kellaris, Harvard CRCS Privacy tools group
Maintainer: Daniel Muise <[email protected]>
References
See http://privacytools.seas.harvard.edu/
Abbrev
Tranforms long numbers into short strings.
Description
Abbreviates long numeric values into visually shorter strings
Usage
Abbrev(value)
Arguments
value
A single numeric value
Value
A string value such as 1k for 1000
Examples
Abbrev(1700000)
4
CDFtest
Make a straight-line faux CDF.
badCDF
Description
Creates a placeholder CDF (a uniform straight line) for demonatration.
Usage
badCDF(range, gran, ...)
Arguments
range
gran
...
A vector length 2 containing user-specified min and max to truncate the universe
to
The smallest unit of measurement in the data (one [year] for a list of ages)
Optionally add additional parameters. This is primarily used to allow automated
execution of varied diagnostic functions.
Value
A fake CDF for demonstration only.
Examples
badCDF(c(1,50), 1)
CDFtest
Comprehensively evaluate and visualize the utility of CDF-generating
implementations.
Description
The suite is a system for determining the utility of differentially private cumulative distribution
function (DP-CDF) algorithm implementations. The system can empirically evaluate and provide
visualizations for several DP-CDF algorithms simultaneously, under various parameters. It can also
be set to focus strictly on data collection, rather than spending time on visualization.
It comes with several pre-loaded adjustable synthetic datasets, and can also analyze functions on
user-defined datasets.
dpCDF implementations to test must take the following as arguments: data, epsilon, granularity, range,
and any number of other inputs. Use "?functionH" for an example of an implementation drawing
on C++ files through Rcpp.
USERS SHOULD NOTE: the following included diagnostic functions are under development:
SkewDiffpdf,KurtDiffpdf, StdDiffpdf, corresponding to error measurements of skewness,
kurtoses, and standard deviations generated from dpCDFs. This is evident through the occasional
result of NA.
CDFtest
5
Usage
CDFtest(Visualization = TRUE, OutputDirectory = 0, functlist, Fnameslist,
epslist = c(0.05, 0.1, 1), datalist, Dnameslist, synthsets = NULL, range,
gran = 1, granlist = c(1), samplesize = 0, nlist = (10000),
cdfstep = 1, reps = 5, ExtraTests_CDF = list(),
ExtraTests_PDF = list(), setseed = -100, comments = "none",
SmoothAll = FALSE, EmpiricBounds = FALSE, AnalyticBounds = FALSE,
AnalyticProbSleeve = FALSE, SuppressRealCDF = FALSE,
SuppressDPCDF = FALSE, SuppressLegends = FALSE, ...)
Arguments
Visualization
Sets the testing suite into Visualization mode (default, Visualization = TRUE)
or Data Collection mode (Visualization = FALSE) In Visualization mode
(default): A .csv file conatining the mean and median results (across reps iterations) of diagnostic functions on DP-CDF algorithms per each combination
of data, function, and epsilon. A .pdf file containing one graphical example
DP CDF for each combination of dataset, function, and epsilon, as well as a set
of boxplots showing the distribution of all diagnostic results for all combinations of parameters. In Data Collection mode (set Visualization = FALSE): A
.csv file containing the entire (raw) results (across reps iterations) of diagnostic
functions on DP-CDF algorithms per each combination of dataset, and function,
seperately looped over all epsilons, then all granularities, and all samplesizes.
OutputDirectory
functlist
Fnameslist
epslist
datalist
Dnameslist
synthsets
range
The location of the folder which will hold the output (.csv and .pdf files). This
defaults to the tempdir() directory.
A list of CDF-computing functions to be tested on the CDFtestTrack (if visualization = TRUE)
or CDFtestTrackx (if Visualization =FALSE))
A vector of function names corresponding to the functions
A vector of epsilon values for differential privacy
A list containing vectors of data, each to be used in a test
A list of dataset names corresponding to the data/variables being tested; used for
labelling the output
This script generates pre-defined synthetic datasets upon request, and fully incorporates them into testing. To call them, users should input a string vector containing the names of the sets they desire. For example, synthsets = list(list(type,size,shape),lis
There are no limits on the amounts of datasets included. Sets available include:
type: "age" (which ranges from about 0 to 100, gran =1) and "wage" which
ranges from 0 to 500k); size: Any positive integer. Type in exact numerical
representation (eg, for ten thousand use 10000 not 10k and not 10^4); shape:
gaussian, sparse, uniform, bimodal; It is assumed that the data input is rounded
to the granularity
The range of the domain as a vector c(min, max). Defined based on user
intuition. to preserve differential privacy, the domain is constructed using this
range. Setting the min too high will bias output upward. Same in reverse for a
low max. However, setting min too low and max too high could reveal the true
limits of your data, compromising some privacy.
6
CDFtest
gran
FOR Visualization MODE ONLY. refer to granlist for setting granularities
(thus domain sizes) in Data Collection mode. This command is irrelevant in
Data Collection mode. The granularity of the domain between the min and max.
ie, if age is measureds per 1 year of age, gran =1. The same granularity is
applied to all datasets, so using comparable (or scaled) data is necessary.
granlist
FOR Data Collection MODE ONLY. refer to gran for selecting samplesizes in
Data Collection mode. This command is irrelevant in Visualization mode. A list
of granularities of the domain between the min and max. ie, if age is measure
per 1 year of age, gran =1.
samplesize
FOR Visualization MODE ONLY. refer to nlist for selecting samplesizes in
Data Collection mode. This command is irrelevant in Data Collection mode.
when set to zero, the entire dataset is used. Otherwise, the specified sample size
is randomly selected from each dataset without replacement.
nlist
FOR Data Collection MODE ONLY. refer to samplesize for selecting samplesizes in visualization mode. This command is irrelevant in Visualization mode.
Sets the absolute sample sizes to draw from each dataset, with replacement. Any
vector of integer values is appropriate.
cdfstep
The step size used in outputting the approximate CDF;
reps
The number of times to repeat each diagnostic. higher reps lends greater accuracy, but comsumes time and power. Author recommends reps = 10 for quick
examples and reps = 100 for more robust examinations.
ExtraTests_CDF If a user wishes to add extra diagnostics, the proper ExtraTests_CDF = list(functionName1=function
Diagnostic Functions should have inputs such as Y for a public CDF, est for a
DP-representation of that CDF, range and gran, and the output should be just
one value.
ExtraTests_PDF See above
setseed
In the function, each combination of data, epsilon, and function is executed with
a separate seed, which by default is randomly generated and reported. Users interested in replicating specific results can locate the reported seed and parameter
combination to replicate tests.
comments
"Comments written here print to a log in excel"
SmoothAll
Applies L2 monotnocity post-processing to every DP-CDF
EmpiricBounds
FOR Visualization MODE ONLY. When TRUE, outputted graphs depict the
minimum and maximum values taken by each bin across reps
AnalyticBounds FOR Visualization MODE ONLY. This is a flag and should be set to TRUE if the
functions being tested are expected to output analytical variance bounds. The
proper output form for such a function is output = list(DPCDFvector, LowerBoundVector, UpperBou
AnalyticProbSleeve
FOR Visualization MODE ONLY. When TRUE, outputted DP-CDFs will have
a ’fuzzy’ analytic sleeve around them, approximating probabalitity density for
each point given by DP. This also requires the function format specified above
in the description for AnalyticBounds.
SuppressRealCDF
FOR Visualization MODE ONLY. When TRUE, outputted graphs will not include
real (non-private) CDFs.
CDFtest
SuppressDPCDF
7
FOR Visualization MODE ONLY. When TRUE, outputted graphs will not include
DP-CDFs (but if SmoothAll = TRUE, monotonized DP CDFs still appear).
SuppressLegends
FOR Visualization MODE ONLY. When TRUE, outputted graphs will not include
legends
...
Optionally add additional parameters. This is primarily used to allow automated
execution of varied diagnostic functions.
Value
If Visualization = TRUE, a list containing:
...$means Contains mean diagnostic results for each diagnostic across reps iterations for each parameter combination;
..$medians Contains median diagnostic results for each diagnostic across reps iterations for each
parameter combination;\
...$yourCDFoutput Containing a single dpCDF iteration for each parameter combination;\
...$yourPDFoutput Containing a single dpPDF iteration for each parameter combination;\
...$realCDFoutput Containing the real (non-DP) CDF output for each relevant parameter combination;
...$realPDFoutput Containing the real (non-DP) PDF output for each relevant parameter combination;
...$databins Containing the domain used to construct the CDFs;
...$TestPack_CDF Containing the definitions of diagnostic functions used on dpCDFs;
...$TestPack_PDF Containing the definitions of diagnostic functions used on dpPDFs;
...$allscores Containing all raw diagnostic output.
...$seed Containing the list of seeds used in the test
...$permetric holding a rearranged dataframe (ordered by parameter combinations) useful for plotting.
A .pdf file: with boxplots showing the distributions of diagnostic outputs, and categorized plots of
dp-CDF function output. Each such graph with show one arbitrary CDF iterations and empirical
boundaries. the empirical boundaries are the max and min values reached by that function (and
parameters) during the test.
A .csv file: containing the mean and median scores of each diagnostic on each combination of
data, eps, function, and the seedlist for reproduction.
Notes on Visualization mode: Both the .pdf and .csv components are named with a time stamp
index, in the form of YearMonthDayHourMinuteSecond. To locate particular tests, look at the
CDFtestindexchart.csv, which automatically records the parameters and index of each test.
These can be found in the file specified by OutputDirectory, which defaults to the R temp files
tempdir().
Alternatively in Data Collection mode (Visualization = FALSE), a list containing:
...$allscores holding the output of each combination of parameters, which is that each eps in
epslist is varied across the first value specified in granlist and nlist. The same is true for varying
granularity and sample size. In that way, only one variable is varied at a time while the other two
8
CDFtestTrack
are held fixed. All such combinations of parameters are executed on all combinations of data and
function (specified within ...datalist and functlist);
...$seed holding the list of seeds used in the test.
A .csv file conatining the entire (raw) results (across reps iterations) of diagnostic functions on DPCDF algorithms per each combination of dataset, and function, looped over epsilon, granularity, and
sample size values as described directly above.\ This mode was designed for collecting metric data
for subsequent supervised learning modelling.
Examples
CDFtest( Visualization = TRUE,OutputDirectory = 0, functlist = c(functionH),
Fnameslist = c("H"), epslist = c(.1, .01), datalist = list(),
Dnameslist = c(), synthsets= list(list("wage", 100000, "uniform"),
list("wage",100000,"sparse"), list("wage",100000,"bimodal")),
range
= c(1,500000),gran =1000,granlist =c(2500,1250,1000,500),
samplesize = 0,nlist = c(100,1000,10000,100000,1000000),
cdfstep =0, reps = 5, ExtraTests_CDF = list(),ExtraTests_PDF = list(),
setseed = c(-100),
comments = "x",SmoothAll = FALSE,EmpiricBounds = FALSE,
AnalyticBounds = FALSE,AnalyticProbSleeve = FALSE,
SuppressRealCDF = FALSE,SuppressDPCDF = FALSE,SuppressLegends = FALSE)
CDFtestTrack
Test a single CDF implementation with one set of parameters.
Description
Generates mean/median empirical error measurements, complete results, single iterations of DP
CDFs at each combination of parameters, and diagnostic functions used.
Usage
CDFtestTrack(funct, eps, cdfstep = 1, data, range, gran, reps,
SmoothAll = FALSE, ABounds = FALSE, EmpiricBounds = FALSE,
ExtraTests_CDF = list(), ExtraTests_PDF = list(), ...)
Arguments
funct
The differentially-private CDF-generating function to be tested
eps
Epsilon value for Differential privacy control
cdfstep
The step sized used in outputting the approximate CDF; the values output are
[min, min + cdfstep], [min, min + 2 * cdfstep], etc. Setting cdfstep equal to 0
(default) will set cdfstep = granularity
data
A vector of the data (single variable to compute CDFs from)
range
A vector length 2 containing user-specified min and max to truncate the universe
to.
CDFtestTrack
9
gran
The smallest unit of measurement in the data (one [year] for a list of ages). The
Domain (ie gran and range) should be identical to those used to create the CDF!
reps
The number of times the combination of CDFfunction, dataset, and epsilon will
be tested
SmoothAll
Applies L2 monotnocity post-processing to every DP-CDF
ABounds
This is a flag and should be set to "true" if the functions being tested are expected to output analytical variance bounds. The proper output form is output =
list(DPCDFvector, LowerBoundVector, UpperBoundVector)
EmpiricBounds
When TRUE, outputted graphs depict the minimum and maximum values taken
by each bin across reps
ExtraTests_CDF If a user wishes to add extra diagnostics, the proper syntax would be: ExtraTests_CDF = list( functionName1 = function1, functionName2 = function2)
ExtraTests_PDF See above
...
Optionally add additional parameters. This is primarily used to allow automated
execution of varied diagnostic functions.
Value
A list in the form of:
...$meanscores Contains mean diagnostic results for each diagnostic across reps iterations;
...$medianscores Contains median diagnostic results for each diagnostic across reps iterations;
...$yourCDFoutput Containing a single dpCDF iteration;
...$yourPDFoutput Containing a single dpPDF iteration;
...$realCDFoutput Containing the real (non-DP) CDF output;
...$realPDFoutput Containing the real (non-DP) PDF output;
...$databins Containing the domain used to construct the CDFs;
...$TestPack_CDF Containing the definitions of diagnostic functions used on dpCDFs;
...$TestPack_PDF Containing the definitions of diagnostic functions used on dpPDFs;
...$allscores Containing all raw diagnostic output.
Examples
CDFtestTrack(badCDF, eps = .01, cdfstep = 1, data = rexp(10000,.4),
range= c(1,10), gran = .1, reps = 20)
10
CDFtestTrackx
CDFtestTrackx
Test a single CDF implementation with one set of parameters.
Description
Applies diagnostic functions to a single dpCDF, and only releases a complete set of diagnostic
results (called withinCDFtest in Data Collection mode — e.g., when Visualization = FALSE)
Usage
CDFtestTrackx(funct, eps, data, range = range, gran, reps, samplesize,
SmoothAll = FALSE, ExtraTests_CDF = list(), ExtraTests_PDF = list(),
...)
Arguments
funct
The differentially-private CDF-generating function to be tested
eps
Epsilon value for Differential privacy control
data
A vector of the data (single variable to compute CDFs from)
range
A vector length 2 containing user-specified min and max to truncate the universe
to
gran
The smallest unit of measurement in the data (one [year] for a list of ages)
reps
The number of times the combination of CDFfunction, dataset, and epsilon will
be tested
samplesize
The specified sample size is randomly selected from each dataset without replacement.
SmoothAll
Applies L2 monotonicity post-processing to every DP-CDF
ExtraTests_CDF If a user wishes to add extra diagnostics, the proper syntax would be: ExtraTests_CDF = list( functio
ExtraTests_PDF See above
...
Optionally add additional parameters. This is primarily used to allow automated
execution of varied diagnostic functions.
Value
A complete set of diagnostic results in the form of ...$allscores, which holds out a row of output
for each of reps results.
Examples
CDFtestTrackx(badCDF, eps = .01, cdfstep = 0, data = rexp(10000,.4),
range= c(1,10), gran = .1, reps = 20, samplesize = 10000)
DerivDiff
11
Determine how well a single DPCDF matches the shape of its data.
DerivDiff
Description
Calculates a score for how much the DP-CDF’s slope varies from the true CDF’s slope at various
resolutions.
Usage
DerivDiff(Y, est, ...)
Arguments
Y
The vector output of a non-differentially private CDF computation (cumulative
count bins)
est
The vector output of a differentially private CDF computation (cumulative count
bins)
...
Optionally add additional parameters. This is primarily used to allow automated
execution of varied diagnostic functions.
Value
A single so-called derivative score; lower scores suggest better performance
Examples
DerivDiff(c(.1,.2,.3,.4,.5,.6,.7,.8,.9,1),c(.1,.2,.3,.3,.3,.3,.3,.3,.4,1))
diffat25
Determine the distance between CDFs at the .25 quantile.
Description
Find the error (between 0 and 1) introduced by DP-Noise at the .25 quantile.
Usage
diffat25(Y, est, ...)
12
diffat75
Arguments
Y
The vector output of a non-differentially private CDF computation (cumulative
count bins)
est
The vector output of a differentially private CDF computation (cumulative count
bins)
...
Optionally add additional parameters. This is primarily used to allow automated
execution of varied diagnostic functions.
Value
The error at the .25 quantile
Examples
diffat25(c(.1,.2,.3,.4,.5,.6,.7,.8,.9,1), c(.1,.2,.3,.3,.3,.3,.3,.3,.4,1))
Determine the distance between CDFs at the .75 quantile.
diffat75
Description
Find the error (between 0 and 1) introduced by DP-Noise at the .75 quantile.
Usage
diffat75(Y, est, ...)
Arguments
Y
The vector output of a non-differentially private CDF computation (cumulative
count bins)
est
The vector output of a differentially private CDF computation (cumulative count
bins)
...
Optionally add additional parameters. This is primarily used to allow automated
execution of varied diagnostic functions.
Value
The error at the .75 quantile
Examples
diffat75(c(.1,.2,.3,.4,.5,.6,.7,.8,.9,1), c(.1,.2,.3,.3,.3,.3,.3,.3,.4,1))
diffatMedian
diffatMedian
13
Determine the distance between CDFs at the median.
Description
Find the error (between 0 and 1) introduced by DP-Noise at the median
Usage
diffatMedian(Y, est, ...)
Arguments
Y
The vector output of a non-differentially private CDF computation (cumulative
count bins)
est
The vector output of a differentially private CDF computation (cumulative count
bins)
...
Optionally add additional parameters. This is primarily used to allow automated
execution of varied diagnostic functions.
Value
The error at the .5 quantile
Examples
diffatMedian(c(.1,.2,.3,.4,.5,.6,.7,.8,.9,1), c(.1,.2,.3,.3,.3,.3,.3,.3,.4,1))
diffatQuantile
Determine the distance between CDFs at key quantiles.
Description
Find the error (between 0 and 1) introduced by DP-Noise at a given quantile in the CDF
Usage
diffatQuantile(Y, est, quantile = 0.5, ...)
14
findMaxError
Arguments
Y
The vector output of a non-differentially private CDF computation (cumulative
count bins).
est
The vector output of a differentially private CDF computation (cumulative count
bins).
quantile
A quantile value between 0 and 1, defaults to 0.5 for the median.
...
Optionally add additional parameters. This is primarily used to allow automated
execution of varied diagnostic functions.
Value
The error at the quantile specified by quantile
Examples
diffatQuantile(c(.1,.2,.3,.4,.5,.6,.7,.8,.9,1),
c(.1,.2,.3,.3,.3,.3,.3,.3,.4,1), .05)
findMaxError
Locate where the maximum error occurs between two CDFs
Description
Find the location of the maximum direct error between a non-private CDF and a DP approximation
of that CDF.
Usage
findMaxError(Y, est, range, gran, ...)
Arguments
Y
The vector output of a non-differentially private CDF computation (cumulative
count bins)
est
The vector output of a differentially private CDF computation (cumulative count
bins)
range
A vector length 2 containing user-specified min and max to truncate the universe
to
gran
The smallest unit of measurement in the data (one [year] for a list of ages)
...
Optionally add additional parameters. This is primarily used to allow automated
execution of varied diagnostic functions.
Value
A single value, the value at which the largest absolute vertical difference between parallel observations in the private- and true-CDF vectors occurs.
functionH
15
Examples
findMaxError(c(.1,.2,.3,.4,.5,.6,.7,.8,.9,1), c(.1,.2,.3,.3,.3,.3,.3,.3,.4,1), c(1,10),1)
functionH
Create a DP-CDF by creating a K-degree noisy tree
Description
This function creates a storage tree of degree K using gran and range, adds independent noise to
each node proportional to epsilon, and then searches the tree to create a DP-CDF.
Usage
functionH(eps, cdfstep, data, range, gran, K = 2, ...)
Arguments
eps
Epsilon value for Differential privacy control
cdfstep
The step sized used in outputting the approximate CDF; the values output are
[min, min + cdfstep], [min, min + 2 * cdfstep], etc. Setting cdfstep equal to 0
(default) will set cdfstep = granularity
data
A vector of the data (single variable to compute CDFs from)
range
A vector length 2 containing user-specified min and max to truncate the universe
to
gran
The smallest unit of measurement in the data (one [year] for a list of ages)
K
This sets the degree of the underlying tree.
...
Optionally add additional parameters.
Value
A list with 2 vectors: one is the y coordinates of the DP-CDF, the other is the abs values of the
anlytically expected bounds on it at 95 percent probability.
Examples
functionH(eps = .01, cdfstep = .1, data = rexp(10000,.4), range= c(1,10), gran = .1, K= 2)
16
functionHmono
functionHmono
Create a monotonically increasing DP-CDF by creating a K-degree
noisy tree
Description
This function creates a storage tree of degree K using gran and range, adds independent noise to
each node proportional to epsilon, and then searches the tree to create a DP-CDF. It then enforces
monotonicity on the resuling dpCDF.
Usage
functionHmono(eps, cdfstep, data, range, gran, K = 2, ...)
Arguments
eps
Epsilon value for Differential privacy control
cdfstep
The step sized used in outputting the approximate CDF; the values output are
[min, min + cdfstep], [min, min + 2 * cdfstep], etc.
data
A vector of the data (single variable to compute CDFs from)
range
A vector length 2 containing user-specified min and max to truncate the universe
to
gran
The smallest unit of measurement in the data (one [year] for a list of ages)
K
This sets the degree of the underlying tree.
...
Optionally add additional parameters.
Value
A list with 2 vectors: one is the y coordinates of the DP-CDF, the other is the abs values of the
anlytically expected bounds for a similarly-constructed non-monotonized DP-CDF, at 95 percent
probability.
Examples
functionHmono(eps = .01, cdfstep = .1, data = rexp(10000,.4), range= c(1,10), gran = .1, K= 2)
functionS2
functionS2
17
Build dpCDFs through Histogram smoothing and minimized expected
L2 per bin
Description
The function seperates the epsilon value in two. The first epsilon component is used to privately
discover the best way to merge contiguous histogram bins in order to reduce the L2 error due to the
noise addition. It then applies the discovered bin merging to the original histogram, and outputs it
by utilizing epsilon2. Finally, it utilizes this output to compute and release the private CDF.
Usage
functionS2(eps, cdfstep, data, range, gran, K = 16, ...)
Arguments
eps
Epsilon value for Differential privacy control
cdfstep
The step sized used in outputting the approximate CDF; the values output are
[min, min + cdfstep], [min, min + 2 * cdfstep], etc.
data
A vector of the data (single variable to compute CDFs from)
range
A vector length 2 containing user-specified min and max to truncate the universe
to
gran
The smallest unit of measurement in the data (one [year] for a list of ages)
K
This sets the degree of the underlying tree
...
Optionally add additional parameters
Value
A list with 2 vectors: one is the y coordinates of the DP-CDF, the other is the abs values of the
anlytically expected bounds for a similarly-constructed non-monotonized DP-CDF made without
merging of bins, at 95 percent probability.
Examples
functionS2(eps = .01, cdfstep = .1, data = rexp(10000,.4), range= c(1,10), gran = .1, K= 2)
18
functionSUB
functionSUB
Build dpCDFs through use of a noisy tree with bin merging.
Description
The function first creates a k-ary aggregate tree on the histogram bins. It then utilizes epsilon1 in
order to privately discover the best way to prune sub-trees in order to reduce the L2 error due to the
noise addition. It then prunes the sub-trees of the original tree, and outputs it by utilizing epsilon2.
Finally, it utilizes this output to compute and release the private CDF.
Usage
functionSUB(eps, cdfstep, data, range, gran, K = 2, ...)
Arguments
eps
Epsilon value for Differential privacy control
cdfstep
The step sized used in outputting the approximate CDF; the values output are
[min, min + cdfstep], [min, min + 2 * cdfstep], etc.
data
A vector of the data (single variable to compute CDFs from)
range
A vector length 2 containing user-specified min and max to truncate the universe
to
gran
The smallest unit of measurement in the data (one [year] for a list of ages)
K
This sets the degree of the underlying tree.
...
Optionally add additional parameters.
Value
A list with 2 vectors: one is the y coordinates of the DP-CDF, the other is the abs values of the
anlytically expected bounds for a similarly-constructed DP-CDF, at 95 percent probability made
without merging.
Examples
functionSUB(eps = .01, cdfstep = .1, data = rexp(10000,.4), range= c(1,10), gran = .1, K= 2)
getMaxError
getMaxError
19
Determine an approximate CDF’s maximum error.
Description
Find the maximum direct error between a non-private CDF and a DP approximation of that CDF.
Usage
getMaxError(Y, est, ...)
Arguments
Y
The vector output of a non-differentially private CDF computation (cumulative
count bins)
est
The vector output of a differentially private CDF computation (cumulative count
bins)
...
Optionally add additional parameters. This is primarily used to allow automated
execution of varied diagnostic functions.
Value
A single value, the largest absolute vertical difference between parallel observations in the privateand true-CDF vectors.
Examples
getMaxError(c(.1,.2,.3,.4,.5,.6,.7,.8,.9,1), c(.1,.2,.3,.3,.3,.3,.3,.3,.4,1))
getMean
Calculate the private mean from the DP-CDF
Description
Calculates the mean value from a CDF plot.
Usage
getMean(est, range, gran, ...)
20
horzdiffat25
Arguments
est
The vector output of a differentially private CDF computation (cumulative count
bins)
range
A vector length 2 containing user-specified min and max Note that the gran and
range must be the same as used to make the DP-CDF!
gran
The smallest unit of measurement in the data (one [year] for a list of ages)
...
Optionally add additional parameters. This is primarily used to allow automated
execution of varied diagnostic functions.
Examples
getMean(c(.1,.2,.3,.3,.3,.3,.3,.3,.4,1), c(1,10), 1)
horzdiffat25
Determine the distance between the .25 quantile values returned by
two CDFs.
Description
Find the distance between the .25 quantile value and that returned by the dpCDF.
Usage
horzdiffat25(Y, est, range, gran, ...)
Arguments
Y
The vector output of a non-differentially private CDF computation (cumulative
count bins)
est
The vector output of a differentially private CDF computation (cumulative count
bins)
range
A vector length 2 containing user-specified min and max to truncate the universe
to
gran
The smallest unit of measurement in the data (one [year] for a list of ages)
...
Optionally add additional parameters. This is primarily used to allow automated
execution of varied diagnostic functions.
Value
The horizontal error at the .25 quantile
Examples
horzdiffat25(c(.1,.2,.3,.4,.5,.6,.7,.8,.9,1),
c(.1,.2,.3,.3,.3,.3,.3,.3,.4,1),c(1,10), 1)
horzdiffat75
horzdiffat75
21
Determine the distance between the .75 quantile values returned by
two CDFs.
Description
Find the distance between the .75 quantile value and that returned by the DP CDF.
Usage
horzdiffat75(Y, est, range, gran, ...)
Arguments
Y
The vector output of a non-differentially private CDF computation (cumulative
count bins)
est
The vector output of a differentially private CDF computation (cumulative count
bins)
range
A vector length 2 containing user-specified min and max to truncate the universe
to
gran
The smallest unit of measurement in the data (one [year] for a list of ages)
...
Optionally add additional parameters. This is primarily used to allow automated
execution of varied diagnostic functions.
Value
The horizontal error at the .75 quantile
Examples
horzdiffat75(c(.1,.2,.3,.4,.5,.6,.7,.8,.9,1),
c(.1,.2,.3,.3,.3,.3,.3,.3,.4,1),c(1,10), 1)
horzdiffatMed
Determine the distance between the median values returned by two
CDFs.
Description
Find the distance between the median value and that returned by the DP CDF.
Usage
horzdiffatMed(Y, est, range, gran, ...)
22
horzdiffatQuantile
Arguments
Y
The vector output of a non-differentially private CDF computation (cumulative
count bins)
est
The vector output of a differentially private CDF computation (cumulative count
bins)
range
A vector length 2 containing user-specified min and max to truncate the universe
to
gran
The smallest unit of measurement in the data (one [year] for a list of ages)
...
Optionally add additional parameters. This is primarily used to allow automated
execution of varied diagnostic functions.
Value
The horizontal error at the median
Examples
horzdiffatMed(c(.1,.2,.3,.4,.5,.6,.7,.8,.9,1),
c(.1,.2,.3,.3,.3,.3,.3,.3,.4,1),c(1,10), 1)
horzdiffatQuantile
Determine the distance between the quantile values returned by two
CDFs.
Description
Find the distance between the quantile value and that returned by the dpCDF at a given quantile.
Usage
horzdiffatQuantile(Y, est, range, gran, quantile, ...)
Arguments
Y
The vector output of a non-differentially private CDF computation (cumulative
count bins)
est
The vector output of a differentially private CDF computation (cumulative count
bins)
range
A vector length 2 containing user-specified min and max to truncate the universe
to
gran
The smallest unit of measurement in the data (one [year] for a list of ages)
quantile
A quantile value between 0 and 1, defaults to 0.5 for the median
...
Optionally add additional parameters. This is primarily used to allow automated
execution of varied diagnostic functions.
KurtDiffpdf
23
Value
The horizontal error at the quantile specified by quantile
Examples
diffatQuantile(c(.1,.2,.3,.4,.5,.6,.7,.8,.9,1),
c(.1,.2,.3,.3,.3,.3,.3,.3,.4,1),c(1,10), 1, .05)
Error in Kurtosis from CDF (under development)
KurtDiffpdf
Description
Calculate difference between the private Kurtosis and the original Kurtosis (from CDFs)
Usage
KurtDiffpdf(Y, est, gran, range)
Arguments
Y
The vector output of a non-differentially private CDF computation (cumulative
count bins)
est
The vector output of a differentially private CDF computation (cumulative count
bins)
gran
The smallest unit of measurement in the data (one [year] for a list of ages)
range
A vector length 2 containing user-specified min and max Note that the gran and
range must be the same as used to make the DP-CDF!
...
Optionally add additional parameters. This is primarily used to allow automated
execution of varied diagnostic functions.
Value
A single difference value
24
L2empiric
Calculate the area between two CDFs.
L1empiric
Description
Calculates the L1 (distance error) area between the non-private CDF and the dpCDF
Usage
L1empiric(Y, est, ...)
Arguments
Y
est
...
The vector output of a non-differentially private CDF computation (cumulative
count bins)
The vector output of a differentially private CDF computation (cumulative count
bins)
Optionally add additional parameters. This is primarily used to allow automated
execution of varied diagnostic functions.
Value
The empirical L1 norm
Examples
L1empiric(c(.1,.2,.3,.4,.5,.6,.7,.8,.9,1), c(.1,.2,.3,.3,.3,.3,.3,.3,.4,1))
L2empiric
Calculate the empirical L2norm between two CDFs.
Description
Calculates the L2 (squared error) area between the non-private CDF and the dpCDF
Usage
L2empiric(Y, est, ...)
Arguments
Y
est
...
The vector output of a non-differentially private CDF computation (cumulative
count bins)
The vector output of a differentially private CDF computation (cumulative count
bins)
Optionally add additional parameters. This is primarily used to allow automated
execution of varied diagnostic functions.
MAE
25
Value
The empirical L2 norm
Examples
L2empiric(c(.1,.2,.3,.4,.5,.6,.7,.8,.9,1), c(.1,.2,.3,.3,.3,.3,.3,.3,.4,1))
Calculate the MAE of a dpCDF relative to that of the non-private CDF.
MAE
Description
Calculates the Mean Absolute Error area between the non-private CDF and the dpCDF
Usage
MAE(Y, est, ...)
Arguments
Y
The vector output of a non-differentially private CDF computation (cumulative
count bins)
est
The vector output of a differentially private CDF computation (cumulative count
bins)
...
Optionally add additional parameters. This is primarily used to allow automated
execution of varied diagnostic functions.
Value
The MAE
Examples
MAE(c(.1,.2,.3,.4,.5,.6,.7,.8,.9,1), c(.1,.2,.3,.3,.3,.3,.3,.3,.4,1))
26
MaxErrorAt_PDF
MaxErrorAt_CDF
Locate where the maximum error occurs between two CDFs
Description
Find the location of the maximum direct error between a non-private CDF and a DP approximation
of that CDF.
Usage
MaxErrorAt_CDF(Y, est, range, gran, ...)
Arguments
Y
The vector output of a non-differentially private CDF computation (cumulative
count bins)
est
The vector output of a differentially private CDF computation (cumulative count
bins)
range
A vector length 2 containing user-specified min and max to truncate the universe
to
gran
The smallest unit of measurement in the data (one [year] for a list of ages)
...
Optionally add additional parameters. This is primarily used to allow automated
execution of varied diagnostic functions.
Value
A single value, the value at which the largest absolute vertical difference between parallel observations in the private- and true-CDF vectors occurs.
Examples
MaxErrorAt_CDF(c(.1,.2,.3,.4,.5,.6,.7,.8,.9,1), c(.1,.2,.3,.3,.3,.3,.3,.3,.4,1),
range= c(1,10), gran =1)
MaxErrorAt_PDF
Locate where the maximum error occurs between two PDFs
Description
Find the location of the maximum direct error between a non-private PDF and a DP approximation
of that PDF.
Usage
MaxErrorAt_PDF(Y, est, range, gran, ...)
MaxError_CDF
27
Arguments
Y
The vector output of a non-differentially private PDF computation (values within
bins)
est
The vector output of a differentially private PDF computation (values within
bins)
range
A vector length 2 containing user-specified min and max to truncate the universe
to
gran
The smallest unit of measurement in the data (one [year] for a list of ages)
...
Optionally add additional parameters. This is primarily used to allow automated
execution of varied diagnostic functions.
Value
A single value, the value at which the largest absolute vertical difference between parallel observations in the private- and true-PDF vectors occurs.
Examples
MaxErrorAt_PDF(c(.1,.2,.3,.4,.5,.6,.7,.8,.9,1), c(.1,.2,.3,.3,.3,.3,.3,.3,.4,1),
range= c(1,10), gran =1)
MaxError_CDF
Determine an approximate CDF’s maximum error.
Description
Find the maximum direct error between a non-private CDF and a DP approximation of that CDF.
Usage
MaxError_CDF(Y, est, ...)
Arguments
Y
The vector output of a non-differentially private CDF computation (cumulative
count bins)
est
The vector output of a differentially private CDF computation (cumulative count
bins)
...
Optionally add additional parameters. This is primarily used to allow automated
execution of varied diagnostic functions.
Value
A single value, the largest absolute vertical difference between parallel observations in the privateand true-CDF vectors.
28
MeanDiffpdf
Examples
MaxError_CDF(c(.1,.2,.3,.4,.5,.6,.7,.8,.9,1), c(.1,.2,.3,.3,.3,.3,.3,.3,.4,1))
MaxError_PDF
Determine an approximate PDF’s maximum error.
Description
Find the maximum direct error between a non-private PDF and a DP approximation of that PDF.
Usage
MaxError_PDF(Y, est, ...)
Arguments
Y
The vector output of a non-differentially private PDF computation (heights of
bins)
est
The vector output of a differentially private PDF computation (heights of bins)
...
Optionally add additional parameters. This is primarily used to allow automated
execution of varied diagnostic functions.
Value
A single value, the largest absolute vertical difference between parallel observations in the privateand true-PDF vectors.
Examples
MaxError_PDF(c(.1,.2,.3,.4,.5,.6,.7,.8,.9,1), c(.1,.2,.3,.3,.3,.3,.3,.3,.4,1))
MeanDiffpdf
Error in mean from CDF
Description
Calculate difference between the private mean and the original mean (from CDFs)
Usage
MeanDiffpdf(Y, est, range, gran)
Medians
29
Arguments
Y
est
range
gran
...
The vector output of a non-differentially private CDF computation (cumulative
count bins)
The vector output of a differentially private CDF computation (cumulative count
bins)
A vector length 2 containing user-specified min and max Note that the gran and
range must be the same as used to make the DP-CDF!
The smallest unit of measurement in the data (one [year] for a list of ages)
Optionally add additional parameters. This is primarily used to allow automated
execution of varied diagnostic functions.
Value
A single difference value
Medians
Retrieve a median estimate from the dpCDF
Description
Determines a median value from a CDF vector.
Usage
Medians(est, range, gran, ...)
Arguments
est
range
gran
...
The vector output of a differentially private CDF computation (cumulative count
bins)
A vector length 2 containing user-specified min and max to truncate the universe
to
The smallest unit of measurement in the data (one [year] for a list of ages), the
Domain (ie gran and range) should be identical to those used to create the CDF!
Optionally add additional parameters. This is primarily used to allow automated
execution of varied diagnostic functions.
Value
A vector of medians obtained from a (differentially private) CDF vector, not using any extra privacy
budget, there may be more than one due to random noise causing the DPCDF doubling back over
the .5 probablity latitude
Examples
Medians(c(.1,.2,.3,.3,.3,.3,.3,.3,.4,1),c(1,10), 1)
30
MovetoRange
Error in Mode from CDF
ModeDiffpdf
Description
Calculate difference between the private Mode and the original Mode (from CDFs)
Usage
ModeDiffpdf(Y, est, range, gran, ...)
Arguments
Y
The vector output of a non-differentially private CDF computation (cumulative
count bins)
est
The vector output of a differentially private CDF computation (cumulative count
bins)
range
A vector length 2 containing user-specified min and max Note that the gran and
range must be the same as used to make the DP-CDF!
gran
The smallest unit of measurement in the data (one [year] for a list of ages)
...
Optionally add additional parameters. This is primarily used to allow automated
execution of varied diagnostic functions.
Value
A single difference value
MovetoRange
Clamp a value to a specified range.
Description
Returns a vector of elements clamped to the specified minimum and maximum
Usage
MovetoRange(val, range)
Arguments
val
A value to clamp.
range
A vector of length 2 in the form c(min, max)
MSE
31
Value
A single value that is either unchanged or clamped upward to minimum or clamped downward to
the maximum
Examples
MovetoRange(11, c(1,10))
Calculate the MSE of a DP-CDF relative to the non-private CDF.
MSE
Description
Calculates the Mean Squared Error area between the non-private CDF and the DP-CDF
Usage
MSE(Y, est, ...)
Arguments
Y
The vector output of a non-differentially private CDF computation (cumulative
count bins)
est
The vector output of a differentially private CDF computation (cumulative count
bins)
...
Optionally add additional parameters. This is primarily used to allow automated
execution of varied diagnostic functions.
Value
The MSE
Examples
MSE(c(.1,.2,.3,.4,.5,.6,.7,.8,.9,1), c(.1,.2,.3,.3,.3,.3,.3,.3,.4,1))
32
nodes
Determine the expected MSE of a simple DPCDF from its parameters.
MSEanalytic
Description
Generates the analytically expected Mean Squared Error of a dpCDF. introduced by random noise,
SUPPOSING that the DP-CDF is through the use of a noisy binary tree.
Usage
MSEanalytic(eps, range, gran, data, ...)
Arguments
eps
Epsilon value for differential privacy control
range
A vector length 2 containing user-specified min and max to truncate the universe
to
gran
The smallest unit of measurement in the data (one [year] for a list of ages)
data
The vector of data from which the DP CDF was/is computed
...
Optionally add additional parameters. This is primarily used to allow automated
execution of varied diagnostic functions.
Value
The MSE guaranteed by the given parameter combination assuming it’s built from the min and max
inward from a DP-Histogram, with 95
Examples
MSEanalytic(.01, c(1,10),1, rexp(10000,.4))
nodes
Node parser.
Description
Runs through tree nodes (assists MSE analytic)
Usage
nodes(height, k, l)
QuantileFromCDF
33
Arguments
height
The height of the tree
k
The tree degree
l
The leaf length
Value
A nodesum containing information for MSEanalytic
Examples
nodes(10,4,2)
QuantileFromCDF
Retrieve a private quantile estimate from the dpCDF
Description
Determines a quantile value from a CDF vector.
Usage
QuantileFromCDF(est, range, gran, quantile, ...)
Arguments
est
The vector output of a differentially private CDF computation (cumulative count
bins)
range
A vector length 2 containing user-specified min and max to truncate the universe
to
gran
The smallest unit of measurement in the data (one [year] for a list of ages), the
Domain (ie gran and range) should be identical to those used to create the CDF!
quantile
the quantile score in question (for testing the median, use quantile = 0.5)
...
Optionally add additional parameters. This is primarily used to allow automated
execution of varied diagnostic functions.
Value
A quantile value obtained from a (differentially private) CDF vector, not using any extra privacy
budget
Examples
QuantileFromCDF(c(.1,.2,.3,.3,.3,.3,.3,.3,.4,1),c(1,10), 1, .05)
34
SkewDiffpdf
Calculate the std. dev. on a DPCDF.
SDempiric
Description
Calculates the standard deviation across bins between the non-private CDF and the DP-CDF
Usage
SDempiric(Y, est, ...)
Arguments
Y
The vector output of a non-differentially private CDF computation (cumulative
count bins)
est
The vector output of a differentially private CDF computation (cumulative count
bins)
...
Optionally add additional parameters. This is primarily used to allow automated
execution of varied diagnostic functions.
Value
The standard deviation
Examples
SDempiric(c(.1,.2,.3,.4,.5,.6,.7,.8,.9,1), c(.1,.2,.3,.3,.3,.3,.3,.3,.4,1))
SkewDiffpdf
Error in Skewness from CDF (under development)
Description
Calculate difference between the private Skewness and the original Skewness (from CDFs)
Usage
SkewDiffpdf(Y, est, range, gran)
Smooth
35
Arguments
Y
The vector output of a non-differentially private CDF computation (cumulative
count bins)
est
The vector output of a differentially private CDF computation (cumulative count
bins)
range
A vector length 2 containing user-specified min and max Note that the gran and
range must be the same as used to make the DP-CDF!
gran
The smallest unit of measurement in the data (one [year] for a list of ages)
...
Optionally add additional parameters. This is primarily used to allow automated
execution of varied diagnostic functions.
Value
A single difference value
Monotonicity enforcement
Smooth
Description
When CDFs get out of line, we call the enforcer
Usage
Smooth(x)
Arguments
x
A numeric vector to be enforced
Value
A monotonized vector
36
StdDiffpdf
Enforce monotnocity on a vector.
smoothVector2
Description
Forces DP-CDFs into the nearest monotonic vector (by euclidean distance minimization).
Usage
smoothVector2(cdf)
Arguments
cdf
The vector output of a differentially private CDF computation (cumulative count
bins)
...
Optionally add additional parameters. This is primarily used to allow automated
execution of varied diagnostic functions.
Value
A single monotonically increasing vector which is the post-processed DP-CDF’s Y coordinates
Examples
smoothVector2(c(.1,.2,.3,.2,.3,.3,.3,.3,1))
StdDiffpdf
Error in Standard Deviation from CDF
Description
Calculate difference between the private Standard Deviation and the original Standard Deviation
(from CDFs)
Usage
StdDiffpdf(Y, est, range, gran)
TreeCDF
37
Arguments
Y
The vector output of a non-differentially private CDF computation (cumulative
count bins)
est
The vector output of a differentially private CDF computation (cumulative count
bins)
range
A vector length 2 containing user-specified min and max Note that the gran and
range must be the same as used to make the DP-CDF!
gran
The smallest unit of measurement in the data (one [year] for a list of ages)
...
Optionally add additional parameters. This is primarily used to allow automated
execution of varied diagnostic functions.
Value
A single difference value
TreeCDF
Creates a Tree then a CDF
Description
This thing sure does make a fine CDF
Usage
TreeCDF(eps, ds, Ks, methods, mins, maxs, grans, datas)
Arguments
eps
An epsilon value for Differential Privacy
ds
The data or something
Ks
the degree of the tree
methods
Either H or S2 or SUB
mins
the minimum of the domain’s range
maxs
the maximum of the domain’s range
grans
The granularity
datas
The data to be CDFd
Value
A dpCDF
38
VarDiffpdf
Error in Variance from CDF
VarDiffpdf
Description
Calculate difference between the private Variance and the original Variance (from CDFs)
Usage
VarDiffpdf(Y, est, range, gran)
Arguments
Y
The vector output of a non-differentially private CDF computation (cumulative
count bins)
est
The vector output of a differentially private CDF computation (cumulative count
bins)
range
A vector length 2 containing user-specified min and max Note that the gran and
range must be the same as used to make the DP-CDF!
gran
The smallest unit of measurement in the data (one [year] for a list of ages)
...
Optionally add additional parameters. This is primarily used to allow automated
execution of varied diagnostic functions.
Value
A single difference value
Index
∗Topic Differential Privacy
dpCDFtesting-package, 3
MaxErrorAt_CDF, 26
MaxErrorAt_PDF, 26
MeanDiffpdf, 28
Medians, 29
ModeDiffpdf, 30
MovetoRange, 30
MSE, 31
MSEanalytic, 32
Abbrev, 3
badCDF, 4
CDFtest, 4
CDFtestTrack, 8
CDFtestTrackx, 10
nodes, 32
DerivDiff, 11
diffat25, 11
diffat75, 12
diffatMedian, 13
diffatQuantile, 13
dpCDFtesting (dpCDFtesting-package), 3
dpCDFtesting-package, 3
QuantileFromCDF, 33
findMaxError, 14
functionH, 15
functionHmono, 16
functionS2, 17
functionSUB, 18
TreeCDF, 37
SDempiric, 34
SkewDiffpdf, 34
Smooth, 35
smoothVector2, 36
StdDiffpdf, 36
VarDiffpdf, 38
getMaxError, 19
getMean, 19
horzdiffat25, 20
horzdiffat75, 21
horzdiffatMed, 21
horzdiffatQuantile, 22
KurtDiffpdf, 23
L1empiric, 24
L2empiric, 24
MAE, 25
MaxError_CDF, 27
MaxError_PDF, 28
39