Package `sdcTable`

Package ‘sdcTable’
April 4, 2011
Title statistical disclosure control for tabular data
Version 0.6.4
Date 2011-04-04
Author Bernhard Meindl
Maintainer Bernhard Meindl <[email protected]>
Description R-Package for statistical disclosure control for tabular data.
Depends Rcpp, Rglpk
LinkingTo Rcpp
Suggests snow, snowfall, lpSolve, lpSolveAPI, Rsymphony
License GPL-2
Repository CRAN
Date/Publication 2011-04-04 10:31:11
R topics documented:
aggregatedDat . . . . . .
calcDimInfos . . . . . .
calcFullTable . . . . . .
cellInfo . . . . . . . . .
changeCellStatus . . . .
checkSuppressionPattern
levelObj . . . . . . . . .
microDat . . . . . . . .
prepareInput . . . . . . .
primarySuppression . . .
protectedData . . . . . .
protectLinkedTables . . .
protectTable . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
1
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
2
2
3
5
6
7
8
9
9
11
12
12
15
2
calcDimInfos
setBounds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
summary.safeTable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
testObj . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
Index
aggregatedDat
21
Aggregated data
Description
Aggregated data set, used for examples only
Usage
data(aggregatedDat)
Format
A data frame consisting of 4 grouping variables, the according Frequencies and the values of a
numerical variable.
calcDimInfos
calcDimInfos
Description
calcDimInfos() calculates all necessary information about a
dimensional variable, eg. this function calculates standardized codes
from an input-file or an input-object as well as the position of the
dimensional variable within the data-set that needs to be protected,
the level-structure or a complete listing of all (sub)-levels.
Usage
calcDimInfos(inputDat, file=NULL, dataframe=NULL, vName)
Arguments
inputDat
an input-data object (microdata or data-frame)
file
path to a dimension (hierarchy)-file.
dataframe
dataframe with 2 columns representing the levels and
the characteristics or a dimension.
vName
variable-name of the dimension that is dealt with in inputDat
calcFullTable
3
Details
This function generates an output object featuring all kinds of neccessary
information about the dimensional variable under consideration. It should
be noted that the hierarchy-file or the input dataframe given as input
needs to be in correct order. Please have a look at the example-files!
Value
manipulated data.
Author(s)
Bernhard Meindl
Examples
n <- 15
dat <- data.frame(
sex=sample(c("Male","Female"), n, replace=TRUE),
age=sample(paste("Age Group",1:12), n, replace=TRUE)
)
### calculate standardized code from example hierarchy-file
# directly from a hierarchy-file
file.Sex <- paste(searchpaths()[grep("sdcTable", searchpaths())], "/etc/exampleSex.hcr", se
dim.Sex <- calcDimInfos(dat, file=file.Sex, vName="sex")
print(dim.Sex)
# from a data-frame
dataAge <- read.table(paste(searchpaths()[grep("sdcTable", searchpaths())], "/etc/exampleAge
dim.Age <- calcDimInfos(dat,dataframe=dataAge, vName="age")
print(dim.Age)
# duplicate levels ("BroadAgeGroup 1" is identical to Total Age"
# thus it is listed as list-element "dups" in dim.Age2
dataAge$V1[2:nrow(dataAge)] <- "@@@"
dataAge <- rbind(dataAge[1,], c("@@", "broad Age Group 1"), dataAge[2:nrow(dataAge),])
dim.Age2 <- calcDimInfos(dat,dataframe=dataAge, vName="age")
print(dim.Age2)
calcFullTable
calcFullTable
Description
calcFullTable() takes input data (both microdata and already aggregated data are possible together
with information about all dimensional variables (defined by ’levelObj’) and returns a list-object
that contains all the information that is neccessary to protect the input data. calcFullTable() calculates the complete hierarchical structure of the input-data, sets default bounds for the required
4
calcFullTable
cell-protection levels (which might be changed with setBounds(). If the parameter ’numVar’ is
specified, the corresponding column is treated as numerical variable (and not as the number of units
contributing to a cell).
Usage
calcFullTable(dataset, levelObj, freqVar=NULL, numVar=NULL, weightVar=NULL, sampWei
Arguments
dataset
a dataframe or matrix. The input data consists of columns defining the dimensions and (optionally) a column with a numeric variable (defined by ’numVar’).
Furthermore (if the input data are already aggregated data) the user may specify
a column showing the number of units contributing to a given cell.
levelObj
a list of objects (for each dimensional-variable). Each list-element has to be
created with function calcDimInfos().
freqVar
if not NULL, ’freqVar’ specifies the column name of the count variable in
’dataset’. If ’NULL’ it is assumed that we are dealing with microdata and therefore each row in dataset is assigned frequency 1.
numVar
if not NULL, ’numVar’ specifies the column name of the numerical variable in
’dataset’.
weightVar
if given, the variable name of a variable containing weights used in the CSP
master-problem.
sampWeight
if given, the variable name of a variable used for weighting purposes.
Value
manipulated data.
Author(s)
Bernhard Meindl
Examples
# micro data
microDat <- get(load(paste(searchpaths()[grep("sdcTable", searchpaths())], "/data/microDat.R
print(head(microDat))
# the level object (information about dimensions)
levelObj <- get(load(paste(searchpaths()[grep("sdcTable", searchpaths())], "/data/levelObj.R
# aggregated data
aggDat <- get(load(paste(searchpaths()[grep("sdcTable", searchpaths())], "/data/aggregatedDa
print(head(aggDat))
out1 <- calcFullTable(microDat, levelObj, freqVar=NULL, numVar="numVal", weightVar=NULL)
out2 <- calcFullTable(aggDat, levelObj, freqVar="Freq", numVar="numVal", weightVar=NULL)
cellInfo
5
# compare
print(identical(out1$fullTabObj, out2$fullTabObj))
print(str(out1))
print(str(out2))
aggDat$sampWeight <- sample(rpois(nrow(aggDat), 10), replace=TRUE)
out3 <- calcFullTable(aggDat, levelObj, freqVar="Freq", numVar="numVal", sampWeight="sampWei
cellInfo
cellInfo
Description
cellInfo() calculates the anonymity state of a given cell.
Usage
cellInfo(outObj, characteristics, varNames, return=FALSE)
Arguments
a data-object derived from primarySuppression (of class ’outObj’)
or protectTable (of class ’safeTable’)
characteristics
for each dimensional variable a given characteristic.
It is important that the characteristics are in the same order as specified
in input object ’varNames’
outObj
varNames
vector of variable names of the dimensional variables which needs
to be of exact same length as the number of dimensional variables and
the length of input object ’characteristics’.
return
if set to TRUE, an object containing the information calculated
is returned, else the information is only printed
Details
if return is set to TRUE, a list-object is returned which containes interesting anonymity-information
about the selected table cell.
if return is set to FALSE, this information is only printed to the screen.
Value
manipulated data.
Author(s)
Bernhard Meindl
6
changeCellStatus
Examples
protectedData <- get(load(paste(searchpaths()[grep("sdcTable", searchpaths())], "/data/prote
# check state of some cells...
a <- cellInfo(protectedData, c("011","B","Kb","w"), c("V1", "V3", "V4", "V2"), return=TRUE)
b <- cellInfo(protectedData, c("011","w","B","Kb"), c("V1", "V2", "V3", "V4"), return=TRUE)
str(a$problematicPrimaryCells)
str(b$problematicPrimaryCells)
# publishable cell
cellInfo(protectedData, c("A","013","Ac","w"), c("V3", "V1", "V4", "V2"), return=FALSE)
# a primary suppressed cell
cellInfo(protectedData, c("Aa","m","A","011"), c("V4", "V2", "V3", "V1"), return=FALSE)
# a secondary suppressed cell
cellInfo(protectedData, c("Ac","m","A","021"), c("V4", "V2", "V3", "V1"), return=FALSE)
changeCellStatus
changeCellStatus
Description
....
Usage
changeCellStatus(outObj, varNames, characteristics, rule, codesOrig=TRUE, suppZero=
Arguments
outObj
an object created by createFullTable() and possible changed by primarySuppression().
varNames
a vector of variable names which need to exist in outObj$levelObj.
characteristics
for the corresponding element in varNames, the characteristic (either original or
standardized code) of the cell that needs to be suppressed.
rule
defines to which status a given cell is changed. Allowed choices are ’u’ (mark
cell as primary suppressed), ’z’ (force publication of cell), ’s’ (allowed candidate
for secondary suppression and ’x’ (set cell as secondary suppressed).
codesOrig
defines if original codes (codesOrig=TRUE) or standardized codes (codesOrig=FALSE)
specified in parameter ’characteristics’.
suppZero
if set to TRUE, a cell is set primary protected even if it has frequency 0. If
FALSE, it will not be set protected in this case.
checkSuppressionPattern
7
Details
for given characteristics of the dimensional variables spanning the input-table, the index of the cell
that needs to be set to primary unsafe is calculated.
The corresponding cell is finally marked as primary unsafe.
Value
manipulated data.
Author(s)
Bernhard Meindl
Examples
datObj <- get(load(paste(searchpaths()[grep("sdcTable", searchpaths())], "/data/testObj.RDat
# force some cells to be published/not suppressed
varNames <- c("V3","V2","V1","V4")
characteristics <- c("B","m", "010","0011")
rule <- "z"
datObj <- changeCellStatus(datObj, varNames, characteristics, rule, codesOrig=TRUE, suppZero
# set cells to be candidates for secondary suppression
varNames <- c("V1","V2")
characteristics <- c("010","m") # for V3 and V4 the absolute top-level is used
rule <- "s"
datObj <- changeCellStatus(datObj, varNames, characteristics, rule, codesOrig=TRUE, suppZero
checkSuppressionPattern
checkSuppressionPattern
Description
checkSuppressionPattern() calculates the anonymity state of a given cell.
Usage
checkSuppressionPattern(outObj, pattern, debug=TRUE, all=FALSE, stop=TRUE)
Arguments
outObj
a data-object of class ’safeTable’ derived from protectTable()
pattern
a vector consisting of TRUE|FALSE only with the same length
as the total number of possible table cells in the same order as in outObj
specifying for each table cell if the cell is part of the suppression scheme (TRUE)
or not (FALSE).
8
levelObj
debug
if debug)==TRUE, for each checked cell the calculated interval is printed.
If debug==FALSE, nothing is printed.
all
if FALSE, only primary suppressed cells are checked. If all==TRUE, then
also secondary suppressed cells are checked if they cannot be recalculated.
stop
if TRUE, the checking procedure stops as soon as at least one cell is not safe.
Value
returns a list with three objects. ’validPattern’ is ’TRUE’ if ’pattern’ is
a valid suppression pattern for the given input data and ’FALSE’ otherwise. ’limits’ is a
list containing to calculated lower and upper bounds. The third object ’indices’ returns the
indices of the cells that have been checked.
Author(s)
Bernhard Meindl
Examples
## Not run:
protectedData <- get(load(paste(searchpaths()[grep("sdcTable", searchpaths())], "/data/prote
suppVec <- protectedData$outObj$status
pattern <- rep(FALSE, length(suppVec))
pattern[!is.na(match(suppVec, c("u","x")))] <- TRUE
c1 <- checkSuppressionPattern(protectedData, pattern, debug=FALSE, all=FALSE)
# remove a cell from the suppression pattern
pattern[which(suppVec=="x")[1]] <- FALSE
c2 <- checkSuppressionPattern(protectedData, pattern, debug=FALSE, all=FALSE, stop=FALSE)
print(c1$validPattern)
print(c2$validPattern)
## End(Not run)
levelObj
levelObj, used for examples only
Description
levelObj data structure, used for examples only
Usage
data(levelObj)
microDat
9
Format
List containing single level objects.
example micro data
microDat
Description
Micro data set consisting of 4 grouping variables and an additional numeric variable.
Usage
data(microDat)
Format
a data frame
prepareInput
prepareInput
Description
prepareInput() creates an output object which can be used as
input object in protectTable().
Usage
prepareInput(dat, filenames=NULL, hierFrames=NULL, freqVar=NULL, numVar=NULL, weigh
Arguments
dat
data.frame or list containing dimensional variables and (optionally) a numeric
variable.
filenames
if given, vector of filenames of dimensional variables. It is
important that the filename (without extension) equals the variable name which
needs
to exist in dat.
hierFrames
if given, a list of data-frames given
freqVar
if given, the variable name of the variable containing frequency counts for combinations in dat.
numVar
if given, the variable name of a numeric variable in dat.
weightVar
if given, the variable name of a variable used as weights for HITAS|OPT protecting procedures.
10
prepareInput
sampWeightVar
if given, the variable name of a variable used for weighting purposes.
suppRule_Freq
optionally require primary suppression using the freq-rule.
suppRule_P
optionally require primary suppression using the p%-rule.
suppRule_NK
optionally require primary suppression using the nk-rule.
Details
This function generates an output object featuring all kinds of neccessary
information about the dimensional variable under consideration. It should
be noted that the hierarchy-file or the input dataframe given as input
needs to be in correct order. Please have a look at the example-files!
Value
manipulated data.
Author(s)
Bernhard Meindl
Examples
N <- 100
# generate micro-data
V1 <- sample(c("011", "012","013","021","022","023","024"), N, replace=TRUE)
V2 <- sample(c("01", "02"), N, replace=TRUE)
V3 <- sample(c("01", "02"), N, replace=TRUE)
microDat <- data.frame(V1=V1,V2=V2,V3=V3, numVal=abs(round(rnorm(N, 500, 200),2)))
# dimensional information (level1-level4
h1 <- c("@@", "@@@","@@@","@@@","@@", "@@@","@@@","@@@","@@@")
l1 <- c("010", "011", "012","013","020", "021","022","023","024")
df1 <- data.frame(h=h1, l=l1) #V1
h2 <- c("@@", "@@")
l2 <- c("m", "w")
df2 <- data.frame(h=h2, l=l2) #V2
h3 <- c("@@", "@@")
l3 <- c("A", "B")
df3 <- data.frame(h=h3, l=l3) #V3
suppRule_Freq <- c(3,0)
outObj <- prepareInput(microDat, filenames=NULL, hierFrames=list(V2=df2,V3=df3,V1=df1), freq
summary(protectTable(outObj, method="HYPERCUBE"))
primarySuppression
11
primarySuppression primarySuppression
Description
primarySuppression() allows to modify an object created by createFullTable() to specfiy the cells
that need to be protected by one of three popular primary suppression rules.
Usage
primarySuppression(outObj, suppRule_Freq=NULL, suppRule_P=NULL, suppRule_NK=NULL)
Arguments
outObj
an object created with createFullTable().
suppRule_Freq
a vector of length 2. the first element specifies the parameter ’n’ of the n-rule
meaning that all cells having less than n elements. if the second element of
suppRule_Freq equals 0, then empty cells will not be considered primary unsafe.
suppRule_P
vector of length 1. the first element specifices parameter ’p’ of the (p)-percent
suppression rule.
suppRule_NK
vector of length 2. the first element specifices parameter ’n’, the second element
specifies parameter ’k’ of the (n,k)-suppression rule.
Value
manipulated data.
Author(s)
Bernhard Meindl
Examples
datObj <- get(load(paste(searchpaths()[grep("sdcTable", searchpaths())], "/data/testObj.RDat
# all cells with less than 3 units should be primary unsafe
suppRule_Freq <- c(3, 0)
datObj1 <- primarySuppression(datObj, suppRule_Freq=suppRule_Freq)
# p-percent rule with p=80
suppRule_P <- 80
datObj1 <- primarySuppression(datObj, suppRule_P=suppRule_P)
12
protectLinkedTables
protectedData
data objects used in examples
Description
a data derived by protectTable() used in examples only.
Usage
data(protectedData)
Format
a data frame
protectLinkedTables
protectLinkedTables
Description
protectLinkedTables() allows to protect linked data-object. ’Linked’ means that e.g tables feature
at least one common cell. Therefore, if the aim is to protect the data under consideration, it is
neccessary to take special care of the common cells since these cells need to have the same status
(suppressed or not suppressed) after the protection procedure.
The common cells are specified using the input object ’commonCells’ which needs to be a list in a
specific format.
The algorithm iteratively protects the data-sets and checks if the stop criterion (all common cells
have the same suppression status) is fulfilled. If so, the procedure stops. If at least one common
cell has different status, the this cell is set to primary suppressed in the other dataset and the protection procedure starts again. Please note, that this iterative algorithm may lead to significant
over-suppression.
Usage
protectLinkedTables(inputObj1, inputObj2, commonCells, method="HITAS", weight=NULL)
Arguments
inputObj1
a data-object created by calcFullTable() and primarySuppression()
inputObj2
a data-object created by calcFullTable() and primarySuppression()
protectLinkedTables
13
commonCells
a list object specifying ’common cells’ between inputObj1 and inputObj2.
Each list element of ’commonCells’ has to be a list element too. For each of
these list-elements there are two possible choices.
The first choice has to be used if a dimension exists in both input-data objects. In
this case a list-object of length 3 has to be specified. The first element specifies
the position of the variable under consideration in the first dataset, the second
list element its position in the second (micro)dataset (see example below). The
third element consists of the keyword ’ALL’ which tells protectLinkedTables
that the variable under consideration is equal in both input datasets.
The second possibility has to be used if different variables have common (sub)totals
or cells. In this case a list-object of length 4 has to be specified. Element 1 and
2 specify the position of the variables in data-set 1 and two (just as described
before). The third list-element is a vector of the characteristics of the variable in
dataset 1 for which identical entries exist in dataset 2. The fourth list-element
is a vector too, specifying the corresponding characteristics for the dimensional
variable from dataset 2 (please have a look at the example provided below!).
method
choice of suppression algorithm. Currently ’HITAS’ and ’HYPERCUBE’ are
valid choices.
weight
currently not used.
Value
manipulated data.
Author(s)
Bernhard Meindl
Examples
# generate some micro-data
# NOTE: we do this in a way that EcoOld and EcoNew have common cells when
# aggregating over the other dimensions.
N <- 100
Region <- sample(c("01","02"), N, replace=TRUE)
Sex <- sample(c("m","f"), N, replace=TRUE)
EcoOld <- sample(c("011","012","021","022"), N, replace=TRUE)
microDat <- data.frame(Region,Sex,EcoOld, EcoNew=NA)
spl <- split(microDat, apply(microDat[,1:2], 1, paste, collapse=""))
for ( i in 1:length(spl) ) {
ind1 <- which(substr(spl[[i]]$EcoOld,1,2)=="01")
ind2 <- setdiff(1:nrow(spl[[i]]), ind1)
if ( length(ind1) > 0 )
spl[[i]]$EcoNew[ind1] <- sample(c("011", "012","013"), length(ind1), replace=TRUE)
if ( length(ind2) > 0 )
spl[[i]]$EcoNew[ind2] <- sample(c("021","022","023"), length(ind2), replace=TRUE)
}
microDat <- do.call("rbind", spl)
14
protectLinkedTables
rownames(microDat) <- 1:N
microDat$numVal <- abs(round(rnorm(N, 500, 200),2))
microDat1 <- microDat[,c(2,3,5)] # Sex, EcoOld and numVal
microDat2 <- microDat[,c(1,2,4,5)] # Region, Sex, EcoNew and newVal
# Region: exists only in microDat2
df1 <- data.frame(h=c("@@","@@"), l=c("R1","R2"))
dim1b <- calcDimInfos(microDat2, file=NULL, dataframe=df1, vName="Region")
# Sex: exists in microDat1 and microDat2
df2 <- data.frame(h=c("@@","@@"), l=c("m","f"))
dim2a <- calcDimInfos(microDat1, file=NULL, dataframe=df2, vName="Sex")
dim2b <- calcDimInfos(microDat2, file=NULL, dataframe=df2, vName="Sex")
# Economic classification: (old version, exists only in microDat1)
df31 <- data.frame(
h=c("@@","@@@","@@@","@@","@@@","@@@"),
l=c("A","Aa","Ab","B","Ba","Bb"))
dim31a <- calcDimInfos(microDat1, file=NULL, dataframe=df31, vName="EcoOld")
#Economic classification: (new version, exists only in microDat2)
df32 <- data.frame(
h=c("@@","@@@","@@@","@@@","@@","@@@","@@@","@@@"),
l=c("C","Ca","Cb","Cc","D","Da","Db","Dc"))
dim32b <- calcDimInfos(microDat2, file=NULL, dataframe=df32, vName="EcoNew")
# the complete levelObjects
levelObj1 <- list(dim2a, dim31a) # Sex, EcoOld
levelObj2 <- list(dim1b, dim2b, dim32b) # Region, Sex, EcoNew
numVar <- "numVal" # the variable name of the numeric variable
suppRule_Freq <- c(5, 0) # a simple rule for primary suppression
inputObj1 <- calcFullTable(microDat1, levelObj1, numVar)
inputObj1 <- primarySuppression(inputObj1, suppRule_Freq=suppRule_Freq)
inputObj2 <- calcFullTable(microDat2, levelObj2, numVar)
inputObj2 <- primarySuppression(inputObj2, suppRule_Freq=suppRule_Freq)
inputObj2 <- changeCellStatus(
inputObj2, c("Region","Sex","EcoNew"),
characteristics=c("TOT","m","D"),
rule="u", codesOrig=TRUE)
# specifiying common cells
commonCells <- list()
# variable "Sex"
commonCells[[1]] <- list()
commonCells[[1]][[1]] <- 1 # first column in microDat1
commonCells[[1]][[2]] <- 2 # second column in microDat2
commonCells[[1]][[3]] <- "ALL" # Sex has equal characteristics on both datasets
protectTable
15
# Economic classification
commonCells[[2]] <- list()
commonCells[[2]][[1]] <- 2 # economic classification (old version) is second column in micro
commonCells[[2]][[2]] <- 3 # economic classification (new version) is third column in microD
commonCells[[2]][[3]] <- c("A","B") # vector of common characteristics: A and B in ecoOld
commonCells[[2]][[4]] <- c("C","D") # correspond to C and D in ecoNew!
out <- protectLinkedTables(inputObj1, inputObj2, commonCells, method="HYPERCUBE")
print(summary(out$outObj1))
print(summary(out$outObj2))
cellInfo(
out$outObj2, c("Region","Sex","EcoNew"),
characteristics=c("TOT","m","D"))
cellInfo(
out$outObj1, c("Sex","EcoOld"),
characteristics=c("m","B"))
protectTable
protectTable
Description
....
Usage
protectTable(outObj, method, ...)
Arguments
outObj
a list-object generated by calcFullTable() and possibly changed by primarySuppression().
method
the protection algorithmus. Currently ’OPT’, ’HITAS’ and ’HYPERCUBE’ are
possible choices
...
additional parameters depending on the choice of the suppression algorithm.
If ’HYPERCUBE’ is used, it is possible to use parameter ’suppMethod’ with
possible choices ’minSupps’ (suppress cubes with minimal number of secondary
suppressions),’minSum’ (minimize the sum of units contributing to secondary
cell suppressions) and ’minSumLogs’ (minimize the log-sum of units contributing to secondary cell suppressions).
Furthermore, ’protectionLevel’ with a default value of 80% can be set. Information on this parameter can be found in Repsilber, D. (1999).
Finally, ’allowZeros’ which can be TRUE or FALSE specifies if empty cells may
be part of a suppression scheme and ’randomResult’ which also can be TRUE or
FALSE specifies if several possible suppression cubes are available, a random
cube should be chosen or if always the first in list should be selected.
16
protectTable
If ’HITAS’ is used, the parameter ’solver’ (which can be ’glpk’, ’lpsolve’, ’symphony’ or ’cplex’ depending on your installed lp-solver and its appropriate Rpackage) can be used to specify your peferred solver for the occuring (mi)lp
programs.
’OPT’ is the choice if you want to protect sensitive cells in an optimal way. The
entire table is anonmyized in one step (unlike in ’HITAS’ where the problem is
split into subtables).
Value
an object of class ’safeTable’
Author(s)
Bernhard Meindl
References
Repsilber, D. (1999). Das Quaderverfahren. In: Forum der Bundesstatistik, Band 31/1999.
de Wolf, P.P (2002). HiTaS: A Heuristic Approach to Cell Suppression in Hierarchical Tables. In:
Domingo-Ferrer, J. (Hrsg.): Inference Control in Statistical Databases. Vol. 2316.
Fischetti, M., Salazar, J.J. (2000). Models and Algorithms for Optimizing Cell Suppression in
Tabular Data with Linear Constraints. In: Journal of the American Statistical Association 95, 916928.
Examples
# generate micro-data
N <- 2500
set.seed(123)
V1 <- sample(c("011","012","013","021","022"), N, replace=TRUE)
V2 <- sample(c("m","w"), N, replace=TRUE)
V3 <- sample(c("01","02"), N, replace=TRUE)
V4 <- sample(c("Aa","Ab", "Ac","Ba","Ca","Da","Db", "Ea","Fa","Fb",
"Ga","Gb","Ha","Ia","Ja","Jb","Ka","Kb"), N, replace=TRUE)
microDat <- data.frame(V1=V1,V2=V2,V3=V3,V4=V4)
microDat$numVal <- abs(round(rnorm(N, 500, 200),2))
sInd <- sample(floor(N/20))
microDat$numVal[sInd] <- abs(round(rnorm(sInd, 100000, 200),2))
# dimension 1
h1 <- c("@@", "@@@","@@@","@@@","@@", "@@@","@@@")
l1 <- c("010", "011", "012","013","020", "021","022")
df1 <- data.frame(h=h1, l=l1)
hier1 <- calcDimInfos(microDat, file=NULL, dataframe=df1, vName="V1")
# Level 2
h2 <- c("@@", "@@")
l2 <- c("m", "w")
df2 <- data.frame(h=h2, l=l2)
setBounds
17
hier2 <- calcDimInfos(microDat, file=NULL, dataframe=df2, vName="V2")
# Level 3
h3 <- c("@@", "@@")
l3 <- c("A", "B")
df3 <- data.frame(h=h3, l=l3)
hier3 <- calcDimInfos(microDat, file=NULL, dataframe=df3, vName="V3")
# Level 4
h4 <- c("@@","@@@","@@@","@@@","@@","@@@","@@","@@@","@@","@@@","@@@",
"@@","@@@","@@","@@@","@@@","@@","@@@","@@@","@@","@@@",
"@@","@@@","@@","@@@","@@@","@@","@@@","@@@")
l4 <- c("A","Aa","Ab","Ac","B","Ba","C","Ca","D","Da","Db",
"E","Ea","F","Fa","Fb","G","Ga","Gb","H","Ha",
"I","Ia","J","Ja","Jb","K","Ka","Kb")
df4 <- data.frame(h=h4, l=l4)
hier4 <- calcDimInfos(microDat, file=NULL, dataframe=df4, vName="V4")
# the complete levelObject
levelObj <- list(hier1, hier2, hier3, hier4)
outObj <- calcFullTable(microDat, levelObj, numVar="numVal")
outObj <- primarySuppression(outObj, suppRule_Freq=c(3,0))
LPL <- rep(1, length(outObj$fullTabObj$strID)) # non negative
UPL <- rep(1, length(outObj$fullTabObj$strID)) # non negative
SPL <- rep(0, length(outObj$fullTabObj$strID)) # non negative
outObj <- setBounds(outObj, type="UPL", UPL)
outObj <- setBounds(outObj, type="LPL", LPL)
outObj <- setBounds(outObj, type="SPL", SPL)
outHITAS <- protectTable(outObj, solver="glpk", method="HITAS")
print(str(outHITAS))
setBounds
setBounds
Description
sets Bounds needed for the protection of tables with HITAS algorithm.
Usage
setBounds(outObj, type, v)
Arguments
outObj
an object derived from calcFullTable().
18
summary.safeTable
type
specifies the type of bounds to be set.
Possible values include:
’lb’ (lower bound known by attacker)
’ub’ (upper bound known by attacker)
’LPL’ (lower protection level)
’UPL’ (upper protection level)
’SPL’ (sliding protection level)
These bounds need to be set for each possible cell.
v
a vector with values for the parameter specified by variable ’type’.
Note
the bounds are only used if HITAS is selected as the protection algorithm.
By default, a sliding protection of 1 is set for each cell in calcFullTable()
which means that no primary suppressed cell may be recalculated exactly
after the protection procedure.
Author(s)
Bernhard Meindl
Examples
datObj <- get(load(paste(searchpaths()[grep("sdcTable", searchpaths())], "/data/testObj.RDat
# attacker knows that cells are non-negative
datObj <- setBounds(datObj, type="lb", rep(0, length(datObj$fullTabObj$strID)))
summary.safeTable
summary.safeTable
Description
summary method for objects of class safeTable.
Usage
## S3 method for class 'safeTable'
summary(object, ...)
Arguments
object
object from class safeTable
...
additional parameters. Not used yet.
Details
object is an object of class safeTable. The summary functions returns several statistics from the
anonymisation process.
testObj
19
Value
Manipulated data.
Author(s)
Bernhard Meindl
Examples
N <- 100
# generate micro-data
V1 <- sample(c("011", "012","013","021","022","023","024"), N, replace=TRUE)
V2 <- sample(c("01", "02"), N, replace=TRUE)
V3 <- sample(c("01", "02"), N, replace=TRUE)
microDat <- data.frame(V1=V1,V2=V2,V3=V3, numVal=abs(round(rnorm(N, 500, 200),2)))
# dimensional information (level1-level4
h1 <- c("@@", "@@@","@@@","@@@","@@", "@@@","@@@","@@@","@@@")
l1 <- c("010", "011", "012","013","020", "021","022","023","024")
df1 <- data.frame(h=h1, l=l1) #V1
h2 <- c("@@", "@@")
l2 <- c("m", "w")
df2 <- data.frame(h=h2, l=l2) #V2
h3 <- c("@@", "@@")
l3 <- c("A", "B")
df3 <- data.frame(h=h3, l=l3) #V3
suppRule_Freq <- c(3,0)
outObj <- prepareInput(microDat, filenames=NULL, hierFrames=list(V2=df2,V3=df3,V1=df1), numV
result <- protectTable(outObj, method="HYPERCUBE")
class(result)
summary(result)
testObj
data objects used in examples
Description
a data object containing objects used or generated by sdcTable. The object is used in the examples
only.
Usage
data(testObj)
20
testObj
Format
a data frame
Index
∗Topic datasets
aggregatedDat, 2
levelObj, 8
microDat, 9
protectedData, 12
testObj, 19
∗Topic methods
calcDimInfos, 2
calcFullTable, 3
cellInfo, 5
changeCellStatus, 6
checkSuppressionPattern, 7
prepareInput, 9
primarySuppression, 11
protectLinkedTables, 12
protectTable, 15
setBounds, 17
∗Topic print
summary.safeTable, 18
aggregatedDat, 2
calcDimInfos, 2
calcFullTable, 3
cellInfo, 5
changeCellStatus, 6
checkSuppressionPattern, 7
levelObj, 8
microDat, 9
prepareInput, 9
primarySuppression, 11
protectedData, 12
protectLinkedTables, 12
protectTable, 15
setBounds, 17
summary.safeTable, 18
testObj, 19
21