Package ‘sdcTable’ April 4, 2011 Title statistical disclosure control for tabular data Version 0.6.4 Date 2011-04-04 Author Bernhard Meindl Maintainer Bernhard Meindl <[email protected]> Description R-Package for statistical disclosure control for tabular data. Depends Rcpp, Rglpk LinkingTo Rcpp Suggests snow, snowfall, lpSolve, lpSolveAPI, Rsymphony License GPL-2 Repository CRAN Date/Publication 2011-04-04 10:31:11 R topics documented: aggregatedDat . . . . . . calcDimInfos . . . . . . calcFullTable . . . . . . cellInfo . . . . . . . . . changeCellStatus . . . . checkSuppressionPattern levelObj . . . . . . . . . microDat . . . . . . . . prepareInput . . . . . . . primarySuppression . . . protectedData . . . . . . protectLinkedTables . . . protectTable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 2 3 5 6 7 8 9 9 11 12 12 15 2 calcDimInfos setBounds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 summary.safeTable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 testObj . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 Index aggregatedDat 21 Aggregated data Description Aggregated data set, used for examples only Usage data(aggregatedDat) Format A data frame consisting of 4 grouping variables, the according Frequencies and the values of a numerical variable. calcDimInfos calcDimInfos Description calcDimInfos() calculates all necessary information about a dimensional variable, eg. this function calculates standardized codes from an input-file or an input-object as well as the position of the dimensional variable within the data-set that needs to be protected, the level-structure or a complete listing of all (sub)-levels. Usage calcDimInfos(inputDat, file=NULL, dataframe=NULL, vName) Arguments inputDat an input-data object (microdata or data-frame) file path to a dimension (hierarchy)-file. dataframe dataframe with 2 columns representing the levels and the characteristics or a dimension. vName variable-name of the dimension that is dealt with in inputDat calcFullTable 3 Details This function generates an output object featuring all kinds of neccessary information about the dimensional variable under consideration. It should be noted that the hierarchy-file or the input dataframe given as input needs to be in correct order. Please have a look at the example-files! Value manipulated data. Author(s) Bernhard Meindl Examples n <- 15 dat <- data.frame( sex=sample(c("Male","Female"), n, replace=TRUE), age=sample(paste("Age Group",1:12), n, replace=TRUE) ) ### calculate standardized code from example hierarchy-file # directly from a hierarchy-file file.Sex <- paste(searchpaths()[grep("sdcTable", searchpaths())], "/etc/exampleSex.hcr", se dim.Sex <- calcDimInfos(dat, file=file.Sex, vName="sex") print(dim.Sex) # from a data-frame dataAge <- read.table(paste(searchpaths()[grep("sdcTable", searchpaths())], "/etc/exampleAge dim.Age <- calcDimInfos(dat,dataframe=dataAge, vName="age") print(dim.Age) # duplicate levels ("BroadAgeGroup 1" is identical to Total Age" # thus it is listed as list-element "dups" in dim.Age2 dataAge$V1[2:nrow(dataAge)] <- "@@@" dataAge <- rbind(dataAge[1,], c("@@", "broad Age Group 1"), dataAge[2:nrow(dataAge),]) dim.Age2 <- calcDimInfos(dat,dataframe=dataAge, vName="age") print(dim.Age2) calcFullTable calcFullTable Description calcFullTable() takes input data (both microdata and already aggregated data are possible together with information about all dimensional variables (defined by ’levelObj’) and returns a list-object that contains all the information that is neccessary to protect the input data. calcFullTable() calculates the complete hierarchical structure of the input-data, sets default bounds for the required 4 calcFullTable cell-protection levels (which might be changed with setBounds(). If the parameter ’numVar’ is specified, the corresponding column is treated as numerical variable (and not as the number of units contributing to a cell). Usage calcFullTable(dataset, levelObj, freqVar=NULL, numVar=NULL, weightVar=NULL, sampWei Arguments dataset a dataframe or matrix. The input data consists of columns defining the dimensions and (optionally) a column with a numeric variable (defined by ’numVar’). Furthermore (if the input data are already aggregated data) the user may specify a column showing the number of units contributing to a given cell. levelObj a list of objects (for each dimensional-variable). Each list-element has to be created with function calcDimInfos(). freqVar if not NULL, ’freqVar’ specifies the column name of the count variable in ’dataset’. If ’NULL’ it is assumed that we are dealing with microdata and therefore each row in dataset is assigned frequency 1. numVar if not NULL, ’numVar’ specifies the column name of the numerical variable in ’dataset’. weightVar if given, the variable name of a variable containing weights used in the CSP master-problem. sampWeight if given, the variable name of a variable used for weighting purposes. Value manipulated data. Author(s) Bernhard Meindl Examples # micro data microDat <- get(load(paste(searchpaths()[grep("sdcTable", searchpaths())], "/data/microDat.R print(head(microDat)) # the level object (information about dimensions) levelObj <- get(load(paste(searchpaths()[grep("sdcTable", searchpaths())], "/data/levelObj.R # aggregated data aggDat <- get(load(paste(searchpaths()[grep("sdcTable", searchpaths())], "/data/aggregatedDa print(head(aggDat)) out1 <- calcFullTable(microDat, levelObj, freqVar=NULL, numVar="numVal", weightVar=NULL) out2 <- calcFullTable(aggDat, levelObj, freqVar="Freq", numVar="numVal", weightVar=NULL) cellInfo 5 # compare print(identical(out1$fullTabObj, out2$fullTabObj)) print(str(out1)) print(str(out2)) aggDat$sampWeight <- sample(rpois(nrow(aggDat), 10), replace=TRUE) out3 <- calcFullTable(aggDat, levelObj, freqVar="Freq", numVar="numVal", sampWeight="sampWei cellInfo cellInfo Description cellInfo() calculates the anonymity state of a given cell. Usage cellInfo(outObj, characteristics, varNames, return=FALSE) Arguments a data-object derived from primarySuppression (of class ’outObj’) or protectTable (of class ’safeTable’) characteristics for each dimensional variable a given characteristic. It is important that the characteristics are in the same order as specified in input object ’varNames’ outObj varNames vector of variable names of the dimensional variables which needs to be of exact same length as the number of dimensional variables and the length of input object ’characteristics’. return if set to TRUE, an object containing the information calculated is returned, else the information is only printed Details if return is set to TRUE, a list-object is returned which containes interesting anonymity-information about the selected table cell. if return is set to FALSE, this information is only printed to the screen. Value manipulated data. Author(s) Bernhard Meindl 6 changeCellStatus Examples protectedData <- get(load(paste(searchpaths()[grep("sdcTable", searchpaths())], "/data/prote # check state of some cells... a <- cellInfo(protectedData, c("011","B","Kb","w"), c("V1", "V3", "V4", "V2"), return=TRUE) b <- cellInfo(protectedData, c("011","w","B","Kb"), c("V1", "V2", "V3", "V4"), return=TRUE) str(a$problematicPrimaryCells) str(b$problematicPrimaryCells) # publishable cell cellInfo(protectedData, c("A","013","Ac","w"), c("V3", "V1", "V4", "V2"), return=FALSE) # a primary suppressed cell cellInfo(protectedData, c("Aa","m","A","011"), c("V4", "V2", "V3", "V1"), return=FALSE) # a secondary suppressed cell cellInfo(protectedData, c("Ac","m","A","021"), c("V4", "V2", "V3", "V1"), return=FALSE) changeCellStatus changeCellStatus Description .... Usage changeCellStatus(outObj, varNames, characteristics, rule, codesOrig=TRUE, suppZero= Arguments outObj an object created by createFullTable() and possible changed by primarySuppression(). varNames a vector of variable names which need to exist in outObj$levelObj. characteristics for the corresponding element in varNames, the characteristic (either original or standardized code) of the cell that needs to be suppressed. rule defines to which status a given cell is changed. Allowed choices are ’u’ (mark cell as primary suppressed), ’z’ (force publication of cell), ’s’ (allowed candidate for secondary suppression and ’x’ (set cell as secondary suppressed). codesOrig defines if original codes (codesOrig=TRUE) or standardized codes (codesOrig=FALSE) specified in parameter ’characteristics’. suppZero if set to TRUE, a cell is set primary protected even if it has frequency 0. If FALSE, it will not be set protected in this case. checkSuppressionPattern 7 Details for given characteristics of the dimensional variables spanning the input-table, the index of the cell that needs to be set to primary unsafe is calculated. The corresponding cell is finally marked as primary unsafe. Value manipulated data. Author(s) Bernhard Meindl Examples datObj <- get(load(paste(searchpaths()[grep("sdcTable", searchpaths())], "/data/testObj.RDat # force some cells to be published/not suppressed varNames <- c("V3","V2","V1","V4") characteristics <- c("B","m", "010","0011") rule <- "z" datObj <- changeCellStatus(datObj, varNames, characteristics, rule, codesOrig=TRUE, suppZero # set cells to be candidates for secondary suppression varNames <- c("V1","V2") characteristics <- c("010","m") # for V3 and V4 the absolute top-level is used rule <- "s" datObj <- changeCellStatus(datObj, varNames, characteristics, rule, codesOrig=TRUE, suppZero checkSuppressionPattern checkSuppressionPattern Description checkSuppressionPattern() calculates the anonymity state of a given cell. Usage checkSuppressionPattern(outObj, pattern, debug=TRUE, all=FALSE, stop=TRUE) Arguments outObj a data-object of class ’safeTable’ derived from protectTable() pattern a vector consisting of TRUE|FALSE only with the same length as the total number of possible table cells in the same order as in outObj specifying for each table cell if the cell is part of the suppression scheme (TRUE) or not (FALSE). 8 levelObj debug if debug)==TRUE, for each checked cell the calculated interval is printed. If debug==FALSE, nothing is printed. all if FALSE, only primary suppressed cells are checked. If all==TRUE, then also secondary suppressed cells are checked if they cannot be recalculated. stop if TRUE, the checking procedure stops as soon as at least one cell is not safe. Value returns a list with three objects. ’validPattern’ is ’TRUE’ if ’pattern’ is a valid suppression pattern for the given input data and ’FALSE’ otherwise. ’limits’ is a list containing to calculated lower and upper bounds. The third object ’indices’ returns the indices of the cells that have been checked. Author(s) Bernhard Meindl Examples ## Not run: protectedData <- get(load(paste(searchpaths()[grep("sdcTable", searchpaths())], "/data/prote suppVec <- protectedData$outObj$status pattern <- rep(FALSE, length(suppVec)) pattern[!is.na(match(suppVec, c("u","x")))] <- TRUE c1 <- checkSuppressionPattern(protectedData, pattern, debug=FALSE, all=FALSE) # remove a cell from the suppression pattern pattern[which(suppVec=="x")[1]] <- FALSE c2 <- checkSuppressionPattern(protectedData, pattern, debug=FALSE, all=FALSE, stop=FALSE) print(c1$validPattern) print(c2$validPattern) ## End(Not run) levelObj levelObj, used for examples only Description levelObj data structure, used for examples only Usage data(levelObj) microDat 9 Format List containing single level objects. example micro data microDat Description Micro data set consisting of 4 grouping variables and an additional numeric variable. Usage data(microDat) Format a data frame prepareInput prepareInput Description prepareInput() creates an output object which can be used as input object in protectTable(). Usage prepareInput(dat, filenames=NULL, hierFrames=NULL, freqVar=NULL, numVar=NULL, weigh Arguments dat data.frame or list containing dimensional variables and (optionally) a numeric variable. filenames if given, vector of filenames of dimensional variables. It is important that the filename (without extension) equals the variable name which needs to exist in dat. hierFrames if given, a list of data-frames given freqVar if given, the variable name of the variable containing frequency counts for combinations in dat. numVar if given, the variable name of a numeric variable in dat. weightVar if given, the variable name of a variable used as weights for HITAS|OPT protecting procedures. 10 prepareInput sampWeightVar if given, the variable name of a variable used for weighting purposes. suppRule_Freq optionally require primary suppression using the freq-rule. suppRule_P optionally require primary suppression using the p%-rule. suppRule_NK optionally require primary suppression using the nk-rule. Details This function generates an output object featuring all kinds of neccessary information about the dimensional variable under consideration. It should be noted that the hierarchy-file or the input dataframe given as input needs to be in correct order. Please have a look at the example-files! Value manipulated data. Author(s) Bernhard Meindl Examples N <- 100 # generate micro-data V1 <- sample(c("011", "012","013","021","022","023","024"), N, replace=TRUE) V2 <- sample(c("01", "02"), N, replace=TRUE) V3 <- sample(c("01", "02"), N, replace=TRUE) microDat <- data.frame(V1=V1,V2=V2,V3=V3, numVal=abs(round(rnorm(N, 500, 200),2))) # dimensional information (level1-level4 h1 <- c("@@", "@@@","@@@","@@@","@@", "@@@","@@@","@@@","@@@") l1 <- c("010", "011", "012","013","020", "021","022","023","024") df1 <- data.frame(h=h1, l=l1) #V1 h2 <- c("@@", "@@") l2 <- c("m", "w") df2 <- data.frame(h=h2, l=l2) #V2 h3 <- c("@@", "@@") l3 <- c("A", "B") df3 <- data.frame(h=h3, l=l3) #V3 suppRule_Freq <- c(3,0) outObj <- prepareInput(microDat, filenames=NULL, hierFrames=list(V2=df2,V3=df3,V1=df1), freq summary(protectTable(outObj, method="HYPERCUBE")) primarySuppression 11 primarySuppression primarySuppression Description primarySuppression() allows to modify an object created by createFullTable() to specfiy the cells that need to be protected by one of three popular primary suppression rules. Usage primarySuppression(outObj, suppRule_Freq=NULL, suppRule_P=NULL, suppRule_NK=NULL) Arguments outObj an object created with createFullTable(). suppRule_Freq a vector of length 2. the first element specifies the parameter ’n’ of the n-rule meaning that all cells having less than n elements. if the second element of suppRule_Freq equals 0, then empty cells will not be considered primary unsafe. suppRule_P vector of length 1. the first element specifices parameter ’p’ of the (p)-percent suppression rule. suppRule_NK vector of length 2. the first element specifices parameter ’n’, the second element specifies parameter ’k’ of the (n,k)-suppression rule. Value manipulated data. Author(s) Bernhard Meindl Examples datObj <- get(load(paste(searchpaths()[grep("sdcTable", searchpaths())], "/data/testObj.RDat # all cells with less than 3 units should be primary unsafe suppRule_Freq <- c(3, 0) datObj1 <- primarySuppression(datObj, suppRule_Freq=suppRule_Freq) # p-percent rule with p=80 suppRule_P <- 80 datObj1 <- primarySuppression(datObj, suppRule_P=suppRule_P) 12 protectLinkedTables protectedData data objects used in examples Description a data derived by protectTable() used in examples only. Usage data(protectedData) Format a data frame protectLinkedTables protectLinkedTables Description protectLinkedTables() allows to protect linked data-object. ’Linked’ means that e.g tables feature at least one common cell. Therefore, if the aim is to protect the data under consideration, it is neccessary to take special care of the common cells since these cells need to have the same status (suppressed or not suppressed) after the protection procedure. The common cells are specified using the input object ’commonCells’ which needs to be a list in a specific format. The algorithm iteratively protects the data-sets and checks if the stop criterion (all common cells have the same suppression status) is fulfilled. If so, the procedure stops. If at least one common cell has different status, the this cell is set to primary suppressed in the other dataset and the protection procedure starts again. Please note, that this iterative algorithm may lead to significant over-suppression. Usage protectLinkedTables(inputObj1, inputObj2, commonCells, method="HITAS", weight=NULL) Arguments inputObj1 a data-object created by calcFullTable() and primarySuppression() inputObj2 a data-object created by calcFullTable() and primarySuppression() protectLinkedTables 13 commonCells a list object specifying ’common cells’ between inputObj1 and inputObj2. Each list element of ’commonCells’ has to be a list element too. For each of these list-elements there are two possible choices. The first choice has to be used if a dimension exists in both input-data objects. In this case a list-object of length 3 has to be specified. The first element specifies the position of the variable under consideration in the first dataset, the second list element its position in the second (micro)dataset (see example below). The third element consists of the keyword ’ALL’ which tells protectLinkedTables that the variable under consideration is equal in both input datasets. The second possibility has to be used if different variables have common (sub)totals or cells. In this case a list-object of length 4 has to be specified. Element 1 and 2 specify the position of the variables in data-set 1 and two (just as described before). The third list-element is a vector of the characteristics of the variable in dataset 1 for which identical entries exist in dataset 2. The fourth list-element is a vector too, specifying the corresponding characteristics for the dimensional variable from dataset 2 (please have a look at the example provided below!). method choice of suppression algorithm. Currently ’HITAS’ and ’HYPERCUBE’ are valid choices. weight currently not used. Value manipulated data. Author(s) Bernhard Meindl Examples # generate some micro-data # NOTE: we do this in a way that EcoOld and EcoNew have common cells when # aggregating over the other dimensions. N <- 100 Region <- sample(c("01","02"), N, replace=TRUE) Sex <- sample(c("m","f"), N, replace=TRUE) EcoOld <- sample(c("011","012","021","022"), N, replace=TRUE) microDat <- data.frame(Region,Sex,EcoOld, EcoNew=NA) spl <- split(microDat, apply(microDat[,1:2], 1, paste, collapse="")) for ( i in 1:length(spl) ) { ind1 <- which(substr(spl[[i]]$EcoOld,1,2)=="01") ind2 <- setdiff(1:nrow(spl[[i]]), ind1) if ( length(ind1) > 0 ) spl[[i]]$EcoNew[ind1] <- sample(c("011", "012","013"), length(ind1), replace=TRUE) if ( length(ind2) > 0 ) spl[[i]]$EcoNew[ind2] <- sample(c("021","022","023"), length(ind2), replace=TRUE) } microDat <- do.call("rbind", spl) 14 protectLinkedTables rownames(microDat) <- 1:N microDat$numVal <- abs(round(rnorm(N, 500, 200),2)) microDat1 <- microDat[,c(2,3,5)] # Sex, EcoOld and numVal microDat2 <- microDat[,c(1,2,4,5)] # Region, Sex, EcoNew and newVal # Region: exists only in microDat2 df1 <- data.frame(h=c("@@","@@"), l=c("R1","R2")) dim1b <- calcDimInfos(microDat2, file=NULL, dataframe=df1, vName="Region") # Sex: exists in microDat1 and microDat2 df2 <- data.frame(h=c("@@","@@"), l=c("m","f")) dim2a <- calcDimInfos(microDat1, file=NULL, dataframe=df2, vName="Sex") dim2b <- calcDimInfos(microDat2, file=NULL, dataframe=df2, vName="Sex") # Economic classification: (old version, exists only in microDat1) df31 <- data.frame( h=c("@@","@@@","@@@","@@","@@@","@@@"), l=c("A","Aa","Ab","B","Ba","Bb")) dim31a <- calcDimInfos(microDat1, file=NULL, dataframe=df31, vName="EcoOld") #Economic classification: (new version, exists only in microDat2) df32 <- data.frame( h=c("@@","@@@","@@@","@@@","@@","@@@","@@@","@@@"), l=c("C","Ca","Cb","Cc","D","Da","Db","Dc")) dim32b <- calcDimInfos(microDat2, file=NULL, dataframe=df32, vName="EcoNew") # the complete levelObjects levelObj1 <- list(dim2a, dim31a) # Sex, EcoOld levelObj2 <- list(dim1b, dim2b, dim32b) # Region, Sex, EcoNew numVar <- "numVal" # the variable name of the numeric variable suppRule_Freq <- c(5, 0) # a simple rule for primary suppression inputObj1 <- calcFullTable(microDat1, levelObj1, numVar) inputObj1 <- primarySuppression(inputObj1, suppRule_Freq=suppRule_Freq) inputObj2 <- calcFullTable(microDat2, levelObj2, numVar) inputObj2 <- primarySuppression(inputObj2, suppRule_Freq=suppRule_Freq) inputObj2 <- changeCellStatus( inputObj2, c("Region","Sex","EcoNew"), characteristics=c("TOT","m","D"), rule="u", codesOrig=TRUE) # specifiying common cells commonCells <- list() # variable "Sex" commonCells[[1]] <- list() commonCells[[1]][[1]] <- 1 # first column in microDat1 commonCells[[1]][[2]] <- 2 # second column in microDat2 commonCells[[1]][[3]] <- "ALL" # Sex has equal characteristics on both datasets protectTable 15 # Economic classification commonCells[[2]] <- list() commonCells[[2]][[1]] <- 2 # economic classification (old version) is second column in micro commonCells[[2]][[2]] <- 3 # economic classification (new version) is third column in microD commonCells[[2]][[3]] <- c("A","B") # vector of common characteristics: A and B in ecoOld commonCells[[2]][[4]] <- c("C","D") # correspond to C and D in ecoNew! out <- protectLinkedTables(inputObj1, inputObj2, commonCells, method="HYPERCUBE") print(summary(out$outObj1)) print(summary(out$outObj2)) cellInfo( out$outObj2, c("Region","Sex","EcoNew"), characteristics=c("TOT","m","D")) cellInfo( out$outObj1, c("Sex","EcoOld"), characteristics=c("m","B")) protectTable protectTable Description .... Usage protectTable(outObj, method, ...) Arguments outObj a list-object generated by calcFullTable() and possibly changed by primarySuppression(). method the protection algorithmus. Currently ’OPT’, ’HITAS’ and ’HYPERCUBE’ are possible choices ... additional parameters depending on the choice of the suppression algorithm. If ’HYPERCUBE’ is used, it is possible to use parameter ’suppMethod’ with possible choices ’minSupps’ (suppress cubes with minimal number of secondary suppressions),’minSum’ (minimize the sum of units contributing to secondary cell suppressions) and ’minSumLogs’ (minimize the log-sum of units contributing to secondary cell suppressions). Furthermore, ’protectionLevel’ with a default value of 80% can be set. Information on this parameter can be found in Repsilber, D. (1999). Finally, ’allowZeros’ which can be TRUE or FALSE specifies if empty cells may be part of a suppression scheme and ’randomResult’ which also can be TRUE or FALSE specifies if several possible suppression cubes are available, a random cube should be chosen or if always the first in list should be selected. 16 protectTable If ’HITAS’ is used, the parameter ’solver’ (which can be ’glpk’, ’lpsolve’, ’symphony’ or ’cplex’ depending on your installed lp-solver and its appropriate Rpackage) can be used to specify your peferred solver for the occuring (mi)lp programs. ’OPT’ is the choice if you want to protect sensitive cells in an optimal way. The entire table is anonmyized in one step (unlike in ’HITAS’ where the problem is split into subtables). Value an object of class ’safeTable’ Author(s) Bernhard Meindl References Repsilber, D. (1999). Das Quaderverfahren. In: Forum der Bundesstatistik, Band 31/1999. de Wolf, P.P (2002). HiTaS: A Heuristic Approach to Cell Suppression in Hierarchical Tables. In: Domingo-Ferrer, J. (Hrsg.): Inference Control in Statistical Databases. Vol. 2316. Fischetti, M., Salazar, J.J. (2000). Models and Algorithms for Optimizing Cell Suppression in Tabular Data with Linear Constraints. In: Journal of the American Statistical Association 95, 916928. Examples # generate micro-data N <- 2500 set.seed(123) V1 <- sample(c("011","012","013","021","022"), N, replace=TRUE) V2 <- sample(c("m","w"), N, replace=TRUE) V3 <- sample(c("01","02"), N, replace=TRUE) V4 <- sample(c("Aa","Ab", "Ac","Ba","Ca","Da","Db", "Ea","Fa","Fb", "Ga","Gb","Ha","Ia","Ja","Jb","Ka","Kb"), N, replace=TRUE) microDat <- data.frame(V1=V1,V2=V2,V3=V3,V4=V4) microDat$numVal <- abs(round(rnorm(N, 500, 200),2)) sInd <- sample(floor(N/20)) microDat$numVal[sInd] <- abs(round(rnorm(sInd, 100000, 200),2)) # dimension 1 h1 <- c("@@", "@@@","@@@","@@@","@@", "@@@","@@@") l1 <- c("010", "011", "012","013","020", "021","022") df1 <- data.frame(h=h1, l=l1) hier1 <- calcDimInfos(microDat, file=NULL, dataframe=df1, vName="V1") # Level 2 h2 <- c("@@", "@@") l2 <- c("m", "w") df2 <- data.frame(h=h2, l=l2) setBounds 17 hier2 <- calcDimInfos(microDat, file=NULL, dataframe=df2, vName="V2") # Level 3 h3 <- c("@@", "@@") l3 <- c("A", "B") df3 <- data.frame(h=h3, l=l3) hier3 <- calcDimInfos(microDat, file=NULL, dataframe=df3, vName="V3") # Level 4 h4 <- c("@@","@@@","@@@","@@@","@@","@@@","@@","@@@","@@","@@@","@@@", "@@","@@@","@@","@@@","@@@","@@","@@@","@@@","@@","@@@", "@@","@@@","@@","@@@","@@@","@@","@@@","@@@") l4 <- c("A","Aa","Ab","Ac","B","Ba","C","Ca","D","Da","Db", "E","Ea","F","Fa","Fb","G","Ga","Gb","H","Ha", "I","Ia","J","Ja","Jb","K","Ka","Kb") df4 <- data.frame(h=h4, l=l4) hier4 <- calcDimInfos(microDat, file=NULL, dataframe=df4, vName="V4") # the complete levelObject levelObj <- list(hier1, hier2, hier3, hier4) outObj <- calcFullTable(microDat, levelObj, numVar="numVal") outObj <- primarySuppression(outObj, suppRule_Freq=c(3,0)) LPL <- rep(1, length(outObj$fullTabObj$strID)) # non negative UPL <- rep(1, length(outObj$fullTabObj$strID)) # non negative SPL <- rep(0, length(outObj$fullTabObj$strID)) # non negative outObj <- setBounds(outObj, type="UPL", UPL) outObj <- setBounds(outObj, type="LPL", LPL) outObj <- setBounds(outObj, type="SPL", SPL) outHITAS <- protectTable(outObj, solver="glpk", method="HITAS") print(str(outHITAS)) setBounds setBounds Description sets Bounds needed for the protection of tables with HITAS algorithm. Usage setBounds(outObj, type, v) Arguments outObj an object derived from calcFullTable(). 18 summary.safeTable type specifies the type of bounds to be set. Possible values include: ’lb’ (lower bound known by attacker) ’ub’ (upper bound known by attacker) ’LPL’ (lower protection level) ’UPL’ (upper protection level) ’SPL’ (sliding protection level) These bounds need to be set for each possible cell. v a vector with values for the parameter specified by variable ’type’. Note the bounds are only used if HITAS is selected as the protection algorithm. By default, a sliding protection of 1 is set for each cell in calcFullTable() which means that no primary suppressed cell may be recalculated exactly after the protection procedure. Author(s) Bernhard Meindl Examples datObj <- get(load(paste(searchpaths()[grep("sdcTable", searchpaths())], "/data/testObj.RDat # attacker knows that cells are non-negative datObj <- setBounds(datObj, type="lb", rep(0, length(datObj$fullTabObj$strID))) summary.safeTable summary.safeTable Description summary method for objects of class safeTable. Usage ## S3 method for class 'safeTable' summary(object, ...) Arguments object object from class safeTable ... additional parameters. Not used yet. Details object is an object of class safeTable. The summary functions returns several statistics from the anonymisation process. testObj 19 Value Manipulated data. Author(s) Bernhard Meindl Examples N <- 100 # generate micro-data V1 <- sample(c("011", "012","013","021","022","023","024"), N, replace=TRUE) V2 <- sample(c("01", "02"), N, replace=TRUE) V3 <- sample(c("01", "02"), N, replace=TRUE) microDat <- data.frame(V1=V1,V2=V2,V3=V3, numVal=abs(round(rnorm(N, 500, 200),2))) # dimensional information (level1-level4 h1 <- c("@@", "@@@","@@@","@@@","@@", "@@@","@@@","@@@","@@@") l1 <- c("010", "011", "012","013","020", "021","022","023","024") df1 <- data.frame(h=h1, l=l1) #V1 h2 <- c("@@", "@@") l2 <- c("m", "w") df2 <- data.frame(h=h2, l=l2) #V2 h3 <- c("@@", "@@") l3 <- c("A", "B") df3 <- data.frame(h=h3, l=l3) #V3 suppRule_Freq <- c(3,0) outObj <- prepareInput(microDat, filenames=NULL, hierFrames=list(V2=df2,V3=df3,V1=df1), numV result <- protectTable(outObj, method="HYPERCUBE") class(result) summary(result) testObj data objects used in examples Description a data object containing objects used or generated by sdcTable. The object is used in the examples only. Usage data(testObj) 20 testObj Format a data frame Index ∗Topic datasets aggregatedDat, 2 levelObj, 8 microDat, 9 protectedData, 12 testObj, 19 ∗Topic methods calcDimInfos, 2 calcFullTable, 3 cellInfo, 5 changeCellStatus, 6 checkSuppressionPattern, 7 prepareInput, 9 primarySuppression, 11 protectLinkedTables, 12 protectTable, 15 setBounds, 17 ∗Topic print summary.safeTable, 18 aggregatedDat, 2 calcDimInfos, 2 calcFullTable, 3 cellInfo, 5 changeCellStatus, 6 checkSuppressionPattern, 7 levelObj, 8 microDat, 9 prepareInput, 9 primarySuppression, 11 protectedData, 12 protectLinkedTables, 12 protectTable, 15 setBounds, 17 summary.safeTable, 18 testObj, 19 21
© Copyright 2026 Paperzz