allocation strategies

01-Feb-12
Data Leakage Detection
1
CONTENTS










ABSTRACT
INTRODUCTION
OBJECTIVES
STUDY AND ANALYSIS
FLOW CHART
FUTURE SCOPE
LIMITATIONS
APPLICATIONS
CONCLUSION
REFERENCES
01-Feb-12
Data Leakage Detection
2
ABSTRACT
 A data distributor has given sensitive data to a set of supposedly trusted agents. Some
of the data are leaked and found in an unauthorized place.
 The distributor must assess the likelihood that the leaked data came from one or more
agents, as opposed to having been independently gathered by other means.
 We propose data allocation strategies that improve the probability of identifying
leakages.
 These methods do not rely on alterations of the released data (e.g., watermarks).
01-Feb-12
Data Leakage Detection
3
INTRODUCTION
 DISTRIBUTER: He is the owner of the data who distributes the data to the third
parties.
 THIRD PARTIES: Trusted recipient’s of the distributer’s data who are also called as
agents.
 PERTURBATION: Technique where the data are modified and made less sensitive
before being handed to agents.
 ALLOCATION STRATEGIES: Tactics used by the distributer to allocate the sensitive
data in order to increase the probability of detecting the data leakage.
01-Feb-12
Data Leakage Detection
4
OBJECTIVES
 Avoiding the perturbation of the original data before being handed to the agents.
 Detecting if the distributer’s sensitive data has been leaked by the agents.
 The likelihood that an agent is responsible for a leak is assessed.
01-Feb-12
Data Leakage Detection
5
STUDY AND ANALYSIS
EXISTING SYSTEM
 Traditionally, leakage detection is handled by watermarking, e.g., a unique code is
embedded in each distributed copy.
 If that copy is later discovered in the hands of an unauthorized party, the leaker can be
identified.
DRAWBACKS OF EXISTING SYSTEM
 Watermarking involves some modification of the original data.
 Watermarks can sometimes be destroyed if the data recipient is intelligent.
01-Feb-12
Data Leakage Detection
6
PROPOSED SYSTEM
ALLOCATION STRATEGIES:
The proposed system uses two allocation strategies through which the data is
allocated to the agents. They are,
 Sample request Ri=SAMPLE (T, mi): Any subset of mi records from T can be given to
agent.
 Explicit request Ri=EXPLICIT (T, condition): Agent receives all T objects that satisfy
condition.
01-Feb-12
Data Leakage Detection
7
FLOW CHART:
start
User’s explicit
request
Check the
Condition
Select the
agent.
else
exit
Create Fake Object is
Invoked
Loop Iterates
User Receives the Output.
end
01-Feb-12
Data Leakage Detection
8
Example:
 Say that T contains customer records for a given company A. Company A hires a marketing
agency U1 to do an online survey of customers.
 Since any customers will do for the survey, U1 requests a sample of 1,000 customer records.
 At the same time, company subcontracts with agent U2 to handle billing for all California
customers.
 Thus, U2 receives all T records that satisfy the condition “state is California.”
01-Feb-12
Data Leakage Detection
9
FUTURE SCOPE
 Future work includes the investigation of agent guilt models that capture leakage
scenarios.
 The extension of data allocation strategies so that they can handle agent requests in an
online fashion.
01-Feb-12
Data Leakage Detection
10
LIMITATION
 The presented strategies assume that there is a fixed set of agents with requests known in
advance.
 The distributor may have a limit on the number of fake objects.
01-Feb-12
Data Leakage Detection
11
APPLICATIONS
 It helps in detecting whether the distributer’s sensitive data has been leaked by the
trustworthy or authorized agents.
 It helps to identify the agents who leaked the data.
 Reduces cybercrime.
01-Feb-12
Data Leakage Detection
12
CONCLUSION
 Though the leakers are identified using the traditional technique of watermarking,
certain data cannot admit watermarks.
 In spite of these difficulties, we have shown that it is possible to assess the likelihood
that an agent is responsible for a leak.
 We have shown that distributing data judiciously can make a significant difference in
identifying guilty agents using the different data allocation strategies.
01-Feb-12
Data Leakage Detection
13
REFERENCES
[1] P. Buneman and W.-C. Tan, “Provenance in Databases,” Proc. ACM SIGMOD, pp. 11711173, 2007.
[2] Y. Cui and J. Widom, “Lineage Tracing for General Data Warehouse Transformations,”
The VLDB J., vol. 12, pp. 41-58, 2003.
[3] S. Czerwinski, R. Fromm, and T. Hodes, “Digital Music Distribution and Audio
Watermarking,” http://www.scientificcommons. org/43025658, 2007.
[4] F. Guo, J. Wang, Z. Zhang, X. Ye, and D. Li, “An Improved Algorithm to Watermark
Numeric Relational Data,” Information
01-Feb-12
Data Leakage Detection
14
01-Feb-12
Data Leakage Detection
15