d - Ssdbm 2011

PROPUB: Towards a Declarative Approach for
Publishing Customized, Policy-Aware Provenance
Saumen Dey
Daniel Zinn
Bertram Ludäscher
20th July 2011
1
Background: Scientific Workflow
Represents executable specifications
Automates processing steps
Enables sharing, and re-use
Captures processing histories
actor
Scientific Workflow Design in COMAD
★
dataflow
★
COMAD is a special kind of Kepler model of computation.
2
Background: Workflow Execution Details
AI1
AH1
RI1
alignWarp:1
reslice:1
WP1
AI2
AH2
RI
alignWarp:2
WP2
AI4
AH4
alignWarp:4
WP3
WP4
AXG
AI
RH2
RI3
alignWarp:3
convert:1
AXS
reslice:2
RH
AI3
AH3
slicer:1
RH1
RI2
softmean:1
slicer:2
convert:2
AYG
AYS
AH
reslice:3
RH3
RI4
slicer:3
convert:3
AZS
reslice:4
AZG
RH4
Workflow Execution Details
3
Background: Workflow Execution Details
intermediate data
input data
AI1
AH1
RI1
alignWarp:1
reslice:1
WP1
AI2
AH2
RI
alignWarp:2
WP2
AI4
AH4
alignWarp:4
WP3
WP4
convert:1
AXG
AXS
reslice:2
AI
RH2
RI3
alignWarp:3
slicer:1
RH1
RI2
RH
AI3
AH3
output data
invocation
softmean:1
slicer:2
convert:2
AYG
AYS
AH
reslice:3
RH3
RI4
slicer:3
convert:3
AZS
reslice:4
AZG
RH4
Workflow Execution Details (Data Flow Graph)
4
Background: Workflow Execution Details
intermediate data
input data
AI1
AH1
RI1
alignWarp:1
reslice:1
WP1
AI2
AH2
RI
alignWarp:2
WP2
AI4
AH4
alignWarp:4
WP3
WP4
convert:1
AXG
AXS
reslice:2
AI
RH2
RI3
alignWarp:3
slicer:1
RH1
RI2
RH
AI3
AH3
output data
invocation
softmean:1
slicer:2
convert:2
AYG
AYS
AH
reslice:3
RH3
RI4
slicer:3
convert:3
AZS
reslice:4
AZG
RH4
Workflow Execution Details (Dependency Graph)
5
Background: Workflow Execution Details
AI1
AH1
RI1
alignWarp:1
reslice:1
WP1
AI2
AH2
RI
alignWarp:2
WP2
AI4
AH4
alignWarp:4
WP3
WP4
AXG
AI
RH2
RI3
alignWarp:3
convert:1
AXS
reslice:2
RH
AI3
AH3
slicer:1
RH1
RI2
softmean:1
slicer:2
convert:2
AYG
AYS
AH
reslice:3
RH3
RI4
slicer:3
convert:3
AZS
reslice:4
AZG
RH4
Workflow Execution Details (Dependency Graph)
invocation
Dependency Graph
(backward)
read
write
Dataflow
Graph(forward)
data
gen_by
ref
ref
invocation
used
ref
6
Background: Workflow Execution Details
AI1
AH1
RI1
alignWarp:1
reslice:1
WP1
AI2
AH2
RI
alignWarp:2
WP2
AI4
AH4
alignWarp:4
WP3
WP4
AXG
AI
RH2
RI3
alignWarp:3
convert:1
AXS
reslice:2
RH
AI3
AH3
slicer:1
RH1
RI2
softmean:1
slicer:2
convert:2
AYG
AYS
AH
reslice:3
RH3
RI4
slicer:3
convert:3
AZG
AZS
reslice:4
RH4
Workflow Execution Details (Dependency Graph)
invocation
Dependency Graph
(backward)
read
write
Dataflow
Graph(forward)
data
gen_by
ref
ref
invocation
used
ref
Type Errors
Cyclic
Dependency
Error
7
Background: Use of Provenance
AI1
AH1
RI1
alignWarp:1
reslice:1
WP1
AI2
AH2
RI
alignWarp:2
WP2
AI4
AH4
alignWarp:4
WP3
WP4
AXG
AI
RH2
RI3
alignWarp:3
convert:1
AXS
reslice:2
RH
AI3
AH3
slicer:1
RH1
RI2
softmean:1
slicer:2
convert:2
AYG
AYS
AH
reslice:3
RH3
RI4
slicer:3
convert:3
AZS
reslice:4
AZG
RH4
Workflow Execution Details (Dependency Graph)
8
Background: Use of Provenance
AI1
AH1
RI1
alignWarp:1
reslice:1
WP1
AI2
AH2
RI
alignWarp:2
WP2
AI3
AH3
AI4
AH4
AI
RH2
RI3
alignWarp:3
alignWarp:4
WP3
WP4
AXG
AXS
reslice:2
RH
convert:1
slicer:1
RH1
RI2
softmean:1
slicer:2
convert:2
AYG
AYS
AH
reslice:3
RH3
RI4
slicer:3
convert:3
AZS
reslice:4
AZG
RH4
Workflow Execution Details (Dependency Graph)
9
Background: Use of Provenance
AI1
AH1
RI1
alignWarp:1
reslice:1
WP1
AI2
AH2
RI
alignWarp:2
WP2
AI4
AH4
alignWarp:4
WP3
WP4
AXG
AI
RH2
RI3
alignWarp:3
convert:1
AXS
reslice:2
RH
AI3
AH3
slicer:1
RH1
RI2
softmean:1
slicer:2
convert:2
AYG
AYS
AH
reslice:3
RH3
RI4
slicer:3
convert:3
AZS
reslice:4
AZG
RH4
Workflow Execution Details (Dependency Graph)
10
Background: Use of Provenance
AI1
AH1
RI1
alignWarp:1
reslice:1
WP1
AI2
AH2
RI
alignWarp:2
WP2
AI4
AH4
alignWarp:4
WP3
WP4
AXG
AI
RH2
RI3
alignWarp:3
convert:1
AXS
reslice:2
RH
AI3
AH3
slicer:1
RH1
RI2
softmean:1
slicer:2
convert:2
AYG
AYS
AH
reslice:3
RH3
RI4
slicer:3
convert:3
AZS
reslice:4
AZG
RH4
Workflow Execution Details (Dependency Graph)
11
Background: Use of Provenance
AI1
AH1
RI1
alignWarp:1
reslice:1
WP1
AI2
AH2
RI
alignWarp:2
WP2
alignWarp:3
AI4
AH4
alignWarp:4
WP3
WP4
convert:1
AXG
AXS
reslice:2
AI
RH2
RI3
RH
AI3
AH3
slicer:1
RH1
RI2
softmean:1
slicer:2
convert:2
AYG
AYS
AH
reslice:3
RH3
RI4
slicer:3
convert:3
AZS
reslice:4
AZG
RH4
Workflow Execution Details (Dependency Graph)
12
Motivation
Use of Provenance Data:
to explain the output data values
to debug the source code to find the root cause of errors
to validate the code and verify the results
to repeat the experiment in the same environment
to reproduce the experiment in a different environment
Privacy Issues with Provenance Data:
Sensitive information
Proprietary information
Irrelevant detail (“TMI”)
13
Motivation – The Balancing Act
Privacy &
Relevancy
Concerns
Provenance
Publishing
We introduce PROPUB (Provenance Publisher)
helps the data publisher
to specify publication and privacy requirements
to customize provenance data
shows consequences of all these requests
14
Structure of the presentation
Background (Scientific Workflow and Provenance)
Provenance Model
Motivation
Example Use Case
User Requests
Provenance Policies
PROPUB (Provenance Publisher)
Conclusion
15
Example Use Case
publish
d9
d12
d15
c1
d18
s2
d16
c2
d19
s3
d17
c3
d20
d13
d10
d11
s1
m1
d14
Provenance Graph
16
Example Use Case
proprietary publish
d9
non-relevant
d10
d13
d11
m1
s1
d15
c1
d18
s2
d16
c2
d19
s3
d17
c3
d20
d14
d12
sensitive
Provenance Graph
Cyclic Dependency
d9
d18
d13
d10
s2
d11
Type Error
d16
d19
False Independency
d12
ProPub: A Systematic Approach
User Requests
Provenance Policies
Fix Policy Violations
Provenance Graph after
sanitization
17
PROPUB – User Requests
non-relevant (abstract)
d9
publish (lineage)
s1
d15
c1
d18
s2
d16
c2
d19
d13
d10
m1
d11
d14
d12
proprietary (hide)
s3
d17
c3
d20
sensitive (anonymize)
Provenance Graph
lineage(d18).
lineage(d19).
anonymize(d11).
anonymize(d12).
abstract(d14, g1).
abstract(s1, g1).
abstract(m1, g1).
hide(d11).
hide(c1).
hide(c2).
18
PROPUB –Provenance Policy
abstract
abstract
d15
c1
d18
d13
s1
d15
c1
d18
d16
c2
d19
d14
s2
d16
c2
d19
Cyclic Dependency
d9
d18
d13
Provenance Graph
d10
g1
d11
s2
Type Error
d16
Provenance Graph
d19
False Independency
d15
d12
d13
d18
s1
d16
d14
d19
Customized Provenance Graph
Provenance Policy
No-Write Conflict
No-Cyclic Dependency
d18
c2
d19
g1
g1
Customized Provenance Graph
c1
s2
Customized provenance graph
Witness
wc(X,Y)
cycle(X,Y)
No-Type Error
fs(X,Y)
No-False Dependence
fi(X,Y)
No-False Independence
fd(X,Y)
19
PROPUB – Fix Provenance Policy Violations
Swallow the violators (i.e. “hide” more)
d9
d10
d15
d13
gen_by
used
g1
d15
d9
d10
s2
d16
g1
d11
d11
d12
d12
a cycle between d13 and g1
s2
d16
d13 is swallowed
20
PROPUB: Provenance Publisher
User
Requests
Provenance
Graph
Provenance
Policy
Resolve Direct Conflicts
Select Lineage
ProPUB
Apply Customization User Requests
Honored
Requests
Ignored
Requests
Guaranteed
Policies
Customized
Provenance
Graph
Verify Policy and Repair Violations
Violated
Policies
21
PROPUB: Provenance Publisher
Resolve Direct Conflicts
Select Lineage
Apply Customization User
Requests
d9
d13
d10
s1
d15
c1
d18
s2
d16
c2
d19
s3
d17
c3
d20
Verify Policy and Repair
Violations
m1
d11
d14
d12
Provenance Graph
22
PROPUB: Provenance Publisher
Resolve Direct Conflicts
Select Lineage
d9
Apply Customization User
Requests
ur:abstract
d13
d10
s1
d15
c1
d18
s2
d16
c2
d19
s3
d17
c3
d20
Verify Policy and Repair
Violations
m1
d11
d14
d12
Provenance Graph
23
PROPUB: Provenance Publisher
Resolve Direct Conflicts
Select Lineage
d9
Apply Customization User
Requests
ur:abstract
d13
d10
s1
d15
c1
d18
s2
d16
c2
d19
c3
d20
Verify Policy and Repair
Violations
m1
d11
d12
d14
ur:ratain
s3
d17
Provenance Graph
24
PROPUB: Provenance Publisher
Resolve Direct Conflicts
Two conflicting user
requests
d9
Apply Customization User
Requests
ur:abstract
d13
d10
Select Lineage
s1
d15
c1
d18
s2
d16
c2
d19
c3
d20
Verify Policy and Repair
Violations
m1
d11
d12
d14
ur:ratain
s3
d17
Provenance Graph
25
PROPUB: Provenance Publisher
Interact with user until
all user requests are
conflict-free
d9
Two conflicting user
requests
Select Lineage
Apply Customization User
Requests
ur:abstract
d13
d10
Resolve Direct Conflicts
s1
d15
c1
d18
s2
d16
c2
d19
c3
d20
Verify Policy and Repair
Violations
m1
d11
d12
d14
ur:retain
s3
d17
Provenance Graph
26
PROPUB: Provenance Publisher
Resolve Direct Conflicts
Select
lineage
Select Lineage
ur:abstract
ur:publish
d9
d13
d10
s1
d15
c1
d18
s2
d16
c2
d19
Apply Customization User
Requests
Verify Policy and Repair
Violations
m1
d11
d14
d12
ur:anonymize
ur:hide
s3
d17
c3
d20
Provenance Graph
27
PROPUB: Provenance Publisher
Resolve Direct Conflicts
Select
lineage
Select Lineage
ur:abstract
ur:publish
d9
d13
d10
s1
d15
c1
d18
s2
d16
c2
d19
Apply Customization User
Requests
Verify Policy and Repair
Violations
m1
d11
d14
d12
ur:anonymize
ur:hide
s3
d17
c3
d20
Provenance Graph
28
PROPUB: Provenance Publisher
Resolve Direct Conflicts
Apply
ur:abstract
Select Lineage
ur:abstract
Apply Customization User
Requests
d9
d13
d10
s1
d15
c1
d18
s2
d16
c2
d19
Verify Policy and Repair
Violations
m1
d11
d14
d12
ur:anonymize
ur:hide
Provenance Graph
29
PROPUB: Provenance Publisher
Resolve Direct Conflicts
s1
m1
Select Lineage
d14
Apply Customization User
Requests
d9
d10
d13
d15
c1
d18
d16
c2
d19
d11
g1
s2
d12
ur:anonymize
Verify Policy and Repair
Violations
ur:hide
Provenance Graph
del_node(N)
ins_actor(I,A)
del_dep(X,Y)
del_dep(X,Y)
int_dep(X,Y)
ins_dep(G,Y)
ins_dep(X,G)
abstract(N,_)
abstract(_,I), A=abstracted
abstract(X,_), dep'(X,Y).
abstract(Y,_), dep'(X,Y).
abstract(X,G), abstract(Y,G), dep'(X,Y).
abstract(X,G), dep'(X,Y),
int_dep(X,Y).
abstract(Y,G), dep'(X,Y),
int_dep(X,Y).
30
PROPUB: Provenance Publisher
Resolve Direct Conflicts
s1
m1
Select Lineage
d14
Apply Customization User
Requests
d9
d10
d13
d15
c1
d18
d16
c2
d19
d11
g1
s2
d12
ur:anonymize
Verify Policy and Repair
Violations
ur:hide
Provenance Graph
Apply
ur:hide
31
PROPUB: Provenance Publisher
Resolve Direct Conflicts
s1
m1
Select Lineage
d14
Apply Customization User
Requests
d9
d10
d18
d13
Verify Policy and Repair
Violations
d11
g1
s2
d16
d19
d12
Apply
ur:anonymize
ur:anonymize
Provenance Graph
32
PROPUB: Provenance Publisher
Resolve Direct Conflicts
s1
m1
Select Lineage
d14
Apply Customization User
Requests
d9
d10
d18
d13
Verify Policy and Repair
Violations
d11
g1
s2
d16
d19
d12
Provenance Graph
33
PROPUB: Provenance Publisher
d13 is swallowed to resolve
“Cyclic-Dependency”
Resolve Direct Conflicts
Cyclic Dependency
d9
Select Lineage
d18
d13
d10
s2
d16
d19
Apply Customization User
Requests
Verify Policy and Repair
Violations
d11
Type Error
False Independency
d12
Provenance Graph after
applying all user requests
same_group(X,Y)
same_group(X,X)
same_group(X,X)
same_group(X,Y)
same_group(X,Y)
smaller(X,Y)
minimum(X)
abstract(X,G)
Policy
Violations
cycle(X,Y).
same_group(_,X).
same_group(_,X).
same_group(Y,X).
same_group(X,Z), same_group(Z,Y).
same_group(X,Y), X < Y.
node(X), smaller(_,X).
same_group(X,G), minimum(G), same_group(X,Y),X!=Y.
34
PROPUB: Provenance Publisher
s2 is swallowed to
resolve “Type-Error”
d13 is swallowed to resolve
“Cyclic-Dependency”
Resolve Direct Conflicts
Select Lineage
d18
d9
d10
d13
s2
d16
d19
Apply Customization User
Requests
Verify Policy and Repair
Violations
d11
Type Error
False Independency
d12
Provenance Graph after
applying all user requests
Policy
Violations
35
PROPUB: Provenance Publisher
s2 is swallowed to
resolve “Type-Error”
d13 is swallowed to resolve
“Cyclic-Dependency”
Resolve Direct Conflicts
Select Lineage
d15
d9
d18
c1
d10
ds213
d16
c2
d19
Apply Customization User
Requests
Verify Policy and Repair
Violations
d11
False Independency
d12
Provenance Graph after
applying all user requests
ur:hide user requests
are ignored
36
PROPUB: Provenance Publisher
Resolve Direct Conflicts
Select Lineage
d15
d9
d18
c1
d10
g1
d16
c2
d19
Apply Customization User
Requests
Verify Policy and Repair
Violations
d11
d12
Provenance Graph after
applying all user requests
Checks for Policy
Violations
37
PROPUB: Provenance Publisher
Resolve Direct Conflicts
Select Lineage
d15
d9
d18
c1
d10
g1
d16
c2
d19
Apply Customization User
Requests
Verify Policy and Repair
Violations
d11
d12
Customized Provenance Graph
Honored user requests
Ignored user requests
Granted Policies
Violated Policies
38
Conclusion
Provenance information (data lineage) has many applications, e.g. in scientific
workflows. But need to balance:
… the desire to publish provenance
… and privacy (relevancy, intellectual property,…) concerns
PROPUB is a system to publish customized provenance:
Allows to specify, analyze, reconcile different repair strategies
PROPUB uses a logic-based approach
… to infer consequences of complex requests, actions
Future work:
other strategies to solve this problem (e.g. repair violations by inventing new
nodes)
define metrics to compare different strategies (swallow, invent,…)
mix-and-match strategies?
higher order abstractions of user requests
39
Related Work
Chebotko, A., Chang, S., Lu, S., Fotouhi, F., Yang, P.: Scientific workflow provenance querying with security
views. In: Web-Age Information Management, 2008. WAIM’08. The Ninth International Conference on, IEEE
(2008) 349–356
Davidson, S., Khanna, S., Roy, S., Boulakia, S.: Privacy issues in scientific workflow provenance. In: Proceedings
of the 1st International Workshop on Workflow Approaches to New Data-centric Science, ACM (2010) 1–6
Missier, P., Ludäscher, B., Bowers, S., Dey, S., Sarkar, A., Shrestha, B., Altintas, I., Anand, M., Goble, C.: Linking
multiple workflow provenance traces for interoperable collaborative science. In: Workflows in Support of LargeScale Science (WORKS), 2010, IEEE 1–8
Biton, O., Cohen-Boulakia, S., Davidson, S.: Zoom* userviews: Querying relevant provenance in workflow
systems. In: Proceedings of the 33rd international conference on Very large data bases, VLDB Endowment
(2007) 1366–1369
Dey, S., Zinn, D., Ludäscher, B.: Reconciling Provenance Policy Conflicts by Inventing Anonymous Nodes. In:
Resource Discovery Workshop; Extended Semantic Web Conference (2011)
Thank
You.
40