PROPUB: Towards a Declarative Approach for Publishing Customized, Policy-Aware Provenance Saumen Dey Daniel Zinn Bertram Ludäscher 20th July 2011 1 Background: Scientific Workflow Represents executable specifications Automates processing steps Enables sharing, and re-use Captures processing histories actor Scientific Workflow Design in COMAD ★ dataflow ★ COMAD is a special kind of Kepler model of computation. 2 Background: Workflow Execution Details AI1 AH1 RI1 alignWarp:1 reslice:1 WP1 AI2 AH2 RI alignWarp:2 WP2 AI4 AH4 alignWarp:4 WP3 WP4 AXG AI RH2 RI3 alignWarp:3 convert:1 AXS reslice:2 RH AI3 AH3 slicer:1 RH1 RI2 softmean:1 slicer:2 convert:2 AYG AYS AH reslice:3 RH3 RI4 slicer:3 convert:3 AZS reslice:4 AZG RH4 Workflow Execution Details 3 Background: Workflow Execution Details intermediate data input data AI1 AH1 RI1 alignWarp:1 reslice:1 WP1 AI2 AH2 RI alignWarp:2 WP2 AI4 AH4 alignWarp:4 WP3 WP4 convert:1 AXG AXS reslice:2 AI RH2 RI3 alignWarp:3 slicer:1 RH1 RI2 RH AI3 AH3 output data invocation softmean:1 slicer:2 convert:2 AYG AYS AH reslice:3 RH3 RI4 slicer:3 convert:3 AZS reslice:4 AZG RH4 Workflow Execution Details (Data Flow Graph) 4 Background: Workflow Execution Details intermediate data input data AI1 AH1 RI1 alignWarp:1 reslice:1 WP1 AI2 AH2 RI alignWarp:2 WP2 AI4 AH4 alignWarp:4 WP3 WP4 convert:1 AXG AXS reslice:2 AI RH2 RI3 alignWarp:3 slicer:1 RH1 RI2 RH AI3 AH3 output data invocation softmean:1 slicer:2 convert:2 AYG AYS AH reslice:3 RH3 RI4 slicer:3 convert:3 AZS reslice:4 AZG RH4 Workflow Execution Details (Dependency Graph) 5 Background: Workflow Execution Details AI1 AH1 RI1 alignWarp:1 reslice:1 WP1 AI2 AH2 RI alignWarp:2 WP2 AI4 AH4 alignWarp:4 WP3 WP4 AXG AI RH2 RI3 alignWarp:3 convert:1 AXS reslice:2 RH AI3 AH3 slicer:1 RH1 RI2 softmean:1 slicer:2 convert:2 AYG AYS AH reslice:3 RH3 RI4 slicer:3 convert:3 AZS reslice:4 AZG RH4 Workflow Execution Details (Dependency Graph) invocation Dependency Graph (backward) read write Dataflow Graph(forward) data gen_by ref ref invocation used ref 6 Background: Workflow Execution Details AI1 AH1 RI1 alignWarp:1 reslice:1 WP1 AI2 AH2 RI alignWarp:2 WP2 AI4 AH4 alignWarp:4 WP3 WP4 AXG AI RH2 RI3 alignWarp:3 convert:1 AXS reslice:2 RH AI3 AH3 slicer:1 RH1 RI2 softmean:1 slicer:2 convert:2 AYG AYS AH reslice:3 RH3 RI4 slicer:3 convert:3 AZG AZS reslice:4 RH4 Workflow Execution Details (Dependency Graph) invocation Dependency Graph (backward) read write Dataflow Graph(forward) data gen_by ref ref invocation used ref Type Errors Cyclic Dependency Error 7 Background: Use of Provenance AI1 AH1 RI1 alignWarp:1 reslice:1 WP1 AI2 AH2 RI alignWarp:2 WP2 AI4 AH4 alignWarp:4 WP3 WP4 AXG AI RH2 RI3 alignWarp:3 convert:1 AXS reslice:2 RH AI3 AH3 slicer:1 RH1 RI2 softmean:1 slicer:2 convert:2 AYG AYS AH reslice:3 RH3 RI4 slicer:3 convert:3 AZS reslice:4 AZG RH4 Workflow Execution Details (Dependency Graph) 8 Background: Use of Provenance AI1 AH1 RI1 alignWarp:1 reslice:1 WP1 AI2 AH2 RI alignWarp:2 WP2 AI3 AH3 AI4 AH4 AI RH2 RI3 alignWarp:3 alignWarp:4 WP3 WP4 AXG AXS reslice:2 RH convert:1 slicer:1 RH1 RI2 softmean:1 slicer:2 convert:2 AYG AYS AH reslice:3 RH3 RI4 slicer:3 convert:3 AZS reslice:4 AZG RH4 Workflow Execution Details (Dependency Graph) 9 Background: Use of Provenance AI1 AH1 RI1 alignWarp:1 reslice:1 WP1 AI2 AH2 RI alignWarp:2 WP2 AI4 AH4 alignWarp:4 WP3 WP4 AXG AI RH2 RI3 alignWarp:3 convert:1 AXS reslice:2 RH AI3 AH3 slicer:1 RH1 RI2 softmean:1 slicer:2 convert:2 AYG AYS AH reslice:3 RH3 RI4 slicer:3 convert:3 AZS reslice:4 AZG RH4 Workflow Execution Details (Dependency Graph) 10 Background: Use of Provenance AI1 AH1 RI1 alignWarp:1 reslice:1 WP1 AI2 AH2 RI alignWarp:2 WP2 AI4 AH4 alignWarp:4 WP3 WP4 AXG AI RH2 RI3 alignWarp:3 convert:1 AXS reslice:2 RH AI3 AH3 slicer:1 RH1 RI2 softmean:1 slicer:2 convert:2 AYG AYS AH reslice:3 RH3 RI4 slicer:3 convert:3 AZS reslice:4 AZG RH4 Workflow Execution Details (Dependency Graph) 11 Background: Use of Provenance AI1 AH1 RI1 alignWarp:1 reslice:1 WP1 AI2 AH2 RI alignWarp:2 WP2 alignWarp:3 AI4 AH4 alignWarp:4 WP3 WP4 convert:1 AXG AXS reslice:2 AI RH2 RI3 RH AI3 AH3 slicer:1 RH1 RI2 softmean:1 slicer:2 convert:2 AYG AYS AH reslice:3 RH3 RI4 slicer:3 convert:3 AZS reslice:4 AZG RH4 Workflow Execution Details (Dependency Graph) 12 Motivation Use of Provenance Data: to explain the output data values to debug the source code to find the root cause of errors to validate the code and verify the results to repeat the experiment in the same environment to reproduce the experiment in a different environment Privacy Issues with Provenance Data: Sensitive information Proprietary information Irrelevant detail (“TMI”) 13 Motivation – The Balancing Act Privacy & Relevancy Concerns Provenance Publishing We introduce PROPUB (Provenance Publisher) helps the data publisher to specify publication and privacy requirements to customize provenance data shows consequences of all these requests 14 Structure of the presentation Background (Scientific Workflow and Provenance) Provenance Model Motivation Example Use Case User Requests Provenance Policies PROPUB (Provenance Publisher) Conclusion 15 Example Use Case publish d9 d12 d15 c1 d18 s2 d16 c2 d19 s3 d17 c3 d20 d13 d10 d11 s1 m1 d14 Provenance Graph 16 Example Use Case proprietary publish d9 non-relevant d10 d13 d11 m1 s1 d15 c1 d18 s2 d16 c2 d19 s3 d17 c3 d20 d14 d12 sensitive Provenance Graph Cyclic Dependency d9 d18 d13 d10 s2 d11 Type Error d16 d19 False Independency d12 ProPub: A Systematic Approach User Requests Provenance Policies Fix Policy Violations Provenance Graph after sanitization 17 PROPUB – User Requests non-relevant (abstract) d9 publish (lineage) s1 d15 c1 d18 s2 d16 c2 d19 d13 d10 m1 d11 d14 d12 proprietary (hide) s3 d17 c3 d20 sensitive (anonymize) Provenance Graph lineage(d18). lineage(d19). anonymize(d11). anonymize(d12). abstract(d14, g1). abstract(s1, g1). abstract(m1, g1). hide(d11). hide(c1). hide(c2). 18 PROPUB –Provenance Policy abstract abstract d15 c1 d18 d13 s1 d15 c1 d18 d16 c2 d19 d14 s2 d16 c2 d19 Cyclic Dependency d9 d18 d13 Provenance Graph d10 g1 d11 s2 Type Error d16 Provenance Graph d19 False Independency d15 d12 d13 d18 s1 d16 d14 d19 Customized Provenance Graph Provenance Policy No-Write Conflict No-Cyclic Dependency d18 c2 d19 g1 g1 Customized Provenance Graph c1 s2 Customized provenance graph Witness wc(X,Y) cycle(X,Y) No-Type Error fs(X,Y) No-False Dependence fi(X,Y) No-False Independence fd(X,Y) 19 PROPUB – Fix Provenance Policy Violations Swallow the violators (i.e. “hide” more) d9 d10 d15 d13 gen_by used g1 d15 d9 d10 s2 d16 g1 d11 d11 d12 d12 a cycle between d13 and g1 s2 d16 d13 is swallowed 20 PROPUB: Provenance Publisher User Requests Provenance Graph Provenance Policy Resolve Direct Conflicts Select Lineage ProPUB Apply Customization User Requests Honored Requests Ignored Requests Guaranteed Policies Customized Provenance Graph Verify Policy and Repair Violations Violated Policies 21 PROPUB: Provenance Publisher Resolve Direct Conflicts Select Lineage Apply Customization User Requests d9 d13 d10 s1 d15 c1 d18 s2 d16 c2 d19 s3 d17 c3 d20 Verify Policy and Repair Violations m1 d11 d14 d12 Provenance Graph 22 PROPUB: Provenance Publisher Resolve Direct Conflicts Select Lineage d9 Apply Customization User Requests ur:abstract d13 d10 s1 d15 c1 d18 s2 d16 c2 d19 s3 d17 c3 d20 Verify Policy and Repair Violations m1 d11 d14 d12 Provenance Graph 23 PROPUB: Provenance Publisher Resolve Direct Conflicts Select Lineage d9 Apply Customization User Requests ur:abstract d13 d10 s1 d15 c1 d18 s2 d16 c2 d19 c3 d20 Verify Policy and Repair Violations m1 d11 d12 d14 ur:ratain s3 d17 Provenance Graph 24 PROPUB: Provenance Publisher Resolve Direct Conflicts Two conflicting user requests d9 Apply Customization User Requests ur:abstract d13 d10 Select Lineage s1 d15 c1 d18 s2 d16 c2 d19 c3 d20 Verify Policy and Repair Violations m1 d11 d12 d14 ur:ratain s3 d17 Provenance Graph 25 PROPUB: Provenance Publisher Interact with user until all user requests are conflict-free d9 Two conflicting user requests Select Lineage Apply Customization User Requests ur:abstract d13 d10 Resolve Direct Conflicts s1 d15 c1 d18 s2 d16 c2 d19 c3 d20 Verify Policy and Repair Violations m1 d11 d12 d14 ur:retain s3 d17 Provenance Graph 26 PROPUB: Provenance Publisher Resolve Direct Conflicts Select lineage Select Lineage ur:abstract ur:publish d9 d13 d10 s1 d15 c1 d18 s2 d16 c2 d19 Apply Customization User Requests Verify Policy and Repair Violations m1 d11 d14 d12 ur:anonymize ur:hide s3 d17 c3 d20 Provenance Graph 27 PROPUB: Provenance Publisher Resolve Direct Conflicts Select lineage Select Lineage ur:abstract ur:publish d9 d13 d10 s1 d15 c1 d18 s2 d16 c2 d19 Apply Customization User Requests Verify Policy and Repair Violations m1 d11 d14 d12 ur:anonymize ur:hide s3 d17 c3 d20 Provenance Graph 28 PROPUB: Provenance Publisher Resolve Direct Conflicts Apply ur:abstract Select Lineage ur:abstract Apply Customization User Requests d9 d13 d10 s1 d15 c1 d18 s2 d16 c2 d19 Verify Policy and Repair Violations m1 d11 d14 d12 ur:anonymize ur:hide Provenance Graph 29 PROPUB: Provenance Publisher Resolve Direct Conflicts s1 m1 Select Lineage d14 Apply Customization User Requests d9 d10 d13 d15 c1 d18 d16 c2 d19 d11 g1 s2 d12 ur:anonymize Verify Policy and Repair Violations ur:hide Provenance Graph del_node(N) ins_actor(I,A) del_dep(X,Y) del_dep(X,Y) int_dep(X,Y) ins_dep(G,Y) ins_dep(X,G) abstract(N,_) abstract(_,I), A=abstracted abstract(X,_), dep'(X,Y). abstract(Y,_), dep'(X,Y). abstract(X,G), abstract(Y,G), dep'(X,Y). abstract(X,G), dep'(X,Y), int_dep(X,Y). abstract(Y,G), dep'(X,Y), int_dep(X,Y). 30 PROPUB: Provenance Publisher Resolve Direct Conflicts s1 m1 Select Lineage d14 Apply Customization User Requests d9 d10 d13 d15 c1 d18 d16 c2 d19 d11 g1 s2 d12 ur:anonymize Verify Policy and Repair Violations ur:hide Provenance Graph Apply ur:hide 31 PROPUB: Provenance Publisher Resolve Direct Conflicts s1 m1 Select Lineage d14 Apply Customization User Requests d9 d10 d18 d13 Verify Policy and Repair Violations d11 g1 s2 d16 d19 d12 Apply ur:anonymize ur:anonymize Provenance Graph 32 PROPUB: Provenance Publisher Resolve Direct Conflicts s1 m1 Select Lineage d14 Apply Customization User Requests d9 d10 d18 d13 Verify Policy and Repair Violations d11 g1 s2 d16 d19 d12 Provenance Graph 33 PROPUB: Provenance Publisher d13 is swallowed to resolve “Cyclic-Dependency” Resolve Direct Conflicts Cyclic Dependency d9 Select Lineage d18 d13 d10 s2 d16 d19 Apply Customization User Requests Verify Policy and Repair Violations d11 Type Error False Independency d12 Provenance Graph after applying all user requests same_group(X,Y) same_group(X,X) same_group(X,X) same_group(X,Y) same_group(X,Y) smaller(X,Y) minimum(X) abstract(X,G) Policy Violations cycle(X,Y). same_group(_,X). same_group(_,X). same_group(Y,X). same_group(X,Z), same_group(Z,Y). same_group(X,Y), X < Y. node(X), smaller(_,X). same_group(X,G), minimum(G), same_group(X,Y),X!=Y. 34 PROPUB: Provenance Publisher s2 is swallowed to resolve “Type-Error” d13 is swallowed to resolve “Cyclic-Dependency” Resolve Direct Conflicts Select Lineage d18 d9 d10 d13 s2 d16 d19 Apply Customization User Requests Verify Policy and Repair Violations d11 Type Error False Independency d12 Provenance Graph after applying all user requests Policy Violations 35 PROPUB: Provenance Publisher s2 is swallowed to resolve “Type-Error” d13 is swallowed to resolve “Cyclic-Dependency” Resolve Direct Conflicts Select Lineage d15 d9 d18 c1 d10 ds213 d16 c2 d19 Apply Customization User Requests Verify Policy and Repair Violations d11 False Independency d12 Provenance Graph after applying all user requests ur:hide user requests are ignored 36 PROPUB: Provenance Publisher Resolve Direct Conflicts Select Lineage d15 d9 d18 c1 d10 g1 d16 c2 d19 Apply Customization User Requests Verify Policy and Repair Violations d11 d12 Provenance Graph after applying all user requests Checks for Policy Violations 37 PROPUB: Provenance Publisher Resolve Direct Conflicts Select Lineage d15 d9 d18 c1 d10 g1 d16 c2 d19 Apply Customization User Requests Verify Policy and Repair Violations d11 d12 Customized Provenance Graph Honored user requests Ignored user requests Granted Policies Violated Policies 38 Conclusion Provenance information (data lineage) has many applications, e.g. in scientific workflows. But need to balance: … the desire to publish provenance … and privacy (relevancy, intellectual property,…) concerns PROPUB is a system to publish customized provenance: Allows to specify, analyze, reconcile different repair strategies PROPUB uses a logic-based approach … to infer consequences of complex requests, actions Future work: other strategies to solve this problem (e.g. repair violations by inventing new nodes) define metrics to compare different strategies (swallow, invent,…) mix-and-match strategies? higher order abstractions of user requests 39 Related Work Chebotko, A., Chang, S., Lu, S., Fotouhi, F., Yang, P.: Scientific workflow provenance querying with security views. In: Web-Age Information Management, 2008. WAIM’08. The Ninth International Conference on, IEEE (2008) 349–356 Davidson, S., Khanna, S., Roy, S., Boulakia, S.: Privacy issues in scientific workflow provenance. In: Proceedings of the 1st International Workshop on Workflow Approaches to New Data-centric Science, ACM (2010) 1–6 Missier, P., Ludäscher, B., Bowers, S., Dey, S., Sarkar, A., Shrestha, B., Altintas, I., Anand, M., Goble, C.: Linking multiple workflow provenance traces for interoperable collaborative science. In: Workflows in Support of LargeScale Science (WORKS), 2010, IEEE 1–8 Biton, O., Cohen-Boulakia, S., Davidson, S.: Zoom* userviews: Querying relevant provenance in workflow systems. In: Proceedings of the 33rd international conference on Very large data bases, VLDB Endowment (2007) 1366–1369 Dey, S., Zinn, D., Ludäscher, B.: Reconciling Provenance Policy Conflicts by Inventing Anonymous Nodes. In: Resource Discovery Workshop; Extended Semantic Web Conference (2011) Thank You. 40
© Copyright 2026 Paperzz