A Formal Model of Provenance in
Distributed Systems
Issam Souilah2
Adrian Francalanza1
1
Vladimiro Sassone2
Department of Computer Science
ICT, University of Malta
2
DSSE, ECS
Southampton University
San Francisco, USA. February 2009
Outline
Motivation
Proposed solution
Provenance Correctness
Conclusion
Motivation
Trust In a Distributed System
Motivation
Trust In a Distributed System
I
Distribution ⇒ inherent parallelism.
Motivation
Trust In a Distributed System
I
Distribution ⇒ inherent parallelism.
I
Distribution ⇒ no shared memory i.e., message passing.
Motivation
Trust In a Distributed System
I
Distribution ⇒ inherent parallelism.
I
Distribution ⇒ no shared memory i.e., message passing.
I
Distribution ⇒ lack of centralised coordination i.e.,
non-determinism.
Motivation
Trust In a Distributed System
I
Distribution ⇒ inherent parallelism.
I
Distribution ⇒ no shared memory i.e., message passing.
I
Distribution ⇒ lack of centralised coordination i.e.,
non-determinism.
Appropriate Calculus?
The piCalculus :
Motivation
Trust In a Distributed System
I
Distribution ⇒ inherent parallelism.
I
Distribution ⇒ no shared memory i.e., message passing.
I
Distribution ⇒ lack of centralised coordination i.e.,
non-determinism.
Appropriate Calculus?
The piCalculus :
I
Captures the characteristic features of our domain of study.
Motivation
Trust In a Distributed System
I
Distribution ⇒ inherent parallelism.
I
Distribution ⇒ no shared memory i.e., message passing.
I
Distribution ⇒ lack of centralised coordination i.e.,
non-determinism.
Appropriate Calculus?
The piCalculus :
I
Captures the characteristic features of our domain of study.
I
Well-studied.
Motivation
Trust In a Distributed System
I
Distribution ⇒ inherent parallelism.
I
Distribution ⇒ no shared memory i.e., message passing.
I
Distribution ⇒ lack of centralised coordination i.e.,
non-determinism.
Appropriate Calculus?
The piCalculus :
I
Captures the characteristic features of our domain of study.
I
Well-studied.
I
Close connections to linear logic and resources
I
...
piCalculus Primer
Names
a , b , . . . ∈ N
piCalculus Primer
Names
a , b , . . . ∈ N
I
denote points of interaction (rendez-vous channels). . .
piCalculus Primer
Names
a , b , . . . ∈ N
I
denote points of interaction (rendez-vous channels). . .
I
. . . and values which are transmitted during communication.
piCalculus Primer
Processes
P
k
Q
k
R
piCalculus Primer
Processes
a !b
k
Q
k
R
piCalculus Primer
Processes
a !b
k
a ? x .c ! x
k
R
piCalculus Primer
Processes
a !b
k
a ?x .c !x
k
R
piCalculus Primer
Processes
a !b
k
a ?x .x !c
k
R
piCalculus Primer
Processes
a !b
|
k
a ? x .x ! c
{z
react on a
}
k
R
piCalculus Primer
Processes
b !c
k
R
piCalculus Primer
Processes
b !c
k
b ? y .R 0
piCalculus Extension: Distribution!
From Processes to Systems
P
k
Q
k
R
piCalculus Extension: Distribution!
From Processes to Systems
p , q, r , . . . ∈ TP
p [P ]
k
q[Q ]
k
p [R ]
piCalculus Extension: Distribution!
From Processes to Systems
p [a ?x .P ]
k
q[a !v1 ]
k
r [R ]
piCalculus Extension: Distribution!
From Processes to Systems
p [a ?x .P ]
|
k
q[a !v1 ]
{z
}
across units of trust
k
r [R ]
piCalculus Extension: Distribution!
From Processes to Systems
p [a ?x .P ]
|
k
q[a !v1 ]
{z
Market of values!
k
r [a !v2 ]
}
piCalculus Extension: Distribution!
From Processes to Systems
communication
z
}|
p [a ?x .P ] k
|
{
q[a !v1 ]
{z
Market of values!
k
r [a !v2 ]
}
piCalculus Extension: Distribution!
From Processes to Systems
communication
z
p [a ?x .P ]
|
k
}|
q[a !v1 ]
{z
Market of values!
k
{
r [a !v2 ]
}
Main Problem
What are the control mechanism need to assist consumers
of data?
How do we automate decisions based on trust?
Main Problem
What are the control mechanism need to assist consumers
of data?
How do we automate decisions based on trust?
I
Static Analysis (not scalable)
Main Problem
What are the control mechanism need to assist consumers
of data?
How do we automate decisions based on trust?
I
I
Static Analysis (not scalable)
Dynamic Analysis:
I
decisions need to be computationally lightweight.
I
decision criteria produced in lightweight fashion.
Main Problem
What are the control mechanism need to assist consumers
of data?
How do we automate decisions based on trust?
I
I
Static Analysis (not scalable)
Dynamic Analysis:
I
I
decisions need to be computationally lightweight. (Full-blown
verification methods do not cut it!)
decision criteria produced in lightweight fashion.
Main Problem
What are the control mechanism need to assist consumers
of data?
How do we automate decisions based on trust?
I
I
Static Analysis (not scalable)
Dynamic Analysis:
I
I
decisions need to be computationally lightweight. (Full-blown
verification methods do not cut it!)
decision criteria produced in lightweight fashion.
(Proof-Carrying Code does not cut it!)
Outline
Motivation
Proposed solution
Provenance Correctness
Conclusion
Provenance
Annotated Values
v:
Provenance
Annotated Values
v:κ
Provenance
Annotated Values
v:κ
κ ::= | α; κ
α ::= rcv(p , κ)
| snd(p , κ)
empty provenance
sequenced provenace
recieve action
send action
Provenance
Annotated Values
p [a !v ]
Provenance
Annotated Values
p [a : κa ! v : κv ]
Provenance
Annotated Values
p [a : κa ! v : κv ]
k
q[a : κa0 ! v 0 : κv 0 .]
k
provenance is linear!
p [a : κa00 ! v 00 : κv 00 .]
Provenance Tracking
Automated:
Provenance Tracking
Automated:
1. orthogonal to programming (can be abstracted away)
Provenance Tracking
Automated:
1. orthogonal to programming (can be abstracted away)
2. ensures provenance annotation standardization.
Provenance Tracking
Automated:
1. orthogonal to programming (can be abstracted away)
2. ensures provenance annotation standardization.
3. avoids circular reasoning with respect to trust.
Provenance Tracking
Automated:
1. orthogonal to programming (can be abstracted away)
2. ensures provenance annotation standardization.
3. avoids circular reasoning with respect to trust.
Two tiered architecture:
Provenance Tracking
Automated:
1. orthogonal to programming (can be abstracted away)
2. ensures provenance annotation standardization.
3. avoids circular reasoning with respect to trust.
Two tiered architecture:
I
Computation Layer: describes computation of values and
processes.
Provenance Tracking
Automated:
1. orthogonal to programming (can be abstracted away)
2. ensures provenance annotation standardization.
3. avoids circular reasoning with respect to trust.
Two tiered architecture:
I
Computation Layer: describes computation of values and
processes.
I
Provenance Tracking Layer: describes the aggregation of
provenance information attached to data
(typically assigned to a trusted middleware)
Provenance Tracking Semantics
Operational Semantics
p [a !v ] k Q
−→
a hv i k Q
Provenance Tracking Semantics
Operational Semantics
p [a !v ] k Q
−→
a hv i k Q
|{z}
loose immediate provenance information!
Provenance Tracking Semantics
(Provenance Tracking) Operational Semantics
p [a : κa !v : κv ] k Q
−→
a hv : snd(p , κa ); κv i k Q
|
{z
}
provenance aggregation
Provenance Usage
Not-Automated!
Provenance Usage
Not-Automated!
I
program with it to control non-derminism. . .
Provenance Usage
Not-Automated!
I
program with it to control non-derminism. . .
I
. . . using intuitive programming idioms/constructs
Provenance Usage
Not-Automated!
I
program with it to control non-derminism. . .
I
. . . using intuitive programming idioms/constructs
Operational Semantics
a hv i k q[a ?(x ).Q ] −→ q[Q {v/x }]
Provenance Usage
Not-Automated!
I
program with it to control non-derminism. . .
I
. . . using intuitive programming idioms/constructs
Operational Semantics
a hv : κv i k q[a : κa ?(x from π).Q ] −→ q[Q {v/x }] if κv |= π
|{z}
provenance pattern matching
Provenance Usage
Not-Automated!
I
program with it to control non-derminism. . .
I
. . . using intuitive programming idioms/constructs
Operational Semantics
a hv : κv i k q[a : κa ?(x from π).Q ] −→ q[Q {v : rcv(q, κa ); κv/x }] if κv |= π
|
{z
}
provenance aggregation
Provenance Usage Example
Client/Server
srv Name of server
ret Name of return channel on which server returns answer
Client = p [ srv!hreti]
k
p [ ret?(x from π).P ]
Server = q[ srv?(y from ∗).y !hv i]
Provenance Usage Example
Client/Server
srv Name of server
ret Name of return channel on which server returns answer
Client = p [ srv!hreti]
k
p [ ret?(x from π).P ]
Server = q[ srv?(y from ∗).y !hv i]
Provenance Usage Example
Client/Server
srv Name of server
ret Name of return channel on which server returns answer
Client = p [ srv!hreti]
k
p [ ret?(x from π).P ]
Server = q[ srv?(y from ∗).y !hv i]
Provenance Usage Example
Client/Server
srv Name of server
ret Name of return channel on which server returns answer
Client = p [ srv!hreti]
k
p [ ret?(x from π).P ]
Server = q[ srv?(y from ∗).y !hv i]
Provenance Usage Example
Client/Server
srv Name of server
ret Name of return channel on which server returns answer
1
Client = p [ srv : κsrv
!h ret : i]
Server =
2
q[ srv : κsrv
k
p [ ret : ?(x from π).P ]
?(y from ∗).y !hv : κv i]
Provenance Usage Example
Client/Server
srv Name of server
ret Name of return channel on which server returns answer
1
Client = p [ srv : κsrv
!h ret : i]
Server =
2
q[ srv : κsrv
k
p [ ret : ?(x from π).P ]
?(y from ∗).y !hv : κv i]
Provenance Usage Example
Client/Server
srv Name of server
ret Name of return channel on which server returns answer
1
Client = p [ srv : κsrv
!h ret : i]
Server =
2
q[ srv : κsrv
k
p [ ret : ?(x from π).P ]
?(y from ∗).y !hv : κv i]
π = snd(q, rcv(q, snd(p , ))); ∗
Outline
Motivation
Proposed solution
Provenance Correctness
Conclusion
Provenance Correctness
Correctness Intuition
I
provenance attached to values records history related to that
value.
I
provenance of a value is correct if it describes a partial
history which corresponds to the total history of events.
Provenance Correctness
History represented by Logs
φ ::= ∅ | ρ; φ
logs
ρ ::= rcv(p ) | snd(p )
log actions
Provenance Correctness
History represented by Logs
φ ::= ∅ | ρ; φ
logs
ρ ::= rcv(p ) | snd(p )
log actions
Sub-Log Comparison
φ ≤ φ0
cmp1
∅≤φ
cmp2
ρ; φ ≤ ρ; φ0
φ ≤ φ0
cmp3
φ ≤ ρ; φ0
Provenance Correctness
Monitored Systems
φ . p [a : κa !v : κv ] k Q
−→mon
snd(p ); φ . a hv : snd(p , κa ); κv i k Q
Provenance Correctness
Monitored Systems
φ . p [a : κa !v : κv ] k Q
−→mon
snd(p ); φ . a hv : snd(p , κa ); κv i k Q
Erasure Function:
| − | : MS → S
Provenance Correctness
Monitored Systems
φ . p [a : κa !v : κv ] k Q
−→mon
snd(p ); φ . a hv : snd(p , κa ); κv i k Q
Erasure Function:
| − | : MS → S
Lemma
M −→mon M 0
implies |M | −→ |M 0 |
Provenance Correctness
Partial Log Extraction
pLog : κ → P(φ)
pLog() = ∅
pLog(rcv(p , κ); κ0 ) = rcv(p ); pLogV(κ0 ) ∪ pLog(κ)
pLog(snd(p , κ); κ0 ) = snd(p ); pLogV(κ0 ) ∪ pLog(κ)
pLogV() = ∅
pLogV(rcv(p , κ); κ0 ) = rcv(p ); pLogV(κ0 )
pLogV(snd(p , κ); κ0 ) = snd(p ); pLogV(κ0 )
Provenance Correctness
Definition
M has correct provenance iff ∀φ ∈ pLog(prov(M )) we have
φ ≤ log(M ).
Theorem
M has correct provenance and M −→mon M 0 implies M 0 has
correct provenance.
Outline
Motivation
Proposed solution
Provenance Correctness
Conclusion
Conclusions
I
Designed a provenance based calculus for distributed
computing.
I
Proposed a two-tier system for provenance tracking and
usage.
I
Defined provenance correctness
I
Proved provenance correctness for our provenance tracking
semantics.
Conclusions
Thank You... Questions?
© Copyright 2026 Paperzz