Exercise 9 - ETH Zürich

Exercise 9
Semantics of probabilistic programs
Program Analysis and Synthesis 2017
ETH Zürich
May 18, 2017
1 Kernels
Before diving into our main topic, we will briefly review some important concepts from
the theory of probability and probabilistic transition systems. We begin with the notions
of measure and integration. They are our main tool in discussing probability rigorously.
Then, we move to the concept of a kernel, which allows us to model systems that
transition from one state to another with a given probability.
Measure. A measure is basically a non-negative quantity µ(A), like volume or mass,
associated with the different regions A of a space S. In order to be well-behaved, every
such assignment µ, has to satisfy a couple of properties:
Definition 1. A measure µ is a map that assigns a real value, or possibly infinity, to
every region in space, such that the following conditions hold:
Non-negativity. µ(A) ≥ 0 for every A.
Null empty set. µ(∅) = 0.
Countable additivity. For any countable set {An }∞
1 of disjoint regions:
µ(
∞
[
An ) =
n=1
∞
X
n=1
1
µ(An ).
Example 1. An almost trivial example is the measure concentrated on one point x,
also known as the Dirac-delta δx :
(
1 if x ∈ A
δx (A) =
0 otherwise.
If we interpret δx as the probability of a random event, then it corresponds to the
situation where there is no uncertainty in the outcome: it will always be x.
It turns out that there is a conceptual difficulty in assigning a measure to every possible
subset in an uncountable space, e.g., R. Thus, we need to be more precise in what we
consider to be a region of space. Non-regions might be non-measurable. The notion of a
σ-algebra captures the properties that a collection of measurable sets needs to satisfy:
Definition 2. A family F of subsets of a given space S forms a σ-algebra if
1. S ∈ F,
2. S \ A ∈ F for every A ∈ F,
3.
S∞
n=1 An
∈ F if all An ∈ F,
The pair (S, F) is called a measurable space.
Example 2. Consider the collection of all intervals on the real line R. This collection is
not a σ-algebra, but (as with any collection of subsets) there exists a smallest σ-algebra
that includes it. This σ-algebra BR is called the Borel σ-algebra on R. It is a standard
way to turn the real line into a measurable space, and we will often simply write R
whenever we technically mean the measurable space (R, BR ).
Any measure µ on a space (S, F) has to be defined for at least all the elements of the
σ-algebra F. Thus, we can view µ as a map F → [0, +∞]. Of special interest will be
measures for which µ(S) ≤ 1. These are called subprobability measures, and µ(A) can
be interpreted as the probability that an event A ∈ F happens. If the total probability
is strictly less that one, then S represents only a subset of all the possible outcomes.
Example 3. A slightly less trivial example is the Borel measure on R. If J is any
interval with ends a ≤ b, then let λ(J) = b − a. By Carathéodory’s extension theorem, it
extends to an unique measure λ defined on the Borel σ-algebra BR . This is the standard
Borel measure on R, and it captures our intuition about length. Evidently, λ(R) = +∞,
and so it is not a subprobability measure. We can use it, however, to obtain the uniform
probability measure µ(A) = λ(A ∩ J)/λ(J) on any interval J with λ(J) ∈ (0, +∞).
2
Integration. The integral of a real-valued function f : S → R defined over a measurable
space (S, F) generalizes the concept of a weighted average, where the role of weights is
played by a chosen measure µ. In the special case when S consists of countably many
points, and every subset is measurable, i.e., F = P(S), the integral of f is simply:
X
µ{s} · f (s).
s∈S
In more general spaces, however, singletons {s} might be massless (i.e., µ{s} = 0)
even when the infinitesimal regions ds around them are not (e.g., consider the uniform
measure on [0, 1]). The simple summation above does not account for this mass, and
could lead to wrong results. Thus, we replace the singleton {s} by the infinitesimal ds:
Z
µ(ds) · f (s)
s∈S
Of course, the above description is far removed from the rigorous theory of integration,
but it should be enough for our purposes. In order to have a well-defined integral,
the function f must possess a special property: every Borel set B ∈ BR must have a
measurable pre-image f −1 (B) ∈ F. Naturally, such functions are called measurable.
Kernels. A kernel is a mathematical object that represents a parametrized collection
of measures over a given measurable space (T, G), where the parameter varies in another
measurable space (S, F). The concrete interpretation a kernel P varies from application
to application, and in our particular case it will model the probability of a system
undergoing a transition s → A from one state s ∈ S to another t ∈ A ⊆ T .
Definition 3. A kernel P : (S, F) → (T, G) from one measurable space to another is a
map P : S × G → [0, +∞] with the following two properties:
1. Fixing the first argument to any s ∈ S, we obtain a measure:
P (s, ·) : G → [0, +∞].
2. Fixing the second argument to any B ∈ G, we obtain a measurable function:
P (·, B) : S → [0, +∞].
Kernels that model transition probabilities have the type P : S × G → [0, 1], and we call
them subprobability kernels. To emphasize that P (s, A) gives a transition weight, we will
P
often write this number as s −
→ A.
3
Example 4. We can interpret any measurable function ϕ : S → T , as the subprobability
kernel that deterministically (i.e., with probability one) makes the transition s → ϕ(s):
(
1 if φ(s) ∈ A
ϕ(s, A) =
0 otherwise.
Kernels possess a rich mathematical structure, which we will utilize in our discussion
of probabilistic semantics later. Basically, a kernel generalizes the familiar concept of a
matrix, and matrix operations transfer almost directly, giving a nice algebra of kernels:
Linear combination. For any P, Q : (S, F) → (T, G), and measurable f and g, let:
Q
P
(f P + gQ)(s, A) = f (s) · (s −
→ A) + g(s) · (s −
→ A).
Multiplication. For any P : (S, F) → (Ω, Z), and Q : (Ω, Z) → (T, F), let:
Z
Q
P
(s −
(P Q)(s, B) =
→ dω) · (ω −
→ B).
ω∈Ω
In particular, kernel multiplication can be interpreted as the probability of making a
P
Q
transition s → B by making two independent transitions s −
→ dω and ω −
→ B via any
of the possible intermediate states ω ∈ Ω.
The analogy with matrices goes further with two additional ways that we can look at
a kernel P : (S, F) → (T, G). If we have a measure µ on the domain of P , then we can
push it forward to a measure on the codomain:
Z
P
(µP )(A) =
µ(ds) · (s −
→ A)
s∈S
If the measure µ is interpreted as the probability of a state before a transition with P ,
then µP gives the probability of a state after the transition. In the dual picture, we have
a measurable function f on the codomain, and we can pull it back to the domain:
Z
P
(P f )(s) =
(s −
→ dt) · f (t).
t∈T
In terms of dynamics, f is a function of the state after a transition with P . If we want
to predict its value without making a transition, then we may approximate it with P f ,
which gives the expected value of f as a function of the state before the transition.
For example, if we want to know whether a given predicate f : T → {0, 1} holds for the
state after transition, then the pull back P f : S → [0, 1] will give us the probability for
that. In a certain sense, P f is the best that we can know about f before the transition.
4
2 Probabilistic semantics
The main idea of probabilistic semantics is to interpret a program p as a subprobability
p
kernel over the program state space S. This kernel gives the probability s −
→ A of p
terminating in a state in A ⊆ S when run in a state s ∈ S. Because the program might
fail to terminate, the total probability p(s, S) need not equal one.
We will give a compositional semantics, which builds programs from simpler ones. We
assume that the state space is equipped with a suitable σ-algebra, and two classes of
basic expressions: measurable maps ϕ : S → S, and measurable functions f : S → [0, 1].
From them we build program expressions as follows:
Program
Informal meaning
ϕ
f p + gq
p; q
{f }
p∗q
state change
choice
composition
test
iteration
State change ϕ. Modifies the current state by applying a deterministic map ϕ : S → S.
An example would be variable assignment “x = e”.
Choice f p + gq. Chooses between p and q with probabilities f and g respectively. For
example, “ 21 (x = 0) + 21 (x = 1)” sets x to either 0 or 1 with probability of 12 .
Composition p; q. Executes p, and then executes q. For example, “ 21 (x = 1); 21 (x = 0)”
first sets x to 0, and then sets it to 1, each of them with probability of 12 .
Test {f }. Continue with probability f , or fail with probability 1 − f . Interpreting
boolean expressions as returning either 0 or 1, “{x = 0}” continues iff x equals 0.
Iteration p ∗ q. Choose between the iterates p0 ; q, p1 ; q, p2 ; q, . . . , pn ; q, . . . . E.g.,
“({x > 0}; x = x − 1) ∗ {x ≤ 0}” is the same as “while x > 0 do x = x − 1”.
More precisely, we interpret each program p as a subprobability kernel JpK, where the
compound combinations correspond to kernel algebra operations. For any two functions
f, g : S → R we will write f ≤ g iff f (s) ≤ g(s) for all s ∈ S.
(
1 if ϕ(s) ∈ A
JϕK(s, A) =
Jf p + gqK = f JpK + gJqK, given f + g ≤ 1.
0 otherwise.
5
Kp; qK = JpKJqK
(
f (s)
J{f }K(s, A) =
0
Jp ∗ qK =
∞
X
if s ∈ A
otherwise.
JpKn JqK.
n=0
Of course, the semantics of iteration is well defined only if the infinite sum converges to
a subprobability kernel. We can now reinterpret standard programming constructs as:
skip ≡ {1}
fail ≡ {0}
if f then p else q ≡ f p + (1 − f )q
while f do p ≡ ({f }; p) ∗ {1 − f }
Problem 1. Show that if a deterministic program p is written with the constructs
p
above, and it makes a transition s −
→ s0 , then under its probabilistic interpretation this
transition happens with probability one.
Problem 2. What is the kernel corresponding to the following program:
1
2
3
n=0
while 12
n = n+1
Problem 3. Recall how kernels can pull back measurable functions, as discussed in the
kernels section. What is the meaning of JpK1?
Definition 4. The weakest precondition hpiB and weakest liberal precondition [p]B for
a given deterministic programs p and postcondition B ⊆ S are defined as:
p
hpiB = {s | ∃s0 .s −
→ s0 ∧ s0 ∈ B}
p
[p]B = {s | ∀s0 .s −
6 → s0 ∨ s0 ∈ B}.
Problem 4. Define the analogues of the weakest precondition and the weakest liberal
precondition for probabilistic programs. How can you express the relation A ⊆ hpiB?
6
Problem 5 (McIver and Morgan [3]). Compute the weakest precondition of x ≥ 0 with
respect to the program:
1
2
(x = +y) + (x = −y).
3
3
Problem 6. If P is a subprobability kernel, prove that f ≤ g implies P f ≤ P g. (Hint:
observe that 0 ≤ f implies 0 ≤ P f ).
Problem 7. Find the analogue of what is a program invariant for the iteration p ∗ q.
Problem 8. Prove that “n ≥ 0” is an invariant for the program from problem 2.
Problem 9 (McIver and Morgan [3]). The following program computes a faulty N !.
Find an invariant establishing that r = N ! with probability at least pN .
1
2
3
4
n, r = N, 1
while n 6= 0
r = rn
p(n = n − 1) + (1 − p)(n = n + 1)
3 Further reading.
See the classic papers Kozen [2] and Kozen [1].
References
[1]
Dexter Kozen. “A Probabilistic PDL”. In: Proceedings of the Fifteenth Annual ACM
Symposium on Theory of Computing. STOC ’83. New York, NY, USA: ACM, 1983,
pp. 291–297. isbn: 0-89791-099-0. doi: 10.1145/800061.808758.
[2]
Dexter Kozen. “Semantics of probabilistic programs”. In: Journal of Computer and
System Sciences 22.3 (1981), pp. 328–350. issn: 0022-0000. doi: http://dx.doi.
org/10.1016/0022-0000(81)90036-2.
[3]
Annabelle McIver and Carroll Morgan. Abstraction, Refinement And Proof For
Probabilistic Systems (Monographs in Computer Science). SpringerVerlag, 2004.
isbn: 0387401156.
7