Combining the box structure development method and CSP

Combining the box structure development method and CSP for industrial
software development
Confidential and under review for publication.
All contents disclosed herein are protected by Patent Application Number GB 0410047.5.
Philippa J. Hopcroft∗ and Guy H. Broadfoot†
Abstract
In this paper, we combine the Box Structure Development
Method (BSDM) [7, 9] and CSP [4, 11], integrating them
into industrial software development processes. BSDM was
developed with practical software projects in mind and provides a framework for developing formal design specifications that are fully traceable to the informal requirements.
It integrates well into an industrial setting and forms an
ideal bridge between the actual system being developed and
the abstract models used for formal analysis. CSP complements BSDM by providing the mathematical framework for
formal verification, together with its model checker FDR.
In this paper, we present generic algorithms for translating specifications from BSDM into CSP, illustrate how they
can be formally verified using FDR and summarise an industrial case-study.
1. Introduction
In this paper, we combine two existing and complementary formal methods, integrating them into industrial software development processes. The first of these is the Box
Structure Development Method (BSDM), originally developed by Mills [7], and later extended by others (for example, see [9]). The second approach is the process algebra
CSP [4, 11], together with its model checker FDR [10, 3].
The problem domain motivating this work is the development of practical software systems where the complexity and concurrency make them increasingly unreliable with
conventional testing methods alone; a more formal, yet at
the same time practical, approach is required to improve
quality and predictability. Many formal methods have been
developed over the years, for example [4, 5, 1]. However,
∗
†
Oxford University Computing Laboratory, United Kingdom.
Verum Consultants BV, The Netherlands.
integrating them into industrial software development environments has proved to be difficult. There are many reasons cited ranging from scalability to educational issues, depending on the formal approach in question. For example,
industrial software development starts on the basis of an informal specification running to hundreds or thousands of
pages, written in domain specific and business terms by domain experts and business analysts.
Formal specifications, written in a mathematical notation, are typically inaccessible to domain experts and business analysts who have the necessary product knowledge.
Training them in the formal language at the start of every
project is infeasible. This leads to the very people who are
essential to the validation process being excluded from it. In
our experience of introducing formal methods into industry,
we have found this to be one of the most severe practical
barriers.
BSDM was developed with practical software projects
in mind. It therefore provides an ideal vehicle for bridging
the gap between the abstract models necessary for the formal analysis and the actual system being developed in practice. It uses a requirements analysis technique that leads to a
formal specification faithful to the original informal specification. This specification is then transformed into an implementation via a number of refinement steps. From an industrial perspective, this approach provides a number of crucial advantages over other formal approaches. Firstly, it produces a formal specification that is accessible to the critical
stakeholders (such as the customers and domain experts)
and they can actively participate in validating it. Secondly,
it maintains complete traceability between the formal specifications and the informal requirements (enabling contractual agreements and forcing rigorous requirements analysis from the start). Thirdly, it was designed to integrate into
an industrial development process; references to practical
case-studies can be found in [9].
CSP complements BSDM by providing the essential
mathematical framework for modelling systems that are
composed of parallel components communicating with one
another. CSP has a rich language, together with semantic
models for reasoning about the correctness of a system.
Important from an industrial perspective, there is a model
checker, namely FDR, that provides fully automated verification for refinement, deadlocks, livelocks and determinism. Its extensive debugging facilities provide the necessary
feedback during analysis. By combining these complementary methods, the necessary link between the abstract models and the actual system in practice, together with the traceability to the informal requirements, is achievable.
In this paper, we show how by combining the two, we
reap the benefits of both. We use BSDM as a bridge between the industrial development, including all the restrictions this brings, and the formal analysis made possible by
the use of CSP and FDR by generating the CSP models
automatically from the BSDM specifications. Because the
models are generated automatically, there is no need to validate them against the informal specifications and the inability of critical project stakeholders and software developers
to understand the CSP models is no longer such a problem.
The paper is organised as follows. In Section 2, we give
an overview of the two formal frameworks in question. In
Section 3, we show how the most abstract functional view in
BSDM, known as a black box, can be modelled in CSP, and
present a generic algorithm for building them. The next step
towards implementation is to derive a state box specification
from the black box. We show in Section 4 how this view can
be modelled in CSP and present a second generic algorithm
for achieving this. The scope for automated analysis using
the FDR model checker is discussed in Section 5. A state
box must be behaviourally equivalent to the black box from
which it was derived. We show how this can be formulated
in CSP and verified automatically using FDR. In Section 6,
we give a brief overview of how our approach can be extended to model some of the abstraction techniques used in
BSDM. Finally, we present our conclusions, industrial experience gained with this approach and future work in Section 7.
In this paper, we are interested in the black box and
state box specifications only, since we apply this framework to construct verifiably correct designs of software systems. This work can be extended to incorporate the transition from the state box specification of the design to a corresponding clear box implementation. However, this is part of
ongoing research (as discussed in the conclusions) and beyond the scope of this paper.
Black boxes A black box specification is the most abstract
view and defines the external behaviour of a system or component in terms of input stimuli from and output responses
observable to its environment; S and R denote the set of
stimuli and responses respectively. A black box is characterised by a black box function BB : S ∗ → R that maps
stimulus history to responses, where S ∗ is the set of all finite sequences over S.
A black box function must be total: every stimulus sequence is mapped to a response. In practice, there may be
a stimulus that does not invoke a response; for completeness, a special response null is added to the set R to handle
this case. For example, in the initial state of a system prior
to any stimulus input, no response can (spontaneously) occur and therefore BB() = null. Sequences of stimuli that
are illegal must be included and are mapped to the special
value ω ∈ R. Illegal sequences are extension-closed: all extensions of illegal sequences are also illegal and result in response ω.
The black box is developed from the informal requirements and is required to be complete, consistent and traceably correct with respect to them. In practice, this is difficult
to achieve, since requirements are often incomplete, inconsistent and can run into the hundreds of pages. In [8], Prowell and Poore present the sequence-based software specification method for systematically defining a black box specification that ensures completeness and consistency, as well
as retaining complete traceability with the original informal
requirements.
The sequence-based specification method involves the
ordered enumeration of all possible stimuli sequences in S ∗
in order of length, starting with the empty sequence (i.e.,
of length 0), and the assignment of a response from R to
each one. For each length, the ordering must be complete.
During this enumeration, equivalences between sequences
may arise: Two sequences u, v ∈ S ∗ are Mealy equivalent, written u ≡ρMe v, precisely when all nonempty extensions of them result in the same response; i.e., for all
z ∈ S∗ − {}, BB(uz) = BB(vz).
When an equivalence arises between a sequence u and a
previously enumerated one v, u is said to reduce to v. In this
case, the response associated with u is noted, together with
its equivalence to v. If no such equivalences exist, then u is
said to be irreducible. The set S ∗ is thus partitioned into a
finite set of equivalence classes as defined by ρ Me where, for
2. Background
2.1. Box structure development method
The box structure development method [6, 7] provides a
framework for the application of function theory to software development. It defines three functional views of a
software system, forming an abstraction hierarchy that allows for stepwise refinement and verification: A black box
is a state-free description of the external view of the system;
a state box is derived from the black box and introduces internal state; and the final view is the clear box which is an
implementation of the state box in a chosen language. Each
view must be complete, consistent and traceably correct.
2
every sequence u ∈ S ∗ , there is a unique irreducible normal
form v such that u ∈ [v] ρMe . These unique normal forms for
the reduction system defined by ρ Me are known as canonical
sequences.
Using the format in [9], the result of this sequence enumeration is a set of enumeration tables: one for each canonical sequence. For example, an enumeration table for some
canonical sequence c i takes the form:
Stimulus
s1
s2
Response
r1
r2
Equivalence
cj
ci s2 box (without needing to make reference to the stimulus
history). It is characterised by a total state box function
SB : (Q × S) → (Q × R), where Q, S and R are the sets
of states, stimuli and responses respectively.
The states in Q represent characteristic predicates, one
distinct state for each equivalence class, over the partition
of S∗ . Defining state variables and their unique instantiations for each equivalence class is achieved through a process called the canonical sequence analysis and is described
in [9].
A black box and state box with functions BB : S ∗ → R
and SB : (Q × S) → (Q × R) respectively are behaviourally
equivalent precisely when, for every sequence in S ∗ , they
invoke the same response:
Trace
R1
R2
where {s1 , s2 } = S (since every extension must be
complete), r1 , r2 ∈ R and cj is an existing canonical sequence (identified earlier in the ordering). Only unreduced
sequences are used in the equivalences. The first row of the
table reads as follows:
∀ u ∈ [ci ]ρMe
•
∀ u s ∈ S∗ ; r ∈ R •
BB(u s) = r ⇔ SB(m(u), s) = (m(u s), r)
where the function m : S ∗ → Q is a mapping from stimulus
sequences to states and defined as:
BB(u s1 ) = r1 ∧ u s1 ∈ [cj ]ρMe
∀ u, v ∈ S∗
This step can be traced back to requirement R 1 in the informal requirements specification. Due to the equivalence established, no further extensions of this sequence need to be
considered. In practice, it frequently occurs that choices are
required concerning issues about which the informal specification is silent, ambiguous or inconsistent. In these cases,
new requirements must be formulated and validated.
In the table above, extending sequences in [c i ]ρMe with s2
invokes response r 2 and are not reducible to a previously
enumerated sequence. Thus, c i s2 is a new canonical sequence, for which an enumeration table must be defined.
The enumeration process is complete once all sequences
have been assigned to an equivalence class. As proved in [8,
Theorem 3], a complete enumeration specifies exactly one
value for every sequence z ∈ S ∗ , and therefore defines
a complete, consistent black box specification. Enumeration specifications that are strongly bisimilar are regarded
as equivalent (since they encode the same black box); as we
will see later on, this is nicely reflected in the CSP framework, since the corresponding CSP models in such cases are
also semantically equivalent. In this paper, we assume that
the enumeration specifications are optimal (i.e., no strongly
bisimilar states), which is what one aims to achieve in practice. A simple example illustrating a complete enumeration
is given in Section 3 (Example 3.1).
For further details on this work and proofs, see [8], in
which the authors also present effective abstraction techniques to enhance scalability in practice. In Section 6, we
give a brief summary of how we handle them.
•
m(u) = m(v) ⇒ u ≡ρMe v
The state box specification is represented as a set of tables,
one for each stimulus s ∈ S of the form:
Current state
q1
q2
Response
r1
r2
New state
q2
q1
Trace
R1
R2
where q1 , q2 ∈ Q and r1 , r2 ∈ R. In a complete, consistent state box specification, from each table for stimulus
s ∈ S, one must be able to determine the unique response in
any given state of the system. Thus, the Current State column must cover all states. Example 4.1 in Section 4 presents
such a state box.
There is more than one possible state box for a given
black box; therefore, it is for the software designer to determine the appropriate one and show that this is behaviourally
equivalent to the black box specification.
2.2. CSP and the FDR model checker
CSP [4] is a process algebra for describing concurrent
processes that interact with one another or their environment through some form of communication. CSP has a
rich expressive language and a collection of semantic models for reasoning about process behaviour. We give a brief
overview of the notation and semantic models we use;
see [11] for further details.
A process is defined in terms of synchronous, atomic
communications, known as events, that it can perform with
its environment. For example, process a → P can initially perform event a and then act like process P. We write
?x : A → P(x) (prefix choice construct) to denote the process that is willing to perform any event in A and then behave like process P(x).
State boxes A state box specification is the next step towards implementation and is derived from a black box specification. It introduces state variables to capture distinctive
characteristics of the stimulus history, such that the external behaviour is equivalent to that specified by the black
3
Channels carry sets of events; for example, c.5 is an
event of channel c. The process c?x → P x inputs a value x
from channel c and then acts like P x . The process d . x → Q
performs the event d . v, where v is the value assigned to x,
and then acts like process Q.
In CSP, we have external and internal choice operators.
P 2 Q denotes an external choice between P and Q; the
initial events of both processes are offered to the environment; when an event is performed, the choice is resolved.
P Q denotes an internal (or nondeterministic) choice between P and Q; the process can act like either P or Q, with
the choice being made according to some criteria that we do
not model.
Processes can be placed in parallel, where they synchronise upon (all or specified) common events or interleaved,
where they run independently of one another.
For the purposes of this paper, it suffices to introduce
the simplest semantic model in CSP, known as the traces
model. A trace is a sequence of events (visible to the environment). For any process P, traces(P) is the set of all
finite traces of P; furthermore, traces(P) is non-empty (always contains empty trace) and prefix closed. The refinement ordering is defined as:
the automatic verification using FDR, while still benefiting
from the practical advantages of the box structure development method.
The sequence-based specification of the black box describes a finite-state automaton (in this case, a Mealy machine), where the states represent the equivalence classes
and the arcs capture the corresponding stimulus-response
pairs between them. Such systems are intuitively modelled
in CSP as a set of mutually recursive processes, one representing each equivalence class, and defined with stimulusresponse pairs of events leading to the process that represents the corresponding equivalence state.
We start by defining what it means for a CSP process to
model a black box specification. For a black box function
BB : S∗ → R, we will define a CSP process P with stimulus
and response events from S and R respectively; thus the alphabet of P will be S ∪ R. For some trace tr, we write tr S
to denote the trace whose members are those of tr that are
also in set S; for example, s 1 , r1 {s1 } = s1 . Thus, for
any given trace tr, we know which stimulus sequence it relates to (namely, tr S). Furthermore, if tr = tr r for
some trace tr and response r, then this trace represents the
stimulus sequence tr S generating response r.
P T Q ⇔ traces(Q) ⊆ traces(P)
Definition 3.1 A CSP process P is said to model a black
box specification characterised by the total black box function BB : S ∗ → R, precisely when the following conditions
are satisfied:
A process Q trace-refines a process P precisely when every
trace of Q is also a trace of P. Processes P and Q are traceequivalent, written P ≡ T Q, precisely when P T Q and
Q T P.
The traces model allows us to verify safety properties
only. Richer semantic models exist in CSP, for example the
failures model, to capture a broader scope of properties such
as nondeterminism and other liveness conditions. (See [11]
for further details.)
FDR is a mature model checker for CSP and a commercial product of Formal Systems Ltd. [3]. It provides fully
automated verification of refinement in all semantic models, determinism, deadlock freedom and livelock freedom.
If such a check fails, then counter-examples are generated
with extensive support for analysing them (from top level
down to the individual component behaviour). There are
optimisation techniques available to enhance scalability in
practice.
1. P must not introduce new behaviour (not in BB):
∀ tr, tr ∈ traces(P) • tr = tr r ⇒ BB(tr S) = r.
2. P must model the whole domain of BB and map each
one to the corresponding response:
∀ z ∈ S∗ ; r ∈ R •
BB(z) = r ⇒ (∃ tr, tr ∈ traces(P) •
tr = tr r ∧ tr S = z).
3. P reflects the stimulus-response pairing pattern. Since
traces(P) is prefix-closed, we specify this as:
∀ tr : traces(P) •
#(tr R) ≤ #(tr S) ≤ #(tr R) + 1.
where #(tr X) returns the number of times an event
in the set X occurs in trace tr.
3. Modelling black boxes in CSP
4. P is a deterministic process, reflecting the fact that BB
is a function.
The starting point for specifying a component using
BSDM is to construct a total black box function (by sequence enumeration) that describes the required external
behaviour of the component. In this section we show how
black boxes can be modelled in CSP and present a generic
algorithm for their construction. As discussed in the introduction, the motivation for these translations is to enable
Condition 3 is a characteristic of the black box specification that must be observed by the CSP model. In terms
of the corresponding finite state automaton, from any given
state, actions only of the form of a stimulus followed by
a response can be performed. Without this property in our
4
c0 : Stimulus
c1 : coin
Response
Equiv
Stimulus
c2 : coin, tea
Response
Equiv
Stimulus
c3 : coin, coffee
Equiv
Response
Stimulus
Response
Equiv
coin
menu
c1
coin
return_coin
c1
coin
c2
coin
tea
pay_coin
c0
tea
confirm
c2
tea
null
c2
tea
null
c3
coffee
pay_coin
c0
coffee
confirm
c3
coffee
null
c2
coffee
null
c3
cancel
null
c0
cancel
return_coin
c0
cancel
menu
c1
cancel
menu
accept
null
c0
accept
null
c1
accept
dispense_tea
c0
accept
return_coin
return_coin
dispense_coffee
c3
c1
c0
Figure 1. Complete enumeration tables for Example 3.1.
every canonical sequence c i in the enumeration, a CSP
process Pci is defined as follows:
definition, one could construct a CSP process P with consecutive repetitions of a single response, such that the first
two conditions were satisfied. For example, one could define a process P such that s, r, r ∈ traces(P). Given that
BB(s) = r, this trace would satisfy the first two conditions
above. However, this does not accurately model the pairing
assumption made by the black box specification of the system and would equate to introducing a new state in the automaton from which a response action could occur without
being preceded by a stimulus.
In the case where a single stimulus does invoke a sequence of consecutive responses, one would still capture
this as a single abstract response to model the fact that they
are treated atomically and are a result of a single stimulus
input.
Pci =
2{Q(s, r, cj )
| (s, r, cj ) ∈ RowsBB (ci )}
where
Q(s, r, cj )
=
s → r → Pcj
This produces a collection of mutually recursive CSP
processes Pc0 , . . . , Pcn , one process P ci for each canonical sequence ci in the enumeration.
2. The CSP process PBB is then defined as follows:
PBB
=
P c0
where c0 represents the shortest canonical sequence
which, by definition, is .
Definition 3.2 A CSP process satisfying Definition 3.1 and
a sequence-based specification are said to be congruent precisely when they encode the same black box.
Due to space restrictions, we use a very simple example
below to illustrate this algorithm. However, we discuss our
industrial experiences and how well this approach scales in
Section 7.
The sequence-based specification method is a practical
one that has shown to integrate well into an industrial setting (see [9]). The CSP models are more abstract, but enables the necessary automated analysis of components and
their parallel composition using FDR. We therefore present
a generic algorithm for converting sequence-based specifications into CSP.
For the purposes of clarity, we capture an enumeration
table for some canonical sequence c i as the set RowsBB (ci )
comprising one element per row of the table and taking the
form (s, r, cj ), where s, r and cj represent the stimulus, response and equivalence columns respectively. For the purposes of the CSP model, we ignore the fourth column capturing the traceability to informal requirements, as it is not
relevant in the CSP analysis of the system.
Example 3.1 Consider a highly simplified vending machine with the following sets of stimuli and responses:
S
R
=
=
{coin, tea, coffee, cancel, accept }
{return coin, dispense tea, dispense coffee,
menu, pay coin, confirm, null }
Assume there are 4 canonical sequences and the enumeration tables are defined as in Figure 1 (ignoring the 4 th
column). This enumeration encodes a complete black box
specification: for any sequence z ∈ S ∗ , one can derive the
corresponding response directly from the tables. For example, consider the sequence coin, cancel, coin, coffee accept. The left side of the concatenation reduces to the
canonical sequence coin, coffee. From its table, we see
that any sequence in this equivalence class extended by
stimulus accept invokes the response dispense coffee.
The null response is used to represent the lack of response from the system, as introduced in Section 2.
Algorithm 3.1 For a given sequence-based specification
that encodes a black box specification with function BB :
S∗ → R, a corresponding CSP process P BB is defined as follows:
1. The sequence-based specification comprises a set of
enumeration tables, one per canonical sequence. For
5
the set of responses. Each state q ∈ Q represents a characteristic predicate χq for an equivalence class defined by ρ Me
over S ∗ ; we write [q]ρMe to denote the set of sequences in
S∗ for which χq holds true. In this section, we demonstrate
how to model state boxes in CSP and present a generic algorithm for constructing them.
We start by defining the traces properties to be satisfied
by a CSP model for a state box function. The difference between the black box specification and this one is the introduction of state data to encapsulate the stimulus history. As
we will illustrate in Algorithm 4.1, such state data is intuitively modelled as processes with corresponding state variables in CSP.
Using Algorithm 3.1, the CSP process generated for this
specification is P = Pc0 where:
Pc0
=
coin → menu → Pc1
2 tea → pay coin → Pc0
2 coffee → pay coin → Pc0
2 cancel → null → Pc0
2 accept → null → Pc0
Processes Pc1 , Pc2 and Pc3 are defined accordingly from
the tables. One can step through the traces of P and see how
each one directly corresponds to the stimulus sequenceresponse mapping in the enumeration.
Proposition 3.1 For a given sequence-based specification
that encodes a black box function BB, if a CSP process P is
constructed according to Algorithm 3.1, then P models the
black box function BB according to Definition 3.1 and they
are therefore congruent.
Definition 4.1 A CSP process P is said to model a state box
specification with function SB : (Q × S) → (Q × R), precisely when the following conditions are satisfied:
1. P must not introduce new behaviour (not in SB):
∀ tr, tr ∈ traces(P) •
tr = tr s, r ⇒
(∃ q, q ∈ Q • SB(q, s) = (q , r)
∧ tr S ∈ [q]ρMe ∧ tr S ∈ [q ]ρMe ).
Proof straightforward and omitted due to lack of space.
The resulting CSP processes are deterministic, reflecting
the fact that the black box specification is indeed a function.
For specifying the design of a component, such deterministic models suffice, since one does not choose to implement
nondeterministic software. The purpose of these formal design specifications is to ensure that they are complete and
unambiguous when passed on for implementation.
However, the ability to abstract away design details is an
important requirement in practice. For example, when specifying interfaces of components and verifying their interactions, it is essential to be able to hide internal actions. This
naturally leads to nondeterminism. The CSP framework
captures nondeterminism very effectively and the sequencebased specification method provides a practical method for
under-specifying components for these purposes, without
conflicting the functional foundations of their approach. In
Section 6, we discuss how our approach can be extended to
handle nondeterminism.
In this section, we have shown how black boxes can be
modelled in CSP, thereby paving the way for automated
analysis within FDR. In Section 7, we discuss the practical benefits of this approach.
2. Every mapping in SB is modelled by P:
∀ q, q ∈ Q; s ∈ S; r ∈ R •
SB(q, s) = (q , r) ⇒
(∀ tr ∈ traces(P) • tr S ∈ [q]ρMe ⇒
tr s, r ∈ traces(P)
∧ (tr S) s ∈ [q ]ρMe ).
By induction, this implies that all mappings in SB are
modelled by P (since traces(P) is always non-empty,
by definition of the traces-model).
3. P satisfies the stimulus-response pairing pattern:
∀ tr : traces(P) •
#(tr R) ≤ #(tr S) ≤ #(tr R) + 1.
This property is a characteristic of the state box specification that must similarly be obeyed by P.
4. P is a deterministic process, reflecting the fact that SB
is a function.
The traces of the CSP process are composed of stimuli
and responses in S and R respectively and clearly make no
reference to the state data. This allows for a direct comparison between the traces of the process modelling the black
box and that of the state box. Any errors introduced by design decisions made from the sequence-based enumeration
of the black box to the state box in terms of the system’s behaviour are easily found in FDR by verifying whether the
corresponding CSP processes are traces-equivalent. Other
properties such as accidental nondeterminism are also automatically checked using FDR (discussed in Section 5).
4. Modelling state boxes in CSP
As described in Section 2.1, the state box specification is
the next step towards implementation in BSDM and is derived from the black box. It captures the stimulus history as
state data, such that the external behaviour remains the same
as defined in the black box specification. A state box of a
system defines a state box function SB : (Q×S) → (Q×R),
where Q is the set of states, S is the set of stimuli and R is
6
Definition 4.2 A CSP process satisfying Definition 4.1 and
a state box specification are said to be congruent precisely
when they model the same state box function.
The state box tables, one for each stimulus, are derived from
the enumeration tables in Example 3.1 and the characteristic predicates defined above. For example, the state boxes
for stimuli coin, tea and cancel are defined as:
We present a generic algorithm below for translating
state box specifications into equivalent CSP models. For the
purposes of clarity (and as done for Algorithm 3.1), we capture the information presented in a state box for a stimulus
s ∈ S as the set RowsSB (s) comprising one element per row
of the state box table and taking the form (q, r, q ), where q
is the current state, r is the response invoked by s in state q
and q is the updated state. (We once again ignore the fourth
column in the state box tables.)
Current state
Coin = false
Coin = true
Stimulus: cancel
Current state
Response
Coin = false
null
Coin = true ∧ Drink = none return coin
Coin = true ∧ Drink = none
menu
1. Let x1 , . . . , xk denote the state variables as defined by
the state box specification.
2. The process P SB is defined as follows:
=
where init1 , . . . , initk are the initial values of the state
variables (defined in the state box).
| s ∈ S}
There is one Q(s) per state box for stimulus s.
4. For each stimulus s ∈ S, Q(s) is defined as follows:
Q(s)
=
2{Q (q, s, r, q )
| (q, r, q ) ∈ RowsSB (s)}
where
Q (q, s, r, q )
=
q & s → r → PSB (q )
Each guarded choice is directly derived from the state box
tables. For example, choices 3, 4 and 5 are derived from the
tables for tea and coffee respectively.
If we consider this process in its initial state, then the following events are offered:
For a stimulus s, there is one process Q (q, s, r, q ) for
each row in its state box. The value of q reflects the precondition for which s invokes response r and state update q , and is modelled as a boolean guard in CSP.
P(false, none) =
coin → menu → P(true, d)
2 ?x : {tea, coffee} → pay coin → P(c, d)
2 cancel → null → P(c, d)
2 accept → null → P(c, d)
Example 4.1 Recall the simple vending machine presented
in Example 3.1. Consider the following state box specification derived from that sequence-based specification,
where the characteristic predicates for each equivalence
class (each denoted by a canonical sequence) is defined by
two state variables Coin and Drink as follows:
Equivalence class
[]ρMe
[coin]ρMe
[coin, tea]ρMe
[coin, coffee]ρMe
New state
No update
Coin = false
Drink = none
P(c, d) =
¬c & coin → menu → P(true, d)
2 c & coin → return coin → P(c, d)
2 ¬c & ?x : {tea, coffee} → pay coin → P(c, d)
2 c ∧ d = none & ?x : {tea, coffee} → confirm → P(c, x)
2 c ∧ d = none & ?x : {tea, coffee} → null → P(c, d)
2 ¬c & cancel → null → P(c, d)
2 c ∧ d = none & cancel → return coin → P(false, d)
2 c ∧ d = none & cancel → menu → P(c, none)
2 ¬c ∨ d = none & accept → null → P(c, d)
2 c ∧ d = none & accept → dispense . d → P(false, none)
3. The process P SB is defined as follows:
2{Q(s)
New state
No update
Drink = tea
No update
From a table for stimulus s, one must be able to determine the response invoked by s in every system state.
Assuming the initial values for Coin and Drink are false
and none respectively, the CSP process is defined as P SB =
P(false, none) where:
PSB (init1 , . . . , initk )
PSB (x1 , . . . , xk ) =
New state
Coin = true
No update
Stimulus: tea
Current state
Response
Coin = false
pay coin
Coin = true ∧ Drink = none
confirm
Coin = true ∧ Drink = none
null
Algorithm 4.1 Given a state box specification for some
state box function SB : (Q × S) → (Q × R), a corresponding CSP process PSB is defined as follows:
PSB
Stimulus: coin
Response
menu
return coin
As expected, the events offered in this process are the
same as those offered by process P c0 constructed from
the sequence-based enumeration table of the canonical sequence c0 in Example 3.1.
Characteristic predicate
Coin = false
Coin = true ∧ Drink = none
Coin = true ∧ Drink = tea
Coin = true ∧ Drink = coffee
A state box specification must be behaviourally equivalent to the black box from which it was derived.
7
The characteristic predicates that partition S ∗ must be
non-overlapping and complete; accidental nondeterminism is introduced when the former of these is not satisfied.
Verifying these properties and equivalences by hand is infeasible for large practical systems in an industrial setting
(where time and resources are limited) and therefore is frequently not carried out as rigorously as is required. However, by translating these specifications into CSP, all such
checks are easily formulated and automatically verified using the FDR model checker. We discuss this in the next
section.
inserted into the model. Nondeterminism is clearly introduced when x = true. Without the extra events, overlapping predicates do not necessarily lead to nondeterminism;
in the example above, if r 1 = r2 and f (x) = g(x), then no
nondeterminism is introduced. Generating the CSP model
with these extra events inserted is straightforward to automate and is done solely for checking this property. Once
checked, the original model is used for further analysis.
Verifying behavioural equivalence Every step in the box
structure development must be verified. Regarding black
boxes and state boxes, one must verify that they are behaviourally equivalent (as defined in Section 2.1). Typically,
this verification is done by deriving the black box behaviour
from the state box and checking whether it is equivalent to
the original black box specification. This is time consuming
and infeasible in practice. The algorithms presented here
allow both specifications to be translated into corresponding CSP models respectively and the necessary verification
checks to be done automatically in FDR.
Using Definitions 3.1 and 4.1, the behavioural equivalence between a sequence-based specification and a state
box specification as defined above can be formulated in
terms of the traces of the corresponding CSP processes: A
deterministic sequence-based specification Spec BB and a deterministic state box specification Spec SB are behaviourally
equivalent precisely when their respective CSP models P BB
and PSB are traces-equivalent, written as P BB ≡T PSB . This
is automatically checked using FDR. If the check fails, then
FDR produces counter-examples. Since these processes are
deterministic, it suffices to consider the traces-model. In the
next section, we discuss how this extends to nondeterministic processes.
Proposition 4.1 For a given state box specification that
captures a state box function SB, if a CSP process P is constructed according to Algorithm 4.1, then P models the state
box function SB according to Definition 4.1 and they are
therefore congruent.
Proof straightforward and omitted due to lack of space.
5. Automated analysis using FDR
In this section, we give a brief summary of the analysis
we do in FDR using this framework in industry.
Verifying black boxes and state boxes At each stage of
the box structure development for a component’s design,
once the specification is modelled in CSP, properties such
as completeness, determinism and control laws specified
in the system’s requirements (for example, verifying that
the vending machine never dispenses drinks without payment) can be formulated in CSP and automatically verified
in FDR.
In the case of state boxes, when defining the characteristic predicates as state data, one must ensure that they correctly partition S ∗ : they must be non-overlapping and complete. In practice, errors during this phase are frequently introduced. Both properties are easily checked in FDR. The
first of these is captured by checking for nondeterminism;
if there are overlapping predicates, then there will be a state
where, for a given stimulus s, at least two guards evaluate
to true, leading to a nondeterministic choice as to which of
these is performed. To ensure that overlapping state predicates always result in nondeterminism, the CSP model is extended by introducing unique events immediately after every stimulus event. For example,
P(x)
=
=
Verifying parallel composition The box structure method
is component-centric in the development of designs. Once
translated into CSP, we analyse their parallel composition
and check that they satisfy the system’s specification. Properties such as deadlock and control laws specified by the
requirements are formulated in CSP and automatically verified in FDR. The compositionality property of CSP provides the scalability for such development in practice. As
discussed in Section 7, this is ongoing research.
6. Handling abstractions
x & s → e1 → r1 → P(f (x))
2 x & s → e2 → r2 → P(g(x))
x & s → (e1 → r1 → P(f (x))
e2 → r2 → P(g(x)))
In [8], a number of effective abstraction techniques are
presented for avoiding over-specification in the sequencebased specification method. We briefly summarise how our
approach can be extended to handle two of these we use in
industry.
where x is the (overlapping) state predicate derived in the
state box, s is a stimulus, r1 and r2 are responses, g and f
model the state updates, and e 1 and e2 are the unique events
Predicate functions To avoid over-specification, details
can be abstracted away in numerous ways, for example, by
8
introducing abstract stimuli (rather than specifying the complete enumeration with actual concrete stimuli of the system). One must then show that the two representations are
behaviourally equivalent.
Such abstraction typically involves mapping sequences
from distinct equivalence classes into a single sequence
and defining predicates over them to capture their distinctive characteristics. These predicates can then be defined
when that level of detail is required. For example, consider a
vending machine where dispensing a drink involves checking a number of resources for availability. Enumerating this
and identifying distinct states (equivalence classes) for every combination would be inefficient.
An alternative is to abstract this level of detail and introduce a predicate p that determines whether or not there
are sufficient resources to enable a given drink d to be
dispensed. Such predicates are typically defined in terms
of prefix recursive functions (known as specification functions) over stimuli sequences. In this case, p may be defined
as (Resource(h) = ∅), where Resource(h) is defined over
the stimuli sequence h and returns the set of available resources. A stimulus s whose response depends on this predicate (for example, tea), is then modelled by the special abstract stimuli s : p and s : ¬p. For example,
Stimulus
s : Resource(h) = ∅
s : Resource(h) = ∅
Response
dispense(s)
unavailable(s)
Equiv
ci
cj
It is straightforward to weaken Definition 3.1 and extend
Algorithm 3.1 to construct corresponding CSP models and
model the potential nondeterminism. For abstract stimuli of
the form:
Stimulus
s:p
s : ¬p
Response
r1
r2
Equiv
ci
cj
...
...
...
where p is an undefined predicate, the CSP process is defined as follows:
P =
(s → r1 → Pci ) 2 (s → r2 → Pcj )
thereby potentially leading to nondeterminism after s is performed (depending on whether subsequent responses and
behaviours are distinct).
In practice, we specify component interfaces of a system
using this approach and verify whether a proposed design
(translated into CSP from BSDM) satisfies the intended interface automatically with FDR by using CSP refinement.
Due to transitivity of refinement and monotonicity of the
CSP operators with respect to refinement, CSP supports
compositional development, enabling this approach to scale
in practice.
...
...
...
7. Conclusion
The problem domain motivating this work is the development of practical software systems where the complexity and concurrency make them increasingly unreliable with
conventional testing methods alone. Formal languages such
as CSP provide the necessary theory for reasoning about
such systems, but are often very difficult to integrate into an
industrial environment.
In this paper, we demonstrated how these complementary approaches can be combined and presented generic algorithms for converting black box and state box specifications into CSP. We also gave a brief overview of how
we deal with under-specification, used for defining interfaces, and the resulting nondeterminism. As discussed below, we use this approach to enhance the designs of software
systems and verify them before the implementation phase
starts; in practice, design errors are frequently the most expensive ones to resolve (in terms of resources and time)
when only found during the testing phase. With these algorithms, the individual component specifications, as well as
their parallel composition, can be formally verified in CSP
using its model checker FDR.
There are a number of advantages of our proposed combined approach over some of the traditional formal methods, including CSP used on its own. The first one is its
scope for integration into an organised industrial environment, achieved by BSDM (and the sequence-based specification method). It is practical in the sense that it provides
Completeness and consistency is achieved by ensuring that
the predicate and its negation are defined over the same sequence. See [8] for further details.
It is straightforward to extend Algorithm 3.1 to handle
predicates that are defined over stimulus history, by introducing state variables to capture the essential information
(in the same way as is done for the state box specifications).
Instead of referring to the sequence history, the predicates
are computed in terms of the corresponding state data within
the CSP models. Each stimulus s in an enumeration table of
the form s : p is modelled as a choice with the boolean
guard p. For example, for the table above, we would have:
P(res) = res = ∅ & s → dispense.s → Pci (f (res))
2 res = ∅ & s → unavailable.s → Pcj (g(res))
where res is the state variable and the functions f and g reflect state updates as defined by Resource.
Nondeterminism A direct benefit of predicate functions
in sequence-based specifications is its scope for underspecification [8] (without breaking the function theory upon
which this approach is built). This is an essential form of abstraction in practice, especially for designing component interfaces where the internal actions are hidden from its environment. Under-specification is achieved by introducing
undefined predicates to capture the fact that for a given stimulus s, there are a number of possible responses and subsequent behaviour, the decision of which is as yet undefined.
9
traceability between the formal specifications and the informal requirements, is accessible to software engineers and
domain experts without the need for extensive training, and
from our experience of using this approach in industry over
the last two years, it has shown to scale well (references to
the experiences of others can be found in [9]). Secondly,
the specifications developed using BSDM can be translated
into other formal languages, in our case CSP, and it therefore acts as an effective bridge for introducing formal verification methods. Finally, the compositional nature of CSP,
together with the maturity of its model checker FDR, have
enabled the formal analysis to scale and provide the necessary automated tool support required in practice.
other machine components was deadlock free and behaved
correctly; (iv) the Machine Controller satisfied the control
laws under all conditions.
The finished Kernel was approximately 20,000 executable lines of C++. The specification, design and implementation effort was approximately 16 man weeks. The final code was integrated on the target machine in one
day. In the first 4 weeks of intensive use, 8 minor coding errors were found, none of which were specification or design errors; in more than 12 months since, no
other errors have been detected. Rework to date is below 1%.
Future work We are currently formalising the work summarised in Section 6 and extending our approach with other
practical abstraction techniques. Ongoing research is also
being done regarding efficient modelling of constructs such
as queues; we are seeking to classify common design patterns and process relationships, according to the modelling
techniques appropriate to each. Other avenues of interest
are: enabling the automated verification of different levels
of abstraction in the enumeration specifications using FDR
(for example, the abstract versus concrete stimuli); extending our approach for the refinement step from the state box
to clear box in BSDM; and modelling real-time systems in
this framework.
Industrial experience We have used this combined approach for developing complex, event driven embedded
control software controlling complex manufacturing machines. It has proven to be successful and scalable for the
development of new components and the re-engineering
of existing software components. For example, one of our
case-studies [2] involved the design and implementation of
a process control Kernel for a machine based on 22 headless PC platforms running in parallel and communicating
with each other via an internal Ethernet.
The Kernel was designed as two concurrent processes,
namely the Machine Controller and the State Controller
processes, communicating via queues. The Machine Controller implemented the basic operational behaviour of the
machine and controlled the other major subsystems. It was
responsible for machine consistency and enforcing the control laws governing safe operation. It used interfaces provided by the subsystems it controlled and it implemented an
interface used solely by the State Controller. The State Controller implemented the overall machine behaviour as seen
by the GUI component and thus the machine operator.
The black box function of the Machine Controller design
had 2,835 mappings in 47 equivalence classes. The length
of the longest canonical sequence was 11 stimuli. The black
box function of the State Controller design had 837 mappings and the length of the longest canonical sequence was
6 stimuli.
In addition to defining the design of the Kernel components as black boxes, the interfaces they implemented and
those they used were defined as underspecified black boxes.
Translating these specifications into CSP using the algorithms presented here enabled the following verifications to
be performed automatically by FDR: (i) the designs of both
Kernel processes were deterministic and implemented their
respective interfaces correctly; (ii) the parallel composition
of the State Controller and the process representing the interface of the Machine Controller was deadlock free and behaved correctly according to the Machine Controller’s interface; (iii) the parallel composition of the Machine Controller and the processes representing the interfaces of the
Acknowledgements
We thank Bill Roscoe for useful discussions and comments on this work.
References
[1] J. R. Abrial. The B Book: Assigning Programs to Meaning.
Cambridge University Press, 1996.
[2] G. H. Broadfoot and P. J. Broadfoot. Academia and industry meet: Some experiences of formal methods in practice.
In Proceedings of the 10th Asia-Pacific Software Engineering Conference, pages 49–59, 2003.
[3] Formal Systems (Europe) Ltd. Failures-Divergence Refinement: FDR2 User Manual, 2003.
[4] C. A. R. Hoare. Communicating Sequential Processes. Prentice Hall, 1985.
[5] G. J. Holzmann. The model checker spin. IEEE Transactions on Software Engineering, 23(5):279–295, 1997.
[6] H. D. Mills. Stepwise refinement and verification in box
structured systems. Computer, 21(6):23–26, 1988.
[7] H. D. Mills, R. C. Linger, and A. R. Hevner. Principles of
Information Systems Analysis and Design. Academic Press,
1986.
[8] S. J. Prowell and J. H. Poore. Foundations of sequence-based
software specification. IEEE Trans. of Soft. Eng., 29(5):417–
429, 2003.
10
[9] S. J. Prowell, C. J. Trammell, R. C. Linger, and J. H. Poore.
Cleanroom Software Engineering - Technology and Process.
Addison-Wesley, 1998.
[10] A. W. Roscoe. Model-checking CSP. In A Classical Mind:
Essays in Honour of C.A.R. Hoare, pages 353–378. PrenticeHall, 1994.
[11] A. W. Roscoe. The Theory and Practice of Concurrency.
Prentice Hall, 1998.
[12] J. M. Spivey. The Z Notation: a Reference Manual. PrenticeHall International, 1992.
11