Supporting material for the paper:- FBMTP: An

Supporting material for the paper:- FBMTP: An automated fault and behavioural anomaly detection and isolation tool for PLC controlled manufacturing systems
1
FBMTP: An automated fault and behavioural
anomaly detection and isolation tool for PLC
controlled manufacturing systems
Arup Ghosh, Shiming Qin, Jooyeoun Lee, and Gi-Nam Wang
Fig. 1. A BIW automotive manufacturing system model.

I. EXPERIMENTS WITH A SMALL VIRTUAL MANUFACTURING
SYSTEM (EXPERIMENTAL STUDY, RESULTS, AND DISCUSSION)
F
BMTP tool is implemented in C# programming language.
We have tested FBMTP on various real-world as well as
simulated (or virtual) manufacturing systems (obtained from
UDMTEK Co., Ltd. [1]) to find out its practical effectiveness.
Here, we report our experimental results based on a small-sized
virtual manufacturing system. This is because, it would be
impossible in such a short space to discuss a real-world
In this supplementary file, we present the experimental results validating the
proposed approach on a wide range of manufacturing system scenarios. These
experimental results further illustrate the importance and the effectiveness of
the proposed approach.
***In this file, the term “PAPER” refers to the original article i.e., “FBMTP:
An automated fault and behavioural anomaly detection and isolation tool for
PLC controlled manufacturing systems”.***
For any query please contact the author directly: Arup Ghosh (email:
[email protected])
manufacturing system comprehensively. Moreover, a real
system cannot be used freely (for example, faults or anomalies
cannot be inserted into the system arbitrarily, partitioning the
system into subsystems cannot be achieved easily etc.). The
architecture of the virtual manufacturing system in which our
experiments are performed, is presented in Fig. 1. This virtual
system is designed and implemented very carefully (by using
PLC Studio Software [2]) so that it behaves in the same manner
as the real system. Just to remind the readers, in the context of
this PAPER, the term ‘large system’ refers to a system where
the number of signals that can be accessed in each PLC scan
cycle, is substantially lower than the total number of signals.
So, even if the given system is small in size, we can still convert
it into a large system by limiting the access of the data logger to
a very few signals in each PLC scan cycle. The architecture of
this virtual system (or the underlying PLC program) is actually
taken from a real-world Body-In-White (BIW) automotive
manufacturing subsystem. It works as follows (also see Fig. 1):
I.
If the part storage location is not empty (detected by using
Supporting material for the paper:- FBMTP: An automated fault and behavioural anomaly detection and isolation tool for PLC controlled manufacturing systems
2
Fig. 2. List of the PLC I/O signals and their respective functions.
the sensor Str_Part_CHK), then the green light of the
signal lamp Lmp will be ON; otherwise, the red light of the
signal lamp Lmp will be ON. If the green light is ON, then
the system works as follows:
o a part (i.e., a door panel) is taken out from the storage
location and is loaded to the part loader L (this is a
manual operation).
o the part loader L moves along the rail track (towards the
robot RB1).
o after the part loader L reaches the end position of the rail
track (its advanced position), the robot RB1 starts its
operation. The complete operation of the robot RB1
comprises two sub-tasks:– i) first task: move the robotic
arm close to the part loader L and then grasp the part on
it; and ii) second task: pick the part from the part loader
o
o
o
o
o
o
L and then load it to the daecha D.
after the robot RB1 finishes its operation, the daecha
clamp DCLP grasps the part on the daecha D (the clamp
closing operation of the daecha clamp DCLP).
then, the robot RB1 starts to return back to its home
position.
after that, the welding robot RB2 moves its arm close to
the daecha D and starts to perform the sealing operation
(this whole operation is referred to as the welding task of
the robot RB2).
after the robot RB1 reaches its home position, the part
loader L starts to return back to its home position.
the robot RB2 finishes its welding task and then starts to
return back to its home position.
after the part loader L and the robot RB2 reach their
Supporting material for the paper:- FBMTP: An automated fault and behavioural anomaly detection and isolation tool for PLC controlled manufacturing systems
corresponding home positions, the daecha D starts to
move towards its advanced position.
o when the daecha D is in its advanced position, the
daecha clamp DCLP opens and the part is immediately
removed (manually) from the system.
o then, the daecha D returns back to its home position.
[this cycle starts again when another part is set on the
part loader L]
II. If the red light is ON, then the operator has to wait until the
storage is partially or completely filled (the storage can be
filled only after the completion of a system cycle).
We should mention that the daecha clamp DCLP is actually
composed of two clamps i.e., the left clamp and the right clamp.
A simulation video of this virtual manufacturing system can be
found in: https://www.youtube.com/watch?v=gB0Q5C3qGWo
(PLC Studio Software [2] is used for the simulation purpose).
The whole system is controlled by 24 sensor and 15 actuator
signals (in total 39 PLC I/O signals). A complete list of those
signals and their corresponding objectives are given in Fig. 2.
The signal names are arranged in that list according to the name
of the devices that they operates.
The DSVTF model of the above described manufacturing
system is given in Fig. 3 (Graphviz Software [3] is used for the
graphical visualization purpose). FBMTP has generated this
DSVTF model based on the log data records of twenty five
consecutive runs of the manufacturing system of Fig. 1. The
DSVTF model states are represented by the state numbers (for
example: X1, X2, and so on) instead of the actual boolean
vectors (or the hash codes) in order to make the model more
simple and easy to understand. The states are numbered
according to their appearance in the log data records. A part is
loaded to the part loader or the storage is filled after an arbitrary
time interval (recall that those operations are the manual
operations). Please note that the transition between state X2 and
X27 is executed only when the last remaining part is taken out
from the part storage location. In Fig. 3, the brown coloured
arrows represent the transitions with high-variance transition
times; the blue coloured arrows represent the low-frequency
transitions; and the black coloured arrows represent the rest of
the transitions (recall that this differentiation is required for the
behavioural anomaly detection). The TTO times associated
with the transitions (see Fig. 3) are given in milliseconds (MS)
where, TTO Time = (corresponding maximum transition time ×
1.02) [this is a virtual system and hence, the state transitions
occur very quickly]. There is no Type II system state in the
DSVTF model of Fig. 3 because, no signal changes its status
value too frequently. The parameter values of the time
inaccuracy bound TIB are given as follows (the symbols have
the same meanings as in Definition 5 – see Subsection 4.2.1 of
the PAPER): i) N = 39; ii) n = 10 (means, four PLC scan cycles
are required to obtain all the PLC I/O signal data); iii) δ = 10
MS; and iv) ∂ = 10 MS (so, TIB = 40 MS).
In our original system, three PLC scan cycles are required to
obtain all the PLC I/O signal data. Here, we have increased that
number to four in order to find out the effectiveness of FBMTP
for even larger system (given the fact that the state transitions
3
occur very rapidly in this virtual system, this is a quite high
factor). As can be seen from Fig. 3, FBMTP is able to define the
complete process behaviour by using only 51 states and 53
transitions (that means almost linear state and space
complexity). It is highly unfair to compare FBMTP with any
other approaches (for details see Subsection 2 of the PAPER)
because, as stated earlier, those approaches are not intended to
handle the large manufacturing processes or the data
inaccuracy issues. However, for the sake of comparison, if we
apply the NDAAO approach [4] on the same set of log data
records, then it generates 103 states and 141 transitions [we set
the preceding sequence length Lps = 0 (see Subsection 4.3 of the
PAPER); otherwise, it generates an extremely large and
complex NDAAO model]. Please note that the number of states
and transitions required to express the same system behaviour
are approximately doubled (because, the NDAAO approach
does not eliminate the redundant transition paths and the
unstable system states from the control process model). This
ratio generally increases unboundedly with the growing
number of signals and TIB time; and hence, may lead to a state
or space explosion problem for large-sized systems [here, we
have used the NDAAO approach [4] as an explanatory example
– the other existing approaches cited in the PAPER also induce
the same problems].
The existing literatures on this subject have not provided
much experimental details about the accuracy of their FDI
approach or the evaluation procedure of it (most of them do not
solve the BADI problem as well). In FBMTP, the information
related to the system alarm, transition time, transition execution
pattern, undetected fault propagation, infinite looping at a state
etc., has been taken into consideration; and sufficient measures
have been implemented to handle the data inaccuracy issues.
These measures theoretically make FBMTP much more
accurate and effective than the other existing approaches (see
Subsection 2 of the PAPER). In order to practically evaluate it,
we selected seven participants and asked each of them to insert
two faults into the virtual system of Fig. 1 (the categories of the
faults are aforementioned in order to avoid the redundancy
problem). The faults are generated by inserting the incorrect
signal status values into the PLC memory using the
KEPServerEX Software [5]; and by activating or deactivating
the device components using the PLC Studio Software. The
results of this experiment are summarized in Fig. 4 [in Fig. 3
and Fig. 4, the SSC elements sets of the DSVTF model
transitions and the (stable) faulty transitions are sorted
according to some predefined signal numbering scheme]. The
state expiry time associated with each DSVTF model state is set
equal to the maximum transition path time of all possible
transition paths of length six from that state (see Subsection 5.2
of the PAPER). In Fig. 4, fault number 3, 8 and 12 are the
examples of a fault without a faulty transition. Among the
others, fault number 1, 4, 7, 9, 13 and 14 are the examples of the
Cause I fault case scenarios and fault number 2, 5, 10 and 11 are
the examples of the Cause II fault case scenarios (these are the
examples of a fault with a faulty transition – for details see
Subsection 5.3 of the PAPER).
Supporting material for the paper:- FBMTP: An automated fault and behavioural anomaly detection and isolation tool for PLC controlled manufacturing systems
Fig. 3. DSVTF model of the manufacturing system of Fig. 1.
4
Supporting material for the paper:- FBMTP: An automated fault and behavioural anomaly detection and isolation tool for PLC controlled manufacturing systems
5
Fig. 4. The output results of the FDI method of FBMTP.
As can be seen in Fig. 4, FBMTP is able to detect 13 out of
14 faults correctly (accuracy rate: 93%). Moreover, it does not
produce any false positives. The fault number 6 remains
undetected because, the actuator associated with the signal
Lmp_Red does not have any impact on the rest of the system
operations. Actually, the fault number 6 causes both the red and
the green light of the signal lamp Lmp to glow together. We
assume that the operator continues to supply the parts to the
system by ignoring such display signs. This makes the actuator
associated with the signal Lmp_Red irrelevant and hence, that
soft fault (i.e., fault number 6) remains undetected. The fault
number 11 of Fig. 4 is yet another example of the soft fault
[here also, we assume that the operator oversights the display
signs (i.e., the red light of the signal lamp Lmp)]. However, as
can be seen, FBMTP is able to detect this soft fault accurately.
From our practical experience, we have seen that most of the
times, the soft faults remain undetected because of the
following two reasons: i) similar types of soft faults (or
anomalies) have already been assimilated into the nominal
DSVTF model; and ii) the system has some structural or
functional deficiencies (for example, consider the objectives of
the sensor and actuators associated with the signal lamp Lmp).
So, those issues are needed to be addressed first by the system
engineers in order to significantly reduce the probability of
occurrence of an undetectable soft fault (as there exists no other
ways to deal with such issues). If all the structural and
Supporting material for the paper:- FBMTP: An automated fault and behavioural anomaly detection and isolation tool for PLC controlled manufacturing systems
6
Fig. 5. The output results of the BADI method of FBMTP.
functional deficiencies associated with the system of Fig. 1
(particularly, the signal lamp Lmp) are resolved, then FBMTP
can accurately detect all the faults of Fig. 4.
Recall from Subsection 5.3.3 of the PAPER that FBMTP can
produce an inaccurate initial fault candidate set only in the
following two cases: i) in cases where the fault is not detected at
the source state (the exact state where the fault has actually
taken place); and ii) in cases where the data logger fails to
detect all the faulty SSC events. As can be seen in Fig. 4, the
SSC event/s of the faulty signal/s is correctly included in the
initial fault candidate set in 12 out of 13 fault cases (accuracy
rate: 92%). In case of fault number 9, FBMTP fails to include
any SSC event of the faulty sensor signal D_RET in the initial
fault candidate set because of the faulty SSC event miss
incidents (for details, see Subsection 5.3.3 of the PAPER).
However, even in that case, FBMTP is able to correctly identify
that the sensor signal D_RET is changing its status value
irregularly. So, the actual faulty signal is ultimately isolated.
Recall that in case of determining the exact fault candidate set,
FBMTP may produce an incorrect exact fault candidate set (in
other words, may misclassify the cause of the fault) especially
if the user-defined time threshold value is not set appropriately
(see Rule 5 in the PAPER). In the above experiment, we have
set the path length parameter value to 5 and the time threshold
parameter value to approximately 650 MS (actually, a little
extra value is dynamically added to that time depending on the
transition path time). As can be seen in Fig. 4, FBMTP is able to
capture the SSC event/s of the faulty signal/s in the exact fault
candidate set in 8 out of 9 Cause I and Cause II fault cases
(accuracy rate: 89%) [the fault number 9 is excluded because of
the faulty SSC event miss incidents]. If the time threshold value
is not set too low (approximately, < 240 MS) or too high
(approximately, > 1970 MS), then FBMTP provides the same
accurate results as in Fig. 4 (given the fact that it is a virtual
system, the time range is quite wide). Only in case of fault
number 11, FBMTP is unable to correctly identify the real
cause of the fault. This is because, its corresponding transition
times fluctuate immensely (as they are dependent on the users’
inputs) and hence, it becomes very difficult to set the
appropriate time threshold value without any prior knowledge.
For real-world systems, we recommend users to set a few
seconds to the time threshold parameter depending on the time
fluctuations of the state transitions.
If we limit the number of signals that can be accessed by
FBMTP in each PLC scan cycle i.e., n to 4 (that means ten PLC
scan cycles are required to collect all the PLC I/O signal data)
then the TIB time becomes 100 MS. It is easy to realize from
Fig. 3, two DSVTF model states i.e., state X26 and X51
become unstable as a consequence of it and hence, are
discarded from the model. However, it does not alter much the
FDI results of Fig. 4. Only in case of fault number 5, two extra
SSC
elements
i.e.,
RB2_RUNNING_OFF
and
RB2_READY_ON are added in the initial and exact fault
candidate set. So, basically, it does not affect the accuracy of
Supporting material for the paper:- FBMTP: An automated fault and behavioural anomaly detection and isolation tool for PLC controlled manufacturing systems
the FDI results (only marginally increases the number of SSC
elements in the fault candidate sets). Theoretically, following
the same procedure, we can arbitrarily increase the TIB time
until the complete DSVTF model is scaled down to a very few
states, so that FBMTP produces incorrect FDI results in most
the fault cases. However, doing so does not practically make
any sense because, we have already set the N/n ratio of the TIB
time (see Definition 5 in the PAPER) to 10, which is quite high
compared to the N/n ratio of a real-world PLC controlled
manufacturing system (also keep in mind that the state
transitions occur very rapidly in this virtual manufacturing
system). In the similar way, if we set the number of accessible
signals n to 39 (implies that all the I/O signals can be captured
in a single PLC scan cycle – the property of a small
manufacturing system) then the TIB time becomes 10 MS. This
measure introduces 18 new additional states and transitions in
the DSVTF model of Fig. 3 and hence, the FDI results of Fig. 4
are completely modified. The most significant change it brings
is that FBMTP produces the correct initial and exact fault
candidate set for fault number 9. This becomes possible
because, the data logger is now capable of detecting all the SSC
events of all the PLC I/O signals accurately. So, in this setting
(TIB = 10 MS), FBMTP is able to correctly identify the real
cause of the fault in 9 out of 10 Cause I and Cause II fault cases
(accuracy rate: 90%). Moreover, the number of (irrelevant)
SSC elements in the fault candidate sets are also reduced by 1 to
2 elements in case of 5 out of 13 detected faults (for fault
numbers 2, 5, 9, 12 and 13).
The above discussed experimental results provide enough
evidence to support the claims made throughout the PAPER. If
we apply the NDAAO approach to detect the same set of faults
(as in Fig. 4), then two faults i.e., fault number 6 and 11 remain
undetected (accuracy rate: 86%). The fault number 11 remains
undetected because, in the NDAAO approach, the transition
execution pattern (or transition time) information is not
incorporated into the control process model. Moreover, for the
settings n = 10 and 4 (that means, TIB = 40 and 100 MS), the
NDAAO approach generates an excessive number of false
positives during the fault detection phase, which makes it
practically infeasible for large manufacturing systems.
However, if we set the n value to 39, then the NDAAO
approach does not produce any false positives and provides
almost similar fault isolation results as FBMTP (both of them
are able to incorporate the SSC events of the faulty signals into
the fault candidate sets for all the detected faults, except fault
number 9). Please note that even in that setting, FBMTP
provides more restricted fault candidate sets than the NDAAO
approach (by applying the exact fault candidate set finding
method). From the discussions throughout the PAPER, it is
easy to perceive that FBMTP will always grossly outperform
than the NDAAO approach for large manufacturing systems;
and for small manufacturing systems, FBMTP will generally
provide more accurate FDI results (because, as stated earlier,
the information related to the system alarm, transition time,
transition execution pattern, undetected fault propagation etc.,
has also been taken into account in FBMTP). These claims are
also empirically validated through several other experiments
7
(in addition to the above experiment – see below):
 experiments with virtual systems: these experiments are
performed by inserting dozens of software-generated
faults into the seven different virtual manufacturing
systems (the N/n ratio is set to 1, 4 or 5 and 10).
 experiments with real-world systems: these experiments
are performed on six log databases taken from four
different automotive manufacturing systems (they have
in total eight faulty device components). We have also
inserted twenty-two software-generated faults into those
databases (the simulated faults are carefully generated
with the help from the engineers of UDMTEK Co.,
Ltd.).
In all the above experiments, it is found that FBMTP
provides more accurate FDI results than the NDAAO approach
(for all the settings of the N/n ratio). Moreover, FBMTP gave
more than 83% accurate FDI results (which is quite high
accuracy rate) in all the above cases (by the term ‘accurate FDI
result’, we actually mean that the fault is detected correctly and
the faulty signal/s is isolated accurately).
Similar experiments are also carried out for evaluating the
accuracy of the BADI approach of FBMTP. We have found that
FBMTP can accurately detect and isolate most the behavioural
anomalies present in the manufacturing system (recall that the
BADI approach of FBMTP always provides an accurate
isolation result). As an example, the participants were asked to
insert several behavioural anomalies into the virtual system of
Fig. 1. The corresponding BADI results (for a few examples)
are shown in Fig. 5 (given for exemplification purposes only).
Actually, the accuracy rate of identifying behavioural
anomalies (such as, transition time error, transition and
transition time probability error etc.) is hard to determine as it
varies depending on the system user’s perspective (in other
words, varies based on the definition of the behavioural
anomalies). A system user often assigns different values to the
time and probability threshold parameters in order to find out
how compactly the system is working (mostly performed
off-line). Anyhow, if all the parameter values are set
appropriately, then a behavioural anomaly that causes a
significant deviation in the system behaviour, is detected and
isolated precisely (for details see Subsection 5.1 of the PAPER
– also see Fig. 5). At this point, we must clarify the fact that it is
not always possible to automatically identify (correctly) the
device components that carry out the physical/mechanical
operations associated with a particular state transition (required
for determining the DDC candidate set – see Fig. 5 and also see
Subsection 5.1 of the PAPER). A system engineer can easily
determine the device components connected with a
state-transition operation by inspecting the corresponding
DSVTF model and/or the PLC program. However, we strongly
recommend the users to create a separate file explaining the
linkages between the transition operations and the device
components. The mentioned file can also be used by FBMTP to
produce the needed DDC candidate set automatically. As
argued previously, in FBMTP, some faults and behavioural
anomalies that do not have much impact on the system
operation can remain unidentified. However, that is not really a
Supporting material for the paper:- FBMTP: An automated fault and behavioural anomaly detection and isolation tool for PLC controlled manufacturing systems
matter of concern because, from the point of view of the system
engineers, those faults or anomalies are unimportant or
irrelevant.
ACKNOWLEDGMENT
The authors would like to thank UDMTEK Co., Ltd. for the
use of its research facilities during this study.
REFERENCES
[1]
UDMTEK Co., Ltd., Website: http://www.udmtek.com, last retrieved
on November 20th, 2016.
[2]
PLC Studio Software, Website: http://www.udmtek.com/esub04_04_01,
last retrieved on November 20th, 2016.
[3]
Graphviz Software, Website: http://www.graphviz.org, last retrieved on
November 20th, 2016.
[4]
[5]
M. Roth, S. Schneider, J. J. Lesage, and L. Litz, “Fault detection and
isolation in manufacturing systems with an identified discrete event
model,” Int. J. of Syst. Sci., vol. 43, no.10, pp. 1826–1841, Oct. 2012.
KEPServerEX
Software,
Website:
https://www.kepware.com/products/kepserverex/, last retrieved on
November 20th, 2016.
8