Work report for Summer Student David Huw Jones
Reliability assessment of the beam energy tracking system
Author
Supervisor
:
:
David Huw Jones
Roberto Filippini
The premise
The BT/TL group of the AB (accelerating beam) department is responsible for the beam transfer lines, from the SPS to
the LHC and from the LHC to the beam dump blocks. Under the summer student scheme, the author was assigned a
supervisor who was a member of this department, Roberto Filippini. Roberto had been tasked to assess the reliability
of the safety-critical LHC beam dump system (LBDS), and, as his supervisee, the author was asked to complete an
assessment of the reliability of a component of this system, the beam energy tracking (BET) system. Part of this work
has been presented to an open audience on the 11 th of august. The presentation title was 'Assessing the reliability of
the beam energy tracking system'. After completion of this reliability assessment, the author attempted to apply novel
research to the design of the BET system as proof of its potential. This work is still pending.
Abstract:
The LBDS has been designed to achieve a set of demanding safety and reliability standards. This report details the
author's assessment of just how successful part of this system, the beam energy tracking system, is at meeting these
standards. To do this the author employed a number of industry-standard reliability assessment techniques that include
failure-mode effects analysis, fault-tree analysis and Markov analysis. This report concludes that there will be 0.87
fail-safe false alarm beam dump requests and approximately zero probability of failing unsafe due a failure of the BET
system over the period of a year.
1.
A functional overview of the BET system
The role of the LHC beam dumping system (LBDS) is to safely extract both beams from the LHC into two graphite
blocks. The LBDS consists of 15 horizontally deflecting extraction kicker magnets (MKD), one superconducting
quadrupole Q4, 15 vertically deflecting septum magnets (MSD), 10 dilution kicker magnets (MKB) and several
hundred meters of transfer line up to a dump absorber block for each ring [LHC].
The role of the BET system is: 1) to bind the strength of the kicker magnets to the beam energy by calculating the
settings of the MKD and MKB systems, 2) to verify the tracking of the MKD, MKB and MSD systems interlocking
their settings with the present beam energy. Formula 1 shows the relationship between the beam energy E, and the
strength of the magnetic field ß, needed to deflect it around a circle of radius ρ.
(1)
Assuming that E >> E0, the magnetic field strength required to safely extract the beam from the storage ring is
proportional to the energy of the beam. Also, the energy of the beam is proportional to the magnetic field strength used
to keep it within the confines of the storage ring [BET].
The BET part devoted to the generation of the MKD and MKB settings consists of the following modules:
1.
A beam energy acquisition system (BEA), for acquiring a value for the current of 2 bending dipole
magnets.
2.
A beam energy monitoring system (BEM), for calculating the beam energy from the values acquired
by 2 BEA cards. The value is sent to various programmable logic devices (PLC) and translated into
voltage settings for the power converters of the MKD and MKB systems.
The BET part devoted to the interlocking of the present beam energy and the settings of the magnets consists of:
1.
A BEA system, for acquiring a value for the current from 2 different bending magnets and the power
converters of the MKD, MKB, MSD and Q4 of the beam dumping system.
2.
A BEM system, for calculating the energy reference from the values acquired by 2 BEA cards.
3.
A BEI system, for comparing the energy reference from the BEM with the MKD and MKB, MSD and
Q4 settings.
The BEI issues a dump request to the beam interlock controller (BIC) if the difference between the settings of at least
one of the 27 (15 MKD, 10 MKB, 1 MSD, 1 Q4) monitored power converters and the present beam energy reference
exceeds a given threshold of 0.5%. In addition, the BEM system covers a number of internal and external (e.g. from
the BEA) failures, which lead to a safe dump request as well. The BIC (Beam Interlocking Controller) is responsible
for the management of all dump requests from the BET and for issuing the dump triggers to the MKD and MKB
systems.
After a dump request, the BET system is diagnosed, potential hidden failures are discovered and the system is returned
to an “as good as new” state.
Figure 1. System-level functional block-diagram
2.
Assessing the reliability and safety of the BET system
The goal of this study is a reliability and safety assessment of the BET for the data acquisition and energy reference
generation (including the interlocking part) over one year of LHC operation. All systems outside the functional
boundary of the BET system, magnets included, should not be considered in this analysis.
This analysis consists of 4 stages:
1. Creating a functional block-diagram of the system components.
2. Performing a failure-mode effects analysis (FMEA) on the system.
3. Calculating the failure rates of components from the MIL-HDBK-217 handbook.
4. Calculating the rate of false-alarm dump requests and the safety of the system.
2.1
Creating a functional block-diagram of the system components
The first stage of any reliability analysis is to define what the system is supposed to achieve. The BET system is
supposed to provide power configuration settings to the LBDS magnets to an accuracy of 2-16. It must also monitor the
settings of the LBDS magnets to ensure they have been correctly configured.
To achieve this specification the BET system consists of 3 different cards, the BEA, the BEM and the BEI cards. The
function of these cards has already been described in section 2, however before reliability analysis can be performed
this specification must be broken down to the functional specification of individual components. To achieve this the
hardware documentation was consulted.
2.2
Performing a failure-mode effects analysis (FMEA) on the system.
Having delved into the documentation of the system hardware and created an extensive functional block-diagram, the
next step is to consider the effects of a failure of each function in turn, in succession with other function failures, and as
part of a common-mode failure of other failures. This is a failure-mode effects analysis, a time-consuming stage that
requires a deep understanding of the system's operation and also a lot of imagination in the description of the failure
mechanisms and their propagation.
2.3
Calculating the failure rates of components from the MIL-HDBK-217 handbook
The qualitative assessment so far achieved must now be converted into a quantitative one. This is a two-stage process.
First, connections must be drawn between the failure modes of the previous section and the hardware responsible for
performing the functions to which the failures refer. This stage requires the consultation of the relevant schematics and
parts-lists of the hardware in question, then determining for each component whether a failure of the component will
correspond to a failure mode listed in the FMEA, and to which failures the component will contribute to.
Second, the failure rates of the responsible hardware must be calculated. The industry standard MIL-HDBK-217
[MIL] provides the documentation to collect the necessary failure rates for the components. The failure modes
distribution [FMD] handbook provides the means to account for how the component failure rate is apportioned across
the different ways the component can fail. For instance, the failure rate of a 0.6W metal-film resistor is 44FIT (number
of failures in 109 hours) under static, room-temperature conditions. 49.7% of these failures will be to an
open-connection.
2.4
Calculating the rate of false-alarm dump requests and the safety of the system
The final stage to failure analysis is to arrange the collected data into an appropriate model describing the interactions
between the failures and the relationship between each failure and the system failure modes - the fail-safe dump request
and the unsafe dump request failures.
There exist a number of modelling techniques suitable for reliability analysis, each requiring a different set of
assumptions. The author used 2 methods particularly popular in this field, fault-tree analysis and Markov modelling.
2.4.1
Fault-tree analysis
A fault tree is a Boolean expression where the elements assume two values {0,1}. The root of the fault tree is the system
failure. Fault-trees are able to model static failure process where each failure occurs independently from the others. For
the analysed case a failure process is “enabled” only if the respective surveillance has failed silent. In addition, the
system cannot fail-safe more than once in each mission, and the fail-safe event cause the stop of the operation, thus
shortening the mission time. Some modifications in the original semantic of the fault trees are necessary to model these
features, like the priority AND for the surveyed processes. A priority AND behaves like an AND if the events occur
with respect to a given time ordering.
Two fault trees have been built, one for the 'Failed Safe' state, the other for the 'Failed Silent' state [JON]. The rates for
the failure modes are calculated in Appendix A. The probability of failing to these states has been calculated over one
year. The number of expected false dumps per year is 0.87 while the probability of failing unsafe is 2.89 x 10 -313 per
year. The result says that the system is absolutely safe. This is due to the fact tht in order to fail unsafe all 54 BEI
must have failed, leaving a BEM failure uncovered, which is unrealistic. Note also that the VME crate power supply
that supplies the BEI and BEM cards, is fail-safe, i.e. results in a dump request being issued to the BIC. The failure
rate of the power supply is insignificant compared to the failure rate of the BET system.
2.4.2
Markov modeling
A Markov model is a stochastic model that describes a discrete events system. It can be used to predict the probability
distribution of a system governed by a stochastic process provided the Markov assumption holds, namely: the future
evolution of the state is independent of the past and only depends on the present [HOY]. Note, with respect to the
fault-tree analysis, the Markov model provides the probability distribution at time t, in the space of states of the system,
while the fault-tree gives the probability for just one state only. The model is shown in Figure 2.
Figure 2. A Markov model of BETS
The results from this analysis are shown in figures 3 and 4. They approximate to those of section 2.4.1 within a
reasonable boundary of error. There will be an expected 0.81 false beam dump requests and 1.35 x 10 -313 unsafe
failures per year due to failures in the BEA, BEM or BEI systems. That the predicted failure rates differ to those of the
fault-tree model is an expected consequence of the different assumptions each relies upon. Note that the Markov
model of Figure 2 assumes the BET coverage is limited to the voting of the BEI, that the other interlock coverages do
not to fail. This assumption, although strictly unnecessary, has been included to significantly simplify an otherwise
complete and more complex model.
Fig 4.
Fig 3. P(false-alarm beam dumps)
P(unsafe failures)
The
static
(Markov);
the
value
model (fault-tree) is more pessimistic than the dynamic model
the pessimistic assumption (F1 failure rate of 0.87/year) will be
concluded.
3.
Conclusions
The beam energy tracking system is a critical component of the beam dumping system and it has been analysed for one
year of operations, 400 missions in total, each with an assumed average length of 10 hours. The failure modes and
rates of the system have been deduced at component levels and then arranged into a fault tree model and a Markov
chain. The results of the analysis are the average number of beam dump requests generated internally the BET and the
probability the system delivers an incorrect beam energy reference.
The average number of fail-safe beam dumps requested per year is 0.87 per year for both rings.
The most likely cause of the fail-safe beam dump requests is a failure of one of the analogue-to-digital
converters on the BEA cards that supply the BEM cards.
The probability of failing unsafe, delivering the wrong beam energy reference to the magnets, is approximately
zero.
Results did not take into account the magnets failure and their coverage. Namely, if at least two BEI have failed then
the respective MKD magnets (14 out of 15 redundancy) remain without coverage, which represents a hazard.
Analogous reasoning holds for the MSD.
4.
References
[BET]
R. Gjelsvik, Development of the Beam Energy Meter System, CERN-2005
[HOY] A. Hoyland and M. Rausand, System Reliability Theory: Models and Statistical methods, Wiley, New York,
1994.
[FMD] Failure Mode/Mechanism Distributions, FMD-97, Reliability Analysis Center RAC, Rome (NY, USA), 1997.
[JON] D. Jones, Fault tree analysis of the BET using ISOGRAPH, C:\Summer student\Fault tree models
Djones\*.psa, August 2005.
[LHC] The LHC Design Report: Vol. I, The LHC Main Ring, CERN-2004-003, Geneva 2004.
[MIL] MIL-HDBK-217F, Reliability Prediction of Electronic Equipment, Department of Defence, Washington D.C.
USA, 1993.
© Copyright 2026 Paperzz