Boeing Fusion Energy Strategic Plan

Modern RAMI Method Capabilities
Tom Weaver
I-Li Lu, Ph.D.
Counter CBRNE DEW
Applied Statistics
Platform Systems Technology
Platform Performance Technology
Boeing Research and Technology
Boeing Research and Technology
ARIES 2011 Quarter #2 Review
27-28 July, 2011; Gaithersburg, Maryland
Page 1 of 14
Introduction
• Proposed ARIES Reliability Estimation Tool is
the most immediately needed tool
• Making the tool will draw on a range of
techniques and processes that will fit within the
tool or will allow for future additional capabilities
• Some examples are:
– Root Causes Identification
– Failure Mode and Effects/Criticality Analyses
– Rogue Unit Identification and Tracing
– Trend Monitoring and Evaluation
ARIES Quarterly Review
Gaithersburg, MD – 27, 28 July 2011
Page 2 of 14
Root Causes Identification
•
•
•
Objectives
Locate the shortest troubleshooting path
Determine the most operationally efficient path
Find the path that minimizes overall maintenance cost
Time between Failures:
Empirical distributions and
the “remaining lifetimes”
Empirical data
2
1
Status=“A”
pF
FF
IF
1-1-2
Status
Missing
1- pF
0
FF
1(1- pF)
pF
1
NFF
2
0
Non-Induced
Failure
Induced
Failure
IF 1 pF
IF
subjective input
Status=“NA”
NFF
1-
1-
1-
cF
1-
NFF
IF
IF)1 pF
ARIES Quarterly Review
Gaithersburg, MD – 27, 28 July 2011
Unconfirmed
Failure
Induced
Failure
cF
(1-
cF)(1-
IF)1 pF
Unconfirmed
Failure
(1-
cF)
(1-
c)(1-1-2)pF
(1-1-2)(1-pF)
IF
cF
Confirmed
Failure
1-pF
FF
Non-Induced
Failure
cF
cF(1-
Time between Events:
Empirical distributions and
the “remaining lifetimes”
Maintenance
Action
IF(1-1-2)pF
Confirmed
Failure
cF
(1-
c)(1-1-2)pF
Page 3 of 14
FMEA/FMECA
Risk
Assessment
Risk Priority
Number
=S O D
Failure
Modes
Definition
Criticality Index
n
  
i 1
i
i
t
i i
Flight/Mission
Phase Duration
t
Failure Rate
Probability of
Loss
Failure Mode
Ratio
Detectability
Index of
Failure
Modes
Severity
Index of
Failure
Effects
Occurrence
Index of
Failure
Modes
ARIES Quarterly Review
Gaithersburg, MD – 27, 28 July 2011
Page 4 of 14
FMEA Risk Elements - Severity
Severity:
Determine all failure modes based on the functional requirements and their
effects. Examples of failure modes are: Electrical short-circuiting, corrosion
or deformation. A failure mode in one component can lead to a failure mode
in another component, therefore each failure mode should be listed in
technical terms and for function. Hereafter the ultimate effect of each failure
mode needs to be considered. A failure effect is defined as the result of a
failure mode on the function of the system as perceived by the user. In this
way it is convenient to write these effects down in terms of what the user
might see or experience. Examples of failure effects are: degraded
performance, noise or even injury to a user. Each effect is given a severity
number (S) from 1 (no danger) to 10 (critical). These numbers help an
engineer to prioritize the failure modes and their effects. If the severity of an
effect has a number 9 or 10, actions are considered to change the design by
eliminating the failure mode, if possible, or protecting the user from the
effect. A severity rating of 9 or 10 is generally reserved for those effects
which would cause injury to a user or otherwise result in litigation.
ARIES Quarterly Review
Gaithersburg, MD – 27, 28 July 2011
Page 5 of 14
FMEA Risk Elements - Occurrence
Occurrence
Determine the cause of a failure mode and how many times it occurs. This
can be done by looking at similar products or processes and the failure
modes that have been documented for them. A failure cause is looked upon
as a design weakness. All the potential causes for a failure mode should be
identified and documented. Examples are: erroneous algorithms, excessive
voltage or improper operating conditions. A failure mode is given an
occurrence ranking (O), again 1–10. Actions need to be determined if the
occurrence is high (meaning > 4 for non-safety failure modes and > 1 when
the severity-number from step 1 is 9 or 10). This step is called the detailed
development section of the FMEA process. Occurrence also can be defined
as %. If a non-safety issue happened less than 1%, we can give 1 to it. It is
based on your product and customer specification
ARIES Quarterly Review
Gaithersburg, MD – 27, 28 July 2011
Page 6 of 14
FMEA Risk Elements - Detectability
Detectability
Determine appropriate actions and their efficiency. In addition, design
verification is needed. The proper inspection methods need to be chosen.
Start from the current controls of the system, that prevent failure modes
from occurring or which detect the failure before it reaches the critical stage.
Then identify testing, analysis, monitoring and other techniques that can be
or have been used on similar systems to detect failures. From these
controls one can learn how likely it is for a failure to be identified or
detected. Each combination from the previous 2 steps receives a detection
number (D). This ranks the ability of planned tests and inspections to
remove defects or detect failure modes in time. The assigned detection
number measures the risk that the failure will escape detection. A high
detection number indicates that the chances are high that the failure will
escape detection, or in other words, that the chances of detection are low.
ARIES Quarterly Review
Gaithersburg, MD – 27, 28 July 2011
Page 7 of 14
FMECA Process
FMECA extends FMEA by including a criticality analysis, which is used to chart the probability of
failure modes against the severity of their consequences. With the following logical steps:
–
Define the system
–
Define ground rules and assumptions in order to help drive the design
–
Construct system block diagrams
–
Identify failure modes (piece part level or functional)
–
Analyze failure effects/causes
–
Feed results back into design process
–
Classify the failure effects by severity
–
Perform criticality calculations
–
Rank failure mode criticality
–
Determine critical items
–
Feed results back into design process
–
Identify the means of failure detection, isolation and compensation
–
Perform maintainability analysis
–
Document the analysis, summarize uncorrectable design areas, identify special controls necessary
to reduce failure risk
–
Make recommendations
–
Follow up on corrective action implementation/effectiveness
ARIES Quarterly Review
Gaithersburg, MD – 27, 28 July 2011
Page 8 of 14
Rogue Unit ID and Tracing
Failure Data by Part
Serial Number
RAT, MHT, LTT tests
Trend
Tests
Constant Trend
Monitoring & Alerting
Impact Analysis of
Preventive Maintenance
Improving
Trend
Failure
Category
Non-Technical
Analytical process to
focus on logistics and
repair quality
Design Evaluation
and Modification
Technical
Renewal Process
By External Factors
Policy Evaluation &
Cost Optimization
Decreasing
Hazard Rate
Hazard
Rates
Constant Hazard Rate
Increasing
Hazard Rate
Reliability Analysis on
TTUR, NFF, & TTF
ARIES Quarterly Review
Gaithersburg, MD – 27, 28 July 2011
Design Modification &
Optimal Inspection
Interval
Rogue Units
Identification
Page 9 of 14
Basic Trend Monitoring
• The Reverse Arrangement Test (a simple and useful test
that has the advantage of making no assumptions about
a model for the possible trend)
• The Military Handbook Test (optimal for distinguishing
between "no trend' and a trend following the NHPP
Power Law or Duane model)
• The Laplace Trend Test (optimal for distinguishing
between "no trend' and a trend following the NHPP
Exponential Law model)
ARIES Quarterly Review
Gaithersburg, MD – 27, 28 July 2011
Page 10 of 14
Reverse Arrangement Test
Given r repairs, T1, T2, ...., Tr, the interarrival times I2=R2-T1, I3=T3-T2, ...., Ir=Tr-Tr-1
and the censoring time Tend > Tr, we calculate how many instances we have
of a later interarrival time being strictly greater than an earlier interarrival
time. These are called reversals. Too many reversals indicates a significant
improving trend and too few reversals indicates a significant degradation
trend. More formally,
• Count a reversal every time Ij < Ik for some j and k with j < k.
• Compute the total number of reversals, R.
• For r repair times, the maximum possible number of reversals is r(r-1)/2.
• If there are no trends, the expected number of reversals is r(r-1)/4.
• For r > 12, the following approximation can be used to determine if the number of
reversals is statistically significant.
z
R  r  r  1 4  0.5
 2 r  5  r  1 r 72
• The advantage of this test is that it is simple and it makes no assumptions about a
model for the possible trend
ARIES Quarterly Review
Gaithersburg, MD – 27, 28 July 2011
Page 11 of 14
Military Handbook Test
– Given r repairs, T1, T2, ...., Tr and the censoring time Tend > Tr,
we calculate the test statistic
r
Tend
T  2  ln
Ti
i 1
– This test statistic follows a chi-square distribution with 2*r
degrees of freedom.
– This test is recommended for the case when the choice is
between no trend and a non-homogeneous Poisson
process (NHPP) power law (Duane) model.
ARIES Quarterly Review
Gaithersburg, MD – 27, 28 July 2011
Page 12 of 14
Trend Monitoring Metrics
The Laplace Trend Test tests the hypothesis that a trend does
not exist within the data. The Laplace Trend Test can serve as a
preliminary metric to determine whether the system is
deteriorating, improving, or if there is no trend at all. Calculate
the test statistic, using the following equation:
where:


 



xi
T 



2 
i 1 N

T

12 N 
N
T = total operating time (termination time)
Xi = age of the system at the successive failure
N = total number of failures
The test statistic is approximately a standard normal random variable
ARIES Quarterly Review
Gaithersburg, MD – 27, 28 July 2011
Page 13 of 14
ARIES Quarterly Review
Gaithersburg, MD – 27, 28 July 2011
Page 14 of 14