Shuttle Risk Progression – Focus on Historical Risk Increases

International Journal of Performability Engineering, Vol. 9, No. 6, November 2013, pp. 633-640.
© RAMS Consultants
Printed in India
Shuttle Risk Progression – Focus on Historical Risk
Increases
T. L. HAMLIN*
Safety and Mission Assurance, National Aeronautics and Space Administration,
Houston, Texas, 77058
(Received on March 22, 2013, revised on March 27, and June11, 2013)
Abstract: It is important to future human spaceflight programs, to understand
the early mission risk and the impact of design, process, and operational
changes on risk. The Shuttle risk progression assessment used the knowledge
gained from 30 years of operational flights and the Shuttle Probabilistic Risk
Assessment (PRA) to calculate the risk of Shuttle Loss of Crew at significant
milestones beginning with the first flight. The results indicated that the Shuttle
risk tends to follow a step function as opposed to following a traditional
reliability growth pattern. In addition, the results showed that risk can increase
due to trading safety margin for increased performance, due to external events
or due to intended (disabling ejection seats) or unintended (Space Shuttle Main
Engine Block II upgrade) consequences of design changes. This paper will
focus on examining those cases where risk increased and explore the lessons
that can be learned by new programs.
Keywords: NASA, Space Shuttle, Probabilistic Risk Assessment
1.0
Introduction
It is important to future human spaceflight programs, to estimate the early
mission risk and the impact of design, process, and operational changes on risk.
The Shuttle risk progression assessment [1] used the knowledge gained from 30
years of operational flights and the Shuttle Probabilistic Risk Assessment (PRA)
to retrospectively estimate the risk of Shuttle Loss of Crew at significant
milestones beginning with the first flight. The term “progression” was chosen to
signify the movement through time from the Shuttle first flight to the final flight
and was not meant to imply progress with respect to risk reduction. The results
of the Shuttle risk progression indicated that the Shuttle risk tends to follow a
step function in which the risk is constant until changes in the design, process, or
operation occur. This is in contrast to the traditional, continuous reliability
growth pattern obtained from a purely theoretical prediction. In addition, the
results showed that risk can increase due to trading safety margin for increased
performance, due to external events or due to intended (disabling ejection seats)
or unintended (Space Shuttle Main Engine Block II upgrade) consequences of
design changes. This paper will focus on examining those cases where risk
increased and explores the lessons that can be learned by new programs from
these events.
_________________________________
*Corresponding author’s email: [email protected]
633
634
1.1
Teri L. Hamlin
Background
The Space Shuttle Program (SSP) initiated the development of a Shuttle
Probabilistic Risk Assessment (SPRA) in March 2001. The purpose of the
SPRA was to provide a useful risk management tool for the SSP to identify
strengths and possible weaknesses in the Shuttle design and operation.
The SPRA was designed to help answer critical risk-related questions such as:
• What is the integrated Space Shuttle risk?
• What are the most significant risk drivers?
• How significant is uncertainty on critical risk factors?
• How robust are current risk controls to mitigate critical risks?
• How can the program best target scarce risk mitigation resources?
• What is the impact of proposed changes in Shuttle design and
operations?
The general scope of the SPRA included hazard causes that may result in an inflight Loss of Crew and Vehicle (LOCV). In-flight was defined as the time from
launch (T-0) to wheel stop. Rendezvous and docking and extravehicular activity
occurred within this time frame. However, these activities were mission-specific
at the time and considered outside the scope. The vehicle configuration was
assumed to be equivalent to that of a generic Orbiter (i.e., all four Orbiters were
assumed the same). The hazards assessed in the SPRA generally consist of:
• Equipment functional failures
• Flammable/explosive fluid leaks
• Structural failures, including hits from ascent debris and
micrometeoroid/orbital debris
• Human errors
Each Shuttle mission was unique, defined by its payload, flight dynamics,
duration, etc. However most were beyond the resolution of the SPRA and could
be treated on a nominal basis. Some however, such as mission duration, have a
direct impact on the SPRA and needed to be specified. The nominal mission
duration was assumed to be 306 hours based use of STS-119 as a reference
mission. In addition, for the nominal SPRA mission, an ISS mission was
assumed. The ISS mission was chosen because the vast majority of the flights
fell into this category. Only in-flight end states are considered in the model;
therefore, only undetectable failures that could lead to the end states of interest
were considered prior to T-0. Conversely, failures occurring in flight that have
the potential to cause LOCV after wheel stop were not included.
The SPRA was modeled using the Systems Analysis Programs for Hands-on
Integrated Reliability Evaluations (SAPHIRE) software. The SPRA model was
developed using linked event trees to represent the integration of the vehicle
elements, the three mission phases (Ascent, Orbit and Entry), as well as four
performance aborts phases. A single entry point of “Launch” was used with fault
trees linked into the event tree top events.
In 2010 a study of how Shuttle Risk changed over time, known as the Shuttle
Risk Progression Study was initiated using the Shuttle PRA. The results were
published in 2011[1].
2.0 Methodology
In the Shuttle PRA, failure contributors which have been mitigated through
redesign or process improvements are discounted and appropriately reduced in
Shuttle Risk Progression – Focus on Historical Risk Increases
635
probability. The retrospective analysis approach is to remove these discounts in
order to estimate the risk prior to the improvements. For risk contributions
which are more complex and are quantified via Bayesian analysis, such as the
contributions from ascent debris or the Reusable Solid Rocket Motor (RSRM),
early flight risk estimates are specifically modeled and documented in the Shuttle
Risk Progression Report [3].
In order to ensure that the risk differences are not about a particular mission
objective, the analysis models the current mission with the vintage vehicle.
Since the model is based upon Iteration 3.3 of the SPRA[4], the mission duration
and Micro-meteoroid Orbital Debris (MMOD) risk are based upon STS-119.
Earlier missions although short in duration were dominated by risks which were
independent of mission length (e.g., RSRM, Ascent Debris). No model logic
changes were made to the Iteration 3.3 model. Inspection, repair, and crew
rescue improvements made after the Columbia accident were not included for the
flights prior to the Columbia accident.
As previously mentioned this analysis uses the SPRA and therefore is
subjected to its limitations. A description of the SPRA limitations can be found
in the Iteration 3.0 integration notebook [2]. In addition to the general SPRA
limitations, the following limitations are specific to the analysis and results
presented in this paper:
The analysis is based upon the current understanding of Shuttle risk looking
back after 30 years of operating history (and therefore does not address any still
unknown risks) covering those risk contributors that were considered.
The analysis can be used to inform a new program of the general trend of
reliability growth for a complex high risk vehicle but specific values should not
be used since a new program will have its own lessons to learn, may be starting
at a different point, and operating under different conditions.
3.0
Shuttle Risk Progression Results
Figure 1 provides the estimate of the overall Shuttle risk progression. Risk is
defined here as the probability of loss of crew. The uncertainties on the estimates
are roughly a range factor of 2 with the relative changes in risk having smaller
uncertainties because of the positive correlation due to the common baseline risk.
As observed, the overall trend in the failure probability is a significant decrease
from approximately 1:12 for the first flight to 1:90 at the latest flight. The failure
probability also remains approximately constant between changes made to the
Shuttle design and/or operation that could impact the failure probability. It is
important to note that the failure probability does not monotonically decrease
with time/missions, but instead increases at some points. These increases will be
focused on since they provide important information and lessons.
636
Teri L. Hamlin
0.12
1:10 1:10
1:10
0.1
1:12
Probability
0.08
1:17
0.06
1:21
1:21
0.04
1:36
1:37
1:38
1:47
0.02
1:47
1:73
0
1
STS-1
5
STS-5
10
15
STS-41B
20
25
30
35
STS-51L, STS-26 and
STS-29
40
45
STS-49
50
55
60
65
70
75
80
STS-77
85
90
STS-86
STS-89
95 100 105 110 115 120
STS-103
STS-110 STS-114
Flight Sequence #
Figure 1: Shuttle Risk Progression Summary Highlighting Risk Increases
The first notable risk increase encircled at the far left occurred when the
Shuttle went from a test vehicle to an operational vehicle and the crew size
increased from 2 to 4 on STS-5. Due to the increase in crew size the ejections
seats, which were provided to the commander and pilot of the test flights, were
disabled since no ejection seats were available to the remaining crew. The
ejection seats were completely removed from the vehicle prior to its next flight.
Although the ejection seats are a relatively effective crew escape system early in
ascent (up to ~80K feet) and late in entry, its impact on risk was limited because
of the other high risk contributions associated with events that could occur
outside that window such as vehicle breakup on entry. At the time the risk
associated with a Shuttle mission was not fully appreciated, which is often the
case for new vehicles due to the difficulty in quantifying unknown risks. The
earliest Shuttle risk estimates were ~1:1000 as shown in Figure 2, and because of
the unknown risks the impact of the decision to disable the ejection seats was not
fully understood at the time. Following Challenger crew escape systems were
evaluated as a potential Shuttle upgrade but were abandoned because of the
implementation difficulty as well as the significant cost and schedule impacts.
0.12
1:10 1:10
1:10
0.1
1:12
Probability
0.08
1:17
0.06
1:21
1:21
0.04
Weatherw ax
Analysis
(1:35)
1:36
0.02
Wiggins
Analysis
(1:1000 to
1:10000)
0
1
5
1:37
Galileo
Study
(1:55)
10
15
STS-1
STS-41B
STS-5
20
25
30
STS-51L, STS-26
and STS-29
1:38
Updated
Galileo
Study
(1:73)
35
40
45
50
STS-49
55
60
1:47
1995
PRA
(1:131)
65
70
75
80
STS-77
Flight Sequence #
1:47
1:73
1998
PRA
(1:234)
85
90
95
1:90
Shuttle PRA
(1:61 to 1:90)
100 105 110 115 120 125 130
STS-86
STS-103
STS-89
STS-110 STS-114
STS-133
Figure 2: Shuttle Risk Progression Summary with Historical Risk Estimates
Since the SPRA model was not set up to include ejection seats, the analysis
was completed by reviewing the top 99% of the cut sets and using engineering
Shuttle Risk Progression – Focus on Historical Risk Increases
637
judgment to determine whether or not and to what extent ejection seats would be
able to mitigate each scenario. Preliminary results were calculated using Excel
and then recovery rules were used to post process the cut sets in SAPHIRE in
order to calculate a mean with uncertainty. Given a scenario that is assumed to
be recoverable, ejection seats are given a 90% success rate (i.e., there is a 10%
chance that either crewmember will not survive).
The next notable risk increase which is encircled in Figure 1 occurs on STS86. Changes in the External Tank (ET) foam and application process led to a
significant number of Orbiter damages which were estimated to result in the risk
contribution from critical ascent debris damage to increase. Figure 3 shows the
number of Orbiter lower surface damage occurrences in order of the ET start
date. From ET-88 (STS-86) to ET-100 (STS-96) there is an increase in the
number of damages on average from approximately 13 greater than 1 inch to
approximately 45 greater than 1 inch.
Black indicates LWT
Red indicates SLWT
Mission 96, STS-103, ET-101 was the first
mission with venting holes on ET TPS
Mission 87,
88, STS-86,
STS-87,ET-88
ET-89was
wasthe
thefirst
f irst
Mission
mission with new foam on intertank
100
Mission 87, STS-86, ET-88 was the first mission
with new foam on tank's acreage
80
Debris Hits
Debris Hits by ET Start Date
LWT ET-93 used on STS-107
(Columbia accident) goes here:
60
40
20
0
120
119
118
117
121
116
115
114
113
112
111
110
109
108
107
106
105
92
104
103
102
99
101
98
100
97
96
91
90
89
88
87
86
85
84
83
82
81
80
79
78
77
76
75
72
74
71
73
70
69
68
67
66
65
64
63
62
ET Number
Figure 3: Orbiter Lower Surface Damages Arranged by ET Start Date
Tile risk from ascent debris risk is modeled in a completely different way than
functional and phenomenological risk since it does not accommodate the
traditional failure rate calculation methodology. The modeling of tile risk is
based upon historical occurrences of Orbiter lower surface damages greater than
1 inch and uses the JSC S&MA developed Ascent Debris Analysis Model
(ADAM)[5]. ADAM uses input distributions derived from historical damages
(length, width, depth, quantity, location) and simulates damages in a mission.
The simulated mission damage is compared against the Orbiter damage criteria to
estimate the probability of the damage being critical and that would cause LOCV
on re-entry if not mitigated through repair or crew rescue. The separate estimate
of the risk contribution to Reinforced Carbon Carbon (RCC) material used on the
Orbiter’s wing leading edges and nose cone is based upon flight history using
engineering judgment to adjust the damage with the changing environment.
A report documenting lessons learned from the development of the External
Tank (ET)[6] provides insight into what occurred from ET-88 (STS-86) to ET100 (STS-96) both to increase risk as well as why it decreased again on STS-103.
Preceding these flights, the Environmental Protection Agency (EPA) banned the
use of CFC-11 Freon which was used extensively in the ET foam. The new foam
along with a new blowing agent was introduced over three tanks starting with
STS-85. The first flight had no noticeable problems, but the use of the new foam
was limited. STS-86 was the first flight with the new foam on the tank’s acreage
with above average damage on the Orbiter as seen in Figure 3. However the next
638
Teri L. Hamlin
flight which had the new foam on the ET intertank area had significantly higher
damages. Following this flight, a team was established to investigate this
problem. As part of the investigation, a thermal/vacuum testing program found
that the high vapor pressure of the new blowing agent combined with the lower
yield strength of the new foam in conjunction with the propensity for this foam to
fail on slip planes parallel to the ET intertank ribs, caused small chunks of foam
to come off [6].
This risk contribution from the foam was eventually mitigated by punching
vent holes into the foam in areas where there was considered to be a transport
mechanism to the Orbiter. This can be seen by the decrease in damages seen on
the Orbiter starting at STS-103. Although it was recognized that STS-87 was a
significant anomaly at the time and the overall risk to the crew was not well
understood, the Shuttle continued to fly while the event was being investigated.
In hindsight the certification of the new foam did not involve testing of the
material to the complete flight environment which could have identified the issue
prior to its first flight, focus was on the critical properties identified in the
previous development and test program.
The additional risk increases that occurred in the evolution of the Shuttle risk
are not visible in Figure 1 since they are offset by decreasing risk contributions in
other areas including decreases in the software risk contribution. These risk
increases are associated with operational and design changes to the Space Shuttle
Main Engines (SSMEs). Figure 4 shows the estimated risk impact of these
changes on risk.
1:170
0.006
0.005
1:190
1:210
1: 240
Probability
0.004
1:290
1:290
0.003
1:380 1:380
0.002
1:680
1:610
1:660
0.001
0.000
01
STS-1
5
STS-5
10
15
STS-41B
20
25
30
STS-51L, STS-26 and
STS-29
35
40
45
50
STS-49
55
60
65
70
75
80
STS-77
Flight Sequence #
85
90
STS-86
STS-89
95 100 105 110 115 120 125 130
STS-103
STS-110 STS-114
STS-133
Figure 4: SSME (Uncontained) Risk Contribution Evolution Highlighting Risk
Increases
On STS-6 the SSME operational power was increased in order to increase
performance, thus reducing the safety margin and increasing the risk of
uncontained engine failure. The increased performance was needed to
accommodate additional weight. Since STS-6 was not an analyzed mission, this
increase shows up on STS-41B in Figure 4. This increase in risk was estimated
by reviewing engine tests and failures that occurred and identifying those failures
which could be attributed to operating at greater than 100% power level and
extrapolating to the failure probability on flight. Although the engine was
Shuttle Risk Progression – Focus on Historical Risk Increases
639
certified to operate at the higher power level, it decreased the safety margin and
caused an increased risk contribution. At the time the risk increase was not
quantified but in hindsight it resulted in a 20% increase in the probability of
having an SSME uncontained engine failure. It is unlikely that this increase in
risk would have impacted the decision to increase the operational power level but
it would have been beneficial for making a risk-informed decision.
When the SSME was upgraded by the introduction of the High Pressure Fuel
Turbopump-Alternate Turbopump, there were three early failures during testing.
These test failures were mitigated prior to the engines being flown and were
therefore discounted in the analysis of the flight engines but risk still increased
due to the remaining residual risk. There was a slight increase in risk (~12%)
due the addition of a new failure mode that did not previously exist. This
demonstrates that vehicle upgrades that are intended to reduce risk associated
with known failure modes can in fact introduce new failure modes and increase
the overall risk. Eventually with the addition of the Advance Health Monitoring
System (AHMS), the SSME uncontained engine risk decreased but still remained
a higher risk than before the changes.
4.0
Conclusions
Overall, the Shuttle mission risk improved by approximately an order of
magnitude over the life of the program. Risk reductions are the result of
redesigns or operational changes, the most significant of which followed major
events (e.g., Challenger, Columbia, STS-27’s TPS damage). The focus of this
paper is on the risk increases that also occurred due to changes to Shuttle. This
analysis is thus different than theoretical reliability growth models which predict
steady risk reduction and reliability improvement as the system operates and
evolves. The analysis that was carried out showed that risk increased due to
trading safety margin for increased performance or due to changes that caused
external event risk to increase. An example of trading safety margin is the
increase in SSME risk with the increase in operating power level. An example of
an external event which increased the risk was the changes in the foam insulation
used for the ET.
An important message to new programs should be that external influences,
operational changes and design upgrades can cause risk increases as well as risk
decreases in the risk evolution. It may be necessary to reassess whether or not
previous testing and analysis is appropriate for the new configuration. In the
case of the ET foam, the previous testing and analysis was inadequate to detect
the impact of the new foam and blowing agent on debris liberation in flight.
Furthermore, it may be necessary to reassess the benefit of a design upgrade if
preliminary testing indicates there are new failure modes.
References
[1]. Hamlin, T., E. Thigpen, J. Kahn, Y. Lo. Shuttle Risk Progression: Use of the
Shuttle Probabilistic Risk Assessment (PRA) to Show Reliability Growth.
American Institute of Aeronautics and Astronautics Space 2011 Conference
held at Long Beach, California on September 27-29, 2011.
[2]. Thigpen, Eric. Model Integration Report, Vol. II, Rev. 3.0, NASA, Johnson
Space Center, Safety and Mission Assurance Directorate, Shuttle and
Exploration Division, Analysis Branch, Houston, Texas, November 2008.
640
Teri L. Hamlin
[3]. Hamlin, T., J. Kahn, and Y. Lo. Shuttle Risk Progression by Flight, NASA
SSMA-11-001, Rev.1 March 2013.
[4]. Thigpen, Eric. Shuttle PRA Iteration 3.3 Changes Notebook, NASA, Johnson
Space Center, Safety and Mission Assurance Directorate, Shuttle and
Exploration Division, Analysis Branch, Houston, Texas, November 2010
[5]. Vera, J. Ascent Debris Analysis Model (ADAM) User’s Reference Guide,
Version 1, NASA, Johnson Space Center, Safety and Mission Assurance
Directorate, Shuttle Exploration Division, Analysis Branch, Houston, Texas,
June 16, 2010.
[6]. Pessin, M. Lessons Learned From Space Shuttle External Tank Development –
A Technical History of the External Tank, October 30, 2002.
Teri Hamlin has a B.S. in Nuclear/Mechanical Engineering from Worcester
Polytechnic Institute. Teri worked at Northeast Utilities for eight years
performing PRA activities for their three Millstone Nuclear Power Plants. In
2002, Teri entered the aerospace industry as a PRA analyst for SAIC. At SAIC,
she served as the lead for the Shuttle Human Reliability Analysis (HRA), which
represents the most comprehensive Shuttle HRA to date. In 2006, Teri joined the
JSC S&MA Analysis Branch as the Shuttle PRA Lead. She remained Shuttle
PRA Lead until the Shuttle retirement following STS-135 in July 2011. She is
currently the Commercial Crew Probabilistic Safety Analysis (PSA) lead,
assisting in the development of commercial crew requirements and PSA
methodology development. She is also responsible for providing insight into the
commercial providers Loss of Crew and Loss of Mission assessments.