How Many Spares Does One Really Need?

How Many Spares Does One Really Need?
Tony Frisch, Xtera Communications
More spares = more confidence that that they will be available when needed, but also = more cost
Possible criteria for selecting quantity of spares
1.
2.
3.
To ensure 99.999% availability …………………………………………… Links to a network requirement
To be X% confident of having spares available when needed … What value for X?
Some other criterion e.g. cost-effectiveness …………………………. Balance the costs of SLA payments against the cost of buying spares
Spares available
Analyse by number of spares being repaired
-
Out of spares
Calculate using
Failure
On a site basis, or
looking at the whole network
often a good way to reduce spares needed
0
1
2
1.
4
3
2.
Repair
Short / No outage
Main weakness is the assumption of a constant, known
failure rate and a well-defined return time
3.
Long outage
Simple formula e.g. Poisson
Easy to use: lacks flexibility
Monte-Carlo simulation
Rigorous and flexible, but hard to check
Steady-state Markov
Flexible and not difficult to check
Uncertainty of FIT derivation
1.5
Failures
Failures
10
10
One approach is to use 95% Upper Confidence Limit
(UCL) values; typically these are 2-3x larger than the
true values and will generally result in requiring more
spares
Only a few failures during testing means
uncertainty of failure rate
Relative Probability
Rapid development makes it difficult to test large
numbers of units for long periods of time
1
55
22
0.5
11
Example network (4 sites)
8 WL
SLTE
8 WL
SLTE
Some units are more significant than others
Common units affect multiple wavelengths
95% UCL
Expected
8 WL
SLTE
8 WL
SLTE
M
U
X
0
0
1
2
3
Normalised FIT Value
8 WL
SLTE
8 WL
SLTE
8 WL
SLTE
WL
10,000
8 WL
SLTE
Failure probability vs. FIT value
Cumulative Probability
FIT 95% 10,000
5,000
5,000
Spares
x 16
x 16
x 64
1
97.720%
99.385%
92.144%
2
99.828%
99.977%
98.841%
3
99.990%
99.999%
99.870%
4
100.000%
100.000%
99.988%
5
100.000%
100.000%
99.999%
Actual FIT value = 5,000; MTTR = 90 days FITs
95% FIT value ≈ 10,000
More spares per site using 95% UCL than using
"expected" FIT value
Network topology and protection are also important
Different protection schemes tolerate more failures
before outage occurs
SLTE
Economic Analysis
Still need at least one per site
Depends on being able to move spares
easily – may not be practical in some
cases e.g. due to Customs
N+1
Cost of outage = SLA payment
Spares
Outage
Total
1200
May be somewhat simplistic, but could be improved by
use of a "utility" function which includes additional
factors such as:
1000
Relative cost
Graph shows the example of a network of 4 sites, each
with 16 units with a true FIT value of 5,000 and a
repair time of 90 days
Important not to base on point-to-point SLTE
computation, as this also increases the spare
requirement for no real benefit
1400
Details of Service Level Agreements vary, so for
simplicity assume cost of outage is:
proportional to the length of outage
proportional to the number of circuits
affected
SLTE
Overall cost during warranty period
Aim to find lowest total cost, analysed over the
warranty period
-
NPE
(Ring)
300 FIT
Wavelength units are highest FIT and cost
Most significant for spares calculation
More savings by sharing spares between sites
SLTE
100
-
800
Reputation loss
Risk of losing existing customers
600
The calculation is not difficult and the minimum
is not very sensitive to precise input values
400
Worth considering as an additional analytic tool?
200
0
0
2
4
6
Number of spares
8
10
Conclusions / ways to reduce costs
1
2
3
4
Calculations should be done with "expected" FITs
If practical, share spares between sites
Seek faster return times
Consider overall costs of operation