! Reliability of Complex Networks ! Louis JM Aslett ([email protected]) and Simon P Wilson ([email protected]) Trinity College Dublin !""#$%&'#"()*+,,'"-(&.+(/0%1& !45/67(/308#9'*0:(;1'<%3(=>&.(/+8&+0 ! "#! $#%! &'()! )#! *+,-+! ).+! #//#0)%(,),+*! #11+0+2! 3$! 4 9.'77+(8+*:! !! ;+'77$! %(2+0*)'(2! ).+! /#)+(),'7! 1#0! $#%0! 3%*,(+**! 49#(#6$:!! ! =+'0!7+'2,(8!,()+0('),#('7!*/+'>+0*!#(!).+*+!)#/,9*!*%9. ! • "0@!;#3+0)!A@!",B#(C!D+'2+0!#1!).+!E7,6')+!E.' ).+!G7#3'7!4(5,0#(6+()!H'9,7,)$!IG4HJ?!! • K()+7!*%*)',('3,7,)$!8%0%C!<)+/.+(!='0/+0?!! • L,90#*#1)M*!E7#%2!E#6/%),(8!+B/+0)!"+30'!E.0'/ 1. The Problem 2. Motivating Examples 3. Research Questions Telecommunications networks are being utilised ever more by governments, businesses and individuals to conduct mission critical activities. This increasing reliance on telecoms infrastructure creates a strong demand for highly reliable networks which suffer minimal outages: any outage can have potentially serious economic impact on both network providers and users. Hardware/software interaction: 15th January 1990, telephones across the United States were inoperable for most of the day due to a software bug in AT&T’s network causing a cascading switch failure. The ultimate focus is on modelling the reliability of a complex physical network where failure occurs through the dependence or interaction between its components – hardware, software and human – that can cause a small failure in the network to escalate to a serious outage. Bayesian networks (BN): enable efficient modelling of dependence in complex systems. Components Fans Radio Installation Cable Components Antenna Temperature Supply ! I. improving models for hardware K;E<4FC! ,(! 9#77'3#0'),#(! &,).! <%*)',('37+! 4(+08$! K0 L,90#*#1)C! KOF4D! '(2! ).+! P<! 463'**$! ,(! K0+7'(2! &,77! QRRS! <$6/#*,%6! )#! *.#&9'*+! ).+! /#)+(),'7! #1! G0++(! *+9)#0C! '*! &+77! '*! ).+! )'7+()! '(2! /#)+(),'7! #1! ).+! . *%//#0)+2!3$!K;E<4F!'(2!<4K@!! II. modelling hardware/software interactions III. accounting for human error • 1,174 flights cancelled and 85,000 passengers disrupted Figure 1: AT&T Global Network Telecom Importance: In 2004 the FCC ruled that carrier outage reports be kept secret in the interests of “national defence and public safety”, due to the government’s view of the critical importance of reliable telecom networks. IV. unified model forF.,*!,*!'(!,2+'7!#//#0)%(,)$!1#0!'($#(+!&.#!&,*.+*!)#!8 hardware / software / hu#1! ).+! )0+(2*! '(2! ).+! )'7+()! ).')! &,77! 20,5+! ).+! man interactions #//#0)%(,)$! ,(! K0+7'(2@! ! F#! 0+8,*)+0! 1#0! ).,*! +B9,),(8! L9H++7$!')!7691++7$V,09*+)@,+! ! Work has been underway less than a year and is currently focused on I/II. !"#$%&' "#$%&!'(%()#*&!+,-.*$/!0,#!1*$(.*(!2.3$.((#$.3!).4!5(*&.,/,36! ()*+,-'./-0 5. Proposed Solution A BN is composed of a qualitative part (a directed acyclic graph, see diagram above) and quantitative part (conditional probability distributions). Parts of the BN model may be specified, say through observed data, and the model updated. Markov processes: provide a natural way to model redundant systems which can be repaired without affecting service. For example: Full Operation Duplex Detected Failure one unit failed System Down We have decided on the following milestones: • 80% of Federal Aviation Authority communication between New York airports blocked Power Amp Service Uncovered Failure primary failed System Down N(2! 6++)! ).+! (+&! 8+(+0'),#(! #1! 0+*+'09.+0*! &.#! &,7 80#&).!'(2!).+!0+)%0(!)#!/0#*/+0,)$:!! • 5 million blocked calls • Kennedy, La Guardia and Newark airports closed for 4 hours 4. Current Techniques Quality Human error: 17th September 1991, all of New York lost service for 8 hours. Power was lost, primary backup systems failed and engineers failed to respond to emergency alarms in time before secondary power was exhausted. The effect was immense: Covered Operation one unit failed Simplex Failover failed one unit failed System Down Uncovered Operation backup failed Simplex No Operation duplex failure System Down A redundant system may start fully operational, become ‘degraded’ when there is a covered failure and return to full operation after repair without service actually being affected. The arcs in a Bayesian network represent ‘causality’, so there is no natural graph structure to explicitly represent a repairable redundant system because the arcs of a Markov process do not carry the same interpretation. On the other hand, Markov processes are distributionally inflexible with all failures assumed to be exponential, so extending this approach to a unified hardware/software model is not possible because software reliability is highly non-exponential. We propose the use of Phase-type (PH) distributions to overcome these current limitations evident in the literature. A PH-distribution models the time until entering an absorbing state of a continuous-time Markov process. For example, given the matrix of transition rates S s0 0 0 ! = λ11 λ21 λ31 0 λ12 λ22 λ32 0 λ13 λ23 λ33 0 λ14 λ24 λ34 0 the time to entering state 4, which represents failure of a repairable redundant sub-system, has density fT (t) = αT exp(St)s0 where α is the vector of initial state probabilities and exp is the matrix exponential operator. This novel approach of using PH-distributions as the conditional probability distributions in a Bayesian network enables embedding an entire Markov process as a single node within the BN, improving the modelling accuracy of redundant repairable sub-systems and moreover maintaining the inferential and distributional flexibility afforded by BNs. In addition, PH-distributions present interesting inferential challenges in their own right, since there is no closed form expression for the likelihood n ∞ j Y X (St ) i T s0 L(λ | t1 , . . . , tn ) = α j! i=1 j=0 Thus at the time of writing, the task being addressed is application of the EM-algorithm within BN inference, or use of alternative techniques to incorporate PH-distributions into the framework. Funding The authors would like to acknowledge the kind support of IRCSET, who awarded Louis Aslett a postgraduate scholarship to pursue his PhD.
© Copyright 2026 Paperzz