What are the Chances of That? A Primer on Probability and Poisson Processes for Earthing Designers I.Griffiths, D.J.Woodhouse and B.Pawlik Safearth Consulting Warners Bay, NSW, Australia Email: igriffi[email protected] Abstract—Earthing design methodologies based around risk are growing in importance as they provide a better way to allocate limited resources and funding. Appropriate application of these processes requires the earthing engineer to have a sound comprehension of probability theory, upon which the processes are built. This paper reviews the basics of Poisson processes and the statistical methods and probability distributions used in their analysis. Examples and explanations are provided of how Poisson processes are used in the context of earthing, including lightning protection, to model the occurrence of power system faults and interactions with the general public. Particular attention is given to some counter-intuitive aspects that may lead a designer, ignorant of the subtleties of probability theory, to draw incorrect conclusions, potentially resulting in over-engineering or even irresponsible designs. Index Terms—Risk, earthing, grounding, safety, coincidence, Bayesian statistics, frequentist statistics. I. I NTRODUCTION The last decade has seen an ongoing revolution in power system earthing centred around the application of risk management principles to the earthing design process. This has increased the complexity for practitioners, as many of these risk management principles are rooted in statistics and probability – fields of mathematics where our intuition is often misleading or incorrect. This paper aims to assist practitioners in understanding some of the mathematical basis for the risk management principles, and also the assumptions and limitations that must be kept in mind when applying them to earthing design. In Section II we recap some of the fundamentals of statistics and probability, focussing on the aspects that are most relevant to the risk based earthing design process. In particular, we examine the different approaches to statistics which are now in use, specifically the classical frequentist approach and the more recent Bayesian approach. In Section III we look at how Bayesian statistics can be applied to legal opinions and specifically how a rigorous mathematical treatment can lead to results that appear to be at odds with our expectations, despite being explicitly based on those expectations and assumptions. While this paper is not aiming to be a complete discussion applying quantified risk management techniques to earthing design, one aspect of quantified earthing risk - the probability of coincidence (i.e. the probability that a person contacts an item, and receives an electric shock as a result of an earth fault related voltage hazard) is examined in Section IV to illustrate how the general statistical principles are applied to a specific earthing design problem. II. BASICS OF P ROBABILITY AND S TATISTICS FOR R ISK Q UANTIFICATION Any discussion of risk management is necessarily a discussion about probability, which can be problematic because in general our ‘common-sense’ or ‘intuition’ is flawed, or even misleading when it comes to assessing and understanding probability. The so-called ‘Monty Hall’ problem [4] is a classic example where the rigorous mathematical analysis clearly points to a conclusion that is entirely contrary to our intuition, and it can take significant effort to overcome our preconceptions. Perhaps it is not surprising that the common understanding of probability is somewhat lacking, since even mathematicians have not yet agreed on what a probability actually is. There are two main1 views on the mathematical understanding of probability: the frequentist and Bayesian interpretations. Broadly speaking, the frequentist (or objective) view is that a probability is a measure of the relative frequency of a particular outcome in a large number of trials, whereas the Bayesian (or subjective) view is that a probability is a measure of the certainty of that outcome occurring in a given trial. This distinction may seem trivial, however, there are many subtle but significant differences that arise. Fortunately many of the subtle differences are not critical for application to earthing, and in this paper we take a probability to be a number between 0 and 1, which indicates our belief that a particular event will occur. A probability of 1 indicates the event will almost surely2 occur, and probability indicates 0 the converse. Typically the events we assign a probability to involve a random variable taking on a particular value of interest. A random variable is much like any other algebraic variable, 1 These are by no means the only interpretations of probability! precise difference between an event that almost surely occurs and one that surely occurs is another fascinating aspect of probability theory, that is largely irrelevant for our application. 2 The c 978-1-5090-3094-1/16/$31.00 2016 IEEE with the important distinction that its value is subject to some element of randomness and is therefore not as well defined or behaved. Building on this we can define a stochastic process (or random process) as any process (or sequence of operations) that depends on at least one random variable. The counterpart to this is a deterministic process, where no random variables are involved. Whereas a deterministic process will always produce the same output for a given set of inputs, a stochastic process may not, because the random variables may take on different values in different trials. In engineering and quantified risk management applications it is common for a deterministic process to form the core of an assessment, and then to use random variables as inputs, with various statistical techniques to assess the likely outcome or risk. For instance, the stresses on various members of a bridge can be calculated using equations from basic physics, and failure could be said to occur if the calculated stress on a member exceeds its strength. Both of these processes are deterministic. However, by treating the strength of the members as a random variable, statistical analysis may be performed to estimate the probability of a failure under various conditions, or for different materials. A. Random Variables and Probability Distributions Random variables would not be particularly useful if all that could be said of them was that their values were inaccurately or poorly defined and that they involved some level of randomness. As might be expected there are a range of mathematical techniques for analysing problems involving random variables. Chief among them is the probability distribution which describes the nature of a random variable’s range of behaviour. On first impression it may seem strange to discuss different types of randomness since our common intuition when some value is described as being random is to assume that this means all values are equally likely, but this is just another example of our intuition about probability being incorrect. Consider the example of a person’s height: all possible values (every positive real number) are obviously not equally likely, some values are more likely to occur than others, and a probability distribution describes how likely each value is to occur. There are a great many probability distributions used in quantified risk management applications, and while a full discussion of the applicability of various distributions is beyond this paper, some of the most common ones are illustrated in Table I. Probability distributions can be characterised in a number of ways, but the probability distribution3 function (PDF) and cumulative distribution function (CDF) are two of the most important. The PDF is a function which assigns a probability to every possible value of a random variable, and the CDF is a function which returns the probability that the random variable takes on a value less than or equal the function argument. It follows from these definitions that the area under the PDF must be 1, and the CDF is simply the integral of the PDF. 3 More correctly called a probability density function for continuous random variables Table I. C OMMON P ROBABILITY D ISTRIBUTIONS Distribution Illustration Notes Normal (Gaussian) range: −∞ ≤ X ≤ ∞ Log-normal range: 0 ≤ X ≤ ∞ adult weight Example of CDF describing human physical characteristic B. Conditional Probability When there are multiple random variables, more interesting questions may be asked such as: what is the probability of events A and B both occurring, or what is the probability of A occurring given that B has occurred, and many more. The first of these is called a joint probability, denoted as P (A, B), and the second is a conditional probability denoted as P (A|B). The vertical line in P (A|B) may be interpreted as ‘given’. These probabilities are related through equation (1). P (A|B) = P (A, B) P (B) (1) Conditional probability provides yet another example of how our intuition leads us astray: in general P (A|B) = P (B|A) and in fact they may be vastly different in magnitude. Bayes’ Theorem [7, 9], see (2), describes the actual relationship. P (A|B) = P (B|A)P (A) P (B) (2) Here P (A) and P (B) are respectively the probabilities of A and B occurring with no consideration of the other, so P (A|B) can only be equal to P (B|A) if P (A) = P (B), of course there is generally no reason for this to be true. This is known as the conditional probability fallacy, or “confusion of the inverse”, and our mistaken intuition is the cause of the so called false-positive paradox. Bayesian statistics is a relatively new addition to the field of statistics. While Bayes’s original essay [1] was published in 1763, his work was largely ignored until 1937 when De Finetti [3] reconsidered the treatment of the problem. It was then not until 1950’s that the Bayesian movement appears to have been spearheaded by Good [5], Savage [10] and Lindley [6]. It is also at this time that the term ‘Bayesian’ comes into common use by statisticians, and in 1992 the International Society for Bayesian Analysis was founded. C. Poisson Process III. E XAMPLE : L EGAL S CENARIO A Poisson process is a random process that describes the distribution of points in a mathematical space: the positive number line in this application representing time. The underlying assumption is that the points are distributed with some average rate that is independent of other points, and location4 . Poisson processes are widely applied in many fields from modelling lightning [8], to phone calls, to radioactive decay, anywhere where the assumption of an average event rate, and independent events is reasonable. There are a number of well known results relating to Poisson processes, for example the number of points in a region of time/space is a random variable with a Poisson distribution, and the ‘time’ between points is an exponentially distributed random variable. The remainder of this paper will investigate some of these results and examine some aspects where our intuition again leads us to draw incorrect conclusions. D. Poisson Distribution Despite the similarity in names the Poisson distribution is a very different thing to the Poisson process, whereas the process is responsible for generating events, the distribution is more aligned with counting events since it describes the probability of observing some number of events in a given region on the timeline. The PDF for a Poisson distribution is P (N = k) = λk e−λ k! (3) Where N is the random variable representing the number of observed events in some interval, and λ is the average number of events over the interval. The parameter λ is often described as the average rate, and one interesting aspect of the Poisson distribution is that the expected value is also λ. Sometimes the general form shown in (3) may be modified to allow for explicitly specifying the rate and observation period, for example taking λ to be the ‘average rate per unit time’ and T to be the number of time units under consideration, then substituting λ T for λ. One interesting question that may be answered using the Poisson distribution is: what is the probability of there being no events in the interval under consideration. This can be calculated by substituting k = 0 into (3) to arrive at P (N = 0) = e−λ . By extension we can also calculate the probability of there being any number (i.e. one-or-more) events using the law of total probability P (N = 0) = 1 − P (N = 0) = 1 − e−λ . 4 This more correctly describes the special case of a homogeneous Poisson process. Let us consider the following example based on an example given by Crilly [2]. The scenario is based on the presumed ability of a jury to judge guilt or innocence based on the balance of probabilities. the scenario plays out as follows: • A juror has just heard a case in court and decided that the probability of the accused being guilty is about 1 in 100, i.e. 1%. • During deliberations the jury is called back to the court to hear further evidence from the prosecutor, a weapon has been found at the defendant’s house. • The prosecutor claims that the probability of finding the weapon at the defendant’s house is as high as 95% if the defendant is guilty, but is he innocent then the probability of finding the weapon would be only as high as 10%. An interesting use of embedded secondary probabilities by the prosecutor! The question for the juror now is to decide how this new evidence changes their opinion of the defendant in light of this new information? To assist the juror in making this re-evaluation of their assessment we represent the event that the defendant is guilty by G and the event describing the receipt of the new evidence by E. The re-evaluation could be undertaken as follows: • The juror has made an initial assessment that P (G) = 0.01. This probability is referred to as the prior probability or assessment. • The re-assessment probability, or P (G|E), is the revised probability of guilt given the new evidence E. This is called the posterior probability. • We can calculate the posterior probability P (G|E) based on the prior probability P (G) using Bayes’ formula as per (4). P (G|E) = • P (E|G) .P (G) P (E) We can calculate the probability of having the evidence P (E) as the sum of the probabilities of having the evidence and being guilty and having the evidence and not being guilty, based on the total probability law (see eq 2-41 [7]), or: P (E) = 0.95 × 0.01 + 0.1 × 0.99 • (4) (5) The reassessed value for the probability of being guilty is then 0.95 × 0.01 0.95 × 0.01 + 0.1 × 0.99 = 0.088 P (G|E) = (6) This will present a quandary for the juror as their initial assessment has now risen almost 10 fold, but even so, the probability of the defendant truly being guilty is still less than 10% If the prosecution had made a greater claim that the probability of finding the incriminating evidence was as high as 0.99 if the defendant is guilty and only 0.01 if the defendant is innocent, then the juror would have to revise their opinion to a 50% probability of guilt. Using Bayes’ Theorem in such a manner has been criticised. The leading criticism is how one arrives at the prior probability. In its favour Bayesian analysis presents a means of dealing with subjective probabilities and updating expectations based on evidence. IV. E XAMPLE A PPLICATION : C OINCIDENCE P ROBABILITY To illustrate how the earthing practitioner might avoid the traps of mistaken intuition, and successfully apply sound statistical principles to quantified risk assessment we will consider the example of coincidence probability. As the name suggests, multiple random variables are involved: the times during which a person contacts some item, and the times during which that item poses a voltage hazard as a result of an earth fault; the event of interest is the times during which both contact and hazard are concurrent. Estimating the coincidence probability requires models for both, EG-0 suggests that Poisson processes are suitable. By considering a practical example we can expose another counter-intuitive result. Assume a particular power system has a historical average fault rate of 1 fault per year, what is the probability that a fault will occur in the next year? Intuition suggests the probability should be close to 1 since there is ‘a fault every year, on average’. Using the Poisson distribution we can see that the probability of a single fault occurring is actually 11 e−1 0.368 1 But that is not the whole story as there is some chance there might be 2, or 3, or more faults in the next year. If we calculate the probability of one-or-more faults occurring we get P (N = 0) = 1 − e−1 0.632 P (N = 1) = So in fact, with an average rate of 1 fault per year the chance of a fault occurring in the next year is about 65%, and there is about a 35% chance that there will be no faults at all! Furthermore, if a Poisson process really is an adequate model for the occurrence of faults, then the distribution of faults is independent of time, or the occurrence of previous faults, so even if there were no faults last year there is still a 35% chance there will be no faults in the next year. If we introduce the second Poisson process (representing the time during which a person contacts some item) to our example we can illustrate another fallacy. Let us take the average fault rate as λf and the average contact rate as λc then it is tempting to assume that the coincidence probability is (at least close to) λf × λc but as we have just seen the probability of a fault occurring in a given year is different from the average rate5 . Even if we were to correctly calculate both P (Nf = 0) and P (Nc = 0), that is respectively the probability of a fault or contact occurring in a given year, then multiplying those probabilities together would not be representative of 5 Even for λf < 1, for example if λf = 0.1 then P (N = 0) = 0.095 the desired coincidence probability. As the fault and contact Poisson processes are independent, multiplying these probabilities together is equivalent to calculating a joint probability, P (Nf = 0) × P (Nc = 0) = P (Nf = 0, Nc = 0). In words, this is the probability of any number of faults occurring in the same year as any number of contacts have also occurred, which tells us nothing about the probability of a specific fault being coincident in time with a specific contact. A further complication in the calculation of coincidence probability is the fact that Poisson processes are defined as generating ‘points’, which are equivalent to instantaneous events on a timeline. As such, it makes no sense to discuss the duration of an event in a pure Poisson process since it is infinitesimally short, however in our application we know that both faults and contacts have a finite duration. To apply Poisson processes to modelling faults and contacts we typically assume that events from the Poisson process correspond to the start of the fault/contact. Calculating the coincidence probability may be thought of as calculating the probability of a fault occurring within the duration of a contact, and vice versa. As we have seen, calculating the probability of an event occurring in some timeframe is actually done using the law of total probability and the probability of there being no events in that time-frame. So, to calculate the coincidence probability we first consider the case where a fault has occurred, then calculate the probability of no contact in order to calculate the probability of a contact occurring during the fault, finally the whole process is then repeated for the converse case of a contact occurring first. In order to make applying this complicated statistical analysis easier EG-0 provides a simple approximation for the coincidence probability: fn × pn × (fd + pd ) × T . (7) 365 × 24 × 60 × 60 where fn and pn and the average rates of faults and contacts respectively, fd and pd are the durations of faults and contacts respectively, and T is the number of years under consideration. For many practitioners and applications this approximation might be adequate, but by understanding the full analysis we can see which aspects have been simplified, so we can understand the boundary conditions and make informed decisions about when that approximation is appropriate. For instance one of the simplifications made to reach (7) is to use to following relationship: e−x 1 − x Pcoinc = which is reasonably accurate so long as x is small. When applied to the calculation of the probability of no events the simplification results in P (N = 0) = 1 − e−λ λ. The condition on the validity of this approximation is that λ is small6 , meaning the expected number of events in the time 6 EG-0 states the error for λ = 0.01 is 0.005%, but this is only for considering the approximation of e−x in isolation. When using this approximation for calculating the probability of no events the error in the estimated probability is actually 0.5%. This may seem surprising, but while the absolute error is the same in both cases the relative error is bigger since P (N = 0) is smaller than e−λ in this case. period under consideration is small. While this simplification might make it easier to perform the coincidence probability calculations ‘by hand’, there is very little difference between the difficulty of implementing the ‘full solution’ and the approximation in computer code, or even a spreadsheet. While a full analysis of the calculation of coincidence probability is beyond the scope of this paper, work by the authors suggests that when either the rates associated with contacts and faults are equivalent or, the contact and fault durations are equivalent, then the probability of coincidence published in EG-0 is a reasonable approximation. However, where these conditions fail the probability of coincidence given by the published approximation do not match the results of the alternative expression derived from first principles. The authors hope to publish this alternate expression for the probability of coincidence in the near future, and further explore the discrepancies from EG-0 in another paper. V. C ONCLUSION Probability and statistical methods can often be complex and counter-intuitive, and correctly applying these mathematical tools to earthing design takes a thorough understanding of the fundamentals. There are many appealing ‘simplifications’ that might tempt inexperienced practitioners, but often these simplifications are fundamentally flawed and in the worst cases may be meaningless, for instance it is not appropriate to simply multiply rates together to estimate coincidence probability. Significant effort is invested by experts into developing standards and guides such as EG-0, and while practitioners will always strive for continual improvement and updated processes as new work is performed, caution is needed when moving away from industry best practice and published procedures. In the spirit of progressing industry best practices, the authors are presently preparing a more in depth paper on the probability of coincidence that includes a derivation of Pcoinc from first principles, which will hopefully lead to a refinement of the expressions provided in EG-0 that more accurately characterises the probability of coincidence to improve the outcomes of earthing related risk assessments. R EFERENCES [1] Thomas Bayes. A letter from the late Reverend Mr. Thomas Bayes, FRS to John Canton, MA and FRS. Philosophical Transactions (1683-1775), 53:269–271, 1763. [2] Tony Crilly. 50 Maths Ideas You Really Need to Know. Hachette UK, 2008. [3] Bruno De Finetti. La prévision: ses lois logiques, ses sources subjectives. In Annales de l’institut Henri Poincaré, volume 7, pages 1–68, 1937. [4] Richard Gill. Monty Hall problem. Mathematical Institute, University of Leiden, Netherlands, pages 10– 13, 2011. [5] I.J. Good. Probability and the Weighing of Evidence. Charles Griffin and Company, 1950. [6] D.V. Lindley. Introduction to Good (1952) Rational Decisions. In Breakthroughs in Statistics, pages 359– 364. Springer, 1992. [7] Athanasios Papoulis and S Unnikrishna Pillai. Probability, Random Variables, and Stochastic Processes. Tata McGraw-Hill Education, 2002. [8] N.I. Petrov and F. D’Alessandro. Verification of lightning strike incidence as a Poisson process. Journal of Atmospheric and Solar-Terrestrial Physics, 64(15):1645– 1650, 2002. [9] Sheldon M. Ross. Introduction to Probability Models. Academic Press, 2014. [10] Leonard J. Savage. The Foundations of Statistics. John Wiley and Sons, 1954. Dr Ian Griffiths is the Lead Engineer of Safearth’s Products Team, and oversees the development of both software and hardware. He received his BE(Comp.) from the University of Newcastle (Australia) in 2005, and his PhD at the same institution in 2014 for work investigating applications and implementations of Markov Chain Monte Carlo methods in multi-antenna wireless communications systems. After joining Safearth in 2011 Ian has gained experience in power system grounding across a range of applications including power utilities, mining, rail, industrial sites, and pipelines. Particular areas of interest include applying statistical methods to the assessment of grounding related risk, along with the assessment and development of safety criteria and standards for the effective management of such ‘High Impact, Low Likelihood’ events. This has included contributing to the review of a number of grounding related Australian Standards, most recently AS2067. Dr Darren Woodhouse is a Principal Engineer with Safearth Consulting. He received his BE(Elec.)(Hons I) (1993), BMaths (1994) and PhD (2004) from the University of Newcastle, Australia. In 1992 what was Shortland Electricity instigated a business unit known as Safearth Engineered Solutions, or Safearth, of which Darren was one of the original members. In the ensuing years that group grew to over 30 staff and in 2008 changed its name to Network Earthing. During that time Darren was involved in consulting, training and R&D. Before joining Safearth Consulting in 2011 as Principal Engineer, Darren’s roles included Development Manager and two years as Acting Principal Consultant. For over 20 years Darren has investigated and managed the risks associated with earthing, lightning protection and interference. The more notable projects include investigations of the Snowy Mountains Hydro Scheme, Pacrim West OFC interference, CLP interference in Hong Kong and a number of forensic investigations. Darren has delivered formal earthing training, including over 20 earthing short courses for the ESAA/ENA across Australia, New Zealand and Asia, and has presented at numerous conferences including CIGRE and the IEEE. Darren recently co-delivered an appendix for AS2067 on earthing system testing and is the editor of the ENA Working Group tasked to publish on risk to telecommunications assets from power system earthing hazards. Brent Pawlik is an Engineer with Safearth Consulting. Brent is an engineer specialising in earthing. He has expertise in earthing system design, auditing and testing of of power system assets and conductive third party infrastructure. He received his BE(Elec.)(Hons 1) (2005) from the University of Newcastle, Australia. Brent was introduced to the profession of earthing in his time on the EnergyAustralia University of Newcastle Industrial Scholarship Scheme (EA UNISS) in 2001 where he first worked for Safearth. Since then Brent has spent time at EnergyAustralia in the sections of Overhead Mains Design, Customer Service, and Network Engineering. Between 2006 and 2011 Brent took a break form the engineering profession where he pursued other interests in Okinawa, Japan. Brent returned to employment with Safearth in 2012 and is presently partaking in post graduate studies in conjunction with Safearth to further the field of earthing system engineering.
© Copyright 2024 Paperzz