MA/CS 109 The Art and Science of Quantitative Reasoning Probability Our primary topic for the three-week statistics module in this class will be inferential statistics. Inferential statistics is concerned with making precise quantitative statements about a population, given data on that population, while acknowledging the inherent uncertainty in the data. This uncertainty may be an unavoidable aspect of our data collection or, alternatively, may be purposely incorporated into the design of a study. In either case, in order to succeed in making precise quantitative statements we require a language and associated mathematical machinery for quantifying the uncertainty we face. For this purpose we employ probability. We will spend the first week of our module laying a modest foundation in elementary probability. Although the topic is both broad and deep, even at a foundational level, our treatment will necessarily be minimal. Our goal here will be simply to gain some basic understanding of what probabilities are, how they behave, and how to interpret them. An understanding of these issues will, in turn, greatly facilitate our discussion of select topics in inferential statistics in the second and third weeks of our module. Probability will also be seen, in a quick detour back to computer science, to be central to certain types of cryptography, where its usage is purposeful and – to any of us relying on password encryption – beneficial. 1 1.1 Background Randomness Recall the fundamental diagram that has guided much of our inquiry so far in this class. / GF Real-World Modeling/Abstraction @A Analysis/Solution Figure 1: Fundamental quantitative reasoning diagram. What most distinguishes the material we will cover in the statistics module of the course from that covered in the previous two modules on mathematics and on computer science, is (i) the 1 introduction of the concept of randomness as an abstraction for our uncertainty in problems, and (ii) the development of tools from probability and statistics for analysis/solution in problems involving uncertainty. In this class, when we refer to a phenomenon as being random we will mean that it is (seemingly) unsystematic – that is, without order, direction, or regularity, and therefore unpredictable. The qualification ‘seemingly’ is included in this working definition because the nature of the information we have on a phenomenon often directly impacts the extent to which we can make effective predictions concerning its behavior. For example, mathematical models of growth and change are used to predict the weather with great accuracy short-term (e.g., on the order of hours to days) but still have little predictive power longer term (e.g., on the order of weeks and beyond). Essentially, there are too many factors, either unknown to us or whose effects are unknown to too little accuracy, that impact weather relatively little over smaller timescales but quite substantially over larger timescales. Alternatively, consider the type of encryption that is employed in our everyday usage of the Internet. For a third party, perhaps seeking to act in a malicious manner towards you or your computer (e.g., steal credit card information, hijack your laptop, etc.), lack of knowledge of the underlying encryption key is sufficient (hopefully!) to render the bits encoding your content essentially random from his/her perspective. But, obviously, with knowledge of this information, such security would be sacrificed. Thus the notion of randomness is a conceptualization that we will use for capturing the non-systematic part of what we are facing in a problem. Probability is an attempt to formalize the language associated with and, in particular, the mathematical treatment of randomness. 1.2 Some Basic Terminology Given a random phenomenon, an actual realization of that phenomena is called an outcome. The set of all possible outcomes is called the sample space. The canonical example of a random phenomena, and the one we will focus on in lecture, is that of a random coin flip(s). Since a coin, when flipped, can come up either heads (H) or tails (T), those are the only two possible outcomes, and the sample space is simply the set {H, T }. Alternatively, suppose that the phenomenon of interest to us is the number of times your favorite celebrity ‘twitters’ tomorrow. Then the outcome is the actual number of twitters (whatever that ends up being), and the sample space is the set {0, 1, 2, 3, . . .}. As a final example, and one which we will revisit throughout these notes, consider the action of rolling a pair of dice. Example 1. (Roll the Dice!) The rolling of a die as a mechanism for inducing uncertainty goes back thousands of years. The mathematical study of the chances of outcomes associated with die rolls goes back centuries, and in fact is the context within which some of the earliest work on what is now called ‘probability’ was done. Although the rolling itself, and hence its outcome, clearly must be governed by the laws of physics, the end result of a die well-rolled on a flat surface remains beyond our ability to predict with any real accuracy. Suppose that you have a pair of dice and that you plan to roll them and to report the value that comes up on each. Then the outcome is the pair of numbers that you report. The 2 sample space is the set of all possible pairs of numbers. It will be useful for our thinking and calculations to organize the sample space in the form of a table, as shown in Figure 2. Here the first number (shown in blue), which matches the row number, corresponds to the number reported for the first die, while the second number (shown in red), which matches the column number, corresponds to the number reported for the second die. Figure 2: Table representing the sample space associated with the throwing of two dice. An outcome is the actual realization of a random phenomena. But in practice our interest often is not necessarily in a specific outcome itself, but rather in a more non-trivial characterization of that outcome. For example, in a sequence of 10 coin flips we may be interested in whether or not there were more heads than tails. Alternatively, in monitoring the number of twitters by your favorite celebrity tomorrow, we may only be interested in whether there more than, say, 25. Formally, these are examples where we are interested not in whether a single outcome occured or not, but instead in whether or not the outcome observed was in in a certain subset of the sample space. Such subsets are called events. To say that an event occured is to say that at least one of the outcomes defining that event occured. Example 2. (Roll the Dice! (cont)) Consider again our example of rolling a pair of dice, introduced in Example 1. Outcomes were the values that come up on the two dice after they are rolled. Events are therefore any statement regarding this pair of values. Examples of such statements include • The first die is less than the second die; • Both dice are odd; • The sum of the two dice is equal to 6; 3 • The first die is equal to three. The subsets of the sample space corresponding to these four events are illustrated in Figure 3. Figure 3: Illustration of four possible events when rolling a pair of dice. Top left: the first die is less than the second die. Top right: both dice are odd. Bottom left: the sum of the two dice is equal to 6. Bottom right: the first die is equal to three. Subsets of outcomes defining these events are shaded in light blue. In general, discussion of sample spaces, outcomes, and events is simplified by the use of some minimal mathematical notation. The symbol S is typically used to denote the sample space, while capital letters like A, B, etc. are used to denote events. Note that outcomes themselves can be thought of as events, in that outcomes are subsets of the sample space consisting of just one element. 2 Probabilities Two of the main points regarding probability that we need to understand for our discussion of inferential statistics in the second and third weeks of our statistics module are (i) what probabilities are and (ii) how they behave. 4 2.1 What Are Probabilities? From a mathematical perspective, a probability is a function that assigns numerical values to events. An event that has a numerical value greater (less) than another is then said to be more (less) likely than the other. The assignment of numerical values, however, is not allowed to be completely arbitrary, but rather is subject to certain rules. These rules derive from a combination of (i) certain basic definitions and assumptions, and (ii) additional results that follow logically from those.1 In terms of notation, we denote the probability of an event A by P (A). A natural question to ask about probabilities is, “From whence do they come?” The answer is that they typically are obtained in one of three manners (or some combination thereof): (i) by mathematical calculations, (ii) by empirical observations, and (iii) through subjective assessment. We will see examples this week of how probabilities may be derived from mathematical calculations, while later we will focus on the inference of probabilities from data. Note that in both cases the modeling inherent in our fundamental diagram, in Figure 1, plays a key role. Also important is the question of how to interpret probabilities. There are two standard interpretations of probability. The first treats probabilities as relative frequencies, while the second treats them as degrees of subjective belief. Both treatments are used in practice, but the former arguably is used substantially more than the latter. The interpretation of probabilities as relative frequencies is known as the long-range frequency interpretation of probablity. Given a random phenomenon we intend to observe, and an event A of interest to us, this interpretation says that the probability P (A) is the frequency with which A would be observed to occur over the long run, were the action of our random phenomenon generating an outcome to be replicated a large number of times. That is, P (A) is the frequency with which we expect to observe A over the ‘long-run’. See the appendix for a more detailed discussion of this interpretation. 2.2 A Simple Model: Equally Like Outcomes The simplest type of probability model, which is nevertheless used with some frequency in practice, is one which assumes that all outcomes in a sample space are equally likely. That is, the model assumes that the probability of any outcome is the same as that of any other. This model has the advantage that it greatly facilitates many of the types of calculations we’d like to do in probability, with many expressions being derivable in closed form. And in some situations, the model is in fact quite reasonable. For example, in modeling the tossing of a coin, if we specify that this be a fair coin, then we are implicitly saying that the outcomes H and T are equally likely. Similarly, when surveys are conducted, the common assumption that the survey is based on a so-called simple random sample of, say, n people from a population of size N (an assumption that will underlie most of our development when studying statistical inference and opinion polls) is 1 Formally, a probability is a mathematical function that assigns a numerical value to each subset of a sample space. The assumptions defining the rules to which probabilities are subject are called axioms. These axioms are generally attributed to the great Russian mathematician A.N. Kolmogorov, and are stated in one of a handful of equivalent forms, involving either 3 or 4 axioms. 5 equivalent to assuming that any subset of n people (i.e., the outcome of the survey sampling) is equally likely. More often, however, this type of model is too simple, and outcomes are not equally likely. Then, either more complex models must be specified upon which to base our probability calculations or, alternatively, we must turn to statistical inference to estimate these probabilities. Nevertheless, the model of equally likely outcomes is extremely valuable for building intuition and, often, as at least a rough approximation to reality. We will use this model below, and in lecture, for illustration. And we will focus on examples where the sample space S is a finite set. This latter constraint is convenient in that it helps reduce most of our probability calculations to counting arguments. In particular, in this context we then have that,2 for any event A the probability of A is simply P (A) = # outcomes in A . # possible outcomes in S (1) That is, for an equally likely probability model on a finite sample space, the probability of any event is simply the fraction of outcomes in the sample space that define the event. Example 3. (Rolling Dice Under an Equally Likely Model) Let us return to the context of Example 1, in which an outcome is defined as the pairs of numbers facing up after rolling two dice. The sample space S in this setting contains a total of 36 outcomes, as shown in Figure 2. If we assume that all 36 outcomes are equally likely, then the probability of a given event A is obtained, by formula (1), as the number of outcomes in A divided by 36. So, for example, consulting Figure 3, we find that 15 36 9 P ( Both dice are odd ) = 36 5 P ( The sum of the two dice is equal to 6 ) = 36 6 . P ( The first die is equal to three ) = 36 P ( The first die is less than the second die ) = With the ability to make such fundamental calculations, we are then able to make precise quantitative statements not only regarding the individual events themselves but also regarding, say, comparisons of different events. For example, we see that it is one and a half times as likely that both dice are odd as that the first die is equal to three, while it is three times as likely that the first die is less than the second die as that the sum of the two dice is equal to 6. It can be shown, using certain rules for probabilities in the spirit of those below (but slightly beyond the scope of this class), that the ‘equally likely’ model assumed here holds if (i) the outcome of each die does not influence that of the other in any way, and (ii) each die is a fair die. 2 This result can be shown to follow from the axioms of probability. Prior to the introduction of the axioms, however, this expression for P (A) was traditionally taken as the definition of probability. As a result it is sometimes referred to as the ‘classical definition of probability’. 6 2.3 How Do Probabilities Behave? We list below some of the key rules that probabilities P must follow. The first three rules are essentially the assumptions that underlie the system of modern probability, while the last two rules may be shown to follow from those assumptions. Rule 1. All probabilities must be between zero and one. That is, for any event A, we must have 0 ≤ P (A) ≤ 1. This rule simply sets limits to the range of values that may be assigned as probabilities of events. These limits are analogous to those implicit in saying, for example, that you can given no less than 0% and no more than 100% in athletics (despite athletes consistently saying in interviews that they are prepared to give, say, 110% is tomorrow’s game!). One simple use of this rule is as a sanity check that a probability calculation is correct – if your calculation shows that an event has probability greater than 1, for example, then your calculation is wrong! Rule 2. The probability of something happening is certain. That is, the probability of the entire sample space is one i.e., P (S) = 1. Recall that for an event to occur means that at least one of the outcomes defining that event must occur. Therefore, to say that S occured simply means that the realization of a given random process was one of its possible outcomes! For example, if we flip a coin, then the probability that it comes up one of heads or tails is 1. This rule, which would seem to be nearly a tautology, is perhaps most useful as a constraint when proving certain of the other rules. Rule 3. If two events do not share any outcomes, then the probability of an outcome being in either event is the sum of the probabilities of the individual events. More formally, if we have two events A and B, and neither share any outcomes (i.e., A and B are disjoint sets), then we write P (A or B) = P (A) + P (B). Here ‘or’ denotes a logical OR operation. In writing A or B we’re defining a new, larger event from the original events A and B. This event contains all outcomes that are either in A, in B, or in both A and B. A standard application of this rule, when joined with the previous rule, is to provide a formal proof of formula (1). Rule 3 also is fundamental to showing that the following two rules must hold for probabilities. Rule 4. The probability of an event happening is equal to one minus the probability of the event not happening. For a given event A, the event ‘not A’ is formally called the complement of A, and usually written Ac . This rule says that P (A) = 1 − P (Ac ). Rule 5. If one event occuring implies that another event occured as well, then the probability of the second event is at least as large as the probability of the first event. Denoting the first and second events by A and B, respectively, the stated condition is the same as saying that A is a subset of B, and the rule itself says that it must then follow that P (A) ≤ P (B). 7 Example 4. (Rolling Dice Under an Equally Likely Model (cont)) Under a finite sample space S and the equally likely model introduced in Section 2.2, it is straightforward to illustrate these various rules. Again we focus on the rolling of a pair of dice. Consider Rules 1 and 2. Since an event A is, by definition, a subset of the sample space S, and therefore can contain no more outcomes than S itself (and, certainly, it can contain no fewer than zero outcomes), by formula (1) we see that P (A) must be between 0 and 1. Similarly, by formula (1), we see that P (S) = 36/36 = 1. In order to illustrate the usage of Rule 3, consider the event that the sum of the two dice is equal to 6. Call this event A and recall that P (A) = 5/36. Now let B be the event that the sum of the two dice is equal to 4. This event consists of the outcomes {3, 1}, {2, 2}, and {1, 3}, and therefore has probability 3/36. That A and B are disjoint can be both seen by inspection of Figure 2 and argued from first principles (since the sum of the two dice cannot simultaneously be both 6 and 4). Therefore, it follows by Rule 3 that the probability of the sum of the two dice equaling either 4 or 6 is equal to P (A or B) = 5/36 + 3/36 = 8/36, which can of course be confirmed through direct counting. Using Rule 4 we can immediately conclude, for example, that the probability of the first die being greater than or equal to the second die is equal to 1−15/36 = 21/36. The conclusion follows because this event is the complement of the event that the first die is less than the second die, which we saw has probability 15/36. Similarly, application of Rule 5 tells us taht the probability that the second die is at least two greater than the first die must be no more than (in fact, it is strictly less than) 15/36. We are allowed to conclude this fact because every outcome defining this event is an instance of the event that the first die is less than the second die, and hence the former event is a subset of the latter event. 2.4 Independence A very important notion in probability is that of independence. What does the word ‘independent’ mean to you when you hear it in every day speech? Presumably it evokes a sense of freedom from external control, a lack of influence, etc. Probabilists have a formal definition for the word ‘independent.’ The use of independence as a modeling assumption is fundamental to much of the probability and statistics used in practice. Let A and B be two events. We say that A and B are independent if P (A and B) = P (A)P (B) . If two events are not independent, we say that they are dependent. Here ‘and’ denotes a logical AND operation. In writing ‘A and B’ we are defining a new, smaller event from the original events A and B. This event constitutes those outcomes that satisfy the definitions of both A and B. Independence as defined here says that in order to quantify the chance with which both A and B will occur all that we need is knowledge of the chance with which A and B will occur individually3 . We illustrate with some examples. 3 Note that the notion of independence is a definition. In particular, it is not a rule that follows from the axioms of probability. There are, in fact, other ways to define independence. We will see another – equivalent – definition of independent events in the third week of our statistics unit when we introduce the idea of conditional frequencies and conditional probability. 8 Example 5. (Rolling Dice Under an Equally Likely Model (cont)) Return again to the rolling of two dice, as introduced in Example 1, and recall the equally-likely model described in Example 3. Suppose we define our events A and B as A = { The first die is equal to 3 } and B = { The second die is equal to 6 } . Under the equally-likely model, P (A and B) = 1/36. In addition, P (A) = P (B) = 1/6. This can be shown using Rule 3. For example, P (A) = P (Observe a pair (3, 1) or (3, 2) or (3, 3) or (3, 4) or (3, 5) or (3, 6)) = 1/36 + 1/36 + 1/36 + 1/36 + 1/36 + 1/36 = 1/6 . As a result, since P (A and B) = 1/36 and P (A)P (B) = (1/6)2 = 1/36, we have that P (A and B) = P (A)P (B). Therefore, it follows from the equally-likely model that the events that the first die is 3 and the second die is 6 are independent. In fact, we can show similarly that any combination of values for the first and second dice are independent. Hence, the equally-likely model actually implies a relationship between the two dice analogous to what we would picture in reality i.e., two dice being rolled in such a way that neither influences the other. Note, however, that the fact that the dice are independent under this model does not mean that all events are independent. For example, let A be the event that the first die is equal to three, and B, the event that both dice are odd. Then we saw in Example 3 that P (A) = 6/36 = 1/6 and P (B) = 9/36 = 1/4. So we see that P (A)P (B) = 1/6×1/4 = 1/24. On the other hand, we can see that the event that both the first die is three and both die are odd consists of three outcomes i.e., (3,1), (3,3), and (3,5). As a result, P (A and B) = 3/36 = 1/12. Since 1/24 6= 1/12, we have that P (A and B) 6= P (A)P (B), and so the events A and B are dependent. This conclusion, of course, matches our intuition, since it makes sense that whether or not both dice are odd is influenced by whether the first die is three, which is an odd number. In Example 5 we found that independence of what comes up on the face of each of the dice followed from the original equally-likely model assumption. Alternatively, it is not uncommon to begin with an assumption of independent die rolls. If we assume that the outcome of each die roll is equally likely to be any of the numbers 1,2,3,4,5, or 6, and we assume that the two outcomes are independent, then we find that the probability of any pair of outcomes is 1/6 × 1/6 = 1/36, just as before. So the two models are equivalent. But the latter model is arguably more convenient when, for example, we want to model the rolling of many dice, and not just two. The notion of independence extends accordingly and we obtain, for example, that the N-tuple of outcomes from rolling N independent dice, each equally likely to come up 1,2,3,4,5, or 6, have probability (1/6)N . 9 3 Probabilistic Modeling in Action: Genetics Gregor Mendel (1822-1884) is considered the ‘father of genetics’, having done a variety of experiments studying inheritance in pea plants, and formulating certain ‘laws’ to summarize the behavior that he observed. Experience has not supported all of his laws, but it has supported what is known as Mendel’s First Law. This law is now incorporated into basic probability models for how characteristics are passed on from parents to children. Here we look at two examples. Example 6. (Modeling Flower Color in Mendel’s Peas) The color of flowers in the pea plant Pisum sativum is governed by a single gene, which can take either of two forms, called alleles. Represent these alleles with the variables A and a. Each child pea plant gets two copies of this gene – one from the ‘father’ plant and one from the ‘mother’ plant. The father and mother in turn each have two alleles. A result of Mendel’s First Law (his so-called ‘law of segregation’) is that any combination of alleles in which one comes from the father and one comes from the mother is equally likely4 . To model genetic inheritance from a probabilistic standpoint, think of the assignment of alleles to a child pea plant as a random phenomenon. Suppose that the father and mother both have one A allele and one a allele. Then the corresponding sample space, consisting of all possible ways for a child to receive a pair of alleles from its parents is {AA, Aa, aA, aa}. Here, for concreteness, we arbitrarily let the first in a pair correspond to the allele received from the mother, and the second, to that received from the father. The sample space is just the set of all possible allele pairs. That all four such pairs are equally likely means that P ( Receive AA) = P ( Receive Aa) = P ( Receive aA) = P ( Receive aa) = 1/4 . Now, the color of the flowers in Mendel’s pea plants can be either purple or white. It is purple if the plant has at least one A allele. What is the probability of the event that the child of our two parents has purple flowers? Since this event occurs for each of the three allele pairs AA, Aa, and aA, we have that P (Child Has Purple Flowers ) = 3/4. Put another way, recalling the long-range frequency interpretation of probability, if we bred many children plants from our two parents, in the long-run we expect to see children with purple flowers to occur three times as frequently as children with white flowers. That the allele A controls the color if at least one copy is present leads to it being called dominant, and the allele a, recessive. The so-called genotype is a reference to allele pairs without concern for order. So, for example, a child will have the ‘Aa′ genotype if it received either the Aa or aA allele pair. The genotype ‘Aa′ is called heterozygous, while the genotypes ‘AA′ and ‘aa′ are called homozygous. The probability of the event that the child genotype is heterozygous is 1/2, as is the probability of the event that it is homozygous. These various genetic states and their associated probabilities are typically represented in a diagram like that shown in Figure 4. 4 Equivalently, we can model reality here by assuming that (i) each parent gives an allele to the child independently of the other parent, and (ii) a parent is equally likely to give either of her/his two alleles. 10 Figure 4: Schematic illustration of the inheritance of flower petal color in Mendel’s experiments. The basic principles here can be extended to the modeling of two characteristics simultaneously. Example 7. (Modeling Pea Color and Texture Simultaneously) The inheritance of the characteristics of pea color and texture also follow Mendel’s first law. Furthermore, a result of Mendel’s first and second laws together is that it is equally likely that a child pea plant have any combination of the parent alleles for pea color and texture resulting from taking one allele for each of color and texture from each parent5 . More specifically, represent the alleles for pea color, occuring as either yellow or green, as Y and y, respectively. Similarly, represent the alleles for pea texture, occuring as either round or wrinkled, as R and r, respectively. Then Figure 5 shows all possible quadruples of alleles, as they can be received by a child from its parents, for these two characteristics. Considering the joint inheritance of pea color and texture as a random phenomenon, the figure shows that the corresponding sample space has 16 outcomes. The outcome RRY y, for example, in the first row and second column, means that the child received the R allele from both parents, the Y allele from the mother, and the y allele from the father. Yellow color is dominant over green color, and round texture is dominant over wrinkled. We see then, for example, that the probability of the event that the child produces peas that are round and yellow is 9/16. 4 What to Take Away with You The key elements of this handout that you should take away with you are as follows. 5 Mendel’s Second Law states that which allele a parent gives for one trait will be independent of which allele the same parent gives for another trait. Empirical evidence has shown that this ‘law’ in fact does not always hold. In some cases, depending on their locations upon the chromosomes, genes can be linked and the transfer of their respective alleles becomes dependent. The modeling of genes under so-called linkage is a substantially more complicated task than that which Mendel faced, and a great deal of work in modern statistical genetics has focused on this problem. 11 Figure 5: Schematic representation of inheritance of color and texture in Mendel’s peas. (Note: Entries of the table are written in standard genotypic (i.e., unordered) notation, but the ordering is implicit from the corresponding marginal row/column entries.) 1. Concepts • ‘Randomness’ – in the sense of (seemingly) unsystematic – as a conceptualization of our uncertainty in quantitative reasoning problems. • The notion of probabilities as functions that assign numbers to outcomes, quantifying their relative uncertainty, subject to certain rules. • The concept of an ‘equally likely outcomes’ model and the implications that this model has on the calculation of probabilities of events. • The concept of ‘independent’ events. 2. Skills • Identify and distinguish between outcomes, events, and sample spaces when faced with a given random phenomena. • Calculate probabilities of events for simple examples, under the ‘equally likely outcomes model’. • Utilize Rules 1 through 5 to make basic probabilistic statements and comparisons. • Determine in simple models whether or not two events are dependent or, conversely, calculate the probability of two independent events both occuring. 5 Appendix: Interpreting Probabilities Ultimately, our goal here is not to become experts in probability calculations. Rather, our goal is to become more saavy consumers of probabilistic information, particularly as it relates to statistical inferences. We have discussed what probabilities are and how they behave. The last main point regarding probability that we need to understand for our discussion 12 of inferential statistics, in the second and third weeks of our statistics module, is how to interpret probabilities. There are two standard interpretations of probability. The first treats probabilities as relative frequencies, while the second treats them as degrees of subjective belief. Both treatments are used in practice, but the former arguably is used substantially more than the latter. That is not to say, however, that the latter is not used! That portion of statistics devoted to inference using the subjective interpretation of probability is called Bayesian statistics, and its uses are many and varied. For example, Microsoft’s Word’s automatic help tool (the ‘Paperclip’), rests on certain principles and tools from this area of statistics. However, to delve further in this direction is beyond the scope of our class. Here and throughout the rest of the statistics module we will focus on the relative frequency interpretation of probability. Let us return to the general context within which we began our discussion of probability. We have a random phenomenon we intend to observe, and we have an event of interest to us. Call this event A. The long-range frequency interpretation of probablity says that the probability P (A) is the frequency with which A would be observed to occur over the long run, were the action of our random phenomenon generating an outcome to be replicated a large number of times. For example, suppose our random phenomenon is the flipping a fair coin, and our event A is the event that the coin comes up H. Then P (A) = P ( Coin comes up H ) = 0.5 is interpreted as the frequency with which we would see heads over the course of a great many flips. A natural question is whether this ‘interpretation’ is simply a convenient concept, or whether it has some grounding in reality. The following example helps shed some light on the answer. Example 8. (A Computer Simulation) While we do not have the luxury of waiting until an (effectively) infinite number of realizations of a random phenomenon are observed, we can easily simulate many types of random phenomena using a computer. Consider the action of rolling a pair of dice, as introduced in Example 1. Recall that the event A that both dice come up odd has probability 9/36. That is, P (A) = 0.25. Using a standard laptop computer, we can simulate the separate rolling of two fair dice. In fact, we can do so repeatedly – up to thousands of times in just seconds. So the computer can be used as a type of laboratory in which to generate empirical evidence with which to investigate the validity of the long-run frequency interpretation of probability. Let the number of times we roll the two dice (using the computer) be called n. For a given value of n, we can take the outcomes of our n rolls, and we can compute the proportion of times that our event occured. We can then vary the value of n, from small to large. If the frequency interpretation of probability is accurate in this setting, we should find that the proportions get closer and closer to P (A) = 0.25 as n gets larger. Figure 6 shows the results of having done this experiment, for values of n ranging from 1 to 1000. The proportion of times A occurs, for each choice of n, is plotted as a function of n. We see that these proportions start off rather wildly. But after a while, as n grows, moving from left to right in the plot, the proportions begin to settle down, fluctuating less and less, until they appear to remain quite firmly in the neighborhood of 0.25. Our example provides some empirical support for the long-range frequency interpretation 13 0.4 0.3 0.2 0.1 0.0 Proportion of Times Both Odd 0 200 400 600 800 1000 Number of Rolls Figure 6: Relative frequency with which, when rolling a pair of dice, both dice come up odd, as a function of the number of times the dice were rolled. of probability. Clearly, however, we cannot generate such support for all events and all random phenomena. That this type of empirical behavior will occur quite generally was proven mathematically by Jacob Bernoulli, in 1689. His result, which states that empirical frequencies of events settle down to the event probabilities in the limit of a large number of repetitions of a random phenomena, is known as the law of large numbers. Interestingly, we can see in Example 8 that even before n becomes particularly large the observed proportions begin to get relatively close to the underlying probability to which they are tending. Importantly, from a statistical point of view, this observation suggests that we might have some reasonable hope of estimating the values of such probabilities using observed frequencies from finite amounts of data. However, there is the uncertainty to be dealt with that derives from not having yet reached the ‘long-run’. Figure 7 shows three traces like that in Figure 6. Here we have simply repeated the computer experiment described in Example 8 three times. Note that in all three traces, as expected, the proportions settle down to 0.25. But each trace is nevertheless different – increasingly so for smaller values of n. We require a formal manner for quantifying the uncertainty that derives from the limitations of finite data and incorporating it into our reporting of any estimates of this type. That is the realm of inferential statistics, the next topic in our statistics module. 14 1.0 0.8 0.6 0.4 0.2 0.0 Proportion of Times Both Odd 0 200 400 600 800 1000 Number of Rolls Figure 7: Results from three different runs of the computer simulation experiment described in Example 8. 15
© Copyright 2024 Paperzz