Probability Numbers For many statisticians the concept of “the probability that an event occurs” is ultimately rooted in the interpretation of an event as an outcome of an experiment, others would interpret the concept as a subjective degree of belief that the event occurs. This difference in interpretation has been the source of great debate between Frequentists (the former group) and Bayesians (the latter group) within the statistics profession. This philosophical debate ultimately influences the way that empirical results are obtained and interpreted, but does not affect the use of probability ideas employed in describing the distribution of happiness in a society. Hence we shall proceed under the Frequentist interpretation whilst acknowledging the existing of an alternative. At the heart of the frequentists view of probability is the notion of an experiment. A sufficiently general definition of the term “Experiment” permits an almost universal application of the probability concept. Essentially a procedure must possess two properties to be eligible as an experiment. 1) It should be notionally repeatable an infinite number of times with a well defined common set of possible outcomes each time the procedure takes place. 2) There should be uncertainty as to which outcome will occur before the procedure takes place. Probability Theory is best understood by using ideas from set theory in its description. The appendix provides an outline of the basic set theory ideas that are used. The set of mutually exclusive (having nothing in common) and exhaustive (a complete list of) possible Basic Outcomes (denoted oi) of an experiment is usually called The Sample Space (denoted S). An Event is defined as any subset of this sample space, including the empty set and the sample space itself, generally an event is denoted by an upper case letter say A, Ac, the complement of A, then corresponds to A not happening. An event is said to have occurred when any one of the basic outcomes in its defining subset is realized. Thus, on executing the procedure, the empty or null set is the event “no outcome occurs” which is certain not to happen, similarly the sample space set (the universal set in set theoretic terms) is the event “an outcome occurs” which is certain to happen. Given outcome uncertainty there is obviously event uncertainty ranging in degree between the empty set and the sample space. Numbers attached to events reflecting the degree of certainty with which they occur, and which in turn obey certain coherency axioms, are called “Probabilities.” The coherency axioms are simple yet remarkably powerful, they provide the basis for all probability theory regardless of its objective or subjective foundations. Denoting the occurrence of the i’th basic outcomes as oi and the probability of it happening as P(oi) and denoting events by upper case letters (with S and ∅ respectively reserved for the sample and empty sets) the coherency axioms are as follows: 1) Probability numbers are non-negative. Using our notation this may be written as: P (oi ) ≥ 0 for all i 2) The probability of an event is the sum of the probabilities of the mutually exclusive basic outcomes that the event comprises. Using out notation this may be written as: P ( A) = ∑ P(o ) i oi∈A 3) The probability that something happens is 1. using out notation this may be written as: P(S) =1 From these axioms many implications follow, specifically they are that: i. The probability of an event occurring is one minus the probability of it not occurring (making the probability of nothing occurring zero). Set Theory tells us that for any event A, A∩Ac = ∅ and A∪Ac =S. Axiom 2 tells us that P(A∪Ac) = P(S) = P(A) + P(Ac), axiom 3 tells us that P(S) = 1 and so it follows that P(A) = 1 – P(Ac). If we let A = S it follows that P(∅) = 0. ii. If event A includes event B (which may be written as B ⊂ A) then P(A) ≥ P(B). Note that A = B∪(A∩Bc) and B∩(A∩Bc) = ∅. Axiom 2 tells us that P(A) = P(B) + P(A∩Bc) and Axiom 1 tells us that P(A∩Bc) ≥ 0 so P(A) ≥ P(B). iii. For any event B, P(B) ≤ 1. This can be seen by simply letting A in ii) be the sample space S and nothing that Axiom 3 tells us that P(S) = 1. iv. The probability of any one of a collection of mutually exclusive events occurring is the sum of their individual probabilities. This is just a repeated application of Axiom 2. v. For any two events A and B, the probability of either one or both of them occurring is the sum of their probabilities minus the probability that they both occur. This is possibly the most complicated idea to justify. First note that A = (A∩B)∪(A∩Bc) and (A∩B)∩(Ac∩B) = ∅ so that P(A) = P(A∩B) + P(A∩Bc) from Axiom 2, in a similar fashion P(B) = P(A∩B) + P(Ac∩B) can be established. Secondly note that A∪B = (A∩B)∪(A∩Bc)∪(B∩Ac) where (A∩B), (A∩Bc) and (B∩Ac) are all mutually exclusive sets so that P(A∪B) = P(A∩B) + P(A∩Bc) + P(B∩Ac) also from Axiom 2. Comparing the equations for P(A), P(B) and P(A∪B) it can be seen that P(A∪B) = P(A) + P(B) – P(A∩B). Conditional and Marginal Probability and the notion of Independence Within the above set of implications the notion of jointly occurring events has already been entertained. Here it is extended so that the concepts of conditional and marginal probability and independence can be understood. We suppose the sample space is covered by a collection of mutually exclusive and exhaustive events Ai, i=1, …, n and another collection of mutually exclusive and exhaustive events Bj, j=1, …, m. for example the sample space may be a collection of people and the events Ai refer to income categories and Bj to occupational categories or the sample space may be a regular deck of cards (excluding jokers) and the events Ai refer to the card values and the events Bj refer to the suits. A notional experiment may e the random selection of an individual or a card, the element selected will have both characteristics prompting questions as to how will they jointly relate to the probability of the element being selected. The basic building block is the joint probability of selecting an element which jointly exhibits characteristics Ai and Bj (written P(Ai∩Bj)). To obey the above axioms the following must be true: B P(Ai∩Bj) ≥ 0 for all i, j. B n m i =1 j =1 ∑ ∑ P( A ∩ B ) = 1 i j In the context of our examples the value P(Ai∩Bj) is the answer to questions like “what is the probability of selecting someone in the poverty group who is a lawyer?” or “what is the probability of selecting the four of clubs?” B If characteristics of only one type are of concern the probability of that characteristic, known as the marginal probability, may be calculated as follows: P( Ai ) = ∑ P(A ∩ B ); P(B ) = ∑ P(A ∩ B ) m n i j j i j =1 j i =1 Intuitively, in the case of the marginal probability of “A” events, the probability of event (Ai and “all possible B events”) is calculated and in the case of the two examples answers the questions “what is the chance of selecting someone in the poverty group regardless of their occupation?” or “what is the chance of drawing an ace regardless of its suit?”. Note that marginal probabilities also obey all of the probability axioms. Slightly more complex questions of the form “given the person selected is a lawyer, what is the chance she is in the poverty group?” or “having been reliably informed that the card drawn is an ace, what is the chance it is a diamond?” require the concept of conditional probability. What is really happening is that the sample space is being modified or constrained by the introduction of information upon which the event is now predicated or “conditioned”. As long as the conditioning event has a non-zero probability (is real information in some sense) a cogent set of conditional probability numbers can be calculated by dividing the joint probability of the two events by the marginal probability of the conditioning event. Writing the conditional probability of Ai given Bj as P(Ai∩Bj) the appropriate calculation is: B ( ) P Ai | B j = ( P Ai ∩ B j P Bj ( ) ) Independence between two events is readily defined and interpreted in the context of conditional probability. Events Ai and Bj are independent if P(Ai∩Bj) = P(Ai)P(Bj) B Note that this only refers to the specific characteristics Ai being independent of specific characteristic Bj. For all characteristics A to be independent of characteristics B the independence condition must hold for all possible pairs i, j. From the definition of conditional probability this implies: P(Ai|Bj) = P(Ai), P(Bj|Ai) = P(Bj) Which may readily be interpreted as the occurrence (or otherwise) of event B does not influence the probability of A occurring. Most importantly the independence concept can be extended to a collection or sequence of many events so that their mutual independence implies that the probability of their joint occurrence will be the product of their individual probabilities, formally: P(A∩B∩C∩…) = (PA)P(B)P(C)… Bayes Rule This rule follows directly from the relationships between conditional, marginal and joint probabilities and may be stated as: The Bayes Rule: P( A | B ) = P (BP| A(B)P)( A ) (1) Probability Functions The mechanism for attaching probability numbers to events is the probability function. Generically there are two types of probability functions, which one is employed depends upon the nature of the sample space it relates to. Some sample spaces contain a finite or countably infinite number of basic outcomes and are covered by “Discrete Probability functions”, others contain an uncountably infinite number of basic outcomes and are covered by “Continuous Probability Functions”. In the discrete case, where o is any basic outcome in the sample space S, any P(o) that satisfies P(o) ≥ 0 for all such o and P(S) = 1 is a Discrete Probability Distribution. In the continuous case the situation is slightly more complex and is greatly facilitated by assuming that the sample space is an interval on the real line containing a typical point “x”. in this case the continuous probability function is defined implicitly as any function f(x) that, for any event A defined on the sample space S, satisfies: f (x ) ≥ 0 ∫ s f (x )dx = 1 P( A) = ∫ A f (x )dx So that Discrete Probability Distributions attach probabilities to points, Continuous Probability Distributions attach probabilities to intervals. Appendix. Basic Set Theory Concepts A set is simply a collection or list of objects or things called elements, the elements are referred to with lower case letters and the sets are referred to with upper case letters. a∈ A May be read as saying “the element a belongs to the set A” similarly: a∉ A May be read as “the element a does not belong to A”. The set of all elements of interest is referred to as the Universal Set, here it will be denoted by the letter S since in the realm of probability theory the Universal Set is the set of all the possible outcomes of an experiment or the sample Space. The empty set or Null Set (which has no elements in it) is referred to with the character ∅. The complementary set of A which is denoted Ac is the set of all elements in S that are not in A. Two basic operations are employed with sets, Set Union and Set Intersection. Set Union The Union of two sets A and B is written as A∪B and is the set of all elements in either A or B (Or both). Set Intersection The Intersection of two sets A and B is written as A∩B and is the set of all elements that are in both A and B. Note that both union and intersection operations can be employed successively on more than two sets so that A∪B∪C is the set of elements that are in any one of A, B or C and A∩B∩C is the set of elements that are common to A, B, C. Some basic Set Theory Ideas Mutual Exclusivity Two sets A and B are said to be mutually exclusive when they have no elements in common so that A∩B=, and similarly A∩Ac=. Subsets When all the elements in set A are in set B, A is said to be a subset of B, which is written as A ⊂ B. It follows that all sets are subsets of the sample space so that A ⊂ S for any A. Some Rules for Operations Associative Rule A∪(B∪C) = (A∪B) ∪C A∩(B∩C) = (A∩B) ∩C Distributive Rule A∩(B∪C) = (A∩C) ∪ (A∩C) A∪ (B∩C) = (A∪B) ∩(A∪C) Complementation Rule (A∩B)c = A c∪B c (A∪B)c = A c∩B c
© Copyright 2026 Paperzz