I I. I I I I I I I ON BAYES SEQUENTIAL DESIGN OF EXPERIMENTS by R. E. Bohrer Research Institute and University of North Carolina at Chapel Hill Triangl~ Institute of Statistics Mimeo Series No. 442 August 1965 I I I I I I I ,. I This research was supported by the Air Force Office of Scientific Research under Contract AF 49(638)-1544 and Grant AF-AFOSR-760-65, at different times. Much of this work appeared previously as an uncirculated report under the above contract. DEPARTMENT OF STATISTICS UNIVERSITY OF NORTH CAROLINA Chapel Hill, North Carolina I I. I I I I I I I I I I I I I I ,. I ii TABLE OF CONTENTS Chapter Page ACKNOWLEDGEMENTS iv I INTRODUCTION AND DISCUSSIuN OF RESULTS 1 II FORMlJIATION OF THE PROBLEM AND SOME GENERAL RESULTS 5 III 2.1 Elements of the problem 5 2.2 Extension of some sequential, non-design results to sequential design problems 8 AN EXAMPLE: PAULSON t S PROBLEM WITH UNIFORM ALTERNATIVES 21 3.1 Framework 21 3.2 The non-sequential, non-design (ns-nd) case 23 3.3 The non-sequential, design (ns-d) case 24 3.4 The sequential, non-design (s-nd) case 26 3.5 The sequential design (s-d) case 28 3.6 Relation of the Bayes d r in ~ co and Chernoff's d r 3.7 Numerical comparisons IV 32 * 36 THIi: EXPERIMENTATION RULE X 4.1 Introduction 36 4.2 X* experimentation rules 36 4.3 A result for general convex stopping sets c , I+c' 1) 4. 4 A partial characterization of the X*( I+c' 37 rule for small c 4.5 Asymptotic optirna.lity of v 31 5: 39 42 AN APPLICATION: THE CASE OF LOST IABEU3 AND THE NO-OVERSHOOT * 48 X RULE 5.1 Introduction 48 5.2 Specification of the no-overshoot X* rule in ~(c2,c1)' X* I I. I I I I I I I I_ I I I I I I I •• I iii 5.3 Some Fourier analysis 53 5.4 Some characteristics of the Bayes stopping rule 57 5.5 58 An example BIBLlOORAPHY 69 nIDEX OF NarATION 71 I iv I- I I I ACKNOWLEDGEMENTS I wish to express my thanks to Professors W. Hoeffding, W. J. Hall, and M. R. Leadbetter for their guidance and encouragement throughout; J. S. MacNerney and W. M. Whyburn for their attempt to teach me to teach me I mathematics; I I I Welfare for financing my graduate study and thesis research; I I I I I I I .I to Professors to the Research Triangle Institute, Air Force Office of Scientific Research, University of North Carolina, and Department of Health, Education, and Carrie for coexisting and helping with it. and to Joyce and I I. I I I I I I I CHAPTER I INTRODUCTION AND DISCUSSION OF RESULTS The study of sequential design of experiments, as discussed, for example, by Chernoff [3J, is related to the more fully developed study of sequential analysis, which I will call sequential, non-design (s-nd) analysis to emphasize the relationship. In both cases, observations on a process of interest are taken in a sequence of trials; after each trial, the experimenter decides, on the basis of observations at hand, whether to take additional observations or to stop and take some action, i.e., terminal decision, concerning the state of nature erning the process. e gov- The difference is that in s-nd analysis, if one decides to continue observation, he repeats the same experiment at each trial. In the se- quential design (s-d) situation, he has a class E of possible experiments from which to choose the next experiment; this choice may be based on previous observations. The experimenter thus has somewhat greater freedom in the s-d case; his goal is to formulate an experimentation rule (e r) which capitalizes on this. I I I I I I I •• I Thus the s-d formulation generali~es the s-nd formulation of problems in the same way as s-nd analysis generalizes fixed sample-size analysis. a decision rule (d r) with fixed sample size n is a s-nd That is, d r, viz., one in which the experimenter decides to stop making trials when, and only when, n trials have been made. Similarly, a s-nd d r is a s-d d r in which the experimenter, if he decides to make another trial, decides to use the same experiment as at all previous trials. class of s-nd Thus, the class of fixed sample-size d r is a subclass of the d r, which is, in turn, a subclass of the class of s-d therefore follows that, for any given problem, the best s-d good as the best s-nd d r. It d r is at least as d r, which is itself at least as good as the best fixed sample-size d r. In Chapter 2, extensions of several results of s-nd theory to the s-d case are cited. Among these is the fact that the Bayes risk p(g) of a s-d procedure I 2 with prior distribution ~ p(~) satisfies, subject to boundary conditions. e inf € where c is the cost per trial, J pU E Xe' ~e,x + c, x) f t (x) d\)(x) s,e is the posterior distribution of trial with experiment e has outcome x, and f t (x) is the s,e experiment e, averaged with respect to~. (1) , e given the density of x, using The solution to (1) in a very special decision problem is obtained in Chapter 3. Comparisons are made there of the risks of the best fixed sample-size, s-nd, and s-d d r, so that, in this case, the average saving gained by using s-d methods is assessed. In cases where solution to (1) is possible, the Bayes e r is specified as ~, the infimum in (1) is attained. (If it cannot be attained for some I I I I I follows: If the posterior distribution is --I then do that experiment for which ~, I then Bayes rules may not exist and one may have to be satisfied with €-Bayes rules.) In most cases, however, solution of (1) is quite difficult and not yet accomp1ished. For these cases, it is of interest to find a d r for which the risk is relatively small, e.g., as compared with the risk of the best s-nd procedure. Two results mentioned in Chapter 2 simplify this problem by showing that if a Bayes rule exists, attention can be restricted to a greatly simplified class of d r. First, it can be deduced from the form of (1) that a Bayes rule --- if one exists --- can be defined in such a way that it depends on observations only through the posterior distribution which results from those observations. With this form of the Bayes rule, the rules concerning when to stop observation and which action to take define "action sets" in the space of possible posterior distributions, i.e., sets of posterior distributions for which the Bayes rule says to stop and take a given action. These Bayes action sets are convex pointsets. In Chapter 4, an e r for problems where E is finite, two states of nature are possible, and two actions are available is investigated. cal In c1assi- inference terminology, this is the case of testing a simple hypothesis I I I I I I I -. I I 3 I. I I I I I This e r, called the X* e r, is proposed for use against a simple alternative. with given, convex action sets. risk for prior distribution and does experiment e ~, ~ It is defined as follows. of the s-nd E at each trial. E X* uses an experiment e(~) Let r(~;e) be the d r which uses the given action sets When the posterior distribution is which satisfies r(~, e(~) ) r(~,e). inf e E E That is, X* acts at each trial as if a single experiment from E were going to be used at all succeeding trials; it selects the experiment which would be preferred in this circumstance. The definition special case by Haggstrom [5]. of this rule has been investigated in a Two general results for the two-action and two I state of nature case are proved in Chapter 4 here. I and the proposed e r X . * First, let p * be the risk function of the d rousing the given action sets * 'I. Let P be the risk function for the d r n o* which uses n the same e r through the first n trials and the same action sets as 0* , but which uses the same experiment at each trial t for t > n. Then Theorem 4.3 states that Pn*+1 < P* and that p* n I I I I I I I ,. I * both under quite general conditions. lim Pn' n=oo Second, an asymptotic result is established, for the case where E is finite, 'I. by Theorem 4.5.2. If the cost per trial is c, let 0c be the d r which stops as soon as the posterior probability of either state of nature is ~ c/(l+c) and which * uses the e r X. at ~ If P* (~,c) is the risk of 0* at c ~ and p(~,c) the Bayes risk for cost c, then * lim p (E"c) c=o -c log c lim c=o p(Lc) -c log c Thus, 0* achieves the minimum limiting risk as c approaches 0; i.e. , 0*c is c asymptotically optimal in the sense of Chernoff [3]. The d r sense. 0* is certainly not the only asymptotically optimal one in this c There are, among others, such rules proposed by Chernoff [3], Abramson [1], and Kiefer and Sacks [9]. The reason for consideration of the X* e r is its seemingly reasonable definition for any size cost c per trial and not just I 4 asymptotic optimality. This is made precise by Theorem 4.3, whick says that for any size cost the risk of the best d r using e r best s-nd X* is less than that of the d r, since this latter risk is the minimum, over all choices of convex * action sets, of PO' In Chapter 5, the case of "lost labels" is considered. This problem has the same general structure as the "two-anned bandit" problem [4} except in its goal. The goal here is to decide what the true state of nature is, whereas the two-anned bandit one's goal is to use the "better" experiment most often. this problem, an approximate X* rule is specified very simply. For For a binomial case with special values for cost,a X* d r is shown to be a Bayes rule and its risk is compared with that of Chernoff's rule. It is of interest to note that, although the goals in the two-anned bandit and lost labels problems may seem related, in this binomial case the experimentation rules are seen (by comparing results of [4) with those of Chapter 5) to be exactly opposite. --I I I I I I I I I I I I I I -. I I I. I I I I I I CHAPTER II FORMULATION OF THE PROBLEM AND SOME GENERAL RESULTS 2.1 Elements of the Problem. Let E be the class of available experiments e and X the class of possible observations x using experiments in E. There is a a-field j containing each Assumption. and a a-field e containing Notation. !n ~ single-point subset of X each single-point subset of E. denotes the infinite, real vector with ith component zi and denotes the (n x 1) real vector with ith component zi' real vector with ith component the ith component of z. Z (z) is the (m x 1) -m - The "vector" z -0 will appear as a subscript in defining d r; it should be read as the subscript ° in these places. I The following five elements are basic in formulating a problem in decision theory. (1) I I I I I I I •• I Sample space, ZOo It is convenient to work with the sample space For interpretation and notation, (z2p_l,z2p) the p-th stage of experimentation when e p (e ,x ) represents a trial at p p -- is the experiment and x p the outcome (I) Alternatively, Zo is written as the countable Cartesian product Z is E for p odd and X for p even. p Z(n) denotes the product X Z , where p=l p CD X p=n+l e P e. (2) A class (3) A class A of available actions or terminal decisions (t d) a. (4) A non-negative loss function L defined on incurred when (5) e of possible states of nature Z . e x A. L(8,a) is the loss is the true state of nature and action a is taken. A cost-of-sampling or ~ function. Throughout, the work assumes that each trial is performed at cost c> 0, i.e., that sampling cost is a multiple c of sample size. Definition. An experimentation rule X is a class of probability measures I 6 e: E & x e: X for 1 < p < n} together with a probabil i ty measure X ' each O p -defined on the 'measurable space (E,e). If SEe, then x (S) gives the proba{X : e ~2n p ~2n bi1ity, using e r X, that an experiment in S is used in trial n+l given the previous trials ~2n; XO(S) is the X-probability that an experiment in S is used at the first trial. Assumption. The outcome of experiment e when determined by the probability measure There is a measure D e is the state of nature is De ,e ,independently which dominates each measure in (U Remark and notation. used where convenient. Thus, the 5o denotes density of previous trials. e. , e : a E e functions f e ,e & e E E) . = dUe ,e /du are the a-algebra generated by the measurable rectangles [6] of ZOo With the probability structure given, a probability function measurable rectangles in Zo is defined. nature, e r X is used, and S = 2m S x Z p X p=l a is Specifically, if (2m) , X from the the state of 2p-l dX (e ) z p -2p-2 m dX ~2m-2 Remark. X and write for ~e , X for I t can be shown flO with ~a (e ) m dU(x ) m x n fa p=l ,e p (x). p Except when confusion will result, I suppress the dependence on ~e on the brevity. J that there is a unique measure on (Zo' 50) which agrees measuTable rectangles. This measure will also be denoted by ~a=~a,x' Definition. A decision rule (d r) 0 for a sequential design problem con- sists of three classes of probability measures, viz., an experimentation rule (e r)X, a stopping rule (s r)*, and a terminal decision rule (t d (A,I1) be a measurable space. n > O· e - , p E E & x p Then a I I I I I I is a measurable rectangle, then J S ~e .-I r)~. Let t d r ¢ is a class of probability measures E X for 1 :::; p :::; n} . For SEA, ~ ~2n (S) is the 0 -prob- I I I I I I I ••I I 7 I. I I I I I I I ability that a t d in S is made, given the n trials of binomial probabi lity measures (W If n trials Zz -n n > 0; e ~Zn have been performed, then W qn is discontinued after the nth trial. Remark and notation. fication of the s r. p € ~2n' A s r E & xp € W is a class X for 1 ~ p ~ n} . is the 5-probability that sampling E r were defined previously. It is convenient to introduce an alternative speci- For ~ € ZO' define (rrn(~): n ~ O} by rrO(~) = W and o () nITl (l-W (». Thus rrn(~) is the 5-probability that, p= l Zz Z pwith sample point ~,0 terminates after exactly n trials. Any s r specifies a rr (z) n - = (l-W ) W 0 Zz Z n- unique set of such rrn(~) for each ~ € ZOo Also, for ~ € Z and p ~ n , o n-l rr (z) / [1 - sIn rr (~)} is the probability that, with sample point ~, 5 termis P nates after exactly p trials, given that 5 does not terminate before n trials. n Definition. A d r 5 is truncated The risk of the d r Definitions. n if, for z - ZO' € L rr (z) p=O n - = 1. 5 when A is the state of nature is +J rrn(~) [nc r(A,5) when defined. ~ A L(A,a)d¢z (a)J d~A(~)' -Zn (1) , Here c is the cost per trial of sampling so that the risk is the average cost plus average loss, when 5 is used and A is the state of nature. I I I I I I I ,. I Let u be a a-algebra of e subsets which contains each 8 probability measure on (8,u). € 8 and let Then the average risk of 5 at r(s,5) J r(A,5) S S be a is ds (8), (2), 8 when defined. Remark. The integrals in (1) and (2) are defined if L is a measurable function with respect to (8 x A,u x II) and i f the functions involving wand ¢ are measurable. All subsequent work considers only rules for which the inte- grals are defined. Bayes criterion. A d r 5 is Bayes at S in the class 6 of d r if 5 and 5' inf r(L5') € 6 r(s,5). € 6 I 8 The Bayes risk at ~ .!!! ~ ..£!.!.!! t::. of d r is p( ~ ,I:::.) Notation. inf o€ r( ~,o) I:::. The class of d r of most interest is in this work is I:::. the co class of d r for which (1) and (2) are defined and which terminate with prob~nn(~) = ability one, i.e., for which 9 8. € The classes I:::. , n n ~ ~9 (~) for almost all 0, of d r truncated at n for which (1) and (2) are defined will also be considered. 2.2 1 almost every where For brevity, write p n (~) = p(~,1:::. n ) and Extension of some sequential, non-design results to sequential design problems. The results of this section can, once the founcation work of Section 2.1 has been done, be proved by methods quite similar to those for analogous s-nd results. Each of the results will be used in the investigations in the succeeding three chapters. .'I I I I I I I The first theorem establishes an integral equation for the rist of s-d d r in I:::. As a corollary to this, integral equations for the Bayes risks in I:::. n CD and I:::. CD are derived. These Bayes risk equations are s-d analogs of standard s-nd results, see e.g., Theorem 9.3.2 of [2J. used in [lJ, [5J, and [13J. The extension to the s-d case has been It is proved here, in the framework of Section 2.1, for completeness, since the equation and its consequences are important througk)ut the present work. Before proceeding to the theorem, I belabor a point in integration theory which is perhaps "obvious" and certainly necessary. Let S(n) be the cr-algebra n generated by the measurable rectangles in Z(n) . of sets in Z(n) = X Z p=l P Lemma 2.2.1 A. A necessary and sufficient condition that a subset S of be in S(n) is that S x z(n) be in B. So . Suppose f(n) is a measurable function on satisfies f(~) = f(n)(~(~»' If Sn € ~(n) (Z(n)'~(n» and S = Sn and f on (Zo,50 ) x z(n), then (1) , I I I I I I I ., I I 9 I. I I I I I I I on the sense that if one side exists, then so does the other, and the two are equal. Here, ~e,n is a measure on (Z(n),S(n» defined (by virture of part A) on S(n) by ~ e,n (S) = ~ e (S x z(n» Proof. (Part A) Let R be the class of measurable rectangles of Z(n) n and show that S(n) x Z(n) is the rr-algebra R(say) generated by R x Z(n) as n S(n) x Z(n) is a rr-algebra, since S(n) is; moreover, S(n) x Z(n) follows. contains R n x z(n) and hence also the rr-algebra R. From this follows that R can be written as R* x Z(n) for some class R* of Z(n) subsets. Since R is a oJ( rr-algebra, R* is; and since R CR , R* contains S(n)' i. e. , R contains n 5 . x z(n) . Since 0 contains R, ifS E S(n) , then S x z(n) E S • (n) 0 S Let 5* be the class of Zo subsets E such that each n-dimensional Z(n) section ~ of E is in .5 (n)' ,* J ~ (V is a rr-algebra, since J(n) is, and J~ contains the class of measurable rectangles in Zoo S x z(n) E So' then for ~ E z(n), Hence j~ is a subset of S*. If the z-section of S x z(n), viz., S, is in S(n)' to complete proof of A. I I I I I I I •• I (Part B) Assume for definiteness that the left hand integral in (1) exists; the other case is similar. is f. By definition of ~e By part A, if f(n) is measurable, then so (see page 6) and ~e,n' the result follows if f ( . n) is an indicator function, hence also if f(n) is simple. If f (n) is any ~e,n integrable function, then there are increasing, mean fundamental sequences of integrable simple functions + - f(n) and f( n )' respectively. for s = +,-. (f~n)p} and (fCn)p} which converge in measure to + - } on Z by f s (z) = f(s ) (z (z» Define (f p } and(f p op n p -n - These functions are measurable (by part A) and simple and form mean fundamental sequences which converge in hand integral in ( 1) exists and the Defini tion. ~e-measure monotone to f. Hence the right convergence theorem proves part B. o ~< I f 5 is a d r and ~2n is in Z(2n)' define 50 = 5 (~2n) by * I 10 41 0 z -Zn'-Zp -Zp '" z 0 -Zp .1 * 41 z z z* z = '"-Zn'-Zp = Xz* z -Zn'-Zp for n;::' O. Thus 50 is the d r which "follows 5 from the n-th trial onward". Definition. The posterior distribution, g ~Zn Zz have been observed is defined by -n bution is g and trials n dg Also, f. s,e n (9) = IT f 9 (x )dg(9)/f IT f (x )d~(Q) ~Zn p=l ,e p p e p=l 9,e p p (x) = f f e ,9e Theorem Z. Z. 1 r(L5) (x)dS(9). If 5 € 6. 00 , e A o 5) JJ E X By definition, for 5 ~, € eA o + r(~ 0 ,5 (zZ»f. (x)du(x)dX (e)J, ~2 s,e 0 6.00 and ~Zn J J L(9,a)d4! n (a)d~(9) 0 + (l-n )[c + r( _I then J J L(9,a)d4! n o Proof. , of 9 when the prior distri- 0 = ~2n(~)' (a)dS(9) + h(~.5) where h( ~. 5) J Jnn(~)[nc 8 Z o + J L(9.a)d4!z A Let nO = rro(z) be the nn-type s r of 50 n n- I I I I I I I -2n (a)Jd~9(~)d~(9) * oo (~2)' Then for n ~ O. (2) • I I I I I I I •• I I I. I I I I I I I 11 /\ /\ By Lemma 2.2.1, wi th rcn defined by JJJ eE /\ (rc'(~2)[c J L: •• I ~2 A 7tn(~2n) [nc + n= 2 Z (2n) f t s,e 1 J L(9,a)d¢~2n(a) J A n II p= 2 f9 ,e p (x )dX (e )dD(x )} x p z2 2 P P - p- (x )dD(x 1)dX (e 1)dg (9) 1 0 ~2 (1- rc)c+ o I_ I I I I I I I + J L(9,a)d¢ (a)J + X 00 + -2n (z» - , n (z) - = rcn (z l'C f g,e (x)dD(X)dX (e) 0 J J r(g~2 'Oo(~2»fg,e (l-rc )[c + E X o 1 (x 1)dD(x 1)dX (e 1)J 0 to prove Theorem 2.2.1. Corollary 2.2.1 for n ~ Pn(g) = min[po(s), c +ei~fE 1; and p(s) = min[po(g), c + e _ Proof. inf E E J p(ge x)f t e (X)dD(X)J X the ' s, From the theorem, r(s,B) = rc oA(s,o) + (l-rc 0 )B(s,O), where A is a function of rc o only and B is independent of d r J p l(s e,x )f s,e (x)dD(X)J t X n- l'C 0 and ¢. 0 If A < B, then the - °* with * = 1 and ¢* = ¢ satisfies r(s,o)* ~ r(s,o); if B ~ A, d r °* with the same t d rand eras 0, but with * = 0 and l'C 0 0 0 l'C o then I 12 rr* n * = rr /(l-rr ), satisfies r(;,O ) < r(;,O). n 0 - one need consider only d r with rr o Thus, to consider risk minimization, ° or rro = 1. = where 6.* is the subset of 6. on which rr = 0. o If Consequently, OO(~2) 6. ' then I) E n 6. _ ' E n l so that the minimum can be obtained by minimizing the average, with respect to f t (x)du(x) dX (e) ,of p(; ). This is done using a d r (or, sequence of d r) s,e 0 ~2 which minimizes (or, with limit which minimizes) the average of p(; ). The ~2 average (or, limiting average) risk of this d r (or, sequence of d r) is just as given on the statement of the corollary. The derivation of the integral equation for p follows in exactly the same way by noting (Bo(~2) : B Theorem 2.2.2 e If L(.,.) is bounded on € 6.oJ= 6. 00 ' x A, then there is at most one solution to the functional equation p(O .. min[p (;), c o + inf e € J p(; EX e,x )fl i,e (x)du(x)] (3), where J e () = a inf Po; € A f!:2.Qi. n J p 1(; e,x )f ;,e (x)du(x)] X n- n ~ 1 J I po'(;) = 0, Pn'(;) = rnin[p o (0, e inf € E (x)du(x) ] t X pn- 1(; e,x )f s,e Let p(;) be any solution of (3). for 0 ~ p ~ n-1, that p(;) ~ Then p(;) < p (;) for all ;. - Hence p (;) > n Also, - p(~) p~(;) ~ 0 n ~ 1. Suppose pp(;) for all ;, then p (0 ~ min[ pO), c + n I I I I I I I _I L(9,a)d;(9) Consider p (;) and p'(;) defined by n _I 0 e inf € J pO e,x )f s,e (x)du(x)] EX t for each nand ;. po(;) for all;. Suppose for 0 ~ p ~ n-1 and for each; p(;). I I I I I I I -. I I I. I I I I I I I 13 that p(E) ~ p' (E). p Then p' (E) :::: min[po(E), c n P~ i.e., + ei~fE X f p(E e,x)f s,e (x)du(x)]" p(E) t (E) :::: p(E) for each nand E. Note that pn is the Bayes risk function for the class of procedures truncated at n observations and p' the Bayes risk for procedure truncated at n n observations in the modified structure wherein a decision d(n) is available at stage n only with L(9,d(n» with risk p'(E). n .. O. Let B'(E) be d r in the modified structure n With r denoting the risk function in the usual structure, it follows that (4) , 1\ where L .. eS~PA L(9,a) and PE(n) .. sf Pr9{B~ takes n observations} dE(9). 1\ 1\ Since p~(E) :::: po(E) :::: Land ncPE(n) ~ p~(E), Pg(n) ~ Line and from (4) I I I I I I I •• I to complete the proof. Definition. A dr B is a non-randomized function of the posterior distri- bution if (a) E .. E , implies Xz = Xz ' , Wz = WZI , lP z = lP z ' , and ~2n ~2m -2n -2m -2n -2m -2n -2m (b) at any possible posterior distribution E,there is 5-probability 1 that sampling continues with a given experiment e(g) or that sampling terminates with a given action a(g). For such a d r, N .. N(5, trials before termination with sample point !> is defined as the number of ~. Theorems 2.2.3 and 2.2.4 establish conditions under which existence of a Bayes d r guarantees existence of a Bayes d r which is a non-randomized function of the posterior distribution. I 14 Theorem Z.Z.3 If 5 is a Bayes rule, then the subset Z of Z on which a (a) fails and the subset Zb of Zo on which (b) fails are ~ a null for almost all (s)9: n-l (a) for each n such that 1: (!.> < 1, there is e 1f p=O p e(zZ -n) € E such that J X where t ~n (b) inf E Jp([s J, )f p([sJ )f t (x)dt(x) = el€ ,(x)du(x) n e,x ~n,e X n e ,x t~n,e = t • ~z -Zn ' for each n such tha t J e 'h (!.) > L(9,a)ds (9) !.Zn 0, there is a • a(!.Zn) inf al € A L(9,a')ds Je € A such that (9) ~Zn Moreover, if !. € Z , z' € Z , and s = s, = s (say), then one can take a a !.Zn !.Zm a(!.Zn) = a(!.2m) = a(s) and e(!.Zn) = e(!.2m) = e(s). E!221. Suppose Za is not null. Then, inf where Za,m =(zZ - m € E if z € Zo' then M(!.2m' !.> = m} , m f (x) = s ,~ -m and where 5(z ) -0 J IT e pel f 9 ,e p I I I I I I _I Let M(!.>, defined on Za' be the least integer for which (a) fails with sample point z, and let S = S m !.Zm by Theorem Z.Z.l, e --I (x )ds(9) p 5 in the term for m • O. Since, if (a) fails, the integrand I I I I I I I -. I I I. I I I I I I I 15 is everywhere positive on a set supposed to have positive measure, r(~,5) > 0, so that 5 cannot be Bayes. This contradiction proves the - p(O assertion that Z is null. a A similar argument proves that Zb is null, and the final assertion follows from Corollary 2.2.1. at ~, then there is a Bayes rule at ~. I which is a non-randomized function of By virture of and with the notation of Theorem 2.2.3, a d r 5 is defined almost everywhere by where ~ n = ~z forn~O; -2n -2n •• ~ there is a Bayes rule the posterior distribution. ~z I I I I I I I e x A and If L(.,·) is bounded on Theorem 2.2.4 (a(~z ». 1; -2n and X (e(~ ~2n ~2n » =1 If definition of 0 is completed by defining it as a non-randomized function of the posterior distribution on Za\}~' then 0 is such a rule for each The theorem is proved by proving that 0 is Bayes at~. and and ~ n =~ ~2n Xn • X - ,let p(~)s;c+ xn • Xn (z2 - n- 1)· (x n X. By definition of the d r n € X o n 5, inf e E E ~ E Zoo For ~2n-lE Z(2n-l) I 16 where 00 L: n-1 n=l J ~ XX p=l P and where ep+1= e(sz ). -2p in /::;. 00 (Corollary 2.2.1), of which there is at most one by Theorem 2.2.2, satisfies ~(s) and A solution p of the integral equation for Bayes risk can be shown by induction to satisfy, for m ~ 1, h_ ·1.m (0 + R (s) m where J J XX X n n-1 p=l P [pc + po(sz )] f -2p t s,~ m J [me + X X p=l P The sequence (h h (s). 1 1m (s) : m~ 1} is bounded and non-decreasing and thus has limit Hence, h(s,o) = h (s) and r(s,o) = p(~), to complete the proof. 1 Notation If d r ° is a non-randomized function of the posterior distri- bution, then the space of possible posterior distributions can be written as ,..., L:.;:, E e + .....* ~ L:":' , where ":'e consists of A a those posterior distributions at which sampling is continued using experiment e at the next trial and - * consists ~a of those posterior distributions at which sampling is discontinued and action a is taken. I I I I I I (x )du\x ) -n -n and R(O m --I I I I I I I I -. I I 17 I. I I I I I I I Theorem 2.2.5 ° is If a Bayes d r which is a non-randomized function of the posterior distribution, then the sets Proof. 2*a are convex. The proof is the same as in the s-nd case, see, for example, Theorem 9.4.3 of [2J. The remaining results of this section are proved in generality sufficient for the applications of them in the following chapters. Note, however, that some of them can be extended easily to more general situations. Theorem 2.2.6 adapts Stein's s-nd theorem [14J on probability of termination to the s-d case. 2. ° is S-d d r e .. 1. Theorem 2.2.6 (1,2) a non-randomized function of the posterior distribution with , specified by numbers m and M, 0 < m $ M < 1, and the rule Stop as soon as the posterior probability that e 3. 1 is • (m,M). There are positive numbers 01 and 02 such that either (i) If e € E, then pr e (f 2 ,e(x)/f l ,e(x) > e (ii) If e € E, then pr e (f 2 ,e(x)/f 1 ,e(x) < e °1 ) > 02 or -0 I I I I I I I I· I ~ 1) > 02 Pre(N < m) = 1; in fact there is a positive number b such that A. for m ;:: O. <D for some t > O. c. eeNk < <D for k;:: O. ~. L(~2n) tN < B. eee Let D = 1 -1 + (M-m) 01 and suppose assumption 3(i) holds. Let denote the random logarithm-of-the-1ike1ihood-ratio after n trials, using 0, Le., I 18 where e e (g )using 0. p ~2p-2 p prS{~ If n l ~ L(~2n +2n ) - L(~2n ) 1 2 0 and n M-m} 2 > 1 ~ 0, then °m > 2 I I I I I I 0 If n ~ 0, to prove A. B. and C. follow exactly as in [14J. The proof where 3(ii), but not 3(i), holds is nearly identical. Definition. e x e The Kullback-Leibler information numbers are defined on x E by I(S,S' .e) = e S log(f S ,e /f S ' ,e ) A s-d corollary of Wald's equation is Theorem 2.2.6, the proof of which follows Johnson's proof [8 Theorem 2.2.7 where again e p Proof. inf € e x E, then n E I(S,3-S,e) ~ e S ~ (5) , p=l = e (s ) using 0. p ~2p-2 Define Then the right hand side of (5) is 00 _I J for the s-nd case. If assumptions 1 and 2 of Theorem 2.2.6 hold and if I(S,3-S,e) is bounded on egN. e --I ~ J eS(y log(f 2 (x )/f 1 (x» n=l E n ,e n ,e n Ien ~ e} d Pr(e n = e} I I I I I I I -. I I 19 I. I I I I I I I Ie I I I I I I I •• I • co ~ n=l JE ee Y ee(log(f 2 (x )/f l (x» Ie • e} d Pr(e = e}, n ,en n ,en n n n since Y and the log-ratio are conditionally independent given en. n (6), But for any e, ee(log(f 2 ,e (x)/f l ,e (x)) ~ inf e'E E I(e,3-e, e'), The theorem follows from (6) and (7), since co co 00 ~ eeYn = ~ Pr(N ~ n} = ~ n Pr(N • n} • eeN n=l n=l n=l Chernoff proves the following theorem in [3]. Theorem 2.2.8 away from zero on e = (l,2) and if I(9,3-9,e) is bounded and bounded If e x E, then there is a number b n Pre ( ~: Proof. ~ p=l log(f e ,e p > 0 such that ) < O} < e -bn /f _ 3 e ,e p See the proof of Lemma 1 in [4]. Corollary 2.2.8 Suppose the assumptions of Theorem 2.2.8 as well as assumption 2 of Theorem 2.2.6 with m < ~ < M are valid. Then there is a number b > 0 such that A. B. el(N(5,~) I~z ~ e2(N(5,~ I~z ~ M} Pr2(~z E.!.22i. -2N m} Prl(~z ~ -2N -2N m} ~ e ~ M} ~ e -2N To prove A, note -b -b /(l-e /(l-e -b 2 ) -b 2 ) that the left-hand side is co co nPr (N = n, ~ <m)~~nPrl(~z~~} l ~2Nn=l -2n n-l ~ ne -bn ,where b > 0 from Theorem 2.2.8, = e -b /(l-e -b ) 2 The proof of B is analogous • (7) • I 20 Theorem 2.2.9 function of S ~. (see [2]). = sell, If e = [1,2}, then the Bayes risk function p, as a the prior probability that 9 = 1, is continuous. This is a direct consequence of the proof of Theorem 2.2.5 --I I I I I I I I I I I I I I -. I I I. I I I I I I I CHAPTER III AN EXAMPLE: PAULSON'S PROBLEM WITH UNIFORM ALTERNATIVES 3.1 Framework. For the example considered in this chapter, the Bayes se- quential-design (s-d) risk equation of Corollary 2.2.1 is evaluated directly and the Bayes d r is easily specified. In addition, Bayes procedures and risks are determined for cases where the experimenter's alternatives are restricted more than in the s-d case, viz., the non-sequential non-design, non-sequential design, and sequential non-design cases. Evaluation of the risk-improvement provided by adding experimental flexibility is carried out for several values of the parameters of the problem. The example considered is that proposed by Paulson [11,12] with a very special probability structure. An interesting hypothetical case in which it arises is the following, which is closely related to a problem considered by ~iryaev I I I I I I I I· I n~. Suppose it is of interest to detect the presence of a target in one's vicinity by scanning with radar. It is not known whether the target is located in the north or in the south, or whether there is a target present at all. More- over, because of interference, one cannot expect to be sure of the presence ~r absense of target based on a single radar reading. However, he must decide, based on radar readings from either direction, either to fire at a supposed target in one of the two directions or not to fire at all. The larger the number of observations he makes, the more sure he is that his action is correct so that he will incur no loss. On the other hand, each observation is made at a cost c, so that he wants to take some action as soon as possible. In the notation of Section 2.1, the possible states of nature are 8=0 if no target is present, B-1 or 2 according as a target is present in the north or in the south. E. {l,2}, with experiment 1 being to take a radar reading f~om the I 22 north and experiment 2 being a radar reading from the south. action 1 or 2 being to fire north or south, respectively. A = (0,1,2} with Suppose the observa- tions, i.e., radar readings, are represented in some way as being uniformly distributed on [O,lJ if no target is present and as uniformly distributed on [b, l+b] if a target is present; here 0 < b < 1 for cases of interest. Hence one is sure that no target is present in a given direction if a radar reading is < b; i f a reading is> 1, then a target is surely present. The probability structure of section 2.1 has U real Lebesgue measure and f 9,e given in Table I, where g0 is the uniform density on [0,1] and gl is the uniform density on rb, l+b]. Table I. 9 a 1 The most general probability measure where sp 2 S on (S,u) can be denoted The problem is to decide how to take now many radar readings when the cost per reading is c and losses for incorrect decisions are given by a if 9 1 if 9"a a In the following sections, this problem of optimal allocation is solved for several different cases. These cases differ in the amount of flexibility allowed in deciding on how to take readings. Numerical comparisons are given in the final section. Admittedly, this loss structure is quite specialized. The most general loss structuring for the problem is specified by nine constants, and it seems that analogous results for this case should be little more difficult to derive than those for the "0-1" structure used here. In particular, note that since the units of loss are unspecified, a 0-1 loss is no less general than a O-L loss, for any L> O. I I I I I I 1 = s«(p}). L(9,a) .-I I I I I I I I •• I I 23 I. I I I I I I I This problem seems to be another one wherein a sequential modification is "obviously" preferred (cf. p1ing [7]). the "classical" example of binomial acceptance sam- For example, a=O can be taken as soon as one sample with each ex- periment is obtained which is less than b. Or a=B for B = 1,2 can be taken as soon as a trial with e=B yields an outcome which exceeds 1. These actions can be taken with no probability of error even though some pre-assigned sample size has not been attained. This reasoning extends to comparison of the sequential, . non-design and sequential design fOTmU1ations of the problem. Since, intui- tive1y, no additional infoTmation can result from observations using e=B after having obtained one such observation outside [b,l], a design procedure which discontinues observations with e=B in such an event would seem preferable to a procedure without this flexibility. 3.2 The non-sequential. non-design (ns-nd) case. This is the fixed sample size case, where, for n fixed, decisions are based on n observations with each experiment. In the notation of Section 2.1, x o x (1) ~p (1) 1 X ~p-2 (2) for 1 ~ P ~ n (1) I I I I I I I I· I *0 • *!.2p - 0 for 1 ~ p < 2n , 1 with the e rand s r otheTWise arbitrary. Remark. Class designations, such as ns-nd, are given to clarify, but not to define, the experimental flexibility of procedures in the class. For example, the ns-nd class provides some design flexibility, in the sense that fixing of the sample size, n, is up to the experimenter. Non-design is meant to indicate that the experimenter cannot decide how many of each experiment to perform, given n. Let 6(n) be the class of ns-nd d r using 2n trials. A Bayes drat 6(n) is a d r in 6(n) which minimizes, as a function of 0, t in I 24 ~2n where is. as usual. the vector with pth component x z2p in the notation p of Section 2.1. --I It is easily checked that this minimization is accomplished with the d r and risk of Result 1. Result 1. ~ ~n A Bayes procedure in ~(n) is given by (1) together with ~a <1 for max(~o'~1'~2) and b<x -p- ~a 1<1. some x 2p<b for max(~o'~l) and b<x - 2p-- l~r and b<x <1, some x _ <b for - 2pa = max(~o'~2) 2p l l~n (a) • 1 i f ~ x p > I and e =a for some p p which is defined with probability one if the as the ~ 1~2n ~a of the conditions are interpreted corresponding to the least a for which the condition holds. The aver- age risk of this procedure is PI (i. n ) • 2nc + (l-b) 2n [I-max ~e] + (l_b)n[l_(l_b)n][ min e 8=0,2 ~e+ min e=o, I se]' I I I I I I Definition. A Bayes ns-nd procedure at i is the Bayes procedure with respect to i in ~~(n) with average risk n=l inf PI (i) n 2:. ° PI (i. n ). This infimum can be attained, since Pl(i.O) = I-max ~e<I<2n'c where n' is I 8 any integer> Thus, the Bayes fixed sample size N satisfies OSN<n'. ZZ. Help in determining the Bayes ns-nd sample size is provided by Result 2. Result 2, PI(i.n) is a convex function of n. Proof. PI (i,n) is a linear combination with non-negative coefficients of n 2n the three convex functions n,(l-b) , and (I-b) . Since PI is a convex function of n, it is minimized for real n by no such that d dn Pl(~,n)ln=n = o ° and for integral n by [no] or [no] + 1, where [x] denotes the greatest integer not greater than the number x. 3.3 The non-sequential. design (ns-d) case. This is the fixed sample siz~ case wherein the experimenter, before taking any observations, allocates each member of his total sample to one of the two experiments. For a sample of n, I I I I I I I -. I I I. I I I I I I I 25 the s r is \jr o \jrz -2p =0 for 1 < P < n, \jr z -2n E r of interest are Xm, n for 0 ~ m ~ nand n ~ =1, (1). 0, where Xm,n is the n-sample rule which assigns the first m samples to e=l and the remaining n-m samples to Thus X is defined by m,n X (1) o ·G p<m i f m-O and X (1) !.2p otherwise where the subscript m,n has been omitted. lent to X n,n (2), otherwise Note that the ns-nd e r is equiva- The minimum-risk t d r for use with the s r of (1) and e r of (2) is that which yields risk r(i~n,m) 2 L: h(S ,n,m) ~ e-O .. nc + h(i,n,m), where e J L(e,a)d~ A n (a) II !.2n p=l Result 3. I I I I I I I I· I .. (x )dx p --n fee (x ) dx p p --n . The average risk in the class of procedures using (1) and (2) defined, with probability 1, ~ is minimized by using 5(i,m,n) with t d r !.2n by a-O and for some ~m a-1 and for some ~, and p'>m, x p<b and x'<b p x >1 p a-2 and for some p>m, x >1 P if ~ (a) 10 1 if !.2 n if 10 e} t} b<x <1 for 1, ' - p- 12 ' b$X~1 The average risk of this procedure is for l$PSn. and for some p>m,x <b P b<x <1 for p>m; and for some - p- a is the least integer in a with and ~_m; ~ a • max e ~e ~m,x - p <b I 26 ~e(l-b)m[l-(l-b)n-mJ+ min nc+ min P (5o,n,m) 2 e=0,1 e=0,2 [1 - max ~eJ (l_b)n. + Definition. 'e(l-b)n-m[l-(l-b)m J + I I I I I I e A Bayes ns-d procedure at i is the o(i,m l ,n l ) with risk .. As in the ns-nd case, and by an analogous argument, this infimum can be attained. To determine the (m,n) value which characterizes the best Bayes ns-d procedure, the following result is helpful. Result 4. (b) (a) 1 - max, e - min e , 8=0,1 e - min 'e e=0,2 ~ 0. ° and f(x,y) .. c(x+y) + ko~X+Y + k1~x(1-~Y) + k2~Y(1-~x). Then (i) f x - ° ~ ~X+Y(ko-k1-k2) + k1~x = c/(-log ~) (ii) f y " ° ~ ~X+Y(ko-k1-k2) + k2~y .. c/(-log ~) Suppose ~ > (iii) F .. [log 2 (iv) ~(2 (k o -k -k 2 ) 1 ~ Remark on proof and use. possible orderings of [f f ° xx f yx ~ L1~X 0 fXYJ yy (k -k -k ) [1 012 1 2 :] F2 is positive definite. Part (a) follows from considering each of the six ~0'~1'~2' The first three parts of (b) are simple anal- ysis results, and (iv) follows from definition. The identifications and k Z " min ~e show that best (m,n-m) e=0,2 values are within unity of the corresponding solutions of (i) and (ii), since, x-m,y-n-m,k -l-max o e ~e' oJ+ k ~y k l .. min ~e' 8=0,1 by (a) and (iv), the matrix of second partial derivatives is positive definite. 3.4 The sequential, non-design (s-nd) case. In this case, each stage of sampling consists of one trial with each experiment; after observing the outcome of a stage, the experimenter decides whether to stop and make a decision or to observe another stage. \ (2) =-4n-Z =1 for n ~ 1. In symbol., X !.t.n (1) =X (1) 0 = 1 and ••I I I I I I I I •• I I I. I I I I I I I 27 Bayes d r with respect to i in the class of s-nd procedures satisfy a risk equation similar to that of Corollary 2.2.1 of Chapter 2, viz., that of Theorem 9.3.2 of [2]. specification of the Bayes d r •• I is neglected, since evaluation of the risk suffices for comparing the s-nd with other procedures. Result S. The Bayes risk P3(i) in the class of s-nd d r P3 (i) • min [l-max ~e' 2c e satisfies + h] , (1), Evaluation of Bayes risk using (1) can be accomplished straightforwardly in this case because of the small number of possible listed, along with the 1_ I I I I I I I This theorem is specialized to the present case in Result 5; ~ ~ ~ values. These are values to which each corresponds, in Table II. Table II , ~z Set ~ -'02 S2 = (liZ: X z > 1} Sl • (li2: xl > 1} = (liZ: b$x1~l, b$xZ~l} Use of Table II with (1) gives = (1,0,0) = (~0/[~0+~2],0'~zI[~0+S2]) 12 = (0,0,1) 's'1 101 S for li2 in this set (0,1,0) = (EO/[~O+~l], S/[~O+~l],O) i = (~'Sl'~2) I 28 Note that P3(ip) =0 from (1), for p = 1,2, so that the integral over Sp is 0 for 0 ~ p ~ 2. Again (3) where h op • (4) . From (3) and (4) for p P3 = 1,2, (~ ) --op. ~o E,., 2c -J f+f" ' --'"E +E 'b ' o pop min [ (5). Using (5) in (2), obtain ••I I I I I I I _I (6). (6) in (1) gives Result 6. Result 6. 3.5 The sequential design (s-d) case. This is the general case of Chapter 2 in which a rather direct application of Corollary 2.2.1 is used to evaluate Bayes risks in 6. 00 Result 7. • The Bayes risk relative to 1 in 6. 00 , p(E), is I I I I I I I •• I I I. I I I I I I I 29 From Corollary 2.2.1, ~. pel) = min [1 - max e se' c + hl, (1), where, with the notation for prior distributions introduced in Table II, h = = min ([p(lo 3 )(s + s3 )1 b + p(l)(l-b)}, e-1,2 ,-e 0 -e (2) Next, reapplying Corollary 2.2.1, -op ) = min[s 0 !(~ 0 +s p ), s p !(s o+s p ), c + h op 1, (3) h = min J p([la lz ) f~ e (xl) dx l , P -2 -op 1 e 1=1 ,2 X (4) . p(~ where op The integral in (4) is minimized with respect to e the integral is (l-b) l by taking el=p, since then p(~ -op )(~ 0 +~ p ) whereas with e 1 = 3-P the integral is By (3), i f 1_ < min ~o 2-J [ ~,~+~ o pop , then I I I I I I I I· I c + (l-b) p(la p ) , i.e., min ~ ~o 2- cJ So+S p's o+S p' (5). b Use of (5) in (2) and the result in (1) completes the proof. Ad r 0 is now defined, and Result 8 proves that r(i,o) Notation. T for e with p(~) = min [T o ,T 1 ,T 2 ], where o = 1,2. Definition. (a) For brevity, set If p(i) = To' then ~ a = max e 0 = oBi as follows: = 1 and ~o(a) = 1 if a is the least integer in A Define the d r se. ~o Otherwise, 0 is defined arbitrarily, in the sense that other rules in 0 are restricted only by the requirement that 0 be in ~oo. I 30 (b) If = Te p(l) =1 for e or 2, then let N denote the (random) least integer n such that x n is not in [b,l]. The e r X is defined, with probability 1, on [O,N] by X z -2p (i) and If x > 1, then 1jI N !.2p X (e) = 1 - for 0 == 1jI 0 = 0 = 1 - 1jI 1 $ P $ N. for 1 $ p < n, If ~ < band min e-o,e (0) = 1 - 41 !.2N arbi trari ly. (e) < ~e 1, !.2N c $ b; then 1jIz == 1jI o -2p 1 only in case !.2N If ~ (iii) b and ~ < s e ; and o - =0 ° is = l-1jI for 1 $ p < N; !.2N otherwise defined c > b' then let Nt denote the (random) least min e=o ,e integer greater than n such that x ' is not in [b,l]. N p (e) 41 !.2N ° is otherwise defined arbitrarily. (ii) 41 (e) The e r is defined for > N by X !.2p The s r (3-e) - 1 n > N. for is defined, with probability 1 for for is defined at Nt by 41 The t d r (0) Otherwise Result 8. Proof. ° is defined arbitrarily. ~ p < N'. 1 only in case (3-e) !.21Q' P(i,oBi) = p(l)· If case (a) obtains, then the proof is straightforward. For the case (b), evaluate risk as the sum of average loss and average cost, and note that Pre{N=n} = (b-i), p(l,o) = (I-b) c b' n-l b, so that eeN = 00 ~ nb(l-b) n-l 1 Thus in case n-l since with probability 1 this case can obtain only if e=e. c Similarly for (b-ii), p(i,o) = -b + min(s o ,s). e = b' Note that, with probability 1, (b-iii) does occur if 8=e and that if e=0,3-e t.hen N' is distributed identically as and independently of N. so that r(l,o) c = b(2-s ). e Thus for c e = e, r(e,o) = b and otherwise r(e,o) = Comparison of risks derived for the several possible cases with those of Result 7 proves Result 8. I I I I I I p $ Nt by 0 1 - 41 !.2N' x Nt < b. ~ ••I 2c ~, I I I I I I I •• I I 31 I. I I I I I I I Relation of the Bayes d r 3.6 ~ 00 in defines an e r for the two action case. and Chernoff's d r. Chernoff [3] An extension to the present three action framework which seems to preserve the substance of Chernoff's rule is as follows . ... Let e n denote the maximum likelihood estimator of e after n trials and let e I(e,e',e) on e x x E be the Ku11back- Leib1er information numbers as used in [4], i.e., I(8,8',e) • ee log (fee/fe'e)' A three action Chernoff-type e r is: X chooses by some random rule between e=l and e=2 and X (1) = 1-X (2) = 1 o !.2n !.2n if min 1(8 ,8,1) > min 1(8 ,8,2), and some random selection is made if the n n ~e n ~8n minima are equal. If more than one e maximizes the likelihood at stage n then select the next experiment by any random method. That the Bayes rule 5 is a special case of Chernoff's rule, i.e., Chernoff's rule with special randomization ru1es,is seen as follows. Maximum likelihood estimates for cases of concern are given in Table III. Table III. I I I I I I I I· I en z -n 0,1,2 b<x <1 for l$p$.n - pb<x <1 for l$P<n; e =8 - p- n ' x >1 for 8 = 0,1. n b<x <1 for l$P<n; e =e x n<b for - pn ' e 8 0,8 =0,1. Information numbers relevant to definition of Chernoff's rule are in Table IV. Table IV. ~(e,e,e) A e (8,e) (0,1) °1 00 2 o (0,2) (1,1) 00 (1,2) (2,1) o o ° 00 00 00 CD (2,2) 00 00 e * : Chernoff experiment 1 or 2 * min ~'e I(~,e,e ) o 1 00 2 00 I 32 Thus if ee > ~3 -e • the Bayes rule is that form of the Chernoff rule which always chooses e when randomization is necessary. to the Bayesian in the following sense. This would seem satisfying His procedure is not worse than a pro- cedure assuming no prior information, of which it is a special case. On the other hand, it is better in the sense that he performs first the experiment which, in his prior belief, is more likely to result in the decision without loss which requires the fewest possible observations, viz., N instead of N' in the notation of the previous section. 3.7 Numerical comparisons. For given prior distributions, the results of the previous sections can be used to evaluate and compare risks of Bayes procedures with different degrees of experimental freedom. As noted in Chapter 1 and made explicit in Sections 3.2-3.4, the class of s-nd d r, the class of ns-nd d r, and the class of ns-d d r are each a subset of the class of s-d d r. Hence the best, i.e., Bayes, s-d d r is no worse than the best d r in any of the other three classes. _. I I I I I I I The extent to which it is better can be judged in the two cases considered by the calculations of this section. For case In the terminology of the hypothetical example of Section 3.1 this is the case when it is not known whether a target is present to the north or to the south, or whether one is present at all, and when each of these possibilities is considered equally likely. For case 2, ~ = (.5,.1, ,4), By comparison with case 1, this might occur when bad weather decreases the probability that a target is present at all but makes it more likely that the target is in the south if it is present. Risks Pl,P2,P3' and P are calculated as a function of the cost c per observation for these cases and are presented in Figures 1 and 2. In both cases, the risks of all four Bayes d r are the same for very large cost, i.e., cost so large that none of the d r can afford even one trial. Risks are small for very small costs, the case when much sampling can be done. For "moderate" cost, substantial saving can be made by adopting the s-d formulation. For example, in case 2, Figure 2 shows that use of the Bayes s-d d r I I I I I I I -. I I II I I I I I I 33 RISK Ie I I I I I I I .- I COST I 34 _I I I I I I I I RISK O.S _I 0.2 ,{,ute' 2 .{. <.5, .1, .4) COST I I I I ,I I I -. I I 35 I- leads to savings of as much as .15 over the s-nd d rand .18 over the ns-nd and I ns-d d r. I I of the s-d drover the s-nd d r is as much as $150,000. I I I I I I I I I I I II For example, if the units of cost is $1 million i.e., the loss for not firing when a target is present is of this size, then the average saving I .1 I I CHAPTER IV THE EXPERIMENTATION RULE X* 4.1 Introduction. In the remaining two chapters, attention is confined to the "simple hypothesis versus simple alternative" case where each of a finite class of experiments gives information concerning the state of nature. In terms of the elements of Section 2. I, the situation is described by: e fl,2} = A; L (L(e,a): c e ~ e, a € ~ $ 1. L(l,l) = L(2,2) on the subsets of 8 can be denoted by ~=~(1)=1-~(2), Without essential loss of generality, the assumption 0 can be and is made. This chapter concerns the X* e r described in Chapter 1. defined in Section 4.2. I I A}; cost per trial. Any probability measure for 0 $ € I I I These e rare In Section 4.3, it is shown that there is, generally, d r using the X* e r which is better than anyone-experiment, s-nd rule. a The asymptotic optimality of the d r 5 , defined in Chapter I, is proved in c Section 4.5. It will be noted that Theorem 4.3 is valid for general classes E, whereas additional assumptions are necessary to extend Theorem 4.5.2 to infinite E. 4.2 X* experimentation rules. The d r to be considered are all in the class of d r which are non-randomized functions of the posterior distribution with convex action sets that include all such procedures. subset of ~ro' search in ~ * !'" ro ~=O and Two facts are noteworthy. * Let ~=l. ~ First, i.e., the search for a Bayes d r in * ~ denote the class of * ~ is a tractible is much simpler than the Second, Theorems 2.2.2 and 2.2.5 prove that a Bayes rule in is a Bayes rule in ~ 00 for a Bayes d r to the class Therefore, nothing is lost by restricting the search *, ~ whereas simplicity is gained. I I I I I I ·1 I I 37 I. I I I I I I I I_ I I I I I I 6(~1,E2) Definitions. is the subset of 6 * consisting of d r which continue sampling if and only if the posterior probability (that eel) is in (E ,E 2 ) and which use a Bayes t d r. l I; e L(e,a) He) $ is Bayes !,!l6* i f The t d r • a ~~ I; L(e,a')~(e) €A e when a is the decision of $ at posterior distribution The e r s(2», 0 $ s(l) $ 1. X* • X* (~I,E2) is defined for given s r wand t d r $, viz., rules such that (W,$,X*) is in 6(sl'~2)' Hence, the best d r using X* is the one which, by choice of (sl,E ),minimizes the risk among such d r. 2 Definition. Let r(s',e) • r(s'. e,sl,s2) be the Bayes risk at of s-nd d r in 6(SI,s2) which use experiment e € E at each stage. e(s') • e(I',sl,s2) be an experiment in E satisfying r(s',e(s'» Define X* = X* (sl' s2) X* (e(s'» !.2n = I if and only if o* 0* (s,sl,s2) is the d r for prior probability e r X* (El's2)' Sz = -2n in the Let • min r(s',e). s'. S in 6(sl,E 2 ) which uses 4.3 A result for general convex stopping sets. provides some indication of S' E by The theorem of this section "goodness" of the X* e r in the class 6(sl's2) for any (El,sZ) with 0 < E < E < 1. In particular, the corollary proves that, in l Z 0* has smaller average risk than any single experiment s-nd 6(sl,s2)' the d r d r. Further, the best s-nd d r in 6 (X) is known [16] to be in 6 * ; suppose it is in 6(E * ,E *). l 2 * * has risk no larger The corollary proves the X* rule X* (El,sZ) As usual, dependence on (sl,E Z) is suppressed than that of the best s-nd d r. for brevity. Definition. as 0* and with e r For n ~ 0, let 8(n) be the d r with the same stopping sets X(n). x(n)(s) defined by • *z X for -2p and for I (~(l), p ~ n. I 38 Let p* be the risk using o(n) and p* the risk using 0*. n Thus 0 (n) is the d. r. which "follows" 0* for n trials and uses the same experiment for all trials after the (n-1)st. Theorem 4.3 * * If 0 ~ ~ ~ 1, then Pn+1(~) ~ Pn(~)' A. If condition 3 of Theorem 2.2.6 holds, then p* (~) • lim n-oo B. (E1'~2) [Note that dependence on stopping values * Pn(~) for 0 ~ ~ ~ 1. is again implicit but suppressed. * Proof (Part A) Let Zn be the set of Zo points for which ep+1 is the experiment of X(n) for each z2 and p < n. -p . ** {!. Z n € * Z : n Then where ~ * denotes complementation with respect to Zn' integer p with = 1, i.e., the decisive sample ~ !.2p size for!. using d r d~ e,x () (!.) n o(n). d~ = e,x If z = (z z(2n» then -2n' , (n)(!. (2n) I!.2n) n IT f (x) dU(x ) a.e. ~ ()' p p=l e ,e p p e,x n for B=1,2, so that a.e. n IT f (x) p=l e ,e p p rule for selecting e 1 ~(e) at prior distribution f dU(x ), p Therefore, !.2n nJ [f{nc xz a + r(~ ,a(e »)dE (e)]£ (x )dun ~ n ~ ~e-n ' ~n -n P=l2p where o(e) is the s-nd d.r. using experiment e (with the given stopping sets) and f t (x )dun = s~ -n Z*n ~ e ; f (x) dU(x )~(e). p=l e e p p p is the same using either o(n) or 0(n+1). By definition of o(n) and 0(0+1), On z**, n Theorem 2.2.1 gives the J .-I I I I I I I _I I I I I I I I I II I I I I I I 39 l'fsk'contribution ~s f (n+1)c + f r(~ * X ~2n+2 n n , o(e »f t (x )du(x )} f t (x )du using o(n), (1), n sen n s~ -n ~2n n and f (n+1)c + r(~ f X* ~2n+2 , 0(en+1» ft s ~2 n n (x )du(x )}f t (x )dun using 0(n+1),(2), n n se-n en+1 -n where X* is the set of outcomes possible at the first n trials of a point in n ** Zn * * But by definition of Xn , i.e., of X , use of en+l at trial n+l minimizes the integral over X in the expressions (1) and (2) for each ~2n' In summary, risk contributions on z** are the same for o(n) and 0(n+1), while on Z** the n n risk contribution using o(n+l) is uniformly no greater than that using o(n), to prove part A. (Part B). Suppose € > O. Theorem 2.2.6 insures that there is a number M eeN < Musing such that, for &=1,2, 0* or o(n) for n ~ O. Also, there is an ! Pre(N(o,z) > Tj}E(e) < €!M for 0=0* or o=o(n). On 8=1 - Let z* denote those Z Z** ,risk contribution using 0('1]) is given by (1). integer 'I] such that p = 'I] o 'I] I points for which ep+l is the experiment of X* for each p and I On "'** Z ,the contributions using either o(Tj) or 0* are, by definition, the Tj I I I I I .- I "'** Tj contribution on Z* -Z ~2p' The risk using 0* is given by (2) with o(en+l) replaced by 0* same; these are the first integral in P* n to complete the proof. Corollary 4.3.1 If condition 2 of The~em 2.2.6 holds, then in ~(El,E2)' P* is not greater than the risk of the best (i.e., minimum risk) s-nd d r. ~. The risk of the best s-nd d r establishes the corollary. 4.4 * so part B of the Theorem is Po' * A partial characterization of the X c 1 (I+C' I+C) rule for small c. The lemma of this section will be used in proving the asymptotic optimality of the d r 8* which uses a X* e r. c Definitions. o(e) = o(e,c) is the d r in c c +c +c ~ =~-l--'-l--) c which uses experiment I 4lJ e in each trial. 5* is the d r in 6 which uses e r c c * c 1 X (Hc' 'i+c')' X* c n L(z2 ) = ~ log(f 2 /f ). - n p=l ,e p 1 ,e p Lemma 4.4 1. I I I I I I L(e,a) is finite for 8,a=1,2. e x E, m ~ 1(8,3-e,e) ~ 2. There are numbers (m,M) such that for (e,e) 3. There is only one experiment, ee (say), in E which maximizes l(e,3-e,.) on E. 4. There is a number B such that, with d r € 5(e) in 6 2N c M. c ee(L(~ ) - log 1 -L ~ 1-e B and ~ uniformly in 8,e, and c. B (This requires that the average "overshoot" of the action set boundaries be uniformly bounded; see Wa1d [15, Appendix A3.2].) > There are numbers (A1,A ) such that if 2 X* c'~2n (e ) 1 1 (e 2) 1 if ez Al < E r > 0, then c 1 > 0 exists with < -2n [ + 1-£ 1 c 2 --I (1), and X* c'~2n if c 1-£ 2/ [ + '_£] 1 c 2 < e.[2n < A , 2 I (2), whenever 0 < c < c . 1 Proof. Proof of (1) is given; (2) is proved analogously. that there is a Al such that, for c sufficiently small and A < 1 the best s-nd d r in 6 c uses experiment e . Writing N for N(5(e),.[) and 1 e' At e', e' < [ l+c l-~]-l This can be proved as follows. to represent any possible prior distribution, de' ,5(e» the s r of 5(e) is to stop as soon as I The lemma asserts (3) . , I I I I I -. I I I. I I I I 41 Thus by Chernoff's Lemma 3 [4], if the prior distribution is f', then where L= max L( e,a). exA By Wald's equation, if the prior distribution is -log c -log ~ + ZC(l;;') (log c-B) I < - ( 1,Z,c ) elN $ $ (Z "1 e ) From (3), for each e E and 5(e) € € eZN $ 6 ' c ~ s'c -log c +log ~ 1-5' +B 1(Z,l,e) r(~' ,5(e)) ~ 1(l,Z,e) [-log c - log l-~' + + and with e I •• I -log c -log ~ 1-5' +B 1(l,Z,e) and I I and 5(e) is used, then -log c +log ~ l-f' + M l-S (log c-B) I I I I I ~' = (1-5')C Zc(l-E') ~' ~ (log c-B)] ~ 1(Z,l,e) [-log c + log 1-~' + l_~I(log c-B)], + (4), e , the experiment which maximizes 1(l,Z,.), l :;; -log c -log ~ 1-5' + B ~ ~ C~' [ 1(1,Z,e ) + L s' ] + 1 ~ + c(l-~') -log c + log 1-5' + B + ~ ~ [ L l-s' ] , 1(Z,1,e ) 1 (5). Let 1 1 1(l,Z,e) 1 1(Z,l,e) 1 and D 1 = Then, using (4) and (5), the difference in the risk by assumption 3, D1 > O. using e and that using e 1 is I 42 r(~',o(e» r(~',o(el» ~ - [~'Dl [lOg c + log 1:;'] -c 2 (l-~')/m] A + 2c [log c - B]/m - cB/m -cL = Since B B is a concave function of c where lim c=o Rl(c) = O. 2 then B ( 2+mD c For E > » c ~'. say. [-Dl+ -lJ -c (x) Bc(~'), + x(l-x) I-x mx Also, Hence, there is a c > ll 0 such that if 0 < c < c ll ' -2n l 0, let ~ I I Then E,C _£ D clog c [1 + h (c)], 2 4 1 if [ where lim R (c) = O. 2 c=o B c (~ E,C I-£' 2 ] D l 2 - ~ / Hence c ) > 0, i.e., X* c'~2n 12 (e l ) (Hc E 1-2 ~2 ' (6), > 0 exists such that if 0 < c < c =1 _I 1 ) if S 12 ' then SE,C ~2n Since B is concave and positive at both c S' = (1) 2/ (2+mD ) and S' = SE,C' for each c < c l ' l between 2/(2+mD ) and l proves (1) with Al 4.5 = ~ E,C, i.e. , X* ~2 uses e l B is positive on the interval c is in this interval. i f ~z This -2 2/(2+mD )· l * Theorem 4.5.1 specializes a result of Asymptotic optimality of 0c; Chernoff [3] to the situation of this chapter. Theorem 4.5.1 Let 0c' ad 1 trial c, be defined for 0 < c < 1. lim inf c=O r(Lo c ) -c log c ~ for a given dec~"i0r. Then if 0 < sup Iil,2,e) E s< proh1pl"l with cost per 1, 1-5 + sup I(2,1,e) E I I I I I 1 i f ~z 0, i.e., _. I I I I I I I -. I I II I I I I I I 43 Proof. This is Thr.orem Z of [3]. By virtue of this theorem, asymptotic optimnlity of a class of d r is defined as follows. Definition. Let ~ c , a d r for a given decision problem with cost per trial c, be defined for 0 < c <: 1. optimum at ~ that if C if lim sup c=o Remark. Then the class {f, : (kc<U is :lsymPtotically r(s,O ) c -c log c 5 Asymptotic optimality of ,I 1-5 sup I(2,1,e) E fa *c : O<c<l} can thus be proved by showing Ci> 0, then c l > 0 exists such that if 0 < c r(8,0 * ) -c log c I(8,3-e,e A) c where ee is as in Lemma 4.4. (1+"') '-" ~ c and 8=1,2, then l , (1) , For this is equivalent to r(8,f/<) lim sup I I I I I I I + sup I(1,2,e) E Theorem 4.5.Z. Under the condi.tions of Lemma 4.4, asymptotically optimum at ~. (1) c -c log c for 0 < ~ ~ ..'e fa c : 0 < c < l} is < 1. is proved for e=l, the proof for e=z being c.,)mpletely analogous. The theorem then follows by tho preceding remark. The numbers m, c ' Al,B are as in Lemma 4.4. l are given, and let Consider c E E = ma. (O,cZl, where ~ppose a> 0 and For notational convenience, let C z satisfies C z E (O,c ] and ~ l E SE (0,1) S = log(s/(l-s)). log c 2 ~ log«l-Al)/A ). l Partition the sample space Z into components which differ in the way o termination is achieved. Z points which do not lead to termination are neglected, o since Theorem Z.2.6 proves that this set is null. diagram of Reference to the "schematic" Figure 3 may be of help in following the argument. I 44 o •.j1 >.J\ >.J\ >.J\ C) C) U + bO .... s:: .,-l .....0 0 Po ,... + + + s:: L' C) bO .....o 0 0 bO bO ..... 0 ..... ""' wl..;t ,,,IN wl..;t 1\1 I >.J\ + -rl.. . '-' N + bO ..... ..... >.J\ NI '-' ...:l ,.< I ..... 0 Corresponding point ~z -2n ..... -L He I Al ~ He Figure 3 C~: for some n, L(~2n) ? -log c + t; &, for p<n, L(~2p) > i log c + n and ZBc = (~: for some n, L(~2n) ::; ~ log c + t; &, for p<11, Let NAc(~)' Nlc-(~)' defined on ZAc' be the least n such that defined on ZBc' be the least n Z B1C and ZB c 2 {~ f ~lIch that i log c + t; < L(~2p)<10g e+O. L(~2n) ? log c, and c L(~2n)::;"2 log c. E Consider for some n > Nlc(~)' L(~2n) ? i; log c + ( &, for p<n, L(z2 - p ) > (l-~E) log c + for some n, L(~2n) ::; (l-\E) log c &, for NB(~) ::; p < n, L(~2p) < 4 log c Let Nc (z) - be the least n such that "- (log c + t; , -log c + and, on ZBc' N2c = NBc - Nlc and N3c = Nc - NBc' L(~2n) "A' 1;), The average sample number (ASN) of 5 e when 8=1 is _I t} t} I + t; E I I I I + Let NBc' defined on ZBc' be the lea3t n > Nlc such that L(~2n) ? ~ log c + or L(~2n) ::; (l-~€) log c. I I I I Define ZAc --I t; I I I I -. I I I. I I I I I I I 1_ 45 e l Nc = el{Nc(~)I~ e ZAc} Prl{~ € ZAc} 3 + eli E N (z) Iz e ZB } Pr { Z } + 1 ~ e Blc p=l pc - 1c 3 e l { EN (z)lz e ZB } Prl{~ e ZB c}, p=l pc - 2° 2 + By Theorem 2.2.6, there is a b (2). 2 > 0 such that the first term is S e -b /(l-e -b ). Next, is not greater than the average of the number of trials, n, such that which by Theorem 2.2.4 is s -,e log c + (B + ~2. m ' the term B/m here is a bound on the average "overshoot" of \e log c +~. Now note that, from Lemma 4.4, B* uses e at trial n+l whenever l c l-A l (l-\e) log c + ~ S L(~2n) Slog -A--- + I I I I I I I I· I + S, 1 and, in particular, whenever Let k(~) k = k(~), and ZB C on ZBc be the "overshoot" of ~ log c + S by L(~2N ). For any Bc the sum of the integrals, i.e., contributions to elN ' over ZB c lC of N is just the ASN of a s-nd d r using experiment e and terminating l 2c 2 as soon as the log likelihood ratio SO satisfies SO i «l-e) log c -k, ~4 log c -k). n n By Theorem 2.2.7, this is, for any k, S -(l-e) log c + B I(1,2,e ) l { } Pr l ~ e ZBc The expectation e{N3c(~)I~ e Blc } is, by Theorem 2.2.7, not greater than an upper bound on the ASN of a s-d d r which stops after the first trial at which L(~2n) € i (2 log c, -(l+t) log c). Such a bound is I 46 -2 log c +B m Similarly, -, € log c +B m $ Required bounds on probabilities are Prl(~ € ZB C} $ 1, 2 Pr l ( ~ € ZB1C } $ Pr ( 8 0 0 terminates SN ~ 4€ log c } $ c \€ by Chernoffls Lemma 3 [4]. Also from this lemma, i t follows that I I (3) . * To bound r(1,8 c )' use together with (2), (3), and preceding facts to obtain, for some numbers b i > 0 and b 2 > 0 and for 0 < c < c 2 ' * r(l,8 ) c (average-cost bound on ZAc) (average-cost bound for Nlc trials on ABC) I -b + + l ce (1_e- bl )2 -\€ log c + cB m c l~ 4 -2 log c + cB m (average-cost bound for Nc-N lc trials on ZB + -(l-€) clog c + cB + -\€ clog c + cB I(1,2,e ) l m - clog c $ - clog c 1 I(1,2,e ) l +~+ R ( ) 2 _I I I (loss bound) $ ct .1 I I I I I 3 c lC ) (average-cost bound for Nc-N lc trials on ZB1C) I I I I ·1 I I 47 I. I I I I I I I 1_ I I I I I I I I· I where lim R (C) • 0, so that there is a c l E (O,c 2) such that if 0 < c < c 1 ' 3 c-o then * 1 r(l,B c ) $-C log c [~I~(~1~,2~,-e-l~) + ex], to complete proof of (1) for 9-1. beginning of this proof. The theorem follows as described at the I ••I CHAPTER V AN APPLICATION: THE CASE OF LOST LABELS AND THE NO-OVERSHOOT X* RULE 5.1 Introduction. The elements of the decision problem in this chapter specialize those of Chapter 4. Here e ~ A - E • {1,2} and L(e,a) > 0 if and only if e ~ a. The probability structure involves two probability densities, gl and g2' with respect to a given measure u on a given measurable space (X,S). fe,e - gl if e~ e and fe,e • g2 if per trial is a constant, c. e~ Specifically, e, as illustrated in Table V. The probability measure Cost I I I I I I S on the subsets of e can be denoted by s - s(l) - 1-s(2). Table V ~l f e,e ~2 e~l e=2 This, for example, is the structure in the problem of target detection in Chapter 3 when (a) radar readings are random variables with density gl if a target is present and g2 if no other target is present, and (b) a target is known to be present, but it is not known whether it is in the north (e-l) or in the south (e=2). In another practical application, this situation can be called the "case of lost labels", because of the following interpretation. A physician is treating I I I I I I I •• I I 49 I. I I I I I I I disease which can be cured quickly using drug 1 and less quickly using drug 2, the performance of drug d on any particular individual measured by a random variable with density gd' Suppose there is a supply of each drug, but that the labels on the two supplies have been lost. The physician must determine, by experimentation with patients, which supply contains drug 1. called sl and s2' then he is working in the present framework if the cost of administering either drug is c, and where, for p=1,2, eap if drug 1 is in supply sl' and e-p if the drug from supply sp is administered. Note also that the S, E, and probability structure here are the same as in the generalized two-armed bandit problem, as solved by Feldman [5]. example of this chapter, the e r for the Bayes two-armed bandit is just the opposite of the Bayes solution here. For this lost-labels case, a "no-overshoot" explictly, i.e., each of the probability measures X* explicitly. r(~,5(e» I However, as indicated in Chapter 1, the loss-cost structures differ, so that in the numerical ~2n I I I I I I I I· If the supplies are of Section 2.1 is defined By a no-overshoot X* e r is meant one in which the quantities are evaluated only approximately, the approximation introduced by "neglecting the excess over the stopping boundaries", i.e., assuming the posterior distribution when sampling is stopped is either exactly c 1 or exactly c 2 . A method of evaluating the risk of s-d d r in 6(c ,c ), adapted from the 2 1 work of Whittle [17], is sketched and applied in an example with certain binomial distributions to obtain no-overshoot approximations to the risks of d r using ,,* X e r and Chernoff e r. It is shown that, for some special parameter values, ,,* a Bayes solution in this binomial case is a X -type d r. 5.2 "* Specification of the no-overshoot X* rule in 6(c2,cl~)-L~X_ Notation. For convenience, let 1(8,3-8,8) J X I 50 and ~ = 1(3-e,8,8) • J (log X g2 --) g2 duo gl The no-overshoot approximation to be used, as indicated in the previous se~ tion, can be written as log (l-C l -1-) .. t c L(!.2N) • 1 log ( where ~ l-~ l l-c 2 -1-) = t c2 l-~ l 2 if if is the prior probability that e=l. The no-overshoot approximations of Wald [15] to error probabilities are " ex these do not depend on which d r o(e) is used. Approximations to average L(!.2N) values are and .. ( l-c c l-c -12 -1-) + (l-c )(~-c ) log (-c---l 1 t) J l-~ 1 2 1 -s 2 To approximate the risk the approximations above are used to give ••I I I I I I I _I I I I I I I I •• I I I. I I I I I I I Ie I I I I I I I I· I 51 l-c 1-c + I(Z,l,e) c ((cl-~)(l-cZ) log(--Z -1-) + (l-cl)(~-cZ) log(-c-l C l-~ z 1 -1- l-~ )~1, Alternatively, this result is obtained by the methods of [17). Definition. d(~) - ~(~,8(Z» - ~(~,8(1». I if ,,* X !.Zn (1) • { ~ !.Zn ~' and * The no-overshoot X e r d(~') ~ in 0 o otherwise. By (1), after some simplification, d(~') - I-x for 0 < x < 1. where g(x) = log --x Theorem 5.Z posterior ~' A. If If If then ...* X uses e=Z for those, and only those, ~ >~, ~ =~, ~'-cZ > g(c Z) + --_-[g(c ) - g(c Z»)' c C l l z ,,* then X uses e=Z for only those posterior ~' satisfying l'-c Z < g(c Z) + ---- [g(c l ) - g(c Z»)' cl-c Z ,,* then X uses e-l for each trial. g(~') C. specified as in Theorem 5.2. satisfying g(~') B. ~ <~, * is Hence ~ Two simplifications are worth note. First, "* X in ~(cZ,cl) can be speci- fied in "cook book" form, with no calculations required of its user, by use of Figure 4 as follows. (1) . I 52 _. I I I I I I I _I Figure 4 I I I I I I I -. I I 53 II I I I I I I Ie Connect the ordinates g(c l ) and g(c ) with a line l(cl'c ), say. 2 2 (bA ) If ~ < and the posterior probability ~. is in (cl'c ), then take 2 (a) 6.F another observation, using e=2 if and only if the curve g is above the line and the posterior probability ~. is in (cl' c ), then take 2 another observation, using e=2 if and only if the curve g is below the line (bB) I If ~ > 6, J(c l , c2 ) at ~'. "* Second, the facts of Lemma 5.2 concerning g aid in specifying X • Lemma 5.2 A. g' < 0 on (0,1). > 0 on (O,!) and gil < 0 B. gil c. If 0 ~ on (!,l). <!, then g(tt-x) x = -g(~x). For (c ,c2 ) with 0 < c2 < c l < 1, there is at most one ~* in (c ,c ) satisl 2 l fying h( ~*> = 0, where _ x-c 2 hex) = g(x) - g(c ) [g(c ) - g(c )] • 2 cl -c2 l 2 ~. A, B, and C are easily proved. Proof of D, which is only slightly D. more involved, is given. case I I I I I I I I- ~ •• I( c l ' c2 ) at two 5* >! is treated similarly. Since zeros on (O,~), viz., c2 and t; and and h* = g-f. On ~ downward on [2,1). at Suppose one such ~* exists and suppose ~* < -!, (!,c l ), h,* i ; the h is convex up on (0, ~], h has at most ha) > O. Define f(x) = ~ cl-~ g(c ) l > 0, since h*(-!) = 0 = h*(cl ) and since h* is Ollnvex ! But h-h.* = f *-f > 0 on a,c ), since this is linear, positive l and 0 at c l ; i.e., h > h,* > 0 on [!,cl ). Therefore, there is at most one value 5* of the posterior probability such that one experiment is preferred for all ~ > 5* while the other is preferred i f 5< "* X * ~. If no such * exists, ~ then the proof of Lemma 5.2 shows that uses one experiment at all trials. Note that if c +c =l, then l 2 ~ * exins 5.3 Some Fourier anaJ,ysis. Consider the lost labels problem with l, L(a,a) = { ! and equals "2. o, (1). I 54 This type of symmetry, i. e., errors of either kind being of equal seriousness, is also part of the structure of the two-armed bandit problem in [4]. As usual, p(~) is the Bayes risk at prior distribution~. From Corollary 2.2.1, experiment e is preferred for the first trial at prior distribution ~ o (hence also when ~ !en= ~ 0 ) if and onq i f R (s ) < R7: (~), where e a ~-e 0 R (~) = f p( ~ ) f t (x) d\)(x) • e,x s,e e X From the symmetry of the problem, some useful facts can be deduced. First, p(~) = p(l-~). For let 8 be a Bayes rule which is a non-randomized func- tion of the posterior distribution for the given problem. 1em forJlD.l1ated in terms of e* = 3-e, Consider the same prob- e* = 3-e, and a* = 3-a. The probability den- sities are again as in Table III and the loss function as in (1), except with * e replaced bye, e replaced by e*, and a replaced by a*. This revised problem thus has the sa.me formal structure as the original one. Hence 8 is a Bayes d r in this case also, and p(s(e» = p(~(e» * = p(l-He» (2). , Suppose 8 uses experiment e when the posterior probability that e*=1 is ~. This means that 8 uses experiment e* = 3-e when the posterior probability that e*=1 is ~, i.e., using 8, e(l-~) = 3-e(~). Moreover, by (2), it follows from Theorems 2.2.5 and 2.2.9 that there is a Bayes rule in ~(l-cl,cl) for some l in [i,l). Let ~t denote the class of d r which are in 6(c ,l-C ) for some cl in [-!,l) l l and for which, i f i < ~ < c and if e( ~)=e, then e(l-s) = 3-e. Any such d r can l c be denoted liz, where E = C'Z1''E2 ), as defined immediately preceding Theorem 2.2.5. Synmetry arguments like those above show that i f 8 E ~t, then r( s, B) = r(l-s,B). Suppose ~ and where £ ~t, and let ~ = ~(~) = log l~S , Y ::I y(x) = log[gl(x)/~(x)J, _. I I I I I I I _I I I I I I I I -. I I I. I I I I I I I Ie I I I I I I I I· I 55 FeE(~) = 0 Note that FIE(~) = e~ F 2E (-s) if ~f e~ if - - ; " E =3 (E). l+e ~ e e 0, since r{£,~) = r(l-~,~). Therefore, n1eorem 2.2.1 implies (3) , eS l+e if ~ £2: 2 , llere, G is the density, 1ilth respect to U* (say), of y = y{x) when x has density ~; g(y) = g{y). eY; H,.(tl • and HzeCtl • l l -e (l+etl i [ minc:,etl-min(l,et+Yl] aCyl dUOCy) + eS if:=! l+e ,...., £ '::'1 otherwise ; -e (l+et, + e~ i [min(l:etl-minCet,eYl] gCy) dUOCy) if ~ l+e £ 32 otherwise • Forming Fourier-Stieltjes transforms in (3) and (4) gives two linear O equations in the transforms of FJE and F • let f denote the Fourier transform 2E of the function f. Then F~ and F; are given by ,p (e) [l-go(-e)] + uO (e) gO(-e) F~ ( e) = '':I.E , ''2]: ( ) 5 , 1 - gO(-e) - go(e) F~(e) mlere Also, hO(a) = f = ~(e) g'(e) + ~(e) 1 _gO ( -a) _ eie.Yh(y)dU*(y) if h F~(e) = [l-g<'(a)] (6). SO( a) = g,g . F~(a) + F~(a) , (7). Equations (5) - (7) provide solutions, at least in concept, to the problems of evaluating risks of a given d r in 6.' and of searching for a Bayes rule. If the Fourier inversion FE of F~ in (7) can be found, then Whether or not ~ is Bayes is easily verified, since ~ is Bayes if and only if I 56 satisfies the Bayes risk equation of Corollary 2.2.1 written in terms of~. The prospective usefulness of this criterion, however, seems limited. One of the simplest discrete distribu~1on ~ cases uses binomial distributions, where is the density of a binomial random variable w:I. th success probability TIp' i. e. , ~(l) = Tlp 1 - ~(O). In this case, ~(8) (1+e i8) is analytic, and the denomi- = nator in (5) and (6) is If the residue theorem is to be used to evaluate the inversion theorem integral, then the zeros of (8) must be found. This itself is a difficult problem. For special binomial random val"iables, however, the roodified Fourier analysis of [J.7] leads to discovery of a Bayes rule, as proved in Section 5.5. The same diffi- culty of inversion arises in one of the simplest cases ,dth continuous random variables, viz., when ~ (x) fez) = 1 + l (l-r) 2 e-\x. In this case, one must find the zeros of 2 + Z r =\ ~ sin(z log r) - 2 cos(z log r) r + iz (l-r) r l1here r = ~/'2 2 + [1 + cos(z lOS r) ] , and sin and cos are the complex trigonometric functions. Equations (5) - (7) provide a second necessary and sufficient condition for % to be Bayes. FE(~(s» This fo11o'-IS from the fact that r ( s, ~) < r ( s, ~ , ) if and only >FE'(~(~». Therefore, r(~,~) is the Bayes risk at s if and only if, o '1110 for any E' such that ~, € 6', FE-.l'E' is a non-negative nn.l1.tiple of a characteristic function of a probability distribution. Although this criterion does not seem practically applicable in proving proving that some ~, € ~ to be a Bayes d r, it may be of use in 6' is inadmissible by finding another such d r that F~-F~, is a positive multiple of a characteristic function. Use of (5) and (6) in (7) gives, formally, ~ such --I I I I I I I _I I I I I I I I -. I I 57 I. I I I I I I I Ie I I I I I I I I· I 00 E [go(e)+go(_e)]n n=O and co FE = E n=O [ (HlE+~) + hl * (~-~) ] * h~ , "'here * "There h:I.(y) = G(Y) - h (y) 2 = g(-y) C(y) + g(-y) denotes convolution and 1-There f"*n is the n-fold convolution of f itself. Since ~ is non-neGative, a Bayes d r HUh in t,' is one which maximizes, by choice of E, the function for all ~ simultaneously, if possible. 5.4 Some characteristics of the Bayes stgPping rule. The results of this section were indicated to me by W. Hoeffding. They are included here, since they provide an interesting cl~acterization ot the Bayes s r for the problem of thlG chapter. Theorem 5.4.1 holds for general s-d problems. The other results arc for the lost labels problem "lith the symmetric loss structure of equation (5.,.1). Theorem 5.4.1 pes) > min [ po(s) , c/A] , "There A = 1 - inf J 10f fee (x) dU(x) • eeE X eee ' Note that in the lost labels case ! J A= Ig1(X)-~(X) I dU(x) • X ~. Since pes) a.nd = inf J r(e,8) BeD. 8 '" ds(e) inf J p(~ ) f t (x) dU(x) = Inf J [inf r r(e,B) eeE X e,x !o,e eeE X BeD.",t3 > 10f J - eeE X p(~) 10f f e e€6' e f e .~ (x) d~( e) ( ) ] f t (x)dn(x) ~,e x !o,e (x) dU(x) , I 58 it follows, from Corollary 2.2.1, that if p(E) < p (E), p(E) > then ciA. o Theorem 5.4.2 A necessary and sufficient condition for the existence of E such that a Bayes drat s can take at least one trial is that ~. Suppose that c > ~A • Then Corollary 2.2.1 po (E) <- ~ < po (E) <- ,p(E), d r which takes no trials. p(~) 'S p0 (E) <- p( E) ciA, so that it follows that p(E) 'S Next, suppose that c ~A • ~ (E) • 'S p( E) 'S ~, By Theorem 5.4.1 and since !:ito includes min [ Po (E) , ciA ] c = p (E) • Since by and there is a Bayes ,0 ~A • Then < rl(~) = c + ~l-A) < ~ = - P 0 (~) , so that a Bayes drat E = ~ nmst take at least one trial. Definition. If 8 is a non-randomized function of the posterior distribution, define the continuation ,.., !!:i w C of 5 by ,...,,..., ~C = E.:.. e • E Theorem 5.4.3 If 8 is a Bayes d r which is a non-randomized function of the posterior distribution, then Since, in the symmetric-loss lost labels problem, p is symmetric and since each -* ~ a is convex, it follows that if ,-, Uc is not empty, then If I€I 'S 1 ':' "2 € u C. (A-2c)/(2-A), then l+€) _ Po (~) <c _~ (1-"'"' _ ~ < 2 2 f\J 2 _ 0, Pl ( 2 to establish the first inclusion. If ciA'S p(E) E ,.., €uc' then po(s) >p(E) so that, using Theorem 5.4.1, ~ po(E) = min(s,l-E). Hence, if min(E,l-E) < ciA, then E ¢ 3 c' to complete the proof. 5.5 An eX!ll!Ple. In this section, an example where gl and binomial densities with a special property is considered. certain c, a x* d r is Bayes. I I I I I I I _I I-c) ( ...., (c c) l-A+C ( ~'2-'X ~C C A' l-~ • ~. _. ~ are In this case, for For this special case, an extension to s-d d r of Whittle's method [17] of evaluating the risk function is sketched and used I I I I I I I -. I I I. I I I I I I I 59 to compare no-overshoot approximations to the Bayes risk of the X* -type d r and the risks of two Chernoff-type d r. Conditions under which this method of evaluating risks is justified in more general problems are of interest, although no such justification is given here. Solutions by this same method in more general cases of interest are presently being sought. In the special case considered, ~1=gl(1)=1-gl(O)=.7457 and ~2=g2(1)=1-g2(O)=.5808. The special feature of these densities is that (1); in particular, (2). The methods applicable for the special g p values here extend without modification to any binomial distributions satisfying (1), but probably no further. The locus of "success probability" pairs (11 ,'1 ) which satisfy (1) is shown in Figure 5. 1 2 '11 t I I I I I I I •• I 19--------ii-"1. 1 Figure 5 The simplification which results from (1) and (2) is because of the fact that if any ~ € ~ is the prior distribution, i.e., prior probability that 8=1, then for Zo and any n, ~z -2n where k is an integer. bution is ~. H [1 + ~ e \k] -1 , This is true since if, before a trial, the prior distri- then possible posterior distributions after the trial are as given in Table VI . I 60 Table V.. Posterior distribution after trial (e,x) e=l e=2 x = 0 [1 + l~s e\(l [1 + l~s e -\(1 x = 1 [1 + l~s e-~]-l [1 + 19S e~(l The loss function is L(e,a) = 0 or 1 according as e= a or e~ a. In the remainder of this section, attention will be confined to cases where the prior distribution is of the form [1 + e ~k -1 ] for k an integer. The analysis for any other prior E can be carried out by the same method. Let E = [1 + e tk J -1 k With the in ~(c2,cl) (~1'~2) value of the example, are specified as follows. so the "* X and Chernoff e r ~ >~, ,,* I f c =1-c , then X (1)=1 if and only i f !.2n c X (1)=1 if and only if E =E and k ~ O. Let X denote the Ez < - \, Le., !.2n z k E -2n -2n Chernoff e r [4] when the prior distribution is E. If the prior distribution is 2 ,,* l Ek ., then X~ '!.2 (1)=1 if and only if E ~ Ek ., i.e., if and only if Ez ~Ek" 'ok' n !.2n -2n i.e., if and only if E =E and k> k'. z k -2n From Theorem 2.2.1, the risk rUk'1"~ ) of the d r k uses ~ 5~ k in 6(c 2 ,c ) which l is the solution of the following functional equation at s=k: k c + r(E s+2 )f ,1(0) + r(Es_l)f ,1(1) Es Es s ~ k, Es if € (c 2 ,c l ), (4) r(E s ) - (3.2) . Similarly, the risk r(E ,5* ) of the d r k is the solution at s=k of the functional equations (3) - (5) with l-e of c , and the condition "s 2 ~ kIf replaced by "s ~ l in place 0" in (4) and "s < kIf replaced by "s < 0" in (5). Let N be the smallest integer such that EN 2 2 ~ c 2 and N the largest integer l _. I I I I I I I I I I I I I I -. I I I. I I I I I I I 61 such that ~N ~ Ct' Then, following Whittle, it can be established that a solu- 1 tion to (3)-(5) is I min(~.l-n r(S) = the functions r l if r (S) l if ~ r (S) 2 if ~ ~ = ~s = ~s and k and s ~ t (N l ,N 2 ) s < N2 ~s and Nl < s < k; and r 2 are given by 109(l~S)(t¢ - H) + r (0 l c r2(~) c log(H)(..1 _ H) + ~ f ~v(l_~)l-v dfl(v), f ~v(l_~)l-v df (v), 2 (6.1) , N l and ~ ~ 4r N2 where 1 (v: N e L: x=o (f l ,e (x» v (f 2 ,e l-v (x» = l} and f (v) is determined by the requirement that it be a measure on a class of N e e subsets such that (3.e) is satisfied, for e=1,2, and such that I I I I I I I I· I (7. 1 ) and r2(~k_l) • c + rl(~k f~ sk_l' 2(1) + r2(~k_])ft 2(0), . sk-l' The solution to the corresponding equation for *) r(~k,,5 with 5 is the solution of the preceding paragraph with N =-N,N 2=N, k=O, and l 0.2). -:. in 6(~N'~_N) ~=~kl' For the present, this "solution" is useful, since it suggests a possible form of the Bayes solution and since it will provide an approximate evaluation, i. albeit inadequately justified, of the risks of 5 and C 5~. Rigorous justification of the method is of interest but is not given here. There is, in fact, a 5* rule which is Bayes for various values of c, as proved by the following result. Theorem 5.5. If c is the cost per trial and there is an integer N satisfying I 62 1 !z;N(~ 1 + + _) Ar ~ 1N Tl 4 +1 1 _e_ _ (_ Tl l -Tl 2 ~ 1 c 5* in 6(E ,E_ ) is a Bayes d r. N N then the d r The first step in the proof is to verify that this 5* has risk Proof. ••I (8) , where and !z; (~ -!z;N _ e~ I I I I I I _I ). This is accomplished by showing that the function in (8) satisfies the functional equation relevant to 0* in Theorem 2.2.1. Specifically, the following can be verified straightforwardly: (b) r 2H ) = l-s for k k (c) Equation (6.1) is satisfied for k=O (d) rl(sk) = c + r l (Sk+2)f ,1(0) + r l (Sk_1)f ,1(1), for 1 sk sk (e) r 2 (E ) k = k = -N, -N-l ~ k < N c + r 2 (sk_2)f 2(0) + r 2 (sk+l)f 2(1), for -N < k Ek' Ek' ~ -1. In fact, because of the symmetry of the framework of the problem and the fact that rl(E ) k = r 2 (E_ k ), only (a), (c), and (d) need be verified. The proof is completed by verifying that the function in functional equation for Bayes risk of Corollary 2.2.1. the proof for ~ > ~ is again established by symmetry. ~) satisfies the This verified for S ~~; I I I I I I I ••I I .. I. I I I I I I I 63 The "optimum boundary" condition, by (d), is POC!k)~}rlC!k) This is satisfied, since r l if k{:r:: ::1. is a concave function of ~ and since, by (a), It remains to show that r satisfies the "optimum experiment" condition, viz., 1 (f) if 2 $ k < N and ~ = ~k' then rl(~) $ (g) rl(~l) $ rl(~2)ft ~ rl(~2 x)f~ x-o ' , ~k' then 2(x) +c and ~l' 2(1) + r2(~_1)ft ~l' 2(0). Condition (f) is satisfied, since if 2 $ k < N and for the (~1'~2) pair of the example. ~ = In fact (9) holds for k=l. To verify (g), note that if k> 0, then rl(~k) > rl(~_k)' i.e., r2(~_1) • rl(~l) > rl(~_l)' I Hence, by (9) with I I which is less than the right-hand side in (g), to complete the proof. Remark. ~=~l' The key to the preceding result is the assumed form of the Bayes risk in equation (8). I I I I I· I for assuming this form. It may therefore be of interest to indicate the motivation The key is a comparison with the form in (6). In fact, I arrived at (8) by beginning with (6), putting f (v)=O on N except at the real e numbers v-O and 1, and fitting the two end conditions Ikl = N,Ntl. e re(~k) = po(~k) for From this form, the value of N for which equation (7.1) holds was determined, as in the statement of the theorem. An approximate comparison of the average risk of a Bayes rule 5* with a c Chernoff rule 5~ is also available through equations (6). Again in this case I 64 the approximation is of the no-overshoot type, in the sense that for a d r in ~(c2,cl)' the posterior distribution at termination is assumed to be exactly c With this approximation, attention can be confined to the real subset l or c . 2 (O,l} of N , as defined following equations (6), to obtain e if ~ poH) .... r(~) - c log (1:1) (....l ~ 6.r c log (1:1) (...!. ~ ~ 1:1) ~ + ~fl(l) - 1:1) + H 4r 2 (1) t [c 2 ,c ] l + O-Of l (0) + (1-~)f2(0) if ~k as the no-overshoot approximation to the risk of the d r The function f l i f c $. ~ $. ~k ' 2 O~k in < ~ $. c (0) , l ~(c2,cl)' is determined by Q.l) and the no-overshoot boundary condition ~(c2) - p o (c 2 ); similarly f 2 is evaluated using (7.2) and ~(cl) - po(c ). l The corresponding no-overshoot approximation to is given by (~O)with *) r(~,o in ~(l-cl,cl) k-O, c 2-l-c , and the same method of determining f and f . l 2 l For the special values of c defined in Theorem 5.3, the exact risk of this ••I I I I I I I procedure is available through (8). Two particular s r are of interest with the x~ e r. First is the type which Chernoff proposed and which, since it does not depend on the prior distribution, does not require that prior knowledge be specified. This rule is to stop after n observations if n is the least integer m for which outside some interval (-k,k). second boundary considered is L(~2m) falls I refer to this type of s r as L-symmetric. ~-symmetric The in the sense that it stops after n falls outside some observations if n is the least integer m for which = ~z -2m interval (-k,k). Formulae for no-overshoot risk approximations are given in Table VII. remark concerning these formulae is of interest. A The (approximately) best choice of boundary-determining value N is that which minimizes the expressions in Table VII as functions of N. For small c and large N, the equation is approximately equivalent to c • ~e -tN . dNo r(~) = 0 (Note that as c decreases it is reasonable that the ASN of the best d r, which increases as N increases, should I I I I I I I ••I -----------------e Table VII: Computing Formulae for No-overshoot Risks When the Prior Distribution is ~k E [c 2 ,c 1 ] Risk (c ,c ) 2 1 d r ck 4(1+e\k) * \k (...l _ L-) + 4 ~ 6.r ck 1 1 e\k -.=:.,,-- ( - - - - ) 4(1+e b B~ (~k+N' ~k-N) k (L-synunetric) ) ~ 6.r "--- r' "--- --or + 1 + e\N (~N'~-N) B 1 Nc e\N + - ( - --) 1 + 1 + e\N ..1.) Ar ~ Ar c + 6.r r"---' 1'\1+1'\2 e \N -e\k . 1) -TJ fN 1£ 0 <- k <- N 121+e 1 1 TJ +TJ 2 e\(N+k)_l 1 (- - -) - .:::.---:----: 4(1+e\k) 6.r ~ TJ1-TJ2 if -N :::;. k < 0 e\N+1 \N (l_e\N){k(e \(N+k) 1+e\k + N+k c(e _ 4 ~ + (..1. _ ...l) 1 Nc e\N + - ( - --) 4 ~ 6.r \k _c_ [(...l _ L-)k + (...l _ 4(1+e\k) ~ c 4 (l+e\k) -1.) Ar \k TJ e\k + TJ + e ) _ 2 1}] W e~N - 1 (TJ -TJ )(e +1) 1 2 + W \k + N-k c(L- _ L-) 4 ~ Ar (1 + e\N) (1 + e\k) C B ~k nN'~-N) c \k 4(1+e) [(~ \k _ e,,---)k _ --or 1+ + e _ --or r' H -synunetric) (~ ~) r' (e\N - e\k) {~ e -1 (e \(N+k) l)(TJ e\k + TJ ) 2 1 )] ~N e\k(TJ[TJ2)(e + -1) \N 1 NC (L- _ "---) 4 ~ \N --or + 1 CI\ U1 I 66 increase.) With N-N(c) so specified, it is easily verified that, for each no- overshoot approximation, lim _~r~(E....) _ ceo -c log c so that, asymptotically at least, the approximate solutions are "well-behaved". For c ... 0125 and c ... 000915, the risk of the Bayes d r is computed using Theorem 5.3. For each case, the no-overshoot approximations to the risk of the "best" L-symmetric and s-symmetric Chernoff-type d r, are shown in Figures 6 and 7. The Chernoff d r risks were computed for several values of N in the neighborhood of -log c. The "best" Chernoff rule is that which gives a relative minimum risk in the neighborhood of -log c. In each case, these best boundaries are the same as the boundaries for the corresponding Bayes rule. case, for Also in each S • \, change from L-symmetric s r to s-symmetric s r reduces risk c appreciably, whereas the improvement of the Bayes /) * d rover the /)s d r, both with E-symmetric boundaries, is at best slight. imate _. I I I I I I I Note, however, with this approx- evaluation as with an exact evaluation (since /) * is a Bayes d r), I I I I I I I -. I I I. I I I I I I I Ie I I I I I I I •• I 67 RISK .4 BEST L- SYMMETRIC CHERNOFF RULE (.) e- BEST SYMMETRIC CHERNOFF RULE (CD) I 68 BEST L- SYMMETRIC CHERNOFF RULE (.) RISK '''~-BEST !-SYMMETRIC CHERNOFF RULE (CD) BAYES· X" RULE (T) .oe _. I I I I I I I _I , I I I I I I I -. I I 69 II I I I I I I Ie I I I I I I I II BIBLIOGRAPHY [1] Abramson, Lee R. (196~). "Sequential design of experiments with two random variables." Thesis at Columbia University, New York. [2] Blackwell, D. and M. A. Girshick (1954). Decisions. New York: Wiley. [3] Chernoff, H. (1959). "Sequential design of experiments." Statistics. 30, 755-70. [4 ] Feldman, D. (1962). Theory of ~ and Statistical Annals Mathematical "Contributions to the 'two-armed bandit' problem." Annals Mathematical Statistics., 11, 847-56. [5 ] Haggstrom, G. (1964). "Optimal stopping and experimental design." at the University of Illinois, Urbana. [6 ] Halmos, P. R. (1950). Measure Theory. [7 ] Hotelling, H. (1963). "Different meanings of experimental design." d'Experiences. pages 39-49. [8 ] Paris: Johnson, N. L. (1959). Princeton: Thesis van Nostrand. Le lli!! Centre International de la Recherche Scientifique, "A proof of Wald's theorem on cumulative sums." Annals Mathematical Statistics, 30, 1245-7. [9 ] Kiefer, J. and J. Sacks (1963). and design." Annals Mathematical Statistics, ,1!1, 705-50. [10] Kolmogorov, A. N. (1933). Berlin: "Asymptotically optimum sequential inference Grundbegriffe der Wahrscheinlichkeitsrechnung. Springer. [11] Paulson, E. (1952). "On the comparison of several experimental categories with a control." Annals Mathematical Statistics, 11, 239-46. [12] Paulson, E. (1962). "A sequential procedure for comparing several experimental categories with a standard or control." Annals Mathematical Statistics, 11, 438-43 • [13] Siryaev, A. N. (1964). "On the theory of decision functions and control of an observational process with incomplete data." Transactions Third Prague Conference Information Theory, Statistical Decision Functions, Random Processes. New York: Academic Press, pages 657-81. (Russian) I 70 [14] Stein, C. (1946). Statistics, [15] Wald, A. (1948). "A note on clUllulative SlUllS." !l, 498-9. Sequential Analysis. [16] Wald, A. and J. Wolfowitz (1948). probability ratio test." [17] Whittle, P. (1964). Biometrika, 11, Annals Mathematical New York: Wiley. "Optimum character of the sequential Annals Mathematical Statistics, ~, "Some general results in sequential analysis." 123-41. 326-39. --I I I I I I I _I I I I I I I I -. I I 71 I. I I I I I I I INDEX OF NOTATION This is a list of notations and abbreviations each of which appears in several places in the report. A few of these are defined or clarified here. For each of the others, the page where it is defined is cited. Defini tions. ~ - generic notation for prior or posterior distribution. Where it is important to distinguish between prior and posterior distributions in Chapters 4 and S, the former are denoted e9 Pr 9 ~ and the latter ~'. expectation when 9 is the state of nature - probability when 9 is the state of nature B probability - probabi 11 ty defined by the d r 01 i.e., by the measures x, "', 4> S - generic notation for a set Notations. I I I I I I I I· I (The number following the symbol is the page where the notation is defined). a,A -S r(9, B) -7 t::. d r -1 rO,o) _7 t::. -8 e r -1 S _S 4r -49 P n -8 e,E,e -S s-d _1 l}. - SO Pl -24 f -6 s r -6 t::.(~l'~2) -37 P2 - 26 -10 t d -S Tlp -S9 P3 - 27 ;1:(9,9' ,e} -18 t d r -6 9,8 -S 4>,4> L(9,a) -S ~ -5 1-l '" 1-l9 ,x -6 X,X z -m -5 f 9,e g,e L(~2n} N ... N( ns-d ns-nd -40 o,~) -13 z (~ -m -5 -24 B* -59 -23 0* -3 c -8 n 00 9 -6 U * a' C Ek e -16 -S8 -60 -7 1l n -7 ~z -2n -10 -2n -8 P ~2n -l "', "'z ~2n -7 -7 ...X * (~l' ~2) -37 tt -60, 51 XC g -60
© Copyright 2024 Paperzz