A simple analysis of the TV game “WHO WANTS TO BE A

A simple analysis of the TV game
“WHO WANTS TO BE A
MILLIONAIRE?”r
Justo Puerto∗
Federico Perea
MaMaEuSch†
Management Mathematics for European Schools
94342 - CP - 1 - 2001 - DE - COMENIUS - C21
∗
University of Seville
This project has been carried out with the partial support of the European Union in the framework of
the Sokrates programme. The content does not necessarily reflect the position of the European Union, nor
does it involve any responsibility on the part of the European Union.
†
0
1
Introduction
This article is about the popular TV game show “Who Wants To Be A Millionaire?”r .
When this paper was written there were 45 versions of ´´Who Wants To Be A Millionaire?”r ,
presented in 71 countries. In more than 100 countries the licence has been bought by TV
stations and the show will be broadcasted sooner or later.
“Who Wants To Be A Millionaire?”r debuted in the United Kingdom in September of
1998 and was very successful there. Afterwards, it spread all over the world, coming to Spain
in the summer of 2000. Here, it was broadcasted by the TV station Telecinco under the title
“¿Quiere ser millonario?”r .
The rules of the game are similar in all countries. In this article, we will regard only the
Spanish version.
One candidate is chosen out of a pool of ten and has the chance of winning the top prize of
300000 Euros. In order to achieve this, she must answer 15 multiple-choice-questions correctly
in a row. The contestant may quit at any time by keeping her earnings. In each step, she is
shown the question and four possible answers before deciding whether to play or not. Once
she has decided to stay in the game the next answer has to be correct to continue playing.
Each question has a certain monetary value, given here in Euros. The money that a
candidate can win if she answers the question correctly is given in table 1:
There are three stages (“guarantee points”) where the money is banked and cannot be
lost even if the candidate offers an incorrect answer to one of the next questions: 1800, 18000
and 300000 Euros.
There is no time limit for answering a question. If time runs out in a particular day, the
next programme continues with that player’s game.
At any point, the contestant may use one or more of her three “lifelines”. These are:
• 50:50 option: the computer eliminates two possible answers leaving one wrong answer
and the correct one.
• Phone a friend: the contestant may discuss the question with a friend or relative on the
phone for 30 seconds.
• Ask the audience: the audience has the option of choosing the answer they consider
correct by pressing the corresponding button on their keypads. The result of this poll
will be listed in percentages.
In the sequel we will refer to those lifelines as
1
question index monetary value
1
150
2
300
3
450
4
900
5
1800
6
2100
7
2700
8
3600
9
4500
10
9000
11
18000
12
36000
13
72000
14
144000
15
300000
Table 1: Immediate rewards
lifeline 1 for the “50:50 option”,
lifeline 2 for “Phone a friend”,
lifeline 3 for “Ask the audience”.
Each lifeline may be used only once during a contestant’s entire game.
The primary aim of this paper is to show how a “difficult” real decision making problem can
be easily modelled and solved by basic Operations Research tools, in our case by Discrete event
Dynamical Programming. In this regard, the paper is quite simple from the mathematical
point of view. Nevertheless, the modelling phase is not so straightforward and, moreover, this
approach can be used as motivating case analysis when presenting Dynamical Programming
in classroom. This aim is achieved performing different phases:
1. modelling,
2. mathematical formulation,
3. simulation of the actual process.
2
In the modelling phase we identify the essential building blocks that describe the problem
and link them to elements of mathematical models. In the formulation phase we propose a
description of the game as a discrete-time Markov decision process that is solved by discrete
event dynamic programming. Two models are presented that guide the players to find optimal
strategies maximizing their expected reward, which will be called maximum expected strategy,
and optimal strategies to maximize the probability of reaching a given question, which will
be called maximum probability strategy.
In doing that we establish two mathematical models of the game and find optimal strategies of a candidate participating in the TV-game. This is achieved through a mathematical
description of the game as a discrete-time Markov decision process that is solved by discrete
event dynamic programming.
The rest of the paper is organized as follows: the second section is devoted to show the general mathematical model (states, feasible actions, rewards, transition function, probabilities
of answering correctly and their estimations). We present in the third section the description
of the first model. In that model we want to maximize the expected reward. Also in this
section we show the case where we want to maximize the probability of reaching and answering correctly a given question, starting in any departing state. After that, we present some
concluding remark based on simulation of how to play in a dynamic way.
2
The general model
The actual game requires that the contestant makes decisions each time a question is
answered correctly. The planning horizon is finite, we have N = 16 stages, where the
16th stage stands for the situation after answering correctly question number 15. To make a
decision, the candidate has to know the index of the question she faces and the lifelines she
has already used. The history of the game is summarized in this information. We define S
as the set of state vectors s = (k, l1 , l2 , l3 ), where k is the index of the current question and
(
1 if lifeline i may be used,
li =
0 if lifeline i was already used in an earlier question.
At any state s ∈ S let A(s) denote the set of feasible actions in this state. If we suppose
we are in state s = (k, l1 , l2 , l3 ), A(s) depends on the question index and the lifelines left.
If k = 16, the game is over and there are no feasible actions.
If k ≤ 15, the candidate has several possibilities:
• Answer the question without using lifelines.
3
r0
r1
r2
r3
r4
r5
r6
r7
r8
r9
r10
r11
r12
r13
r14
r15
r0∗
r1∗
r2∗
r3∗
r4∗
r5∗
r6∗
r7∗
r8∗
r9∗
∗
r10
∗
r11
∗
r12
∗
r13
∗
r14
∗
r15
0
150
300
450
900
1800
2100
2700
3600
4500
9000
18000
36000
72000
144000
300000
0
0
0
0
0
1800
1800
1800
1800
1800
9000
9000
9000
9000
9000
300000
Table 2: Immediate versus ensured rewards
• Answer the question employing one or more lifelines, if any is left. In this case, the
candidate must also specify the lifelines she is going to use.
• Stop and quit the game.
If the player decides not to answer, the immediate reward is the monetary value of the last
question answered. If the candidate decides to answer, the immediate reward is a random
variable and depends on the probability of answering correctly. If the candidate fails, the
immediate reward is the last guarantee point reached before failing. If the candidate decides
to answer and chooses the correct alternative, there is no immediate reward. The candidate
goes on to the next question, and the reward is the expected (final) reward.
Denote rk the immediate reward if the candidate decides to quit the game after answering
correctly question k, i.e., if the candidate stops in a state s = (k + 1, l1 , l2 , l3 ), and denote
rk∗ the immediate reward if the candidate fails in a state s = (k + 1, l1 , l2 , l3 ). See table 2.
After a decision is made, the process evolves to a new state.
4
• If the candidate decides to stop or if she fails in a question, the game is over.
• If she decides to play and chooses the correct answer, there is a transition to another
state t(s, a) = (k 0 , l10 , l20 , l30 ) ∈ S, where the question index k 0 is equal to k + 1 and the
lifeline indicators li0 are:
(
li − 1 if the candidate uses lifeline i in this question,
li0 =
li
otherwise.
Answering correctly depends on certain probabilities for each question, being the same for
all candidates. We further assume that the probabilities can be influenced by using lifelines,
which we suppose to be helpful (i.e. to increase the probability of answering correctly).
Denote pas the probability of answering correctly if in state s ∈ S action a ∈ A(s) is chosen.
Our analysis takes into account the possible skill of the participants. For that, we will
divide the participants in four groups, namely A, B, C, D. Belonging to these groups means
that the “a priori” probability pas is modified according to a skill factor associated with
each group. Mathematically speaking this is reflected in a multiplicative factor hG , G ∈
{A, B, C, D}, that modifies the probability in the following way:
hG paS ,
G ∈ {A, B, C, D},
where hA = 1, hB = 0.9, hC = 0.8, hD = 0.7. This means that the lower the skill the
participant has, the smaller the probabilities of answering correctly are.
One of the cornerstones in the resolution of the actual problem is to get a good estimation
of the probabilities in the decision process.
For a realistic estimation, one would need detailed data: for each question and for each
possible combination of lifelines, there would have to be a certain number of candidates who
answered correctly and failed, and this number would have to be high enough to estimate the
probabilities. As mentioned above, the actual data are only available for approximately 40
games broadcasted on Spanish TV, and, of course, for most combinations of lifelines there are
no observations, making not possible to give an estimation of the probabilities. Nevertheless,
we had enough information to be able to estimate the probabilities of answering correctly
without using any lifelines and using one single lifeline. Therefore, in order to solve the
problem we make further assumptions.
Let p∗k denote the probability of answering correctly without using any lifeline. We assume
that there exists a multiplicative relationship between the probability of failing in a given state
5
using lifeline i and the probability of failure without lifelines. This relation is such that the
probability of failing decreases by a fix factor ci , 0 < ci < 1, i = 1, 2, 3, or in other words:
pik = 1 − (1 − p∗k )cik ,
(1)
where pik is the probability of answering correctly the question number k using the ith lifeline
(both pk and pik are known, for all k, i). We assume further that the combination of several
lifelines modifies the original probability (1 − p∗k ) in a multiplicative way, multiplying the
different ‘c’ constants.
This simplification allows us to give a heuristic expression of the probabilities, that can be
justified because we did not have enough data to give a valid estimation for each combination
of lifelines. Under this assumption, we can use the information that we have about the
candidates to estimate the probabilities of answering correctly with any feasible combination
of lifelines.
In the sequel we estimate the probabilities of answering correctly without using any lifelines
and the constants cik from the available data.
For every question index k, we consider the candidates who did not use any lifelines and
those who used only one lifeline. Then, for each of these groups of candidates, respectively, we
consider the number of candidates who answered correctly this question and the number of
candidates who failed. These probabilities are estimated by the relative frequencies observed
in the data, and they are shown in table 3.
Let p∗k denote the probability of answering correctly the k th question without using lifelines,
p1k the probability of answering correctly using lifeline 1 (50:50 option), p2k the probability of
answering correctly using lifeline 2 (phone a friend) and p3k the probability of answering
correctly using lifeline 3 (ask the audience).
In table 3 get the probabilities of answering correctly (all the probabilities in %)1 .
We use our model in the equation (1) to estimate the values of the ‘c‘ constants. Thus,
for each question index k the factor cik that modifies the probability when lifeline i is used is
given by:
cik
1
1 − pik
=
1 − p∗k
original value 100% replaced by 99%
6
question index k
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
p∗k
97
95
92
86
80
79
76
63
51
43
39
38
40
37
36
p1k
99†
99†
99†
93
91
99†
87
70
67
58
57
54
54
50
52
p2k
99†
99†
99†
99†
98
99†
90
78
70
66
68
64
60
62
60
p3k
99†
99†
99†
95
93
99†
88
69
65
52
50
49
47
48
45
Table 3: Estimated probabilities of correct answers
3
Mathematical formulation
In this section we present two different models. The first model is thought to maximize the
expected reward and the second to maximize the probability of reaching a fixed goal. Both of
them, apart from giving us expected reward and maximum probability, also give us optimal
strategies to achieve their respective goals.
3.1
Model 1: expected reward
Let pas denote the probability of answering correctly if in state s ∈ S action a ∈ A(s) is
chosen. Suppose that the probabilities pas depend only on the question index and on the used
lifelines.
Let f (s) be the maximum expected reward that can be obtained starting at the state s.
We can evaluate f (s) in the following way. The maximum expected reward from s will be on
the maximum among the expected rewards that can be obtained according with the different
courses of actions a ∈ A(s). At that point, we can either quit the game, thus ensuring rk−1 ,
or go for the next question (assume indexed by k). In the latter case, if we choose an action
a ∈ A(s) then we answer correctly with probability pas and fail with probability (1 − pas ). The
7
k
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
c1k
0.3333
0.2
0.125
0.5
0.45
0.0476
0.5416
0.8108
0.6734
0.7368
0.7049
0.7419
0.7666
0.7936
0.75
c2k
0.3333
0.2
0.125
0.0714
0.1
0.0476
0.4166
0.5945
0.6122
0.5964
0.5245
0.5806
0.6666
0.6031
0.625
c3k
0.3333
0.2
0.125
0.3571
0.35
0.0476
0.5
0.8378
0.7142
0.8421
0.8196
0.8225
0.8833
0.8253
0.8593
Table 4: Correction factors
∗
reward when failing is given by the prior reward ensured to the question k, i.e. rk−1
. On the
other hand, answering correctly question k produces a transition to the next question with
the remaining lifelines. Denote by t(s, a) the transition function that gives the new state when
action a is chosen in state s. Then, from that point on the expected reward is f (t(s, a)). In
summary, the expected reward under action ‘a’ is:
∗
pas f (t(s, a)) + (1 − pas )rk−1
.
Hence,
∗
}.
f (s) = max {rk−1 , pas f (t(s, a)) + (1 − pas )rk−1
a∈A(s)
In order to get the maximum expected reward we have to evaluate f (departing state).
If the candidate starts in question number 1 with the three lifelines we have to compute
f (1, 1, 1, 1). The values of f can be computed recursively by backward induction once we
know the value of f at any feasible state of the terminal stage. These values are easily
computed and they are shown in table 5.
8
State
f (State)
15,1,1,1 224976.5
15,0,0,1
144000
15,0,1,0
183600
15,0,1,1 199968.75
15,1,1,0
212700
15,1,0,1 179962.5
15,1,0,0
160320
15,0,0,0
144000
Table 5: Departing state probabilities.
Therefore, using backward induction starting with the data in the table above we obtain
f (1, 1, 1, 1) and find optimal strategies. In this process we use the estimated probabilities and
constants obtained in Section 2. All computations were performed with a MAPLE computer
program.
The solution obtained by the program is f (1, 1, 1, 1) = 2490.89 and an optimal strategy is
shown in table 6.
3.2
Model 2: reaching a question
In this section we address a different solution approach to our problem. We have seen in
Section 3.1 the optimal strategy to be used if we want to maximize the expected reward
and how much we win following it. Now we want to find the optimal strategy that we have
to follow in order to maximize the probability of reaching and answering correctly a given
question. Moreover, we also give the probability of doing that if we follow an optimal strategy.
Let us define the new problem. Recall that a state s is defined as a four-dimensional
vector, as before:
s = (k, l1 , l2 , l3 ).
Let k, e = 1, 2, · · · , 15, be a fix number. Our goal is to answer correctly the question
number k. We denote by f (s) the maximum probability of getting and answering correctly
the question number k, starting in state s.
We evaluate f (s) in the following way. The maximum probability of reaching and answering correctly the question number k starting in state s is the maximum among the possible
actions a ∈ A(s) of the probability of answering correctly the current question times the
9
Question index
1
2
3
4
5
6
7
8
9
10
11
12
13
expected reward
Strategy
No lifelines
No lifelines
No lifelines
No lifelines
Audience
No lifelines
No lifelines
No lifelines
50:50 Option
Phone
No lifelines
No lifelines
Stop
2490.89
Table 6: Solution of Model 1.
maximum probability of getting our goal from the state t(a, s), a ∈ A(s), where t(a, s) is the
transition state after choosing the action a in the state s if answering correctly.
Then, we have:
f (k, l1 , l2 , l3 ) =
max
{pk,g1 ,g2 ,g3 · f (k + 1, l1 − g1 , l2 − g2 , l3 − g3 )},
0 ≤ gi ≤ li
gi ∈ Z, ∀i
where pk,g1 ,g2 ,g3 is the probability of answering correctly the k th question using some lifelines.
Indeed, ith lifeline is used if gi = 1, i = 1, 2, 3.
The function f is a recursive functional, therefore to obtain its evaluation by backward
induction we need its value at all the states in the terminal stage. Notice that the goal in this
formulation is to reach the stage k. Thus, the probability of having reached the stage k if we
are already at k + 1 is clearly 1. Hence, we have
f (k + 1, l1 , l2 , l3 ) = 1 ∀li ∈ {0, 1}, i = 1, 2, 3.
Once we have the evaluation of the function at the terminal stage, the solution of this model
is the evaluation of f (departing state). If we start from the first question and we have all the
lifelines, the departing state is (1,1,1,1). But if we start at the third question and we only
10
Question index
Goal: 5
1
No lifelines
2
No lifelines
3
50:50
4
Audience
5
Phone
6
7
8
9
10
11
12
13
14
15
Probability
0.85
Goal: 10
No lifelines
No lifelines
No lifelines
No lifelines
No lifelines
Audience
No lifelines
No lifelines
50:50
Phone
Goal: 13
No lifelines
No lifelines
No lifelines
No lifelines
No lifelines
No lifelines
No lifelines
No lifelines
Audience
No lifelines
Phone
50:50
No lifelines
0.12
0.01
Goal:15
No lifelines
No lifelines
No lifelines
No lifelines
No lifelines
No lifelines
No lifelines
No lifelines
No lifelines
No lifelines
Phone
No lifelines
No lifelines
Audience
50:50
0.001
Table 7: Optimal strategies for Model 2.
have the 50:50 and the audience lifelines, the departing state would be (3,1,0,1). Anyway, the
algorithm we propose solves that problem starting in any departing state and having as goal
any level of the game.
We use the estimated probabilities and constants ci , calculated as before, in a computer
program in MAPLE to evaluate the function f and to find optimal strategies.
In this model we do not have a unique solution but 15, because we may have fifteen
possible goals: the fifteen questions we have in the game. For the sake of brevity we only
show the solutions obtained if we start in state (1,1,1,1) and we want to reach and answer
correctly the questions number 5, 10, 13 or 15. The optimal strategies and the probabilities
of reaching and answering correctly the goals mentioned above are shown in table 7.
The last row of the table above represents the probabilities of getting the proposed goal.
11
4
Further analysis of the game
We have solved the problem in a static way, because of all the probabilities are determined “a
priori”, that is, without the actual knowledge of each question. Actually the game is played
changing the probabilities of answering correctly each time that the player faces the current
question. For example, if in the fourth question, once she knows the actual question, she can
estimate her probability of answering correctly that question, based on her own knowledge on
the question. Then, changing that probability in our scheme, she evaluates again the function
starting in the current state and keeping the others probabilities unchanged.
This analysis means that the player modifies at each stage k the probability p∗k of answering
correctly according with her own knowledge of the subject. This would be a realistic dynamic
way to play the game. This feature has been incorporated in our computer code, so that, at
each stage the player can change the probability of answering correctly the current question.
Notice that this argument does not modify our recursive analysis of the problem. It only
means that we allow to change the probability p∗k at each step of the analysis.
4.1
Simulation
In order to illustrate our analysis of this game we perform a simulation of the process to check
the behavior of the winning strategies proposed in our models. As we mentioned in Section
2 we classify the participants in four groups as follows:
• Players in group A have the original probabilities described previously.
• Probabilities of answering correctly in group B are the probabilities for group A multiplied by 0.9.
• Probabilities of answering correctly in group C are the probabilities for group A multiplied by 0.8.
• Probabilities of answering correctly in group D are the probabilities for group A multiplied by 0.7.
In the following, we present two tables (table 8) with the strategy that one participant of
each group (A, B, C and D) should take in order to maximize her expected reward (Model 1)
and the strategy to maximize the probability of winning, at least, that maximum expected
reward (Model 2). For example, the last row in the column of the participant “A” in Model
1 shows us the expected reward that she would get following the strategy described in that
column, and the last row in Model 2 is the probability of winning at least that maximum
12
expected reward. To win at least 2490.9 euros we have to answer correctly question number
7. The other cases are analogous.
The last row in both tables shows the maximum expected reward, if Model 1, or the
probability of being successful if we follow the strategy described on Model 2.
To finish this section we are going to show a simulation of Model 1 of the game played in
its dynamic version. This is, we will assume that on each actual question the probability of
answering correctly are modified once the concrete question is known. Let us assume that the
contestant is now facing the question k th . She is deciding whether answering the question and
how, or not depending on the degree of difficulty of the actual question. The model assumes
that the probabilities of answering correctly the next questions, that is from k + 1 on, are the
original ones estimated before.
In table 9 the strategies using the lifeline of the 50:50, Phone and Audience are denoted by
50, P and A respectively. In order to simplify the simulation we assume that the probabilities
of answering correctly can be:
• 1 if the contestant knows the right answer.
• 0.5 if the contestant doubts between two answers.
• 0.33 if she simply is sure that one of the answers is incorrect.
• 0.25 if she does not know anything about the answer and the four of them can be
possible for her.
The reader may notice that any kind of “a priori” probabilistic information, based on the
knowledge of the actual player, can be incorporated into the model. This incorporation is
done by computing “posterior” probabilities using Bayes rule. It is clear that the strategies
change depending on the probabilities of the question that the contestant is facing, which have
been chosen at random using different probability functions for each question number. The
first number in each cell is the actual probability of answering correctly the corresponding
question. As can be seen, depending on the simulated probability, the strategies can vary
from stopping at the fifth until the twelfth question.
13
Skill A
Question
Model 1 Model 2
1
Without Without
2
Without Without
3
Without Without
4
Without Without
5
Without
Phone
6
Audience
50:50
7
Without Audience
8
Without
Stop
9
50:50
10
Phone
11
Without
12
Without
13
Stop
14
15
E.R / Prob
2490.9
0.622
Skill B
Model 1 Model 2
Without Without
Without Without
Without
50:50
Audience Audience
Phone
Phone
Without
Stop
Without
Without
Without
50:50
Without
Without
Stop
Skill C
Model 1 Model 2
Without Without
Without Audience
50:50
50:50
Audience
Phone
Phone
Stop
Without
Without
Without
Stop
Skill D
Model 1 Model 2
Without Without
Without Without
50:50
50:50
Audience Audience
Phone
Phone
Without
Stop
Without
Stop
Question
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
E.R / Prob
747.5
0.482
1289.4
421.1
0.557
0.475
Table 8: Optimal solutions depending on the player’s skill
14
Question
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
P1
1/NL
0.5/50
1/NL
1/NL
0.5/P
0.5/A
0.5/NL
1/NL
0.33/Stop
P2
1/NL
0.5/P
0.33/A
0.5/50
0.25/Stop
P3
0.5/50-A
1/NL
0.5/P
0.5/NL
0.5/NL
0.33/NL
1/NL
0.5/NL
0.33/Stop
P4
0.5/50-A
0.33/ P
1/NL
1/NL
0.33/NL
0.5/NL
0.5/NL
0.5/NL
0.33/Stop
P5
0.5/50-A
1/NL
1/NL
0.5/P
0.5/NL
1/NL
0.33/NL
1/NL
0.25/Stop
P6
1/NL
1/NL
0.33/50
1/NL
1/NL
0.5/A
1/NL
0.5/NL
1/NL
0.25/P
0.25/NL
0.25/Stop
Table 9: Simulation
References
[1] Chlond M.J. (2001), ”The Travelling Space Telescope Problem,” INFORMS Transactions
on Education 2:1 (58-60).
[2] Cochran J.J. (2001), ”Who Wants To Be A Millionairer : The Classroom Edition,”
INFORMS Transactions on Education 1:3 (112-116).
[3] Rump C.M. (2001), ”Who Wants to See a $Million Error?. A Neglected Educational
Resource,” INFORMS Transactions on Education 1:3 (102-111).
[4] Heyman D. and Sobel M. (1984), Stochastic Models in Operations Research. Vol 2,
McGraw-Hill, New York.
[5] Sniedovich M. (2003), ”A Neglected Educational Resource,” INFORMS Transactions on
Education 2:3, 86-95.
[6] Sniedovich M. (2002), ”Towers of Hanoi,” INFORMS Transactions on Education 3:1
(34-51).
15
[7] Sniedovich M. (2000), ”Counterfeit Coin Problem” INFORMS Transactions on Education 3:2 (32-41).
[8] Tijms H.C. (1986), Stochastic modeling and analysis. A computational approach. WILEY,
New York.
16

Download Report

A simple analysis of the TV game “WHO WANTS TO BE A

Paperzz.com

Your Paperzz