•
••
DEPARTMENT-WIDE DOCTORAL WRITTEN EXAMINATIONS
of the
DEPARTMENT OF BIOSTATISTICS
School of Public Health
University of North Carolina at Chapel Hill
VOLUME I:
Closed-Book Parts
edited by Dana Quade
•
Institute of Statistics Mimeo Series
Third Edition: May 1988
.1343-~
•
DEPARTMENT-WIDE DOCTORAL WRITTEN EXAMINATIONS
of the
DEPARTMENT OF BIOSTATISTICS
School of'Public Health
University of North Carolina at Chapel Hill
VOLUME I
CLOSED-BOOK PARTS
assembled and edited by
".
DANA QUADE •
First Edition:
Second Edition:
Third Edition:
•
•
with MICHAEL J SYMONS
June 1981
November 1983
May 1988
(First and Second Editions)
•
TABLE OF CONTENTS
Page
.
•
•
Introduction
16 September 1967 :
Part I
16 September 1967:
Part II
21 September 1968:
Part I
21 September 1968:
Part I I
21 February 1970:
Part I
7 March 1970:
Part III
26 June 1970:
Part I
10 July 1970:
Part III
20 February 1971:
Part I
25 June 1971:
Part I
25 June 1971: Extra Questions (for one student
taking a special re-examination)
29 January 1972:
Part I
27 January 1973:
Part I
25 May 1973:
Part I
26 January 1974:
Part I
18 May 1974:
Part I
25 January 1975:
Part I
24 January 1976:
Part I
22 January 1977:
Part I
3 August 1977:
Part I
21 January 1978:
Part I
26 August 1978:
Part I
11 August 1979:
Part I
9 August 1980:
Part I
15 August 1981:
Part I
16 August 1982:
Part I
8 August 1983:
Part I
30 July 1984:
Part I
23 July 1985:
Part I
29 July 1986:
Part I
3 August 1987:
Part I
1
5
7
12
14
17
20
26
28
32
37
39
41
44
48
52
55
59
64
68
72
77
81
85
90
95
99
103
109
115
121
127
1
•
INTRODUCTION
This publication (Mimeo Series
Series
~1344)
~1343)
and its companion (Mimeo
contain the general written examinations which the
Department of Biostatistics has set for doctoral degrees:
closed-book parts here and the take-home parts in
the
~1344.
The first Department-wide doctoral-level written examinations
were given in September 1967, at the initiation of the new PhD
program in Biostatistics.
(Prior to that time the Department had
offered the PhD only in Public Health, for which it had not required
any Department-wide examination.)
The original format was:
Basic written Examination, with two Parts,
••
(A)
(I) Theory and (II)
Analysis, each including 5 closed-book questions, of which
candidates were to complete 4 in 3 hours, and (B) Advanced Written
Examination, again with two Parts,
(I) General and (II) Specialty
Areas, each one week take-home, where Part I had 5 questions and
Part II 10, and candidates were to complete 4 from each Part.
In the 1969-1970 academic year, it was decided to drop the
Advanced written EXamination and expand the Basic Written
Examination to three Parts: (I) Theory, with 5 closed-book questions
of which 4 were to be .completed in 3 hours, (II) Applications, with
5 take-home questions of which 4 were to be completed in one week,
and (III) Specialty Areas, with 6 closed-book questions of which 3
were to be completed in 3 hours.
This format was used twice, in
February/March and again in June of 1970.
•
On 3 March 1971, the Department voted to abolish Part III of
the Basic Written Examination, and to let the Literature Review for
the Dissertation satisfy the Graduate School requirement for a
2
Prel iminary Written Examination.
This action produced a format for
•
"the Basics" which has continued essentially without change until
the present time.
In 1972 the time allowed f or Part I
was expanded from 3 hours to 4.
(closed-book)
Also, the following instruction was
added to the rubrics for both Parts:
Most questions shOuld be answered in the equivalent
of less than 7 typewritten pages (300 words per
page) and under no circumstances will more than
the first 15 typewritten pages or the equivalent
be read by the grader.
This rubric was dropped from Part I in 1974.
It continues for Part
II, and indeed changed from 7 and 15 pages to 5 and 10 starting in
19781 but it has never been strictly enforced.
Originally the Basics served not only as a screening
examination for the PhD, but also as the Master's written
Examination for the MS.
one of three gradEis:
Thus, each candidate was evaluated using
pass at doctoral level, fail at doctoral level
but pass at master'slevel~or-fail. Th''fs practice continued
thrOUgh 1979, since when theMS eXamination has been given
separately.
.-.
- •.. .- -
In addit:idn,
i'
.~
-
~~
~
.
,"
-'-~t.-hoiigh
tlleBasics were originally
intended for t1l~-~plfrllgili\~,sb.;g£nnihS{i-n 1979 candidates for the DrPH
have also bee~- ~~Urfea;;'~&JY"PaS'S
The
Pl:i.rt -iI
Baslc;~::: ~fe""~i~~h()ir&t.a.hYi originally in Winter, but
~ r;·~f.
... .-
•.
.
starting 1979 -i'D.' suii:iliie~::F,-sP4iCiaiseveral
(not Part I).
examinations have been given on
occasii~i:lIi'~hin:?th~afJ'(~ere sufficient students to warrant itl
the current regUi2aU6nJYeqJir~- i:.lilsonly i f there are 5 or more
-.;
.'
students to retake a Part.
The Examinations Committee prepares and conducts all
Department-wide written examinations, and handles arrangements for
•
3
~
their grading.
question.
A team of two graders is appointed for each
where possible, all graders are members of the Department
of Biostatistics and of the Graduate Faculty, and no individual
serves on more than one team for the same examination (the two Parts
counting separately in this context).
The members of each grading
team are asked to prepare for their question an "official answer"
covering at least the key points.
They agree beforehand on the
maximum score possible for each component, the total for any
question being 25.
The papers are coded so that the graders are unaware of the
candidates' identities, and each candidate's answer is marked
independently by each of the two graders. The score awarded
,
reflects the effective proportion correctly answered of the
• •
question.
The two graders then meet together and attempt to clear
up any major discrepancies (more than 5 points, say)
between.... their
'. scores.
Their joint report is to include comments
on serious
. '.
shortcomings in any candidate's answer.
.. ... _--.
~._
On the ba si s of a candida te :-.~ c·t.-?,:al
~'
..
.
~~-
.~s~~e. C?~
a paper, the
Examinations Committee recomme~!is.i7"~?;:>~Re.~~5;glGr:?w~ther the
candidate is to be passed, f~ile~! ~fs~8~ege~~~~ti?~11Y. In the
last case, the condition is
spe.ci~J,~l3,.,.toge.~er~i~ a
- ~·:o.._._.r~ .uS', ... t· S" ... F ~. ~_
time limit.
All final decisions are by vote ~f..s.t,!l£:.:Jac~J~:CJt:~~e hcul ty has
never specified in advance what the pa..S.li!i.ng
score will be, out of
.. _.
:::E·:'.I';" H':"_
,'to
':J.-.~
the possible 200 total points per Part,
but has set cut-points
:=-.,:..:.:...;.r.,Ll arrc:.::!:".~':
.
.'
I
_
varying from about 100 to 140, usually a little higher for Part II
•
than for Part I.
Examination papers are not identified as to
candidate until after the verdicts of PASS and FAIL have been
rendered.
4
Once the decisions have been made, advisors are free to tell
~
their students unofficially; the official notification, however, is
by letter fram the Chairman of the Examination Committee.
Actual
scores are never: released, but the "official answers." are made
public, and candidates who are not passed unconditionally are
permitted to see the graders' comments on their papers.
When
performance is not of the standard required, candida tes are
reexamined at a later date set by the Examinations Committee.
Candidates whose native language is not English are not
allowed extra time on Department-wide (not individual course)
e xamina tions.
This condi tion may be waived for individual
candidates at the discretion of the Department Chairman upon
petition by the candidate at least one week prior to the
~.
examina ti on.
NOTE. Most of what follows reproduces the examinations exactly as
they were originally set; however, minor editorial changes and
corrections have been made, particularly in order to save
space.
Thus, journal articles which were included with the
original examinations are not reproduced here, but complete
references to
th~a~~ ,¥;;'?V~ded
in all cases.
Also, in
recent years the larger datasets have been provided in main.'.
~~
... ;":- •.
,-
frame files··a:nd/or
c~,,-,'.-!
..
·on~diskettes.
Sane of these datasets
are still available to the interested reader: inquiries
may be addressed to the Editor.
~
5
•
BASIC DOCTORAL WRITTEN EXAMINATION IN BIOSTATISTICS
PART I
(9 A.M., September 16, 1967)
1.
Let X be the number of successes in n independent Bernoulli trials with
probability p of success on each trial. Under appropriate conditions, the
distribution of X is approximately Poisson. State and prove the limit
theorem pertaining to this.
2.
(a)
•
Derive the density function of the ratio of two independent standard
normal random variables:
U =~Y ,
X is N(O,l), Y is N(O,l); X and Y independent.
(b)
What is the name of this distribution?
(c)
What is P(U < O)?
(a)
State and prove the !le)'llllln-Pean~ lin ,;-",::."
(b)
To what extent is this Lemma
a' composite hypothesis?
(c)
Name and define a test criterion applicable to cases where the Lemma
can not be used.
". J. .... _..•
3.
•
applicable.~n
~
....
~
-~.
0:;
:';,'
significance tests involving
~.
6
4.
Statistician A decided to observe a certain random process for one hour; on
doing so, he recorded that 50 events occurred during that hour. Independently, Statistician B decided to observe the process until 50 events
occurred; on doing so, he recorded that one hour was required for that to
happen.
(a)
Find the maximum likelihood estimate of the mean rate of occurrence
of events, using A's data; using B's data.
(b)
Are these estimates unbiased? What are their standard errors?
they are not unbiased, show how to correct them for bias.
(c)
An· onlooker remarked that
(i)
(11)
If
'~and B have really observed the same thing, i.e., SO events
in one hour"
and
"When 2 statisticians have observed the same thing, they
should make the same estimates."
.~
Discuss these points carefully.
5.
•
Xl and ~ are independent and identically distributed random variables with
density tunction _
.
,." ..:. .,:': {::- fX~9Y;: .
f(x)
--.:,0'.1
Show that.
function.
min(~, ~)
.
D·
if x > e ,
otherwise; -
o
,
_,. .s::
~
0':,
~
<
e<~
.
~'.i_
is a'sufficient statistic for
e,
and find its density
\
•
7
BASIC DOCTORAL WRITTEN EXAMINATION IN BIOSTATISTICS
•
(1: 30 P.M., September 16, 1967)
EDITORIAL NOTE: Appended to this examination were:
1) Density function and moment-generating function for uniform, normal,
exponential, gamma, chi-squared, and student's t distributions;
2)
1.
•
Table of standard normal distribution function
z = 0(.01)2(.02)3(.2)4(.5)5 and ~-l(p) for
p = .00001(various).1(.05).9(various).99999 •
~(z)
for
In order to determine whether the addition of an anticariogenic agent to
a dentifrice makes an appreciable difference ,in flavor, it was decided to
run a duo-trio test with ten panelists.
The duo-trio test consists of presenting each judge with a known standard
(the present dentifrice) and two unknowns (the new dentifrice and the
present dentifrice). Each judge is then asked to state which of the unknown
dentifrices differs from the standard.
(a)
What is the exact probability that 8 or more panelists will be right
when all are actually guessing?
(b)
What is the exact probability that ,9 or more panelists are right
when all are actually guessing? :'-
(c)
, .,' .
What is the probability that all 10 are right when all are actually
guessing?
'.o~
• • ."., ;"J:
'! i
!.~1.
(d)
If we want'a ~ .05, how many panelists must be right to reject the
null hypothesis that all panelists are guessing?
What is the alternate hypothesis?
•
(e)
Is this a two-tail test?
Explain.
If it is desired to assure 90~ probability of detecting a true 2:1
chance of correct identification, while keeping a 5 .05, how many
judges must be used?
8
2.
A test has been conducted to compare the effectiveness of a new deodorant
product (A) with the effectiveness of a competitive product (C). Effectiveness is measured in terms of underarm odor scores at a fixed time after use
of the product. The objective is to reach one of the following conclusions:
(1)
A is good enough to justify a formal claim that A is better than C.
(2)
A is not good enough to justify such a
•
c1ai~.
Scores are defined so that they increase as effectiveness increases. The true
mean scores for products A and C are ~A and ~C respectively. In the standard
test ~ is estimated by Y and ~C is estin~ted by Y ' Conclusion (1) is
C
reache~ if (YA-Y C) is at teast 1.0 unit.
In Figure 1 the performance of a single test is presented in graphical form.
The plotted curve represents the probability of reaching conclusion (1) as
a function of the true difference ~A-~C'
Figure 1.
Probability of Reaching Conclusion (1)
~;lli:;;~f'~~U~n:~~j;~F~~~cTI;I~f;~~f~;~tl~~:;'I;}I~}i[.~·~~
~:~~~~I~t1l=JD~t
=.'f;..
·t···:f.. £ .... -_., .... ','f" ,-.'.. l, ..... -~-::t= "'->-'--1"--
=
~:
71 .... , ......
::::S: ~;~~~~4±H;~:j: :t~~3q~~;~::~~~~~..:~;\;~:
·.. [.. 1.. -[.· gl¥-:~l=~~~;;;:f;~ =~-f::;;l:~
','c". • c•.
:.:::Jc,-.uc '-'r:ij'''+·-·-T.. ·l~
(.•',
::~.
_l~~-..f.
What is the null hypothesis in this test?
(b)
What is the a-risk?
. £'::'~'.fj
•.•. ' . ,
(c)
•
L:,
(a)
. ~:".,
-l
~~\.:.;
(Give numerical value.)
I
If a difference' of·1.8un.Hs:1s considered important, what is the
~-risk?
(Give numerical value.)
(continued)
•
9
For the rel7'.a ioing questions
as follows:
•
'.Ie
assume that
~A-~C
= ,5,
Define two events
G is the event that YA-Y > 1,
C
H
is the event that YA-Y C >
0,
From Figure 1 the probability of occurrence of G is peG) = .2.
The probabilitY'of occurrence of H is known to be P(H) = .8.
Cd)
What is the conditional probability that we will reach conclusion (1)
given that Y
A is at least as large as Y
C?
(e)
What is the expected value of the random variable (YA-Y )?
C
(f)
Suppose two independent tests are run.
( i)
(li)
•
3.
What is the conditional probability that Y-Y ~ 1 in
c (1) on
the second test given that we reached conc1uslon
the basis of the first test?
What is the joint probability of reaching conclusion (1)
on the bas is of the first test and then finding Y
-Y < 1
A c
in the second test?
Suppose the following experiment was performed.
Time Period _
Production Unit
1
1
2
3
AB
Ab
_aB
-:'-ab
-4
·ab
"
2
Ab
3
aB
4
ab
aB
-'
AB
. .',
.ab::
AB
AB
Ab
Ab:
'-
a'
Note that the treatments form a 22 factorial, where
a represents the low level of 1st treatment,
A represents the high level of 1st treatment,
b represents the low level of 2nd-treatment,
-0-,'
B represents the high level of 2nd treatment.
•
(a)
Exhibit the complete breakdown of degrees of freedom in the
ANOVA for this experiment.
(b)
Comment on the interactions which may be investigated in this
experiment.
10
4.
(a)
Make a comparison of the treatments, including a test of
significance, using a method of analysis so simple that you
can complete it ~.
(b)
Describe briefly how you might attack a problem like this if
you had plenty of time and computational assistance (and if
there were more data, so that an elaborate analysis would be
worthl,hile) •
Treatment 1
Child II
Sex
1
2
3
4
5
6
7
8
9
10
M
11
M
M
M
M
M
M
M
12
13
14
15
16
17
18
19
20
21
22
•
Given below are real data from a recent study to compare certain dental
treatments. The children at a North Carolina orphanage were divided at
random into 2 groups, and those in one group were given one treatment
while those in the other group "ere given a different treatment. For each
child a determination IJas made of the mlF rate (number of decayed, missing,
or filled teeth) at the beginning of the experiment and two years later.
F
F
M
M
M
M
F
M
F
F
F
M
M
M
Initial
A e
Initial
DMF Rate
Final
DMF Rate
13
4
9
18
4
7
14
15
15
7
15
12
12
13
16
14
9
17
15
13
*
*
22
8
10
18
13
11
11
16
17
16
8
14
18
6
10
16
Trea t1lll!n t 2
'
2
10
6
9
4
10
8,
l~, :';0:
,.,
*
.",."
8
'
'
'"
,::
(
"11
12
9 :,!'''' ,"; ~ 1;. :'
12
2
4
18
15
'. _~2
3
5
8
,.: .
*4'
'.,': *.
Initial
Ae
Init ial
DMF Rate
M
M
13
7
11
F
15
10
9
13
4
19
4
5
7
Child iI
Sex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
M
F
M
Final
DMF Rate
*9
*
*
*
*
F
11
7
10
M
18
14
15
12
13
17
21
25
17
17
3
9
9
3
12
1
3
6
0
*
*
F
F
F
M
F
F
M
M
F
F
M
M
11
10
16
7
11
10
6
•
17
*
7
*
11
*
*
5
7
7
0
10
*Not available; child left orphanage before end of 2-year period.
•
0
.'
5.
It has been suggested that a linear relationship exists between the aQou~t
of ~~il ha~clcd in rt POjl office and the ~~n-hours required to handle it.
In order to test out this hypothesis the fo11rn.ing data were collected:
•
Pieces of
mail handled (X)
(in millions)
Four-week period,
fisc~l yc~r
1962
Man-hours used (Y)
(in thousands)
157
161
169
186
183
184
268
184
180
188
184
182
179
1
2
.S
4
5
6
7
8
9
.10
11
12
13
572
570
599
645
645
671
1053
655
637
667
656
640
609
Assume the following model:
where X is a fixed (independent) variate, and s is N(O, a 2 )
Calculations
EX " 2405
I:Y " 8619
E;' " 453517
Ey2 " 5892485
EXY " 1633249
t! • 185
N
t! •
tl
663
d-
8592,
•
•
assumed~model
178088
(a)
Calculate the AtlOVA table for the
figures.
.
uaing the above
(b)
Assuming all the above calculations are correct, test the
hypothesis that ~l " 0 using an.a-risk of. St.
(c)
Comment on the validity of the error term in the Analysis •
(d)
How do you explain that the best estimate of the intercept is
negative?
(e)
Is there any evidence here that the linear model is inadequate?
Justify your answer.
11
12
BASIC WRITTEN EXAl1INATION IN BIOSTATISTICS
PART I
•
(9-12 A.M., September 21, 1968)
•
1.
Define the term "sufficient statistic".
Explain the importance of sufficient
statistics in the theory of estimation:
prove your statements as far as you
can.
Prove that the most powerful test of a simple null hypothesis against
a simple alternative is based on a sufficient statistic, if such a statistic
exists.
2.
Find the probability that, in a random sample of size 3 from an exponential
distribution, the mean w1l1 exceed the median.
'·'1 '
3.
Assume that a curve of the form y ,. b
O
+ b l x + b x 2 has been fitted and that
2
the usual assumptions of normality hold.
Derive a confidence interval for
that x which produces the maximum or minimum value of y.
:
•
13
•
4.
(a)
Assume X and Yare random variables with means
~
.
variances aX2/ n and ~2/
~Y n, and covarlance
a xy / n.
x
and
~
y
(~y+O),
Derive the asymptotic
variance of X/Yo
(b)
Assume that X is a (lxp) random vector with mean vector
matrix
E:
pxp
fi(~)'
i
~
and variance
define q < P functions of the elements of X, viz.
= 1,2, ... ,q,
which have derivatives up to second order.
the asymptotic covariance matrix of the vector _F
Derive
= (f.(X)
... f (X)"J.
1 q -
For what kinds of functions is this expression exact?
S.
•
Let X be a Poisson random variable with expected value
a.
Let Y , Y , ... , be independent variables each having the same expected
l
2
value
~
an d
.
var~ance
0
2
.
Let
X
E
j=l
X> 0
Yj
Z =
0
X.f' 0
Show that the correlation coefficient between X and Z does not depend on
'h_~)
~L:!',
1.,
":;:'",.
-
,'. .
•
"
e.
14
BASIC WRITTEN EXAMINATION IN BIOSTATISTICS
•
PART II
(1:30-4:30 P.M., September 21, 1968)
1.
Suppose a population consists of four strata, with the following weights:
WI
= 10%,
101
2
= 50%,
101
3
= 25%,
101
4
= 16, s~
2
strata are approximately:
= 15%.
sl
Suppose the variances within the
2
= 4, s3 = 9,
s~
= 12.25.
A sample
of size n = 100 has been drawn from the population, with Neyman allocation
used to determine the number of units selected from each stratum.
these sample sizes?
What are
Suppose the observed means within the strata are
Yl = 10,
Y3 = 14, Y4 = 15; and the observed within-strata variances are
2
2
s2 = 5, s3
2
s4 = 12.
= 10,
Give the appropriate estimate for the
population mean; also give an estimate of the sampling variance of this
estimate.
NOTE:
3.
•
Problem #2 is on the next page.
An investigator has approached you with the following data concerning the
response of white rats to a certiin drug.
Be tells you that he has examined
10 animals at each drug levei~~and the responses given are means.
..
He has
(CiS;:' '
also calculated variances -lif,ea-ch" drug level.
<
'
-
y (mean response)
S.2
,
"3.9
"
'''"'
;:,.-
x (drug level)
1
-
r.: :i.• '
d
.,-,
2
s
2
1.40
.80
.65
3.2
8.4
3
4
3.30
7.5
5
3.00
6.8
6
2.70
Indicate what steps are needed to complete the analysis.
explain your results to this investigator?
How would you
•
•
15
•
2.
An
amateur weather forecaster made "rain" or "shine" predictions for 100 days.
During this period he was successful on 8 out of his 20 "rain" forecasts and
63 out of his 80 "shine" forecasts.
a
This information might be organized for X computation in any of the
following three ways.
In each case the numbers in parentheses are the "expected
frequencies" based on a null hypothesis which seems natural from the way the
table is set up.
(a)
Successes
Failures
Total
71
(50)
..1i
(50)
100
(b)
Prediction
~
•
Success
Failure
Rain
Shine
8(14.2)
63(56.8)
71
17(23.2)
29
12 (5.8)
100
(c)
Predicted
Observed
Rain
Shine
Rain
8(5)
17(20)
Shine
12(15)
°63(60)
25
,: '75
20
80
.,100
.
~ • 3.0
,'
..
For each case state the null hypothesis on which the expected frequencies
'.
are computed.
Which of these three analyses leads to the most meaningful
answer to the question:
•
Justify your answer.
''Has the forecaster demonstrated real ability?"
16
4.
Two judges each ranked eleven contestants in a beauty contest, with the
following results:
•
,
Contestant
Judge 1
Judge 2
A
6
4
B
4
1
C
6
7
D
1
5
E
2
2
F
6
6
G
10
9
H
3
3
I
9
11
J
11
8
K
8
10
Using three essentially different methods:
obtain a measure of the agreement.
between the judges; interpret this measure; and
~
whether the judges agree
significantly (i.e., calculate the test statistic and indicate exactly where
and how you would look up the critical value).
5.
A common practice. in medicinecis to.use.each patient as his own control
when comparing-two treatments A'and B.
treatment A follOWed by
B,~
That is:
n/2 patients receive
'andn/2 receive B followed by A.
Assume that
neither drug produces" any 'appreciable .residual effect and that the effect
of treatment can be measured by a normally distributed random variable.
Under what conditions is thus using each patient as his own control better,
equal, or worse as a design than administering each of drugs A and B to a
separate group of n patients?
•
0
17
BASIC WRITTEN EXAl-IINATION IN BlOSTATISTICS
•
PART I
(9-12 A. H., Saturday, February 21, 1970)
NOTE:
2.
Problem 01 is on the next page.
Suppose a certain population contains two strata of sizes N •· and N ..
l
2
respectively.
Suppose samples of sizes n .. and n "
l
Z
are taken from
the respective strata independently according to simple random sampling.
Suppose that the selected elements are classified according to two
binomial attributes, 'with one of them being the variable of principal
occurrence
interest such as
of automobile accidents, and the other
being a related covariable, such as age classified as under or over 35.
The data appear as follows:
~
•
Stratum 1
Stratum 2
Age
< 35
> 36
Total
Accident
n
n
112
n 11 •
No accidents
n
12l
°122
°12'
Total
nl ' l
°1'2
a.
l11
n
1"
Age
< 35
> 36
Total
Accident
n
n
212
n 2l •
No accident
n
222
n Z2 '
Total
n2 ' 1
n 2' 2
n2• •
211
22l
n
Indicate what the appropriate estimates of the incidence rate of
automobile accidents are for each stra-tum and the.·· overall population
when no information other than that\ giv.en.-above is. available.
Also
iodicate an estimate of the standard error of ,these statistics.
b.
How is the analysis in (a) changed:"i-f",itis kno-.m. that the population
is subdivided as follows?
•
Total
Stratum
1
2
Age < 35
Nl ' l
Nl ' 2
N _•
1
Age > 35
N2 ' 1
N2 ' 2
N
2"
Total
N·· 1
N
"2
N
...
18
1.
A and B make independent measurements on a certain material having
magnitude a.
•
Both A's measurement and B's measurement are subject to
random error, the former being uniformly distributed over a±2 and the
latter uniformly distributed over a±1.
Hhat is the probability that
AI sand B' s measurements will be y"ithin u of each other?
I,That is the
expected value of the magnitude of the discrepancy between the two
measurements?
(Hint:
Take
U = /x-yl where X is uniform on [-2.2]
and Y is. uniform on [-1.1].)
NOTE:
3.
Problem #2 is on the previous page.
Let F be a continuous distribution function with density function f and
unique median
n; let Xl ,x 2 "",Xn be a random sample of n observations
from F. and let Y be the sample median of the X's.
n
4.
(Assume n is odd.)
a.
Find the density function of Y , say g.
b.
Show that if f is symmetric about
c.
k
Show that if E[X ] exists for some k >
d.
Show that {Y • n=1.3.S •••• } is a consistent estimator sequence for
n
n then so is g.
a
k
then so does E[y ).
n
n
Suppose we wish to estimate a set of Poisson probabilities.
maximum likelihood estimate of
n.
•
Using the
A. we find the maximum likelihood estimates
are
i=0.1.2 ••••
a.
Show that the mean of ~i is
where k is a Poisson variable with parameter
II .. nAe
b.
-lin •
Show that
= e -A
c.
What properties are associated with the estimates Pi?
•
19
•
5.
A traffic accident researcher found an average of 0.5 accidents per
driver (for a certain ti@e period).
In his report he called attention
to the fac, that 40% of the accidents were "caused by" (Le., were
on the records .of) only 9% of the drivers.
Assuming an adequate sample
was taken, would you accept this as evidence of "accident proneness" of
a relatively small percentage of the drivers?
Justify your answer.
Poisson distribution with A
•
x
~
0
.6065
1
.3033
2
.0758
3
.0126
4
.0016
5
.0002
6
.0000
1.0000
.J
~
= 0.5
..
, ':.!
.
•
~
._-_._...
20
BASIC liRITTEN EX&~INATION IN BIOSTATISTICS
PART III
(9-12 A.M., Saturday, March 7, 1970)
•
Instructions
1.
a)
This is a closed-book examination.
b)
The time limit is three hours.
c)
Answer any three of the six questions which follow.
d)
Put the answers to different questions on separate sets of papers.
e)
Return the examination with a signed statement of the honor pledge.
A recently-reported Taiwan study used a "matching" study in an attempt to
"es,timate the real demographic impact of the IUD". The introduction and
methods (design) sections of the paper are reproduced and attached*. The
pre-IUD and post-IUD fertility data from the household register were used
to obtain three year pre-IUD and post-IUD to December 31, 1967 fertility
rates, which were compared to estimate the demographic impact.
a)
Discuss the advantages of this design as compared with a study of acceptors alone.
b)
Discuss the potential biases when using the two registers to identify
IUD acceptors and non~ac~eptors in the household register system.
c)
Assume that accep,torlLanq ,non-acceptors can be accurately and completely
identified before "matching". Discuss how other factors including
migration might bias the results of the p~esent study.
d)' Row might one analyze the available data to get some idea of whether
biases exist?: ",.
';.,. :",'
i~
_.
,.-J _'
r,
* M.C. Chang, T.R.,Liu and:L.~. Chow. "Study by matching of the demographic impact of an,IUD::pr!'gram'!, Milbank Memorial Fund Quarterly,
April, 1969 (pp. 137-140 attached to examination) •
.u'"
•
21
2.
•
In an experiment on the response of three variables to two drugs, the
following data were obtained:
Variable
A
B
C
1
5
10
8
2
4
12
7
3
6
12
9
4
6
10
7
5
4
11
9
6
10
12
7
7
9
13
8
9
12
9
9
9
10
8
11
7
9
12
8
Case
Drug 1
Drug 2
".
xl
=
x
= (9,12,8)
2
/6
0
(5,11,8)
5
= 1/8
(:
6
3
:)
b)
=8
C
~
8/39 -:/13 )
-1/13
2113
Assuming multivariate normality: which, if any, variables are
independent? If multivariate normality is not assumed, what
'statements can be made about independence?
c)
Test for equality of means in the two 'groups. (i.e., calculate the
test statistic and indicate bow to look up its significance).
d)
Discuss the effect of sample size, and number of
the test you have just perf~r~ed.
'..
•
1
Is the given inverse correct?
a)
3.
5-
0
var1able~ on
..!
Assume that a'health department serving a metropolitan area with a population of approximately a million persons has initiated a cervical cancer
screening program aimed at reaching approximately 5000 women. You have
been asked to serve as statistical consultant:for the program, with specific responsibility for preparing a brief'statistical report suggesting:
a.
Population groups the program might seek to reach, and the best
location.of program services for this purpose.
b.
Statistical data and analyses needed in program operation and
evaluation.
Btiefly highlight the type of report you might prepare. Include, for
example, ways you might use mortality and any available morbidity data in
determining target groups, types of statistics on program services that
might be maintained, and means by which you would seek to measure program
success.
22
4.
The following experiments were undertaken to measure the effect of an observable stationary patrol car on the speed of vehicles. The method of measure,ment used consisted in· placing hidden cars with Vas car speed measuring equi~ment at the extreme points of a section of highway to determine the actual
speed of vellicles at these points. In the control experiment I, no observable
patrol vehicle is used; in experiment II, an cbservable patrol car is midway
between the two hidden points; and in experimEnt III, the observable patrol
car is nearer the terminal hidden point. These may be diagrammed as follow~:
Hidden
Point AI
Experiment I
)
)
)
)
•
,
Hidden point
C
Experiment II
Hidden
Point Aj
Experiment III
Hidden
Point AI
B(obs. Patrol car)
I
)
Hidden point
~
C
B(obs. Patrol car)
)
Hidden point
)
C
Where A is the initial point and C is the terminal point.
follows:
Experiment
I
II
Speed at C
<60
>60
<60
>60
19
1
8
4
20
12
Speed
at A
Total·
The data were as
<60
>60
27
5
139
89
15
38
32
228
53
Total
Discuss the analysis of these data in
Indicate the nature of the effects of
among the experiments. Finally, give
directed at the various hypotheses of
III
<60
>60
154
127
46
19
4
281
65
12
Total
-
50 .
27
8
77
terms of a general categorical data model.
the observable patrol car and differences
approximate values for test statistics
interest.
•
23
•
5.
A test "for the presence or absence of gout is based on the serum uric acid
level x in the blood. Assume x is a normally distributed random variable.
In healthy individuals x has a mean of 5.0 mg/10Oml and a standard deviation of 1.0 mg/100ml and in diseased persons it has a mean of 8.5 mg/100 ml
and a standard deviation of 1.0 mg/100ml.
A test T for the presence of gout is to classify those persons with a serum
uric acid level of at least 7.0 mg/lOOml as diseased.
(a)
In epidemiologic terminology, the "sensitivity" of a test for
a disease is defined as the probability of detecting the
disease when it is present. Express the sensitivity of T as
a test for gout in terms of standard tabulated funCtions.
(b)
The "specificity" of a test is defined as the probability of
the test not detecting the disease when it is absent. Express
the specificity of T similarly.
Alternatively, this can be regarded as a statistical test of the null
hypothesis
H:
o
~
D
5.Omg/lOOml
against the alternative hypothesis
'.
H:
a"
~
D
8.5mg/100ml
The data is a single observation on the serum uric acid x of the person to
be classified.
What is the rejection region of the test T?
What is the size of the test (expressed in terms of tabulated functions)?
Relate the epidemiological concepts of sensitivity and specificity to a,
a. and l-a. the Type I and Type II error pr~~~bilities and the power of T
respectively •
•
,
24
6.
In his continuing research into ways of improving our environment, a scientist has developed a fly ointment which, when applied to the legs of young
flies, mayor may not inhibit (slow down) the growth of their legs. Since
short-legged flies are known to have more trouble with takeoffs and landing:;
(fallon their little faces .•. ), if the ointm~nt works the scientist feels
it will be a boon- to future generations of mankind.
...
In order to prove that his ointment does indeed inhibit growth of flies'
legs the scientist performed the follOWing experiment. Three concentrations
of ointment were prepared, containing O(control), 0.2 mglt and 0.4 mg.lt
of the active ingredient. Thirteen young flies (6 females and 7 males)
were captured and assigned to treatments at random. After being assigned to
treatments, the flies in each group were numbered 1,2, ••• ,n; as shown in the
following table:
Concentration
0.0
Sex
0.4
0.2
M
cell II
fly II
1
1,2,3,4
2
1
3
1,2
F
cell II
fly /I
4
1,2
5
1,2,3
6
1
For example, there were n
ointment.
3
=2
male flies who received the 0.4 concentration
...
Before the ointment was applied, the lengths of each fly's fore and hind legs
were measured and recorded. The appropriate ointment was then applied to
the right fore and hind legs of each fly, and the flies were allowed to
mature under controlled conditions, after which their legs were re-measured.
The experimenters 'chose to-analyze the data with the following model:
j •
1,2,
,n
i·
1,2,
,6
i
where:
is the difference between growth of the right foreleg (which received the ointment) and the growth of the left foreleg of the
j-th fly in the ith cell.
X is the average length of the two forelegs of the jth fly in the
ij
ith cell before the ointment was applied.
continued
•
25
•
6.
continued
Questions:
Let ~
~i's?
a.
In the model (1) above, what is the interpretation of the
b.
In the model above, what is the interpretation of the eij's?
c.
What assumptions about the eij'S are necessary if we are to use
the Gauss-Markov theorem to justify using least squares estimators of a and the ~i's?
d.
What further assumptions about the eij's are necessary if we
want to perform the usual tests of hypothesis?
= (~l'~2'~3'~4'~5'~6,a)'
ca
We can test hypotheses of the form
= Q. Write down the C-matrices for
testing the usual hypotheses of an analysis of covariance. That is, write
down the C-matrices to test:
••
e.
For a "Sex" main effect;
f.
For a "Concentration" main effect;
g.
For a Sex x Concentration interaction.
h.
Whether a
i.
Write a C-matrix to test whether there is a significant linear
trend in response, y, as concentration increases.
j.
Write a C-matrix to test whether· .there is a significant quadratic
trend in the response, y, as concentration changes. Write this
C-matrix so that the resulting.test statistic is not correlated
with the test statistic generated by the test in part i •
a
O.
. . .. ..
". ~
••
26
•
BASIC I-'RITTEN EXLI.mUATION IN llIOSTI.TISTICS
PART I
(9-12 A.M., Friday, June 26, 1970)
Instructions
a)
This is a closed-book examination.
b)
The time limit is three hours.
c)
Answer
d)
Put the answers to different questions on separate sets of papers.
e)
Return the examination with, a signed statement of the honor pledge.
1.
In a certain population the number of children per family has a dis-
~
four of the five questions which follow.
tribution with mean 3 and variance 6; and every child, independently
of all others, has probability 1/2 of being male.
Let a male child
be chosen at random from this population, and let the random variable
X be the number of brothers he has.
2.
•
Find E[x].
For each of the following distributions:
write down the probability
function; specify a model, including all necessary assumptions, which
would produce data haVing the distribution; and give an illustrative
example from the biological or health sciences.
a) Negative binomial
b) Multinomial
c) Hypergeometric
3.
Suppose Xl is exponential with mean Al and X
z is
mean A '
Z
What is the distribution of Xl/XZ?
this distribution?
Find an expression for the
exponential with
What is the mean of
p!h
percentile.
•
27
4.
Given independent random samples, of size n each, from three
2
populations havi"g a common knmm variance a
•
but unknO\m means:
{Xli' i;l,2, ... ,n}
2
fro," N(\11,a ),
, i;l, 2, ... ,n}
from N(1l ,a ),
, i=l,2, ..• ,n}
2
from N(1l ,a ),
3
{X
{X
2i
3i
no,~al
2
2
construct a ::lost powerful test of size a for testing the null hypothesis
H :
O
III
= \1 2 = \1 3 = 0
against the. alternative hypothesis
H :
A
III
=
a , 11
2
1
= a 2 , \1 3 = a 3 ,
where aI' a , a are given constants.
2
3
Be precise as to details of the
critical region.
For the specific case
'.
a = 2, a l
= .3, a
2
= .4, a
3
= 1.2,0.
= 0.025,
find
the smallest sample size required to ensure that the power of the test
against H will be not less than 0.975.
l
5.
In a certain region a 15% random sample of. hotels is taken and
fied according to whether or not every bedroom has a telephone.
cla~si-
The
results are:
Total Number
of bedrooms
•
Hotels in
Region
Hotels in
Sample
Sample hotels
with full number
of telephones
1-5
400
80
2
6-20
1500
200
50
21-50
600
100
45
51-100
300
50
35
101-
200
20
18
Total
3000
450
150
Compute the appropriate unbiased estimates of the total number of hotels
with telephones in every room and the corresponding variance, assuming
the data arose from (a) a simple random sample; (b) a stratified random
sample as indicated in the table.
two methods of selection.
Find the relative efficiency of the
28
BASIC WRITTEN EXAMINATION IN BIOSTATISTICS
PART III
(9-12 a.m., Friday, July 10, 1970)
•
INSTRUCTIONS
a)
This is a closed-book examination.
b)
The time limit is three hours.
c)
Answer any three of the six questions which follow.
d)
Put the answers to different questions on separate sets of papers.
e)
Return the examination with a signed statement of the honor pledge.
1.
Briefly specify
a.
Some ways in which vital statistics might be used in development
and evaluation of programs for detection and care of chronic
••
illness, such as cancer and heart disease -- including specific
reference to some analytic techniques that might be employed.
b.
Several ways in which other types of health statistics, e.g. on
morbidity and .. on health services provided, might be used in
development and evaluation of the chronic disease programs.
c.
Some
special~eeds
for improved analytic techniques that might be
used in applying vital or other health statistics in development
and evaluation of chronic disease programs.
•
29
2.
~
The following data were obtained from a sample of motorcycle
operators on campus in order to investigate the dependence of average
daily mileage on the operator's class status and marital status.
Mileage
Very Low
Low
Medium
High
39
39
42
49
169
Graduate Single
3
4
9
6
22
Undergraduate Married
3
3
039
15
13
Undergraduate Single
Graduate Married
9
Sample Size
2
39
Discuss the statistical analysis of these data in terms of a model
which accounts for the relative importance of the marital status
effects, class status effects, and marital status x class status
interaction on mileage •
.
~
3.
In this problem, N
(~,V)
p -
with mean vector
(a)
~
refers to the p- variate normal distribution
(pxl) and variance-covariance matrix V.
What is the joint frequency function (probability density function)
of N (~,V)?
P (~,V)?
(b)
What is the moment generating"fwction-of N
(c)
If A is an mxp matrix (m~p) of rank m, and I has the Np(~'V)
P-
distribution, what is the distribution of ! • AI?
Prove your
answer.
(d)
~
distribution then the components Yi
are jointly independent if and only if cov (Yi'Yj) • 0 for all
Prove:
If v has the N
..
(~,V)
p -
i+j, i.e., iff V is diagonal.
4.
Consider n randomized blocks of
k(~2)
treatments are applied (once each).
th
the i
plots each where k different
30
The response of the plot in
block receiving the jth treatment is ~ij
X(2»,
= ( X(l)
ij • ij
•
where
•
and E is positive definite.
(a)
Show how to test the null hypothesis
H~l)
(b)
!l'" ••• '" !k vs !j
+ !j'
for at least one (j.j').
(2)
as the concomitant variate and assuming that
Treating X
(2)
= 0, V , show how to carry out the
.. 0, Vi' and T(2)
j
j
analysis of covariance test for the null hypothesis
Il i
In the second problem, what is a simultaneous test for the paired
••
differences Tj - Tj"l~j<jl~k!
5.
A "gross nuptiality table" is a "life table" in which a population
of single individuals suffers decrementation through marriage in
the absence of mortality.
a)
~
bescribe·such a table including the assumptions involved in
constructing it.
b)
Derive an expression for the average age at marriage in
the stationary life table population.
c)
Show how mortality can be incorporated as an additional source •
of decrementation to arrive at what is called a "net nuptiality
table".
31
•
6,
The following data on the incidence of pneumococcal pneumonia have been
collected in Chapel Hill during the past year.
Boys
Age at
last birthday
years
Cases
Pop,
Attack
rate
(Der 1000)
Cases
Pop,
Attack
rate
(per 1000)
200
75.0
13
195
66.7
1
2
15
25
17
225
210
111.1
26
305
85.2
81.0
16
215
74.4
3
4
15
9
202
74.3
12
195
46 .2
7
205
180
58.5
38.9·
Under 1
'.
Girls
Suppose you are interested in learning whether there is a sex difference in susceptibility or exposure to the disease among children under
five years of age.
a.
List at least six different tests that might be performed to
test this hypothesis.
b.
Describe in general terms the power·of each test or reasons why
one would want to use it, or not use it, in the present instance.
c.
Perform at least two of these test&, and indicate what inferences
you can draw about a sex differential from each test •
•
32
•
BASIC WHITEN EXAMINATION IN BIOSTATISTICS
PART I
(9-12 A.M •• Saturday. February 20. 1971)
1.
Consider a finite population of size N with population mean X and population
variance 52.
A simple random sample of size 3 is drawn with replacement
from this population.
Let
x
I
denote the lDlWeighted mean over the distinct
units in the sample.
a)
Show that the probabilities that the sample contains
1, 2, or 3 distinct units are
b)
Show that, for estimating
X,
x'
•
is conditionally
unbiased (Le., conditional upon a fixed number of
distinct units).
unbiased for
c)
Show also that
x'
is unconditionally
X.
Find the unconditional variance of the estimator
'.
x'.
•
33
•
••
•
z.
A bivariate random variable
bution if
r
~
= (xI,x Z) has a bivariate lognormal distri-
(YI'YZ), where YI = loge xl and YZ
bivariate normal distribution. That is
=
=
loge xz' has a
a)
Give explicitly the distribution of lS = (xl'x Z)'
b)
What is the dis tribution of z = xI/x Z?
c)
What is E(z)?
34
3.
In a two-organ system (e.g., kidneys, lungs) assume that the life-times
of the two organs are independent exponentials, each with density
•
function
->. x
f(x)
= >'0 eO,
>'0 > 0, x > O.
Assume also that as soon as one organ fails, at time
x~,
say, the
surviving organ has density function
->. (y-x )
g(y) = >'1 e
a)
1
y>x
0 ,
-
0
Show that at time t both organs are still functioning
••
with probability
-2>' t
poet)
b)
=e O .
t > O.
ShoW· that at time t the system is still functioning
on cine organ with probability
(e
c)
-2)" t
0
_
e
-).. t
1).
t >
o.
Show also that the density function of system life is
h(t)
(e
-2>'
t
0
->. t
. 1)
- e
•
t > O.
•
35
•
4.
The following table records 292 litters of mice classified according to
the size of the litter and the number of females in the litter.
NUMBER
SIZE OF LITIER 0
1
8 12
1
2
23 44
3
10 25
5 30
4
a)
OF ffilALES
2
3 4
13
48
34
13
22
5
With the assumption that for any specified size of litter
the number of females is binomially distributed, show
that there is no significant difference at the 5\ level
••
among litter sizes with respect to the proportion of
females.
b) Test the significance of deviations from the assumption
of the binomial distribution (the IUIIIIIlers have been
chosen so that the arithmetic
i~easy),
and then comment
(no calculations are required) .on h(Jlf your procedure for
this test would require modification if the first test
had shown a variation in the proportion of females with
litter size •
•
36
5.
a)
Define the following terms:
•
i) Unbiased estimator
b)
ii)
Consistent estimator
iii)
Efficient estimator
iv)
Sufficient estimator
Give an example of an estimator for a parameter of a
normal distribution which is unbiased but not
consistent and prove your result.
c)
Give an example of an estimator for a parameter of a
normal distribution which is consistent but not
unbiased and prove your result.
d)
Show that there is no sufficient estimator for the
location parameter
e of the Cauchy distribution
f(x) =
1
1I'[1+(x-e)2]
•
37
BASIC WRITTEN EXAMINATION IN BIOSTATISTICS
•
PART I
(9-12 A.M., Friday, June 25, 1971)
1.
It is desired to select a sample from a certain population in order to estimate the mean of a particular variate, within ±2% of its true value with
probability at least 95%.
It is known from previous experience with this
population that this variate has approximately the following distribution:
Variate Value
Class Interval
o<
•
Approximate
Percentage
X < 10
10%
10 < X < 20
20%
20 < X < 30
30 < X < 40
40 < X < 50
40%
15%
50 < X < 60
5%
10%
Using the above information, what sample size will be necessary in order
to satisfy the objectives of the survey (at minimum cost where total cost
is
2.
~resumed
to be a linear function of the number of observations)?
A 2 x 5 factorial experiment is to be replicated four times using 40
individuals chosen from a population of 10,000 individuals.
Use this
situation to discuss
a)
a completely randomized design
b)
a randomized block design (What· might be used for blocking
factors?
What·is randomized?
Describe the differences in
the sampling procedures from a completely randomized design.)
•
c)
a components of variance vs. a fixed effects model.
Give the mathematical model in each case with the appropriate formulas
for the partitioned sum of squares and give the corresponding degrees of
freedom.
38
3.
Let Xl' X2 • X3 be independent Poisson variables with parameters AI' A • A3'
2
respectively. (a) Obtain the liMP Test for
A=A*>A
(b)
4.
•
o
Suggest a test for
Suppose Xl .X 2 •.••• Xn (n>2) are mutually independent and normally distributed
random variables with zero mean and unit variance.
Let
and
.'
Show that the random variables U and V are mutually independent and find
the distribution of U.
5.
a)
Find the distribution function of the largest observation in a random
sample of size n from a distribution with c.d.f. F(x).
b)
Given a sample of size n from the uniform distribution on [0.61,
obtain a confidence interval for 6 of confidence coefficient l-a
based on the maximum likelihood estimator of 6.
•
39
RE-EXMIINATlON FOR • • • • •
•
BASIC IJRITTE;; EXAHINATION IN BIOSTATISTICS
PART I
(9 A.M. - 5 P.M., Friday, June Z5, 1971)
Instructions
a)
This is a closed-book examination.
b)
The time limit is eight hours.
c)
Answer four of the five questions on Part I of the Basic Written Examination
and the two attached questions.
'.
d)
Put the answers to different questions on separate sets of papers.
e)
Put your code letter
f)
Return the examination with a signed statement of the honor pledge on a
~
your name on each question.
page separate from your answers.
Special Question for • • • • • • (1)
Consider a model of the form
E(y) •
(a)
State an appropriate error model and derive the least squares equations
for estimating the parameters 01' 6 , 0Z' 6 '
1
Z
(b)
What problems arise in the least squares estimation of the parameters
° 1 , °Z' 61 , 6Z?
(c)
.(d)
Suggest an approximate method of solving these equations.
Suppose it is known that 6 =6 Z=6.
1
Consider an alternative error model and
find an explicit formula for an estimator of
is known in this case.
B.
You may assume 0l+aZ=a
40
Special Question for
. . (II)
•
We have a random sample of size n from a distribution whose probability
density function is assumed to be
fe(x)
=0
a)
e(-l~e~l)
otherwise, where the parameter
is unknown.
Find that unbiased estimator, T , of e which is a linear function of
o
the sample mean X.
b)
Show that if the true value of e is 0, then no unbiased estimator
has a smaller variance than T •
o
c)
Show that T minimizes the maximum (with respect to e) of the variance
o
•
of any unbiased estimator of e.
d)
Do you notice an undesirable property of T ?
o
•
•
BASIC WRITTEN
•
E~I!NATION
IN BIOSTATISTICS
PART I
(9-1 A.M., Saturday, January 29, 1972)
1.
Given
a
(10 points)
2x 2 (two way) analysis of " ..dau.::e oesign,
a)
Refo~mulate
as a one way design by showing the relationships
between the parameters of the two models.
(8 points)
b)
Show
ho~
the main effects and interactions may be tested by
way of orthogonal contrasts applied to the one way model.
(7 points)
c)
Compare the Scheffe and Tukey methods of multiple comparisons
discussing the relative advantages and disadvantages of each.
'.
2;
The concept of a hazard f.unction is used in. reliabi lity.
h(t) is
de~ined
The hazard function
such that h(t)dt is the probability that the failure of a
system will occur in the time interval t to (t+dt) given that no failure
occurred prior to time t.
(8 points)
a)
If f(t) and F(t) are the density function and cumulative
distribution function of the random variable T, the time to
'failure, show that
(10 points)
b)
h(t) = f(t)/[l-F(t)).
a-I (a>0, 6>0, t>O).
Find f(t) i f h(t) = a6t
(5
points)
c)
Find the mean and variance of f(t) found in (b).
(2
points)
d)
Show that the exponential distTibution is a special case
of this distribution .
•
42
3.
Let xi (i"l,2, ... ,n+l) be a rGl1dClm sample 0.. siz~ n+l f:cono a normal N(ll,02)
•
population.
a)
Prove that
(
I: (xi -x) 2
b
is a confidence interval for
0
2
which has confidence
coefficient I-a provided
f
b
a
f n (y)dy " I-a .,. (i)
where
f (y) .. y
n
lm-l
e
-~
2!9l r (lm)
b)
,
y>O.
If the interval is to be the shortest, show that a and b must
f + (a) .. f + (b) •
n 4
n 4
c)
•
Show that the likelihood-ratio test, at significance level a,
of H :
o
a2..a~ against H:
a2~a~ is to accept H if
o
lies in the interval (a,b), where a,b now satisfy (i) and
f n + 3 (a) .. f n + 3 (b) •
[You may assume that Y .. I:(X._X)2/a 2 has density function
1
•
43
i=I,2.
Denote'then by Xl'··· ,Xn ,X n +1"" ,X n +n where the first n
1
l
l
1 2
come from population 1 arid the remainder from population 2. For each X.,
1
an, indicator variable Zi is associated, where Zi=l if Xi is in population
land Z.=O
if X.1 is in population 2.
1
the X's and the Z's.
Let r be the sample correlation between
Find the relation between r and the usual unpaired two
sample t-test.
•
s. Discuss the relative merits of using personal interviews, telephone interviews,
an~
mailed questionnaires as methods of data collection for each of
the following situations:
a)
A television executive wants to estimate the proportion of viewers in
the country who are watching his network at a certain hour.
b)
A newspaper editor wants to survey the attitudes of the public toward
the type of news coverage offered by his paper.
c)
A city commissioner is interested in determining how homeowners feel
about a proposed zoning change.
d)
A county health department wants to estimate the proportion of dogs
that have had rabies shots within the last year •
••
44
•
BASIC wRITTEN EXA}ITNATION IN BIOSTATISTICS
PART I
(9-1 A.M., Saturday, January 27, 1973)
1.
Iu tile analye~a ot ram11y records on a certain inherited disease, the
distribution of affected children is often given in the form
Pr{X=r} •
(~)
(1)
for r·l,2 ••.. ,s, where
s • the number of children in a family (family size);
e•
•
segregation probability;
w • ascertainment probability.
(s)
Show that (1) is a probability mass function.
(b)
Evaluate the expected number of affected children in a
family of size s, using distribution defined in (1).
(c)
For small z, (l-z) ~ (l-kz). Using this fact, derive
an approximation to (1) for small w. Show that the
approximation is a probability mass function, and find
its expectation.
k •
•
45
•
2.
Let X be a random variable with a gamma distribution. having density
e
-xIS •
Q<:;<C':lt
0..13>0.
Let xl ••••• xn be a random sample from f.
(a)
If S is known. find a sufficient statistic for 0..
(b)
If 8 is known. what is the best linear unbiased (Bl.UE)
estimator for 0.1
•
...
What is the variance of this estimator?
(c)
If a is known. find a sufficient statistic for 8.
(d)
If a is known. 8 unknown. find the Cramer-Rao lower
bound on the variance of an unbiased estimator of 8.
Find an estimator which achieves this lower bound.
You may assume that a family of gamma laws with Hxed
a and unknown 8 is complete.
(e)
Find the estimator for the variance of i (a known.
8 unknown) which is optimal in the sense of part d).
Gi.eu & r"ildoUl sample xl ••••• x n from a normal distribution N(Il.O:<):
(a)
Construct the likelihood ratio test criterion for the
hypothesis
Ho :
against alternatives
It is not necessary to reduce the test criterion to
lowest terms.
•
(b)
Transform the test criterion to one with known asymptotic
distribution. and show exactly how you construct a
critical region for a test of level a.
(c)
Find the acymptotic variance of the maximum likelihood
estimator of II under Ho '
46
4.
The Duchy of Grand Fenwick experiences a period of unprecedented economic
•
growth from year 1 to year n1 resulting from a massive American aid progra~,
The end of such aid followin5 year nl combined with a new policy of open
immigration extended to citizens of underdeveloped countries. initiates a
decreasing trend in per capita income from years n~ to n, in stark contrast
to the increasing trend experienced during the per10d of American aid. It
is suggested that the change in per capita income, years 1 to n, may be
appro~riately described by two straight lines. necessarily intersecting at
year n l ,
(a)
= per capita income in year
r = vector of Yi's. i=l ••••• n
Let Yi
rr.
Define a parameter vector ~ and an appropriate X matrix
for the model Ey = ~ described above.
(b)
Give algebraic expressions for the elements of (X' X).
(c)
Give a matrix expression for
of S. (in terms of X and r)'
(d)
Find
A
~,
the least-squares estimates
A
S for
the following data. where n =3
l
year
Per capita income
1
2
3
10
4
5
(4 pohl-t.6)
25
30
25
10
(e)
Suggest two things to look at in evaluating the fit of
such a model.
(f)
Discuss briefly the effect of sampling considerations (how
per capita income was measured) and possible distributional
assumptions on the interpretation and use of this model •
•
•
47
•
<
J.
(a)
Show that for a non-negative random variable X for which
exists for sume k>O,
Elf = kJ
'" k-l
o
x
[l-F(x)]dx,
~here
F(x)
Elxl k
=
Hence, or otherwise, show that
EX =
(b)
J [l-F(x»)dx.
o
Let X be a non-negative integer valued random variable with
Pi = P{X=i}, i=0,1.2, ...•"'. Also. let
Qi = P{X>i} = Pi+l + Pi + 2 + •.•• i~O;
p[q) = p(p-l) ... (p-q+l).
Let then
'"
L
i[k]Pi' k an integer
i-O
(~l).
Show that
••
,
k
Il[k] =
(c)
'"t
i[k-l]Qi'
l.
i-O
Let X have the Poisson distribution viz-.•
-m i
P{X=i} - e
m /il. i=O.l •••••"'; m>O.
and let
Il
r
r
- E(X-m) • for r-0.l.2 ••••
Show that
Bence. or. otherwise. evaluate the first four central moments
of X•
•
48
BASIC WRITTEN EXAMINATION IN BIOSTATISTICS
•
PART I
(9-12, Friday, May 25)
1.
1973
Simple random samples drawn from each of three strata give data
as follows:
Stratum I
10
12
13
17
8
9
20
Stratum II
Stratum III
21
23
25
23
20
26
30
31
28
40
38
37
37
33
11
13
12
a)
Estimate the population mean, and the variance of your
estimate, under the assumption that the sample shown was
drawn under
b)
~roportional
allocation.
.'
Estimate the population mean and the variance of your
estimate, if the true stratum weights are equal.
c)
Assume the combined sample above was drawn by simple random
sampling without stratification.
Estimate the population
mean and its variance.
d)
Now assume, under the conditions of (a), that data is
available on a concomitant variable x, such that for each
of the y values observed above, the corresponding x value
is (y-5).
Suppose also that the actual totals for the x
variable in the population are known to be 8, 12, and
20 million in stratum I, II, III, respectively.
Calculate
the combined and seperate ratio estimates of the population
total Y.
Which would you prefer in this case,· and why?
•
49
•
2.
Let the probability p n that a family haa exactly n children be
exp
n
when n~l, and PO=1-ap(1+p+p2+ .•.. ).
of giving birth to a boy is
a)
~
Suppose that the probability
and that all births are independent events.
Show that for k>l the probability that a family has exactly
k boys is 2expk/(2_p)k+l.
b)
Given that a family includes at least one boy, what is
the probability that there are two or more?
NOTE:
4.
PRoblem #3 is' on the next page.
Let Xl"'"
Xn be n independent random variables distributed
according to the (0,8)~ 8>0, rectangular distribution, and let
Un
=
Show that
a)
U and V are stochastically independent,
n
n
b)
U is the maximum likelihood estimator of 8 and
n
is also a sufficient statistic for 8.
•
est~tor
c)
Is U an unbiased
d)
Obtain a 95% confidence interval for 8 based on U •
n
of 81
n
50
3.
•
Suppose that a random variable Y is related to a non-stochastic
factor X over the interval -l<X<l by the relationship
Let N pairs of observations (X.,Y.), i=l,2, ••• ,N, be available for
~
~
1 N k
which Var (y )=cr 2 , Cov (Yi,Yi,)=O if i~i'; further, let \.I -N l: X.,
i
k
i=l ~
k=l,2,3, and assume that \.11=\.13=0.
It is desired to approximate
~
over
[-l,l) using
b)
N
N
1 N
where b = - 1:
o N i=1
1:
XiY.I 1:
1. i=l
i=l
Show that, in general,
1
J=
J
-1
can be written as J=V+B where·
••
1
V=
J
Var(Y)dX
-1
and
a=J
1
A
[E(Y)-~ }2dX.
-1
c)
For the particular situation described earlier, show
that V is minimized when lJ2-1, that B is minimized when
\.12-1/3, and that the value of \.1 2 which minimizes J is the
solution of a cubic equation.
NOTE:
Problem 64 is on the preceding page.
•
51
5.
Assume that an individual is subject to two types of failure and the
two failure times are represented by random variables X and Y.
The
joint distribution of X and Y can be described in terms of the
probability function
peDs and Y>t]=
e
-A s-A t-A max(s,t)
2
l
12
for s>O, t>O
= 1 for
s~O,
t<O
where
max(s,t)
=
s
= .t
•
if
s~t
if s<t
a)
Find the marginal distribution functions of X and Y.
b)
Find
c)
Prove P[X>x+u, and Y>y+u
P[X~x,
and
Y~l
= P[X>u
I
X>x and Y>y]
and Y>u]
52
BASIC \J1llTTDI EXl'l'lEATlON Hi BlOSTATlSTlCS
PART I
•
(9 a.m.-1 p.m., Saturday, January 26, 1974)
1.
COllsider a
a.
?op~.1.ation
cumposcd 'Jf the follo\ving
Hh<lt is the variance of th" mean for <l random sample of size
n=6 selected without replacement?
b.
.!hat is the v?riar..ce of the mean for the systematic sample
th
which selects every s-:= sampling unit from a random start?
c.
If the population uere divided into 6 strata as follows:
what is the variance of the estimate of the population mean
for a stratified random sample of one observation per stratum?
d.
•
I-lhat is the efficiency of the three sampling schemes relative
to simple random sampling?
2.
The probability distribution of a random variable Y is given by
P[Y=y] • p qy-l,
yal,2, ••• ,
and q
a.
= l-p.
Sho,," that the moment generating function of Y is given by
M(s)
= (A
1
e- s - A+l)-l, where A = -p.
Let Y and Y be two independent observations from the above
2
l
distribution, and let Zl • max(Yl,Y Z) and
Zz
= min(Y l ,Y 2)·
b.
Find the probability distribution of Zl'
C.
Find the conditional distribution of Zl given Z2'
•
53
•
3.
Let
i='1,2, ... ,n,
where
a~d
C."
1.
N(O, °i2) , t. is a knmm posit;.ve constant for every i, and a
1
.
S are parameters to be estimated. Obtain the maximum likelihood
estimate of p when
•
4.
is ur..'kn O\~"1l and O~1.
2
, 1=1,2, ...
a.
C1
b.
0=0 and O~ =
c.
C!=O and o~1. = tiv 2 , 1=1,2, ... ,n.
d.
a=O and o~ = t 20 2 i=1,2, ... ,n.
1i '
1.
02,
~
0
,n.
i=l,2, ••• ,n.
During a holiday season, a merchant will make a net profit of a dollars for
each unit amount of some commodity sold and will lose
unit amount left unsold at the cnd of the season.
e dollars
Suppose that
for each
th~
merchant
can purchase this commodity only before the season starts, and that the
total amount his customers will want to buy during the season can be viewed
as a r.ontinuous random variable with a given probability density functicn
f(x) for x >
a.
o.
If k is the number of units which the merchant should purchase
in order to maximi1.e the expected value of his profit G on the
commodity, show that k is given by
F(k) "
•
b.
a
a+e
Compute k wilen CL=l, a=2, and
f(x) ".0002xe-·OOOlx2,
x>O.
54
5.
•
Clinical studies in which several clinics pa.ticipate in evaluation of
therapies using a standard protocol have become re!.atively common.
an
experiffi~ntal
are
random~y
Assume
design in which patients who ueet protocol requirements
assigned to treatments within clinics, and that the clinics
that participate in the study are a random sample from the population of
cl i.nics that might use the therapi.es.
An appropriate linear model might
be
where E(Y
ijk
) =
~+ai;
V is the general mean, a
Bj
of the i-th treatment,
effect
~ijk
du~
i
is the differential effect
is a random clinic effect, (aB)ij is a random
to interaction between the j-th clinic and the i-th treatment,
is a random effect due to the k-th patient in the i-th treatment and
j-th clinic; B , (aB)ij' and
j
mean 0 and respective
~ijk are independently distributed, each with
.22
var~ances
cr , craB'
B
2 f
cr~
•
or i=l, ••• ,t, j=l, ••• ,c;
k=l.2, ••• ,n ij •
a.
What is the variance of Yijk?
b.
What is the covariance between two observations in the same
clinic and treatment?
c.
What is the covariance between two observations in the same
clinic, but
d.
diffe~ent
treatments?
What is the covariance matrix for a vector of treatment
means from a specified clinic?
e.
Find the covariance
and \ c
L.
j
y
2j ij'
where
and i is fixed.
•
55
•
BASIC WRITTEN
~~INATION
IN BIOSTATISTICS
PART I
(9 a.m.-l p.m.', Saturday; May 18, 1974)
1.
'.
Suppose that X and Yare random variables with joint density
a)' Obtain the moment generating function of the joint
distribution of X and Y.
'.
and y 2 is given
b)
Show that the correlation between
c)
Find the distribution of
d)
Show that U .. X+Y and V .. X-Yare independently
distributed •
•
X2
56
2.
Let y
(1)
< ••• < Y( ) be the ordered random variables of a sample of
n
size n from the rectangular (0.8) distribution. where 0 < 8 <
=.
By
•
a careless mistake. the observations Y(k+l) •..•• Y(n) were recorded
incorrectly. so they were discarded.
(Here. k is a positive integer
less than n.)
a)
Show that the conditional distribution of Y(l) •.•••Y(k-l).
given Y(k)
b)
Q
y.. is independent of 8.
Hence. or otherwise. obtain the maximum likelihood
estimator of 8 and show that it is a function of Y(k)
alone.
c)
If kin
+
p as
n~.
for some 0 < P < 1. what can you
say about the asymptotic distribution of the maximum
likelihood estimator of 81
3.
A bus line has length L.
••
The probability density for the point X at
which a passenger gets on the bus is proportional to X(L-X)2. and the
probability density for the point Y at which a passenger who entered
at point X gets off the bus is proportional to (Y-X).
Find the
probability that
a)
A passenger will get on the bus before point z.
b)
A passenger who got on the bus at point x will
get off after point z.
c)
A passenger who gets off at point y got on before
point z.
•
57
•
4.
Suppose that N represents the number of traffic accidents occurring at
l
a certain highway location during the year prior to the installation of
a certain improvement at that location, and that N represents the number
2
occurring during the year following the improvement.
Suppose. moreover,
that Nl and N2 are assumed to be independent Poisson random variables
with means
~l
a)
and
~2' respecti~ely.
What is the probability that n
l
+ n
2
accidents occur
during the two-year period?
b)
What is the conditional probability.that n
occur during the second year given that n
••
l
2
accidents
+ n occur
2
during the two-year period?
c)
-Show that the condiw.enal probability of part (b) is
the binomial probability B(n ; n + n , 8), where
l
l
2
1
~2
8 a -andpal+p
d)
~l
Suggest a test of H : pal versus H : p<l with approximate
O
l
significance level a, and comment on any properties of your
suggested test which you think are particularly good or bad •
•
58
5.
Consider the following set of data:
GROUP
A
GROUP
B
GROUp·
C
a)
!
y
3
2
1
2
12
10
10
13
4
3
3
14
14
12
5
15
1
2
3
8
7
10
1
9
•
Perform a simple linear regression of Y on X for the
twelve bivariate observatipns given.
b)
For each
~erV'ation,
•
c-alcu-late the residual from the
.'
fitted regression line.
c)
Perform a one-way analysis of variance on the residuals,
to see if the mean residuals differ significantly from
group to group.
d)
Place the ANOVA test you have just performed in the
context of general multiple regression analysis.
not necessary to· get very technical.)
(It is
Do parts a)-c)
relate to a classical procedure familiar to you?
Give
a crucial restriction on the use of this technique.
•
59
BASIC WRITTEN EXAMINATION IN BIOSTATISTICS
•
PART I
(9 a.m.-l p.m .• Saturday.
~anuary
25. 1975)
1. Let W(x) be a monotonic increasing function of x such
that
W(x)~
as
X~X
o
and
w(x)~
as
X~.
Let
be the cumulative distribution function (CDF) of a random variable X.
a) Prove that Ya 2W(X) has a1l2 distribution with 2 degrees of freedom.
b) Let Xl.X2 ••••• ~ be k (k>2) independent random variables. X having
j
the CDF
•
F (xj)al-exp[-Auj (X j )]. A>O.j a l.2 ••••• k.
Xj
where uj(x ) satisfies the conditions of W(x) in part (a) and A(>O) is
j
a constant.
Using part (a). show that
has a~2 distribution with 2k degrees of freedom.
c) Suppose that A is an unknown parameter and is estimated from the
formula
A
Show that A is an unbiased estimator of A.
•
A
d) Find the variance of A.
60
2. Consider a simplistic model of morbidity (illness) for some ailment,
say migraine headache. For any given calendar week, an individual belongs to one of two states:
Well (W) if he experiences no symptoms or signs of migraine headache
at any time during the week;
•
Morbid (M) if he experiences symptoms or signs of migraine headache
at some time during the week.
For a population P under study, assume that the probability that
a randomly chosen individual will be well in week (k+l) depends only
on his health statuses (W or M) in weeks k and (k-l). Denote the four
probabilities for the status "well in week (k+l)" as below and assume
that all probabilities are strictly between 0 and 1.
HEALTH STATUS
WEEK (k-l) WEEK k
PROBABILITY OF BEING
WELL IN WEEK (k+l)
W
W
pww
W
M
Ii
W
Pwm
Pmw
Ii
M
Pmm
a) Let the states of a Markov chain be defined by pairs
~-(zk_l'
zk)'
where Zj is the health status of an individual in week j.
Rename the states as: l=(W,W)i 2-(W,M)i 3-(M,W); 4=(M,M).
be the state random variable.
Let Xn
•
Verify that {X : ~O} is a homogeneous
n
Markov chain over the state space {l,2,3,4}.
b) Write out the one- and two-step transition matrices.
c) Write down the equilibrium equations for this chain.
8 pts.
d) Suppose that p
WV
• 0.8, p
WIll
- 0.2, P
1IIW
• 0.4, and p
lID
- 0.6.
Assume no
emigration, immigration, births or deaths relative to the population P.
After a large number of weeks, what is the probability that an individual
selected randomly from P will be well in the calendar week of his selection?
In the week following?
•
61
•
3. Consider a sequeuce of independent random variables X ,X ,X " " ,
l 2 3
each having the same expected value ~ and standard deviation O.
The distribution of
Yi =(X i -~)/O
does not depend on
~
E[(y -y
i j
Or
0,
)-2]
and
=A «co) for i"j.
Firstly, Xl and X2 are observed.
observations X3 ,X 4 , ••• ,
~+2
Then, a further set of.N
is taken, where N is the least integer
not less than M(X -X 2)2, M being a fixed positive number.
l
x=-1N
•
~+2
is then calculated.
a) Show that
E(N) l! :2Ma 2 •
b) Show that
for any positive number E.
•
The quantity
62
4. It is required to take samples of sizes n l and n 2 from two normal
~
populations having known variances o~ and o~ , the purpose being
to test the null hypothesis of
~l
and
~2
e~uality
against the alternative
of the population means
~l> ~2'
a) Setting
,
and using the notation which defines zk by Pr(Z>zk)=k, find the
expression for V which satisfies the conditions that the level of
significance be a and the power of the test against the alternative
b) For the V given in any such situation, find formulas for n
l
aod n 2
(in terms of v,Ol,02) which will minimize sampling cost if the cost
of making an observation in Population 1 is four times that in Population 2.
What are the specific sample size values if 0 =5, O2=4,
1
~
a-.OS, and the power is to be .90 when 0=31
•
63
•
5. Data is available of the form (Xi' Yi • Zi)' i a l.2 •...• n.
a) Consider the statistical model
where a and a are fixed (unknown) parameters and the E are
i
independent random variables with zero mean and common variance
A
Give formulae for a and a. the least squares estimators of
A
Give formulae for the variances of a and S.
a and a.
Derivations
are not necessary.
b) Consider the statistical model
where a.a and yare fixed (unknown) parameters and the E
are
i
independent random variables with zero mean and common variance
02 •
•
An analysis of variance for the data. under this model. is
given below:
SOURCE
D.F.
S.S.
K.S.
X
1
Z/X
1
8.0
5.0
8.0
5.0
Residual
Total
23
25
"7i:O
65.0
2.8
The correlation (Pearson) between Y and Z is r yZ • .4.
Calculate:
i) r xy • ' the partial correlation between Y and X, adjusting for Z;
Z
ii) r y • ' the multiple correlation of Y with X and Z;
XZ
iii) r
XZ
' the Pearson correlation of X with Z.
c) Consider the statistical model
Yi-a* + (S*+Z~) Xi+E~,
•
where a* and S* are fixed (unkDown) parameters and the E~ are independent
random variables with zero mesn and common variance 0 2 •
for ~* and a*, the least squares es~imators of a* and S*.
variances of these estimators?
Derive formulae
What are the
64
•
BASIC WRITTEN EXAMINATION IN BIOSTATISTICS
PART I
(9a.m. - lp.m., Saturday, January 24, 1976)
1.
I.et
l\}
be a sequence of independent and identically distributed
random variables and let
So = 0,
Sk = Xl
k ~ 1.
+ ••• + \ '
Also, let
N be a non-negative integer-valued random variable, independent of
the
\.
.'
(a) Show that whenever the moments exist,
E(SN)
= E(N)E(X l )
V(SN)
= E(N)V(X l )
+
V(N)[E(X l )]2
(b) Suppose that
x. =
1
and
Find
0, with probability l-p
1
1, with probability
p
°< p <
1
J
N has the Poisson distribution with parameter
E(SN)
and
V(~)
A(>O).
.
. (c) What is the actual distribution of
~.
in part (b)?
•
65
•
2.
Consider a bivariate normal distribution with mean vector
2,25 1.68)
and dispersion matrix ( 1,68. 4.00 .
vector of a
(i)
~ample
of size
n
(Xn , Y)
n
Also, let
\I
= (\II'
\1 )
2
be the mean
from this distribution.
Show that for testing
o vs HI: 8 > 0
1 2
based on the set of n differences (Xi-Yi,ISiSn)
8=\1-\1=
consists in rejecting H if
O
the level of significance.
(ii)
(iii)
'.
•
-
Yn ~
c
where
P{X -Y
n
n
:!;
cIH } = a.
O
Find an expression for the power function of the UMP test for
Determine
n
such that for
power is equal to
(iv)
Xn
8
1
= 0.5
and
a
= 0.05,
the
0.95.
For the same
n
and
(a, 8 ) • determine the power of the classi1
cal sign-test based on the signs of the difference of the two variates for the
n
observations (vectors) .
66
~.
Suppose that a set of three measurements
(Y, Xl' X )
Z
is available
on each experimental unit in a study.
(a)
Determine the values of
(Sl' SZ)
•
which minimize the variance of
where you may assume that the dispersion matrix of
(say, equal to
~)
is known.
(Y, Xl' X )
Z
Does the solution depend on a?
(Y i , Xli' XZi ) , i = 1, ... ,n from a trivariate
normal distribution, find out the maximum likelihood estimator
(b)
Given a sample
of the conditional mean of
(c)
Y given
Xl
= Xl
and
z = x2 •
X
Suppose that in (b), it is given that the mean vector of the trivariate normal distribution is equal to
Q. Does the solution
remain the same?
.·1.
(;I)
I.t't
h,'" "",,<10111 v"I'i"hll' with" ,'o"tinlioliS <Ibtl'ihillion 1"1I"<'Iioll
X
F(x)
- ,'"
variable
(b)
•
< x < "'.
Y = F(X).
Find the distribution fun,'tioll of the randolll
What is this result usually called?
Illustrate the use of this result in simulating a random sample
from a simple exponential distribution.
(c)
be n independent random variables with continuous
n
distribution functions FI (x) •...• Fn(x), all defined on (-"'."').
Using the result in (a) show that each of Un = -2~~1= llog F,1 (X.)
1
Let
Xl •... ,X
and
V =-z~~ llog{l-F,(X.)}
2n
n
1=
1
degrees of freedom,
1
is distributed as chi-square with
•
67
•
:'.
of <.\ealin!: with till' problem of non-response in a survey is to
(JilL' Will'
mUKl' efforts to collect information from a sub-sample of the units not
responding in the first attempt.
selected from a population of
out replacement (SRSWOR).
1
n
units were
N using simple random sampling with-
Of these,
non-responding units,
n (=n-n )
2
In a mail enquiry,
n
units responded. From the
l
were selected with SRSWOR and
the required information was obtained by personal interview.
(a)
= vr • V(>l) fixed in advance, N be the number of
2
2
l
units in the population that would have responded in the first
Let
n
attempt,
•
(b)
Let
y
N = N-N
Z
l
and
W.
J
= N.lN,
J
j
= 1,2,
be the variable under study,
means based on the
and
Show that
-,
n
units responding in the 1st attempt and
l
on the sub-sample of size r
units respectively. Show that
Z
is an unbiased estimator of the population mean
(c)
Y,
Show that
where
is the variance of the whole population and
variance within the non-response stratum.
•
be the sample
Y2
is the
BASIC WRITTEN EXAMINATION IN BIOSTATISTICS
68
PART I
•
(9:00 a.m. - 1:00 p.m., Saturday, January 22, 1977)
1. Suppose
(N)
that after suitable observation, it is decided that the number
of auto accldents at a given intersection this year has a Poisson
distribution with parameter
cars.
Let
A.
A typical accident involves one or more
X be a random variable denoting the number of autos involved
in an accident.
We assume that the numbers for different accidents are
independent and identically distributed with some distribution.
(a)
Derive the distribution of
in
(N)
T.
the total number of autos involved
accidents at the intersection this year.
Name the result-
ing distribution.
(b)
Find the generating function of
T.
(c)
If we are given the rule of thumb that. in practice. only one. two
or three autos are involved in a given accident with corresponding
frequencies in the ratio
2. Let
6.
what is the mean of
T1
be a random sample from a Poisson distribution with
X .X •... ,X
l 2
n
parameter
5:3:2.
•
i.e .•
.
-1 x -6
f(x;6) .. (x!) 6 e
a
with
6>
Let
Pr(X
(a)
Show that
(b)
(c)
u(X ) " 1 for Xl S 1 and u(X ) .. a for Xl > 1.
l
l
that u(X ) is unbiased for A.
l
Obtain the minimum variance unbiased estimate for A.
(d)
Show that the statistic obtained in (c) is indeed unbiased for
Let
S
and
x .. 0.1.2 •...
1) "(l+6)e -6 .. A.
Y ..
i:~ \
is a complete sufficient statistic for
6.
•
Show
A.
.
69
•
3. In a certain region a 15\ random sample of hotels is taken and classified according to whether or not every bedroom has a telephone.
The
results are:
Hotels in
Region
1-5
400
80
2
6-20
1500
200
50
21-50
600
100
45
51-100
300
50
35
200
20
lB
3000
450
150
101+
Total
'.
Hotels in
Sample
Sample Hotels
With Full Number
of Telephones
Total Number
of Bedrooms
Compute the appropriate unbiased estimates of the total number of hotels
with telephones in every room and the corresponding variance. assuming
the data arose from
•
(a)
a simple random sample without the knowledge of the strata,
(b)
a stratified
(c)
Find the relative efficiency of the two methods of selection .
random sample as indicated in the table.
70
Xl .X 2 •... ,Xn be a random sample from the following probability
density function:
4. Let
8 < x < 00,
1
where
Ca)
~
-00<8 1 <00 •
Determine the most powerful test of size
~
for testing
against
Cb)
Is the test obtained in
size
~
Ca) the uniformly most power test of
for testing
H : 8 = 8; •
O 2
with
9
1
known
~.
against
HI: 8 2 <
Cc)
8; .
with
9
1
known ?
Obtain the likelihood ratio test for testinl
I
H : 8 .8
1
O 1
I
with
9
2
unknown
against
I
HI: 8 1 > 9 1
Cd)
• with
8
2
unknown .
Show that the test in (c) may be based on a statistic which has
an
F distribution under the null hypothesis.
~
71
•
5. A wrist-watch manufacturing concern intended to study the effect of
humidity on the rusting of a particular metal they used for the watches.
Two countries A (a tropical one) and 8 (a temperate one) were chosen and
within each country five different places with varying degrees of humi"
dity were selected.
For each place, ten specimens of the metal were
exposed to the normal environment for a year.
j-th place and k-th specimen, let
and let
t ..
l.J
Y"
l.J k
For the i-th country,
denote the amount of rusting,
be the average amount of humidity (during the experimental
year) in the j-th place of the i-th country.
Note that
i =A,B; j::: 1, ... ,5;
k=I, ... IO.
It is conceived that for each country, a simple linear regression
of
•
(a)
Y on
t
describes the model adequately.
The concern wants to know:
Whether the rate of growth of rusting with humidiy is the same for
both the countries, and
(b)
Whether the typical rusting at
0 amount of humidity is the same
for both the countries.
(i)
(ii)
(iii)
•
State your assumptions carefully.
Provide suitable statistical tests for both (a) and (b).
Comment on their merits and demerits.
72
BASIC WRITTEN EXAMINATION IN BIOSTATISTICS
•
SPECIAL OFFERING OF PART I
(9:00 a.m. - 1:00 p.m., Wednesday, August 3, 1977)
I.
Sampling
Simple random samples drawn from each of three strata give data as
follows:
Stratum I
Stratum II
Stratum III
10
12
13
17
8
21
23
2S
23
20
26
30
31
28
40
38
37
37
33
9
20
11
13
lZ
(a)
Estimate the population mean, and the variance of your estimate, under
•
the assumption that the sample shown was drawn under proportional allocation.
(b)
Estimate the population mean and the variance of your estimate, if the
true stratum weights are equal.
(c)
Assume that combined sample above was drawn by simple random sampling
without stratification.
(d)
Estimate the population mean and its variance.
Now assume, under the conditions of (a), that data are available on a
concomitant variable
x,
such that for each of the
above, the corresponding
x value is
actual totals for the
variable in the population are known to be 8,
x
(y-S).
y values observed
lZ, and 20 in stratum I, II, III, respectively.
Suppose also that the
Calculate the combined
and separate ratio estimates of the population total
you prefer in this case, and why?
Y.
Which would
•
73
•
•
2.
Hypothesis Testing
De fi ne
(a)
Type I and
(b)
Power of a test;
(c)
Uniformly most power test;
(d)
Unbiased test.
T}~e
A box contains
II errors;
N items numbered from
at random and its number
X observed.
I
to
N.
One item is selected
We wish to test the hypothesis
N= 5
at
a = 0.2.
The following 5 tests are thus available at the desired leve I:
T
l
rejects
H if
O
X=l
or
X>S;
T
Z
rejects
H if
O
X=Z
or
X>s;
etc.
•
(e)
Calculate and sketch the power curve of each of the tests
against the alternatives
N = 1.Z.3.4.s.6.7....
T l • T2 ,··· .T s
Are these tests unbi-
ased?
(f)
Is there a uniformly most powerful test among
your answer.
•
Tl.T •...• T ? Justify
Z
s
74
3.
Estimation
X is said to have a Pareto distribution if
The random variable
F(x)
=
r{x ~ xl
•
is given by
F(x)
and
where both
(a)
are positive numbers.
a
Derive the form of the probability density function
ing to
k
E(X )
for
k = 1,2
Find
(c)
Under what condition on
(d)
If
is known and if
and
a
cient statistics for
For a speci fied
a,
VeX).
does
Xl' ... ,X
having the same distribution
(e)
correspond-
F(x) .
(b)
a
f(x)
k
E(X )
n
F(x),
exist for
k = 1,2.
are independent random variables
show that there exists a suffi-
•
xO.
find a 95% confidence interval for
based on
the sufficient statistic.
•
\.
75
•
4.
Stochastic Processes
The special case of the general birth and death process in which the
birth rate is
A = An+a, A and
n
a>O,
and the death rate is
un, ,,>0,
is called a linear growth process with immigration.
Ca)
Write down the transition probahilities corresponding to the above
birth and death rates.
(We assume time homogeneity.)
(b)
Is this a Markov
(c)
Determine differential-difference equations satisfied by the transi-
process?
Show why or why not.
tion probabilities in Cal.
.,.
(d)
Use (c) to determine the (time-dependent) mean.
(e)
Give conditions for which limits of the expressions in (a) and Cd)
exist and write down the limits.
76
S.
Experimental Design
The problem often arises in experimental design over the large area:
e
how many sites and how many replications for each experiment should he used
to obtain some '9ptimal' solution under certain conditions.
We will consider
the problem as follows:
Let
(site)
v ..
• 1)
denote the yield of some crop observed in the i-th experiment
(i=I.2, ...• m)
on the j-th plot.
(j =1,2 •... ,r).
(We assume that
number of sites is very large, and for practical purpose infinite.)
Let
with
2
E .. ~ N(O'OI)
1)
COV(E .. ,E.,. ,)
1)
(a)
1)
= 0,
COV(E ..• u.)
1)
1
=0
.
e~
Let
I
y = -
r
m
~
~ y ..
rm 1=
. I j =I
1)
denote the observed overall average yield of this crop,
(b)
As costs of an experiment, let
i
var(y).
be the cost of Obtaining each plot
be the cost of preparing and conducting a trial within a plot.
and
Give
c
Find
an expression for the total cost
plots and
r
C of an experiment with
m
trials within a plot.
,.
Find the optimal allocat ion with respect to
var(y)
r
for given total cost,
(c)
to minimize
(d)
to minimize total cost
C
and
m,
\,hen it is desired:
-6
C = Co ;
for given variance
var(y)
= Vo '
Ie
77
•
BASIC WRITTEN EXAMINATION IN BIOSTATISTICS
PART I
(9 a.m. - 1 p.m .• Saturday, January 21, 1978)
1.
Consider a continuous process, denoting the interarrival time between
th
the (i-l)th and the i
events by Ti for i ~ 1. Presume that the Ti are
independent and identically distributed as an exponential. specifically
f(t;B) = Be -Bt
'.
for
t
a)
Verify that the no memory waiting time assumption is satisfied. i.e .•
~
0
and
B>0 •
Pr(T >t +ulT >u) = Pr(T >t)
b)
c)
~O,
u
~O.
Let
T .T ••••• Tn be the interarrival times for the n events observed
l 2
between time zero and time t. Find the distribution of the total time
Wn required to observe n events.
Let Nt be the number of events occurring in an interval of length
Find the distribution of
d)
for every t
Nt'
Suppose that
f(t; a) = a 2 te -at
for
t ~0
and
a> 0 •
Verify whether the no memory waiting time assumption is satisfied or
not .
•
t.
78
2.
Let a random variable Y= logX
value
a)
and standard deviation
~
be normally distributed with expected
•
0.
Obtain the expected value of X,
;
E(X)
=~,
say. in terms of
~
and o.
b)
Let
Xl' X2•...• Xn be a random sample from this distribution.
Find the maximum likelihood estimator of ~ (~. say).
c)
Find the expected value and variance of ~
and
3.
(in terms of ~
0)·
For the linear model
suppose that
(Y i • xli' x2i ).
i
= l •••.•n
a)
What are the interpretations of the parameters
b)
Suppose that you want to test
where
Sl' S2
••
are given.
(SO' Sl' S2)?
are treated as nuisance parameters.
Is
H a linear
O
hypothesis?
c)
State clearly the standard assumptions usually made in this context.
d)
Construct the standard analysis of variance table for this testing
problem.
e)
What is the distribution of the test statistic when
and
(ii)
H does not hold?
O
(i)
H holds
O
•
79
•
4.
a)
Define each of the following:
random sampling,
i) Simple random sampling,
iii) Systematic sampling,
ii) Stratified
iv) Cluster sampling, and
v) Two stage sampling.
b)
Consider the ratio method of estimation, and let us denote by
and
the values of the primary and an ancillary characteristic
under study for the i th unit of the population (i = 1,2, ... ,N)
y
R = =-
the ratio of the population mean of y to the population mean of
X
x
and
R =~
the corresponding ratio for a sample of size
x
n
2
(C - pC C)t, where
y 7t J
n
x
denote the coefficient, of variation of x and y respectively,
Show that
C
y
•
is approximately equal to
R{l
+ NN-
denotes the correlation coefficient between
p
c)
n.
x
and
y.
Suppose that a population is divided into two mutually exclusive classes
with
N
l
and
N
2
members respectively, so that
N
2
= Nq
Assume that a simple random sample of size n is drawn without replacement fro
the N units and nlof the observations in the sample are in class 1 and
in class 2.
Define
A
R =
Show that for large values of
1
nq
•
and
N,
R =
the relative bias in
R is
given by
n2
80
•
s.
a)
For testing a simple null hypothesis vs a simple alternative. what
is a most powerful test?
Does it exist always?
b)
What is a uniformly most powerful test?
c)
Let
xl •...•X be n independent random variables each having the
n
normal distribution with mean a and variance e. where a >0.
Let
HI:
(i)
(ii)
e =·1.5
.
What is the most powerful test for this problem?
Verify whether the test derived in (i) is uniformly most
powerful when
(I)
(II)
HO:
HO:
e =1
e =1
vs HI:
vs HI:
e>
e~
1
1
•
•
81
BASIC WRITTEN EXAMINATION IN BIOSTATISTICS
•
SPECIAL OFFERING OF PART I
(9 a.m. - 1 p,m"
Saturday, August 26, 197B)
QUESTION 1
•
a)
State the components of variance model for the one-way layout
consisting of k samples of n observations each, Use the
notations a~ and a~ for the between.sample and within-sample
components of variance, respectively.
b)
Using the notations SSB and SSW for the between-sample and
within-sample sums of squares, respectively, set up and fill
out an analysis of variance table for this one-way layout,
wi th rows "between" and "within" and columns "OF", "55", "MS",
"EMS". and "VR",
c)
Let a" a~/a~. Give formulas for the point estimator of a
and for a two-sided 100 (1 - a)" confidence interval on a.
Note that a cannot be negative. How do your formulas take
this fact into account?
against the onesided alternative 6 > eO' What is the power of the test
against the specific alternative a" a l > ao?
d)
State how to test the hYllothesis that
a" eo
e)
Suppose k" 3, n • 6; the 3 sample means are Xl" S. x2 " 6,
x "7; andSSW=1800, a=,10, 6 =1, 6 =2. Write down
0
1
3
the analysis of variance table for these data; calculate the
point of estimate and the confidence interval of (c); perform
the test of Cd) and evaluate its power.
NOTE:
•
If X is distributed as
of freedom, then
p(X>
cl
"
F with
(_~~_Jm/2
c+iii72J
2 and
m degrees
82
QYESTION 2
Suppose that
variance (12,
X is normally distributed with mean
~
and
a)
Let - '" < L < U <"". Derive the moment generating function
of the conditional distribution of X given Ls Xs U,
bl
Let Y = (X - ~)/(1 and <f>(y)
function of y. Show' that
•
be the probability density
(d/dy)~(y) = -~(Yl
cl
Using
(al or
(b).
E(X I Ls Xs U)
in terms of
d)
~
derive
and
Var (X I L S XS U)
and the corresponding cdf
~.
Verify that as L'" _.. and U'" + ao, the expressions in
(c) converge to ~ and (12, respectively.
••
QUESTION 3
a)
What is a likelihood function? Define a maximum likelihood
estimator (MLE) of a parameter. State the usual properties
of a MLE of a parameter e.
b)
Suppose that X, the length of life of an electric tube, has
the simple exponential density function
fe(x)
= ae -ax ,
Osx<ao
(a> 0)
nt have been observed to have
Based on a sample of size n,
lifetime less than t(> 0) while non have lifetime ~ t.
t
(i)
(ii)
(iii)
Find the MLE en of a.
Verify whether §n is unbiased for a
What is the asymptotic variance of §n?
or not.
•
83
•
QUESTION 4
a)
Define
(il
(ii)
b)
Process
Markov Chain
M~rkov
n~l}
is
n = I, 2, " ' , let Xn = UI + U2 ,+ '" + Un where {U:
n
a sequence of indpendent random variables. We consider two cases
For
(i)
(ii)
each
Un normally distributed,
each Un takes the values 0 and I with probability
p and q respectively (p+q=l; O<p<I).
State in each case whether or not the stochastic process
{X:
n
n ~ I}
is a Markov Process or Markov Chain.
Explain
your reasoning.
•
c)
{Xl' X2 , ... } b~ a sequence of independent and identically
distributed random (non-negative) variables with p.d.f. f(x)
Let
and let
R be an integer valued random variable with
Probability{R = r} = Pr'
the
X's
and
R being independent.
A finite renewal process is one in which events occur at
Xl'
X2 + X2 , ••• , Xl + ... + XR
Calculate
(i)
the mean and variance of the distribution of the time
to
(ii)
•
Xl
XR
the expected number of events in
+
+
(0, t).
84
•
QUESTION 5
a)
Explain each of the following
Simple random sampling with or without replacement;
Stratified random sampling;
Cluster sampling.
(i)
(ii)
(iii)
b)
In a population of size N, let p be the proportion of
individuals with a certain characteristic. Let
1. if the ith individual has the characteristic;
y
-
i -
c)
1
0, otherwise,
for i = I, ... , N.
Suppose that a simple random sample of size n is drawn
without replacement. Denote by Y the sample arithmetic
n
mean of the Y-values. Show that
EY =p. Find Var(Yn ).
n
Assume that in the population described in (b) the proportion
of individuals possessing the characteristic varies by age
and n
denote respectively the number
and sex. Let N
hk
hk
of people in h-th age and k-th sex group and the size of a
simple random sample (drawn without replacement) from the
corresponding group (h = 1, 2, ... , H, k = 1, 2) . Use the
following scoring system. Denote
ihk = I,
Y
•
if i-th individual in the h-th age and k-th sex
group possesses the characteristic
= 0,
otherwise
(ial, 2, ""~k' h=l,2, ... ,H, k=1,2),
Use the scores to obtain an unbiased estimate of p, the
proportion of individuals possessing the characteristic in
the population. Find the variance of your estimate.
•
85
BASIC WRITTEN EXAMINATION IN BIOSTATISTICS
•
SPECIAL OFFERING OF PART I
(9 a.m. - 1 p.m., Saturday, August II, 1979)
Question 1
Conditional on
e
fixed, X and Yare independent random
variables each having a Poisson distribution with expected value
•
e • Also,
suppose that
e
follows a distribution with the
probability density function
g (6) _ --,,1_
rea)
where
e-6 6a-l ,0< 6 <
~
,
a is a positive integer.
(a) Find the marginal distribution of X ( and Y ).
(b) Find the joint distribution of X and Y •
(c) Find the value of the (linear) correlation coefficient
between X and Y •
(d) Find the regression function of Y on X, namely, E(YIX-x) •
•
86
Question 2
) be two
(X 11 • x 12 •..•• x
) and (X • x .... 'x
22
21
,
1n
2n2
1
independent.random samples from Poisson distribution with parameters
Let
III
(a)
and
1l •
2
respectively.
11 2 > 0) .
Ill'
III = 11 2 = Il· Obtain the maximum likelihood estimator
Il. using data from the two independent samples.
Suppose that
of
(c)
O.
Construct the likelihood and derive the maximum likelihood
estimator for
(b)
(Ill>
•
Construct the (extended) likelihood ratio criterion for testing
HO: III = 11 2 = Il against the alternative H1 : III F 1l • using a
2
suitable distributional approximation for the likelihood ratio
criterion.
(d)
Suppose that the following data are available:
n
1 =50 •. xl
where
=2.70;
n 2 " 60.
;(2
•
=3.05
n.
1
L x 1)
..•
j=l
i = 1. 2.
Apply the test you construct in (c) to these data. using
significance level
a. = 0.05.
What is your decision in regard to
Ho?
•
87
•
Question 3
(a)
Write down the estimator for a population proportion in the case of
a stratified random sample.
(b)
Derive a computational formula that would enable you to compute
the variance of your estimator.
(c)
Suppose that a simple random sample of size
n was selected
h
from each stratum of size Nh , h = 1, ... ,H, and "1, respondents
were obtained from each stratum. Then, for each stratum
h = 1, ... ,H, a simple random sub-sample of size t
was selected
h
for intensive follow-up from the (n -"1,) non-respondents.
h
Suppose further that the sub-samples of size t
all responded,
h
h = 1, ... ,H. Consider the following estimator Pi for a population
proportion, Pi' in category i,
A_I H
P'-N" LNh
1
h=l
•
~'\ ~i (~
+
~
- '\)
::iJ
'
H
where
out
LN •
h=l h
. of the
N=
'\i
is the number of respondents in category
i
initial respondents, and ~i is the number of
respondents in category i out of the sub-sample of size t h '
for h • 1; •••• H.
Show that the variance of
Pi may be estimated using the following
computational formula
(d)
•
Suppose now that in the actual survey, some of the sub-sample
of non-respondents did not respond even with intensive follow-up
causing a few of the t
to be zero. How might you modify the
h
computational formula in part (c) to deal with this situation?
88
Question 4
Consider a one-way analysis of variance (ANOVA) model with fixed
effects.
There are
k(~
2)
treatments, indexed
i = 1•... ,k
•
and
2) repl-icates for each treatment. indexed j = 1•...• n. The response
for the jth replicate of the ith treatment is denoted by Y ,
ij
n(~
l~j~n,
l~i';k.
(a)
Write down the usual model for this situation and any assumptions
about the model and its parameters.
(b)
In terms of the model parameters, write down the null and
alternative hypotheses for testing whether there are any treatment
effects.
(c)
Provide a complete breakdown of the usual ANOVA table, including
degrees of freedom, sum of squares, mean squares.
(d)
State the test statistic used for the test in part (b) and its
distribution under the null hypothesis of no treatment effects.
(e)
Derive the expected values of the mean square due to error and
the mean square due to treatment when the null hypothesis may not
•
be true.
(f)
Write the formula for the noncentrality parameter of the distribution
of the test statistic in part (d). Of what use is this parameter?
•
89
•
Question 5
(a)
Give definitions of the following terms (define any notation
you use):
(i) . persistent positive (or positive recurrent) state
(ii)
(iii)
(iv)
(b)
•
•
persistent null (or null recurrent) state
trasient state
stationary distribution
Suppose you are given a finite Markov chain with transition matrix
(i)
1
0
0
0
0
0
.1
0
0
.9
.3
.1
.1
.3
.2
.2
.2
.2
.2
.2
0
0
0
0
1
Find the conditional probability of eventual absorption
into state 0, given X =i, for i = 0) I, 2, 3, 4.
o
(ii)
Find the expected waiting time for absorption into state 0
or state r, given X =i, for i = 0, 1, 2, 3, 4.
o
90
•
BASIC DOCTORAL
\~ITTEN
E~lINATION
IN BIOSTATISTICS
PART I
(9:30 a.m. - 1:30 p.m., Saturday, August 9, 1980)
INSTRUCTIONS
a)
This is a cZosed-book examination.
b)
The time limit is four hours.
c)
Answer any four (but onZy four) of the five questions
which follow.
d)
Put the answers to different questions on separate sets
of papers.
e)
Put your code letter, not your name, on each page.
f)
Return the examination with a signed statement of honor
pledge on a page separate from your answers.
g)
You are required to answer onZy what is asked in the
questions and not aZZ you know about the topics.
••
•
91
•
Qucst ion 1
a)
Clarify thc confusion of a person who makes the following statement:
"Stratified sampling and cluster sampling are conceptuaLly
Both are types of designs in which a probability
sample is selected from a population which is divided into
groups of elements."
identical.
b)
Briefly describe stratified (simple) random sampling.
c)
Let
(1 if the i-th element in the h-th stratum possesses some attribute
Y , =j
h1
lO
if otherwise.
and suppose we wish to estimate the proportion (P) of elements in a
population of size
N which posses the attribute.
The sampling
design we use is proportionate stratified (simple random) sampling
and our estimator of
••
takes the general form
P
H
})h~
p=
h
where
W = ~/N,
h
stratum, and
~
~
is the total number of elements in th,e h-th
is the proportion of
the attribute in the h-th stratum.
nh
sample elements possessing
Show that the true variance of
P is
Yar(p) =
where
n
N
Ph =1 -Q
is the proportion of all elements in the h-th
h
stratum which possess the attribute,
d)
•
N '=; ~ -1 and N'=,N - I, show that the true variance
h
of the estimator (po) of P from a simple random sample of size
Assuming that
n
taken from the same population can be expressed as
Var (po)
where
f = n/IL
= Yar(p)
+!-.::.i ~w
n
h h
(P
h
_ P) 2
92
Question 2
suppose that a system has two components whose Zife times
4It
(X and Y,
say) are independent and each has the same exponential distribution with
mean
G(> 0).
docs so.
The system fails as soon as at least one of its components
Let
Z be the life-time of the system.
(a)
What is the probability density function of
(b)
For n(" 1) systems of the same type, let Zl"" ,Zn be the
respective life times. Obtain the maximum likelihood estimator
of
(c)
(d)
e
(say, en)
Obtain
Ee
based on
Z?
Zl,,,,,Zn'
and Var(e ) .
n
Compare var(en ) with the Cramer-Rao bound and comment on the
efficiency and sufficiency of en'
n
Question 3
a)
Suppose that the probability of an insect laying
-m r
emir!
r(~
0) eggs is
and that the probability of an egg developing is
4It
p(O <p <1).
Assuming mutual independence of the eggs, show that the probability
of a total k (" 0)
b)
survivors is
e
-mp (mp) k Ik!
Suppose that two insects of the same species are placed in two
different incubators with different temperatures, and you suspect
that the temperature may have some effect on the probability of an egg
developing, though the distribution of the number of eggs layed should
be the same,
Let
PI and Pz
be these probabilities for the two insects
and let
k and k
be the actual number of survivors in the two incubators
Z
l
(i) Find the joint distribution of (K , K )
I
Z
(ii) Find the conditional distribution of k , given k =k +K Z'
l
l
(iii)
H : PI =Pz against
O
would
be
easier
to
construct the test if
It
>
P
'
PI
"l :
Z
we use the distribution defined in (ii) . How in this
case "0 is expressed? Construct a test of size
,
We wish to test the hypothesis
a(O<a<l),
given
k ,
l
4It
93
•
Question 4
Suppose that a random variable
factor
x
over the interval
Y
is related to a non-stochastic
-1 $ x $1
by the relationship
I'lx =E(Ylx) =6 0 +6 x +6 2 x
1
Let
(x.,Y.l,
N pairs of observations
o
which
>J k =N
Var(Y.) =0-«
1
i = I, ... , N,
be available, for
Cov (Y ., Y.) = 0, Vii' j = 1, ... ,N,
1
J
00),
1
1
2
-l,N
k
Li=lx , k = I, 2,3.
i
>J . = >J = o.
3
l
Assume further that
desired to ,1/J!)('Ox1Jnate the function
I'lx
over the interval
and let
It is
-1 $ x $1
by a linear function
where
and
.
•
o)
(a)
Express
(b)
Express V(b O) and V(b ) in terms of
l
(c)
Show that
E(b
and E (b )
l
o
Cov(b ' b l ) =
O
(d)
Show in general
J =
in terms of
(since
S's and >J"s .
2
a , N, 6's and >J's
U = 0) .
I
that the integmted mean square e1TOl'
J\[(~ x -I'l x )2)dx
can be written as the sum of the
-1
V and the integrated squared bias
integrated vapiance
I
where
(e)
V =J
.
-1
Var(n )dx
B=
I
2
[E(n) - I'l ) dx .
J-1
x
x
Show that, for the particular situation described earlier,
is minimized when
•
x
and
B,
\.1
2
=1.
V
94
•
Question 5
Let
Xl" ",X n be
n
independent and identically distributed
random variables with the probability density function (pdf)
and distribution function
F(x).
Let
Zl 5"·5 Zn
be the order statistics
corresponding to
(al
Xl'" .,X n ·
ror an arbitrary k(l 5k 5n -1),
f(x)
of
(b)
write down the joint pdf
Zl"" ,Zk+l
Hence or otherwise, obtain the conditional pdf of
Zk+l
given
Zl' ... , Zk .
(c 1 Show that (b) depends on
(d)
Define a Markov process.
(e)
Define a /·larkov chain.
(£)
Is the process
Markov chain?
Zl"",Zk
{Zl"" ,Zn}
only through
a Markov process?
Is it a
Does it have independent increments?
have homogeneous (stationary) increments.
Zk'
Does it
•
•
,.
95
•
BASIC DOCTORAL WRITTEN EXAMINATION IN BIOSTATISTICS
PART I
(9:30,a.m. - 1:30 p.m., Saturday, August 15, 1981)
Q.l.
Estimates of differences are a COmmon part of analysis from survey data. Suppose
we wish to do an initial sample survey in a population with N elements at a point
in time, T , and then a second survey in the same population of N elements at a
later point in time, T2 . Population meana for T and T are Yl and Y , respectively.
2
1
2
Our objective is to esfimate the difference,
Suppose furthermore that the samples for the two surveys are both of size n are
selected by the following process:
•
-i)
An
SRS sample of size n is selected at T •
I
The SRS sample at T2 consists of 1111 "old" elements (OSlISI) from the
sample at TI plus (1-lI)n "new" elements that are independently selected
by SRS from the T population elements which did not fall in the T sample.
2
I
We assume that n is small relative to N so that finite population corrections
can be ignored.
11)
(a)
Show that the variance of our estimator of 6,
is
Var(d) • var(y ) + var(y ) - 211 Cov(Y I , Y2)
2
I
where Yij is the j-th sample observation at Ti and
•
96
(b)
If element variances at T and T2 are equal. i.e .•
1
52 = 52 = 52
1
2
then show that
Var(d)
•
= 25 2 (1-nP12)
where
(c)
Discuss the relative statistical precision of d in the
following situations:
i
i)
Situation 1:
-1<P
ii)
Situation 2:
P
iii)
Situation 3:
O<P
iv)
Situation 4:
O<P
l2
12
<O and o<n<1
=0
12
12
and
•.
n = 0
<1 and o<n<1
'
<1 and n = 1
Q.2. Let X1••••• X be independent and identically distributed random variables from
the Poisson Ristribution with parameter
A. Let Yn = X + •.•+Xn .
.
1
(a) Show that the probability distribution of Y is given by
k-nA /kl. k·O.l ••..
P(Y • k) • (nA) e
n
n
(b)
Calculate the moment generating function of Y :
for real t.
n
(c)
Show that Y is asymptotically normally distributed with mean nA
and variancR nA.
(d)
Show that for m. not necessarily an integer.
[m]
~L
-m
e
k
m /k! • ~
k·O
where [m] stands for the largest integer contained in m.
[Hint:
You may use (c) to prove (d)] .
•
97
•
Q.3.
(a)
(b)
Define sufficient statistics. minimal sufficient statistics and
completeness in the context of estimation theory.
Let X •..•• X be n independent and identically distributed random
varia~les wiPh the probability density function
f (x. ~)
otherwise.
8
2
>
8 ,
1
where ~ = (8 • 8 ) is an unknown parameter (vector). Also. let
1
2
X{l)S"'SX{n) be the ordered variables corresponding to Xl.··· ,X n
(i)
Show that X{l) and X{n) are jointly sufficient for 8 1 and 8 2 ,
(ii)
Assuming completeness and using (i). obtain a minimum variance
unbiased (MVU) estimator of 0 = 8 - 8 and derive the formula for it
2
1
mean square error.
Suppose that it is given that 8
1
of 0 • 8 ?
(iii)
= O.
Then. what is the MVU estimato]
2
•
Q.4.
Let y[ ••••• y be n independent random variables with a common probability
density func~ion (p.d.f.)
fy{Y) = 8exp{-8y).
0 S Y <~.
8 > O.
where 8 is an unknown parameter. Further. let Zl ••..• Zn be an independen'
set of n independent random variables with a common p.d.f.
gz{z)
=c
exp{-cz).
0 S z <
where c is a specified constant.
(a)
(b)
~
c > 0
Let then
Write down the likelihood function of X ••••• X •
n
1
Based on X1 ••••• X • obtain the maximum likelihood estimatOJ
en of
n
e.
~
(c)
•
Obtain the expressions for the bias and variance of
the minimum values of n for which these are finite?
n~{en
e.
n
- 8) has a bias converging to 0 as n+= •
(d)
Show that
(e)
Obtain the Cramer-Rao bound for the variance of an
and hence, comment on the asymptotic efficiency of
~st:
en .
98
Q.5.
In a mathematical modeling approach to representation of biological
phenomena. we often start with a very simple model so as to achieve
mathematical tractability and then build increasingly sophisticated models
to reflect more realism and to obtain a better fit to the empirical process.
In particular. if we consider births in a population, we may start with
the simple Poisson process and then. by generalizing the parameters of the
resulting model. attempt to model more complicated probabilistic phenomena.
For the stochastic process X, we use the distribution
pn (t). - p[X(t) - nl.
•
n-O.l ••••
j.
to represent the number of births in a population in the interval (0. t).
(i)
(ii)
State three assumptions about the above probability distribution which
are necessary and sufficient to insure that the distribution is Poisson.
State these assumptions very carefully being sure to specify appropriate
parameters where needed.
State the Chapman-Kolmogorov equations for continuous-time Markov
chains (discrete state space) and. using the above assumptions, show
that the aforementioned distribution satisfies these assumptions and
that they may be used to imply the following set of differential
equations (or the differential-difference equation) where parameter \
is associated with your aforementioned statement of the Poisson property:
p'(t) - -\p (t) + \p let). n 2
n
~
n
11
for all t 2 0, where \ > 0 . •
a
(iii)
(iv)
•
-APO(t)
Define the probability generating function for the stochastic process •
{X(t): t 20}. Use this generating function and the above equation to
prove that the solution to the system is the Poisson distribution.
Suppose that we generalize this model by allowing the birth rate to
become a function of t. Recompute the above generating function and
derive the new expression for p (t). What is the expected value of
this new distribution?
n
,
•
99
•
BASIC DOCTORAL WRITTEN EXAMINATION IN BIOSTATISTICS
PART I
,
(9:30 a.m, - 1:30 p.m ••
Q,L
xl ••••• x
n
~~nday.
August 16. 1982)
are independent random variables. each having the
uniform distribution with probability density function
~(x) -
'.
otherwise
where 6 .6 2 are unknown parameters.
1
(i) Find a minimal sufficient statistic for (6 .6:) [In this
1
context. define clearly sufficient statistics and minimal
sufficient statistics,]
(ii) Given that 6 >0 and that 6 -k6 • k known. find a minimal
2
2
l
sufficient statistic for 6
(a)
O<k<l;
(b) k-O;
(c)
,
•
k<O.
2
when
100
Q.2.
(a) Let Xl.· ••• Xn be n independent random variables with a
continuous distribution function F. and let X 1 ••••• X
n:
order statistics corresponding to Xl ••••• X •
n
ulation p-quantile of F. so that
F(~
p
)
=p
If
n:n
~p
be the
is the pop-
•
for O<p<l. then show
that
<
n:r -
p{X
~
for every (r.s):
< X
p -
n:s
}
l<r<s<n.
Also. show that if
where Ri is the rank of IXil among Ixll ••••• Ixnl. for i.l ••••• n.
and
~i
is equal to 1 or 0 according as Xi is > or <0. then
P{ 1 ~
T: ~ n(n-l)/2
I HO } = 1
_ 2-n+l
where HO refers to the null hypothesis that F is symmetric about O.
(b) Suppose two independent observations on the random variable
X yield the values 3 and 26.
•
Consider using [3.26] as a confidence
interval for the population median
(~.5)'
What confidence coeffi-
cient can be assigned to this interval. assuming X is normally
distributed?
Without the normality assumption (based on the sign
test. and also baaed on the Wilcoxon signed-rank test)?
(c) A third independent observation on the same random variable
yields the value 4.
With what confidence coefficient can it now
be asserted that the population median is in the interval [3.26].
based on the assumption that X is normally distributed?
Without
the normality assumption (again based on the sign test. and also
on the Wilcoxon signed-rank test)?
EDITORIAL NOTE:
A table of the cumulative distribution function of
Student's t for 1(1)5 degrees of freedom was provided.
•
101
•
Q.3.
A sample consists of n independent observations X .X •.•.• X •
n
l 2
It is known that XI has been chosen randomly from a parent population with probability density function
-.
(Note that ret)
roo
= 10
x
t-l-x
e dx(t>O»
(a) Find E(X~). r>O.
(b) Use the result in part (a) to find expliCit expressions for
_
n
lin
the mean and variance of the geometric mean X u ( IT Xi)
•
g
i=l
(c) Assuming that
is known. show that the likelihood ratio test
e
criterion for testing the null hypothesis that a
i
has a common. but
unspecified. value (say. a) in all n distributions involves comparing
'.
•
1
X with the artihmetic mean X - -
E Xi"
n i-I
g
Q.4,
n
Suppose that in a model for onset of a disease we distinguish- 'two
states:
5
1
-8
vulnerability state
and 5 2-onset of a disease after
reaching state 51"
Suppose that a healthy individual (at state SO) was exposed
to an agent which makes him liable to'enter state 51'
Let Xl>O
denote the time to entry from So to 51' and fl(x l ) for O<xl<a
and fl(x ) - 0 otherwise be the probability density function (PDF)
l
- of Xl'
Let
Xz>O
be the time from incidence of vulnerability to
onset of a disease. and f2Il(X2Ixl) be the conditional PDF of X2
given Xl'
•
Let Z - Xl+Xz denote the total time to onset.
(a) Obtain the expression for the probability pr{xl+Xz~z},
fl(x l )
u
1
T
Xz
are independent with
->.,x
2,>.,>0. x 2 >O (exponential).
0<x1<T (uniform) and f2(~2) - >"e
Consider a special case. where Xl and
10Z
•
(b) Obtain Pr{X +XZ2z}.
l
(c)
Fi"d the PDF fZ(z).
(d) Hence, or
Q.5.
otherwise, find the mean and variance of Z.
Suppose we have a two-stage sampling design of equal-sized clusters
'
in which we select SRS (a of A) clusters in the first stage and
SRS (b of B) elements within each selected cluster.
If the object
is to estimate the population mean per element (Yo)' the true variance of the estimator (Y2cs) is known to be
S2
) s (l-f ) ~ +
var(Y
2cs
a a
where fasa/A, fb-b/B, s2_ 5 2+52 and 52 is the element variance in
a b'
the population.
(a) Ignoring all finite population corrections and allowing that,
222
for all intents and purposes, 5 -Sa+Sb and the intraclass corrla-
.'
2 2
tion coefficient can be expressed as P-l-Sb!S , show that
Var(Y2 cs )
S2
= ab
[1
+ p(b-l)]
(1)
(b) Explain what is meant by "the sampling distribution of Y
"
2cs
and how does it relate to the result of Equation (1)1
(c) Define a "design effect."
What part of Equation (1) represents
the design effect for Var(Y2cs)1
(d) Derive the formula you would use to determine the sample size,
n=ab, required to yield some prespecified coefficient of variation
of yZ cs ,given that reasonably good prior guesses about the sizes
of S2, Yo' and the design effect are available.
,
~
103
•
BASIC DOCTORAL WRITTES EXAMINATION IN BIOSTATISTICS
'.
PART I
(9:30 a.m. - 1:30 p.m., Monday, August 8, 19B3)
INSTRUCTIONS
a) This is a aLosed-book examination.
b) The time limit is four hours.
••
c) Answer any four (but onLy four) of the five questions which follow .
d) Put the answers to different questions on separate sets of papers.
e) Put your code letter, not your name, on each page.
f) Return the examination with a signed statement of honor pledge
on a page separate from your answers.
g) You are required to answer onLy what is asked in the questions
and not aU you know about the topics .
•
104
Q.l
Suppose you aTe designing a two-stage clusteT sample of households
fOT a health sUTvey being planned in a laTge city.
~
You pTopose to obtain
peTsonal inteTviews fTom 1000 households selected fTom a sample of SO pTimaTY sampling units, which fOT this sUTvey would be aTea segments consisting
of a small numbeT of neighboTing blocks.
.'
The pTincipal goal of the sUTvey
is to estimate the pTopoTtion (P) of households in the city with health
insuTance.
Your best guess is that P is likely to be about 0.20 in the
population and that the design effect (Deff) fOT YOUT pToposed sample will
be about 1. 50.
a) DescTibe in wOTds how you would select the aTea segments and
households within segments to pToduce an epsem sample of 1000
households.
b)
An
appTopTiate estimatoT fOT P in the Tesulting sample would be
the pTopoTtion (p) of sample households with health insurance.
~
IgnoTing the finite population correction, how laTge would you
expect the coefficient of vaTiation to be fOT p as obtained in
the pToposed survey?
Hint:
Recall that the true variance fOT an estimated mean
fTom a simple Tandom sample of size n is
wheTe f is the sampling Tate and 52 is the vaTiance
among
eleme~ts
in the population.
c) Assuming that the anticipated design effect of 1.50 is correct,
what would be yOUT estimate of intTaclass cOTTelation fOT the
estimate of P in yOUT design?
~
105
•
Q.2
a) Find a probability distribution {r:
n
n = O,l, ... } which satisfies
the difference equation:
r
for
°
< p < 1.
n+l
pr
+
n
= r
n
What is this distribution?
b) Show that the pgf of {r
n
n
= 0,1, ... },
say R(s), satisfies the
equation:
pR'es) = q[R(s)]2
where q ;: 1 - p.
c) Use part (b) to derive the mean and variance of {r;
n
n
= 0,1, ... }.
d) What is the probability of occurrence at time n given no previous
••
occurrence?
e) Give an empirical example for which one might use this probability
distribution and discuss the implications of the assumptions. inherent
in its definition; .e. g., look at the ratio r
n+
l/r.
n
What is the
mode of the distribution; i.e., at what point is the distribution
at its maximum?
Q.3
Suppose that the number of new cases Xi of lung cancer developing in
region i (i
= 1,2)
over a certain specified period of time is to modeled by
the Poison distribution with parameter M.A., where M. is the total number
1 1
of person-years of
reside~:e
1
for all individuals living in the i-th region
of various times during the specified time period, and where Ai is the rate
•
of lung cancer development per person-year of residence during that time
period .
Theoretically, M.1 represents the sum over all individuals residing
in region i (during the specified time
residence for each individual.
period) of the number of years of
106
~
a) Treating M and M2 as fixed constants, show that the likelihood
l
ratio test statistic for testing Ho : Al = A2 versus HI: Al # A2
can be written in the form
M +M_
-Xl
[( 1 ""2)y ]
M
1
l
where Yl = Xl /(X l +X2).
random variables.
You can assume that Xl and
~
are independent
b) Under HO: Al = A2 , and conditional on (Xl+~) = k, a fixed positive
integer, show that the rejection region for the likelihood ratio
test statistic developed in part (a) can be based on the binomial
distribution with "sample size" k and parameter p = M/EM
[HINT:
Find the distribution of
~
I
+~)
given (X l +X2 ) = k under
HO: Al = A2 (=A, say).].
c) Suppose thatMi has a gamma distribution with parameters a. and 8.
~.
:
Assuming that the conditional distribution of Xi given Mi = mi is
Poisson with parameter miA i , find the unconditional distribution of
Xi'
When
e is
a positive integer, show that this distribution is
negative binomial.
Q.4
Let (Xl,Y , ... , (Xn,Y ) be n independent (bivariate) random vectors
l
n
with a continuous distribution F(x,y), and let Fl (x) = F(x,~) and F2 (y) =
F(~,y), (x,y) ~. E2 • Let Ri (Qi) be the rank of Xi (Y i ) among Xl"" 'Xn
(Yl, ... ,Yn ), for i = l, ... ,n.
Consider a statistic of the form
n
I:
i=l
where a (1) < .•• < a (n) and bn(l)
n
-
-
n
the null hypothesis H that F(x,y)
O
tically independent. Further let
an (ri)bn(Qi)
~
'"
~
bn(n) are given scores.
= Fl (x)F2 (y)
•
Consider
i.e., X and Yare stochas-
107
•
L~
n
=
1:
a= 1
an (a)bn (Sa ),
where (Sl"" ,Sn? takes on each permutation of (1, ... ,n) with the common
probability (n!)-l.
a) Show that under HO' Ln and L~ both have the same distribution
(which does not depend on F l and F2 ).
b) Using (a) [or otherwise], prove that
na b
n n
where
2
An
'.
1
n
= ---1
n- 1:.
~=
.
-
l[an (~)-an ]
2
and bn and Bn2 are defined similarly.
c) Show that the distribution of L (under H ) is symmetric if either
n
O
an(i) + an(n-i+l) = constant, for all i C~n) or bnCi) + bn(n-i+l) =
constant, for all i
(~n).
d) Consider the Spearman rank correlation and cOllllllent on (b) and Cc)
for this statistic.
Q.5
a) State the components of variance model for the one-way layout consisting.of k samples of n observations each,
a~ and
•
Use the notations
a; for the between-sample and within-sample components of
variance, respectively.
b) Using the notations SSB and SSW for the between-sample and withinsample sums of squares, respectively, set up and fil·l out an analysis
of variance table for this one-way layout, with rows "between" and
"within" and columns HOP', "55". "MS", "EMS", and "VR".
108
·)
C
Let 6 = 0B2; OW'
2
Give formulas for the point estimator of 6 and for
a two-sided 100(1-a)% confidence interval on 6.
be negative.
•
How do your formulas take this fact into account?
d) State how to test the hypothesis that 6
alternative· 6
Note that 6 cannot
= 60
against the one-sided
,
6 , What is the power of the test against the
0
specific alternative 6 = 6 > 6 ?
1
0
e) Suppose k
and SSW
>
= 3,
= 1800.
= 6; the 3 sample means
a = .10, 6 = 1, 6 = 2,
0
1
n
are xl
= 5,
X2 = 6,
x3
= 7;
WRite down the analysis
of variance table for these data; calculate the point of estimate
and the confidence interval of Cc); perform the test of Cd) and
evaluate its power,
NOTE:
If X is distributed as F with 2 and m degrees of freedom,
then
p{X>d =
.'
•
109
•
8ASIC'OOCTORAL IVRITTEN EXAMINATION IN BIOSTATISTICS
PART I
(9:30 a.m. - 1:30 p.m., Monday, July 30, 1984)
INSTRUCTIONS
a) This is a alosed-book examination.
b) The time limit is four hours.
•
c) Answer any four (but only four) of the five questions
which follow.
d) Put the answers to different questions on separate sets
of papers.
e) Put your code letter, not your name, on each page.
f) Return the examination with a signed statement of honor
pledge On a page separate from your answers.
g) You are required to answer only what i8 asked in the
questions and not all yow know about the topics.
•
110
Q1.
Suppose that
n
integers are drawn at random and with repZace-
ment from the integers
1,2, ••. ,N.
liN
integer has probability
1,2, ...', N,
That is, each sampled
•
of taking on any of the values
and the sampled values are independent observations.
Further, assume that
N is an unknown parameter to be estimated.
a) If
Y ,Y , ... 'Y
represent the random sample of n observan
1 2
tions, show that the estimator of N obtained by the method
of moments is
A
N
1
= (2Y-1)
where
b) Show that
A
1
2
var (N ) = 3n (N -1)
1
c) Show that the maximum likelihood estimator of
N is
•
A
d) Find an
e~t
expression for
pr(N
2
• k).
e) Assuming that
for large
and that
of
N,
(n:l)N
2
N
is approximately an unbiased estimator
show that
for large
~
when
n > 1.
•
111
•
QZ.
Suppose that a random sample of length-of-life measurements
Y,_ •Y " ", Yn
Z
is to be taken on components whose leng'th of
life has an exponential distribution with mean
•
e.
It is of
interest to estimate unbiasedly the parameter
the reliability at time
t
of a component of this type.
a) Find the maximum likelihood estimator of
e- t/e •
this estimator is not an unbiased estimator of
Note that
e-
t/e
.
b) Cse the Rao-Blaakwell Theorem to show that
•
n
where
I: Y , is an unbiased estimator of
U =
i-I
[HINt:
Let
i
V-
{l.
O.
Ft(t).
Y1 > t
;
then. find
E(Vlu=u»).
otherwise
A
c) Show that
Ft(t)
found in part (b) is the minimum variance
unbiased estimator of
•
e-
t/e
•
112
Q3.
Let
Xl •.•. ,Xn
be
n
tinuous distributions
Let
"R
i
independent random variables with con-
F , ... ,F ,
n
l
all defined on
be the rank of
and for" each
(scores).
n,
let
(-~,~).
for
a (1), ... ,a(n)
n
n
i = l , ... ,n
be a set of real numbers
Consider the statistic
where
are known constants, not all equal, and
•
where
a
n
•
= n-1 (an (l)+••• +an (n».
b) Show that the
Wilcoxon-~~nn-Whitney two-sample
statistic is a particular case of
rank sum
L •
n
c) Show that the Brown-Mood two-sample median test statistic
is a particular case of
Ln.
d) For the case (c) derive the exact probability law of
under
when
and
L ,
n
are the individual
sample sizes.
e) What can you say about the distribution of
when
n
L
n
(under
is small and what large sample approximation is
available?
•
113
•
Q4.
Suppose that stratified simple random sampling is the design
of choice for a survey you're helping to design and that the
survey's unit costs will be the same among all strata (i.e.,
J h - J *.•
h
=
1.2 ..... H).
a) Assuming the conditions stated above. show that the optimum
stratum sample size will be
~Pt = n
where
n
is the total sample size.
W
h
of the population in the h-th stratum.
is the proportion
Sh
is the stratum
element standard deviation. and "optimum" here refers to
•
minimizing the true variance of an estimated population
mean given prespecified total 'survey cost following the
model.
Total Cost - J - J o + nJ- •
whereJ~
J- are fixed costs not dependent on the sample size.
b) Assuming the eLement coefficient of variation for each
stratum is the same. show that
the stratum total and
~Pt. KY • where Yh
h
is
K is a constant of proportionality.
c) What is the name given to the design which results when
stratum unit costs and element variances are equal among
,
strata (i.e •• when
•
J
Is this design epsem?
h
• J*
and
Explain.
S~. s*2.
h· 1.2 •...• H).
114
Q5.
Suppose that
iXl(t), t~O}
Poisson process
;~th
rates
{NZ(t), t~O}
and
\
and
e
-" 1 t
are independent
"Z:
("1 t)
•
n
n!
a) Show that
"1
+
{N (t) +NZ(t),
l
t~O}
is Poisson with rate
"Z·
b) Show that the probability that the first event of the combined process comes from
{Nl(t), t~O}
is
Al/("l+"Z)
independently of the time of the event.
to < t l < ••• < t n • 1- Find the conditional
distribution of the number of arrivals for Nl(t) + NZ(t)
c) Fix
o
D
in the disjoint time intervals
given a total of
X
arrivals in
.'
(to,tll,(tl'tzl ••.• ,(tn_l,tnl
[to,tnl.
•
115
•
BASIC DOCTORAL WRITTEN EXAMINATION IN BIOSTATISTICS
PART I
,
(10:00 a.m.- 2:00 p.m., Tuesday, July 23, 1985)
INSTRUCTIONS
,
•
•
a)
This is a closed-book examination.
b)
The time limit is four hours.
c)
Answer any four (but only four). of the five questions
which follow.
d)
Put the answers to different questions on separate sets
of papers.
e)
Put your code letter, not your name, on each page.
f)
Return the examination with a signed statement of honor
pledge on a page separate from your answers.
g)
You are required to answer only what is asked in the
questions and not all you know about the topics.
116
Ql.
The normally distributed random variables Xl' X2 ' ...• X are said
n
to be serially correlated or to follow an autoregressive model if
X.1 = ex.1- 1
where Xo
=0
~
i=l,2, ... , n ,
+ E.
1
2
and El , E2 , ...• En are independent N(O, ( ) random
variables.
a) Find corr(X , X ).
l
2
b) Find f X (x2lXl = xl)' the conditional distribution of X given
2
2
Xl = xl'
d) Prove that the joint distribution of Xl' X2 ' ...• Xn is
fX X
X (xl' x 2 •• ... x ) =
l' 2· .. ·' n
n
-~
2 -n/2
(2110 )
exp
[
•
2 n
2}
-(l/2a) I (x. - ex. -1)
,
i=l 1
1
o - o.
< Xi < += , i = 1, 2, ... , n, X
e) Using one observation
= (xl'
~
x2 ••. ·, xn ) from the joint density
given in part (d), show that the likelihood ratio test of
H : e = 0 versus H :
A
O
of the statistic
e;
0 can' be expressed directly as a function
(~ x. lX' }/(ntX7} ( ~ X7}
. 2
1:1
1-
1
.
1.=
1
1
. 1 1
1=
•
•
117
•
Q2.
Suppose you are asked to design the sample for a survey of patients
seen during 1934 in a large hospital.
will be to estimate D
The main purpose of the survey
= PM-P F , where PM and PF are, respectively,
the proportions of all male and female patients without health
insurance.
You propose to select a stratified simple random sample
(selected without replacement) of size n from among the 5DOO patients
in the survey population.
Stratification will be by sex, with each
stratum containing half of the population.
a) Suppose (1) that 0
= 0, and
(2) that total survey cost can be
expressed as
e
-.
= eM~
+
eFn F '
where the average per-unit costs for males (eM) and females (e )
F
are equal, and
~
sample sizes.
Show that the optimum allocation of the n patients
and
~
are, respectively, the male and female
among strata is proportionate allocation.
b) Suppose that, in fact, PM
= 0.40
and P
F
= 0.2D;
how large a sample
from your proportionate stratified sample would be needed to
yield a 10 percent coefficient of variation for the unbiased
estimator, d •
PM - PF'
of 0, where PM and PF are, respectively,
the sample proportions of male and female patients without
insurance?
c) If an unstratified simple random sample has been proposed instead
of the stratified simple random sample, show that the expected
•
size of n (the number of females in the sample) would have been
F
2S00(n/N) .
118
Q3.
•
a) Define the following:
i)
Markov chain;
ii)
homogeneous Markov chain;
iii)'persistent state, transient state.
b) Let Xl' X ,· .. be independent integer-valued random variables
2
with density f. Let YO be an integer-valued random variable
that is independent of the Xi'sJand set
'"
+
Xn , n> 1.
Show that the sequence {y , n > O} is a Markov chain whose state
n
-
space is the set of all integers and whose transition function
P(x,y) [= P(Y l = y\Y o = x)] is given by
P(x,y)
c)
= fey
- x).
Let {y , n > O} be a Markov chain.
n
-
Given the n-step transition
•
probabil ities
p..
(n)
1J
= P(Ym+n = jlYm = i),
write down the Chapman-Kolmogorov equations.
•
119
•
Q4.
.
Let Xl' X,_ •...• Xn be random variables such that E(X.)
= U, V(X.)
= '."' z
1
1
carr (X.. X.) = p for ever)' i F j. i = l. Z..... nand j = 1. Z•...•' n.
1
)
Define
n
L =
a,X. and LZ
1
i=l 1 1
I
'.
n
= I
i=l
b. X. •
1 1
where the a, 's and b. 's are constants.
1
1
a) Find a general expression for corr(Ll' Lz ) as a function of the
a 's b. 's and p.
i
'
1
b) Using the result derived in part (a), or otherwise, find
'.
•
c) When LI = Xl and LZ = (X Z - Xl)' and when E(XzIX I = xl) = a + SX 1 '
express corr(L , L ) as a function of S', What is COrT(L l , LZ)
Z
I
when S
= O?
120
QS.
•
Consider the density function
•
(, (x,
0
_
") - c
1/ S
6
-1
x
- [1 +
~] ,
x > c, 6 > 0 ,
C
> O.
Here, S'is an unkno"TI parameter and c is a known constant.
r
a) For r > 0, find E(X ). For what set of values of r does E(X r )
exist'?
b) For e < 1. if Xl' X2 •...• Xn constitute a random sample of
size n from fx(x; e), show that
, where
1 n
X = n - L x. ,
i=l
is an unbiased estimator of I';
~
c) Find V(I';).
= e/(l
1
- e).
~
Under what conditions on e does V(I';) exist?
d) Find an explicit expression (as a function of 1';) for the
Cramer-Rao lower bound for the variance of any unbiased
estimator of 1';.
•
~
Is I'; the minimum variance bound unbiased
estimator of I';?
•
~
•
121
•
•
BASIC DOCTORAL WRITTEN EXAMINATION IN BIOSTATISTICS
PART I
(9:00 a.m. - 1:00 p.m., Tuesday, July 29, 1986)
INSTRUCTIONS
•
•
a)
This is a closed-book examination.
b)
The time limit is four hours.
c)
Answer any four (but only four) of the five questions
which follow.
d)
Put the answers to different questions on separate sets
of papers.
e)
Put your code letter.
f)
Return the examination with a signed statement of honor
pledge on a page separate from your answers.
g)
You are required to answer only what is asked in the
questions and not all you know about the topics.
~
your name, on each page.
122
Ql.
Suppose that we have available n sets of observations of the form
,
(Y.,
, Xl"
•
,
x ,), i = 1, 2, ... , n, gathered under the following
2
conditions:
",
the x values, namely {xl,}n
1 and {X2'}~ l' are fixed,
~
1=
1 1=
i)
known constants;
ii)
for each i,
is a normally distributed random variable
Y,
1
with mean
=n
where Xl
-1 n
LX,
i=l 11
and with variance
V( Y ,
IXl ' • x
=
' )
111
2
iii)
a2 ;
Yl ' Y2 ' ..•• Yn constitute a set of mutually independent
•
random variables.
Define
S2
-1 n
- 2
=
(n-l)
,Ll(X l , - xl)
•
Xl
1=
1
(n-l)
-1 n
,L (X '
1= l
2 1.
and
n
,L (Xl'1. - Xl)
1=1
(x
, 2 1.
x2 ) =
(n-l)rS
S
Xl x 2
a) Assuming that the regression model in (ii) above is to be fit by
the method of least squares, show that the ordinary least squares
a2
estimator
V(a )
2
of
a2
has variance
a
=
(n-llS
2
22'
(l-r l
2
x
b) Now, suppose that we have available an extraneous unbiased estimator
b
1
of
a
1
(i.e., an independent estimator of
a
not based on the n .
1-
"
123
•
observations currently under consideration) .
To estimate 8 , consider the regression of [Yo - b (x . 2
,
1 l'
x ).
2
n
In particular, show that the estimator
_
i~l (x 2i - x 2 ) [Y i - \
.
2
(n-l) 5
x
2
is an unbiased estimator of 8
2
V {b '
. 2'
=
.
222
+ (n-l) 5
0b r
(xli - Xl)
2
I
with variance
°
xl
1
-------=--.::....--=--52
(n-l)
x
2
c) using the results in parts (a) and (b), show that thp
AX~raneous
information about 8
over just the use of 8
2
1
""P
of rhp
(namely, b ) increases precision
l
when estimating 8
2
if and only if
2
2 2
(n-l) (l-r)S
xl
Q2.
Let Xl'.'"
< °
X be a random sample from a Poisson distribution with
n
parameter A, i.e.,
pr{x
= k} = e -,l. ,l. k /k!,
Define w = pr{x
~
k
l} = (l+,l.)e
= 0,
-,l.
1,
,,l. > O.
n
a) Show that 1=
.Llx.1. (=T, say) is a complete sufficient statistic for w.
b) Show that I[x.<lJ is an unbiased estimator of w.
,-
(I A stands for
the indicator function of the set A) .
c) Obtain the minimum variance unbiased estimator of w.
d) Show that thp. statistic obtained in (c) is a function of T, and
•
is indeed an unbiased estimator of
~
•
124
Q3.
~
consider the situation where our goal is to estimate the proportion (P)
with some attribute in a population of N elements that has been divided
into H strata, with the proportion (Ph) of the N elements in the h-th
h
stra-om possess j "9 the
0
8 pts.
-same a1:"t:ribJtte_
a) suppose that a (without replacement) simple random sample of n
h
elements is chosen independently in each stratum, and that the
H
= ~WhPh'
estimator for P is p
"h
where Ph is the ~roportion of the
e1.......nts with the attribute and W
h
~
'\tIN.
Detive the true
variance of p.
[Hint:
Recall that the true variance o£ the stratum sample mean
(Yh) in this setting would be
(l-f )
h s2
var(y ) =
=
h
h
Nh
- 2
(l-f 1 1: '{hi - '{h)
i
h
~
(N -1)
~
h
where '{hi is the i-th observation in the h-th stratum, '{h is the
~
overall stratum mean, and f = nh/Nh. 1
h
5 pts.
•
b) If proportionate stratified simple random sampling is used in Part
(a), show that
N
h
~Wh (Nh-l) Ph (l-P h ) ,
H
(l-f)
n
Var (p) =
H
where n
8 pts.
= ~nh
and f
= n/N.
c) If Var(p ) represents the true variance of the proportion (p )
o
0
of a simple random sample of size n who possess the attribute,
show that
Var(p ) - Var(pl
o
[!!!.!:!:
=
(l-f)
2
Partition S
2
(Ph - P) .
n
HN
= EE h ('{h'
-
Y)
2
I(N-l) into between and within
hi
1
stratum components, and note that N(Nh-l)/Nh(N-l) " 1.)
•
125
•
d) Show how the result from part (c) supports the general notion that
stratified sampling (as opposed to no stratification in sampling)
is most effective when internally homogeneous strata can be formed
prior to selection.
Q4.
Suppose that X is a random variable with density function f(x), and
with cumulative distribution function (cdf) F(z)
= f=o t0
a) Show that E(X)
f(x)dz dx.
b) Use part (al to show that E(X) =
f:
[1 -
= Pr(X
< z).
F(z) J dz.
Suppose that X is the failure time of a certain piece of equipment.
replacement policy might be to replace the piece either upon failure
or upon reaching age c, whichever event happens first.
Then, X , the
c
time of replacement, is expressible as
X if X <
Xc
= min(c,X)
c) Express Pr(X
c
c,
={c i f X > c
> z)
in terms of the cdf of X.
d) Use partn (b) and (c) to express E(X ) in terms of the cdf of X.
c
e) Evaluate E(X ) for the special case when X is exponentially
c
-ex
distributed, i.e., when f(x) = ee
.
•
A
126
Q5.
•
Let {X .. }, i=l, 2, 3, and j=l, 2, ... , n, be a set of independent
1)
normal random variables with
EIX .. J
= ~.
1)
1
2
and VarIX .. J = a .
1)
Confidence intervals of the general form
X.1
+
-
AS 1
are calculated for Ili' i=l, 2, 3, where
X.=n
1
-1 n
2
EX .. andS.=ln-l)
j=l 1)
1
-1 n
E(X ..
j=l 1)
Ais chosen(using appropriate percentiles of a Student's t distribution) so
that each confidence interval has exact confidence coefficient (l-a).
For notational simplicity, let Ii denote the set of points in the
i-th interval (Xi + A'\) , i=l, 2, 3 ; further, let the event E .. '- ,
1J.
i t- i' , denote the event
i.e., Eii , is the event that the confidence intervals (Xi + ASiJ and
(X.,
+ AS.,)
do not overlap.
1
1
•
.,
a) Show that
pr(EJ..J..') = pr{!x. ~
x.,1
1
> A(S. + S.,)}.
1
1
b) Under the null hypothesis Ho : ~l = ~2 = ~3' show, for it-i: that
pr(Eu,IHo ) ~ pr{!t 2 (n_l)
where t
is a
v
and pr(t
v
> t
Stud~nt's
v,
1 a)
-2
I
>
t(n_l),l_~IHo}'
t random variable with v degrees of freedom
a
= -2'
c) It is proposed to test the null hypothesis H :
o
versus the alternative hypothesis H :
A
rejecting H
o
~·s
~l
=
~2
=
~3
are not all equal by not
if and only if the three confidence intervals all
."
have at least one point in common (i.e., there is at least one point
which is simultaneously in all three confidence intervals).
USing.
earlier results, show that the significance level for this proposed
statistical testing procedure is at most
2
3a(l-a+~).
127
BASIC DOCTORAL IVRITTE:-i EXA.\II:-iATION I:-i BIOSTATISTICS
•
D~RT
10:00 a.m. - 2:00 p.m .• }:onday. August 3. 1987
0.1.
,
I
Specimens .of a new high-impact plastic are tested by
repeatedly striking them with a hammer until they fracture.
e
If each specimen has a constant probability
of survivip.g
a blow, independently of the number of previous blows
received, the number of blows
X
required to fracture a
specimen will have the geometric distribution
px(x;6)
,..
~
(1-6)6
x-l
• x
~
1.2, ••••
=.
= 200
a) Suppose that the results of tests on n
specimens
were as follows:
Number of blows
x
required: 1
2
3
36
22
TOTAL
Number of specimens
fracturing after x blows:
112
Find the maximum likelihood estimator of
30
200
6 , and
develop an appropriate 95% confidence interval for 6 •
b) Based on an inspection of the data in part (a), a
statistician suggests that, while the geometric
distribution applies to most specimens, a fraction
(1 - >. ) of them have structural flaws and therefore
always fracture on the first blow.
•
Given this
additional conSideration, show that the proportions of
specimens fracturing after one, two,
or more blows are,
respectively,
2
(1->.6). >.6(1-6), >.6 (1-6). and
>.e3.
three, and four
128
x.~
c) I f
(i
specimens are observed in the i-th category
= 1.2.3.4).
n
I. xi = n. use the results in
i=l
to find explicit expressions for the maximum
part (b)
where
likelihood estimators of
Q.2.
A and
•
e.
Consider systematic sampling from a seriallv numbered list of , population
elements in which a sample is chosen by drawing a random integer between 1 and
i
k, say g, and selecting the element with the corresponding serial number and
every k-th element thereafter on the list.
Instead of the simple case where
N is an integral multiple of the sample size (n) for all possible random starts,
suppose that N
= nk
+ r, where r is an integer such that 1
~
r < k, i.e.,
~
divided by the actual sample size is not an integer.
a.
Explicitly describe how this design is equivalent to a one-stage
sample of a single cluster from a set of unequal sized clusters.
b.
Let ygj denote the measurement of interest for the element bearing
the serial number g + (j-l)k in the population; g = 1,2, ••. ,k and
•
j = 1,2, ... ,n , where
g
n
g
n
n
Also let ji
g
if
{ n+l
=
=
.
l~g~r
if r<g,.k
g
L
j-l
Ygj/n
random start is g.
g
denote the sample mean when the sample's
Show that the expected value of
yg
among all
possible samples is
Yg Ik
EeY g ) •
and that
yg
will be an unbiased estimator of the population mean,
k
Y =
ng
L L
g=l j=l
Y • IN,
gJ
if N is an integral multiple of the sample size for all possible
random starts.
•
129
•
,
d.
Derive the true variance of Y~ among all possible samples.
Q .3
a) Suppose that Y1' Y2 ""'Y n are independent Poisson
variabl es wi th respective means
Ill' 1l 2 ,··. ,Il n ' .
k 1 , k 2 , ••• ,k n be known positive constants.
the null hypothesis
".
HO :
where
Let
Consider
Il, = k . e , i=1,2, ... ,n,
~
~
e ( > 0) is an unknown parameter.
likelihood ratio statistic (
-2.en
Show that the
A l for testing H
o
versus the alternative hypothesis H that there are no
A
structural restrictions on the Ili's
takes the form
-Un A • 2
where
b) Suppose that k
and Y3 = 12.
•
1
= k
2
= k
3
= 1,000, Y = 10, Y = 11,
1
2
Test H versus HA using part (al.
o
is the P-va1ue of your test?
What
00 you reject H or not?
o
c) Use the data in part (bl to construct an appropriate
95% confidence interval of e •
130
Q.4.
Data have been gathered from hospital records to study
the effect of treatment of a certain disease.
A Markov
•
model with 4 states is to be used for the study:
o-
state of being under treatment
1 - death after treatment and due to the disease
2 - recovery
3 - state of being lost after recovery or death from
other causes.
DATA:
Following State
o
1
2
3
o
60
12
83
o
2
72
o
77
Previous State
Totals
27
155
176
.~
Assume only one state transition is possible between
observations.
a) What is your estimate of the matrix of transition
probabilities?
b) How would you clarify each of the states?
What is the
period of the chain?
c) What is the probability that a patient who is under
treatment will survive for 4 observation periods or
longer?
d) What is the probabili ty that a pa tient in recovery
will be 'lost' to observation or die of other causes?
e) What is the probabil i ty that a patient in treatment
will eventually die of the disease?
•
131
•
0.5.
Let Xl •••.• Xn denote a random sample from the density
f
e
defined by:
e/(l9+x)2
f (<<)
e
=
0
if
x > 0
if
x
5
0; where 'edl:
a) Computer the Fisher information
In(~)
= {e:
'8 > o}
for this set of
n observations.
It t n is the maximum likelihood
estimator (you don't have to solve the likelihood
equation). find the the asymptotic distribution of
A
,!O(e
n
- e) as ,n ~ ...
b) For 0 < p < 1, show that if Yn
is the sample p-tile.
,p
2
then rr;"(Yn,p - ~p) is asymptotically normal (0; 1~_p)3)
[Hint:
You may use without proof that if
~
p
is the p-
tile of a distribution Fl') with continuous density f('),
then the
s~ple
normal
p-tile Y
is asymptotically
n,p
p(l-p)
f2(~p)n
) •1
c) Find a constant c p such that' {cpY n, p} is a consistent
sequence of estimators of
e,
asymptotic law if I'ii(c Y
p n,p
and determine the
e).
d) Determine the asymptotic relative efficiency of
'{CpYn,p} to
~n}'
What value of p yields the best
efficiency?
•
•
e) Since t
is difficult to compute, give a computable
n
sequence of estimators t
that are asymptotically as
n
A
efficient as tn'
© Copyright 2026 Paperzz