TAMS28 MATEMATISK STATISTIK Provkod: TEN1 English

Kurskod: TAMS28
Provkod: TEN1
MATEMATISK STATISTIK
16 March 2017, 14:00-18:00
Examiner: Zhenxia Liu (Tel: 070 0895208). Please answer in ENGLISH if you can.
a. You are allowed to use a calculator, the formula and table collection edited by MAI.
b. Scores rating: 8-11 points giving rate 3; 11.5-14.5 points giving rate 4; 15-18 points giving rate 5.
English Version
1
(3 points)
Suppose that a random variable X has the probability density function
f (x) =
4 2
− x
3 3
for 0 ≤ x ≤ 1.
(1.1). (1p) Calculate the probability P (X < 21 ).
(1.2). (1p) Calculate the mean E(X).
(1.3). (1p) Calculate the standard deviation D(X).
Solution. (1.1)
Z
1/2
P (X < 1/2) =
0
4 2
( − x)dx = 7/12.
3 3
(1.2)
1
Z
E(X) =
0
4 2
x( − x)dx = 4/9.
3 3
(1.3) The variance is
V (X) = E(X 2 ) − (E(X))2 =
1
Z
0
4 2
x2 ( − x)dx − (4/9)2 = 13/162.
3 3
So the standard deviation is
D(X) =
2
p
p
V (X) = 13/162.
(3 points)
Suppose that the random variables X1 , X2 , . . . , X160 are independent and have the same distribution as follows
Xi
f (x)
1
2
1
2
1
2
(2.1). (1p) Find the mean µ = E(X) and the variance σ 2 = V (X).
(2.2). (2p) Use the central limit theorem to find the probability P (X1 + X2 + . . . + X160 ≤ 260).
Solution. (2.1)
µ = E(X) = 1 ·
1
1
3
+ 2 · = = 1.5.
2
2
2
σ 2 = V (X) = E(X 2 ) − (E(X))2 = 12 ·
Page 1/4
1
1
3
5 9
1
+ 2 2 · − ( )2 = − = .
2
2
2
2 4
4
(2.2)
X1 + . . . + X160
260
≤
) = P (X̄ ≤ 26/16)
160
160
26/16 − µ
26/16 − 1.5
X̄ − µ
√
√
)
) ≈CLT P (N (0, 1) ≤ p
= P( √ ≤
σ/ n
σ/ n
1/4/ 160
P (X1 + . . . + X160 ≤ 260) = P (
= P (N (0, 1) ≤ 3.16) = 0.9992.
3
(3 points)
In an investigation of a new virus suspension, one takes out samples randomly and determines the number of virus
particles in each sample. For 100 samples have been received the following results:
≤3
20
Number of virus particles i
Number of samples Ni
4
23
5
16
≥6
41
With a significance level α = 0.05 test the hypothesis H0 that the number of virus particles in a sample is a Poisson
random variable with mean µ = 5 by applying a χ2 -test.
Solution. Let X = number of virus particles, and we want to test
H0 : X ∼ P o(5)
vs.
H1 : X P o(5)
What we can have is the following:
N1 = 20,
p1 = P (P o(5) ≤ 3) = (from Poisson table) = 0.2650;
N2 = 23,
p2 = P (P o(5) = 4) = (from Poisson table) = 0.1755;
N3 = 16,
p3 = P (P o(5) = 5) = (from Poisson table) = 0.1755;
N4 = 41,
p4 = P (P o(5) ≥ 6) = 1 − P (P o(5) ≤ 5) = 0.384.
Since all of n · pi > 5 (here notice that n = 100), we thus have
TS =
4
X
(Ni − n · pi )2
i=1
n · pi
= 3.6,
C = (χ2α (4 − 1 − 0), +∞) = (χ20.05 (3), +∞) = (7.82, +∞).
Since T S ∈
/ C, we don’t reject H0 .
4
(3 points)
Suppose we have two independent populations X ∼ N (µX , σ) and Y ∼ N (µY , σ). A sample is taken from each
population and the results are:
sample from X :
sample from Y :
58.0
61.4
57.5
65.0
57.8
64.9
59.6
60.8
59.7
61.6
58.3,
63.2,
x̄ = 58.5,
ȳ = 62.8,
sX = 0.9.
sY = 1.8.
(4.1). (1.5p) Construct a 95% confidence interval for σ.
(4.2). (1.5p) Test the hypothesis with a significance level α = 0.05 :
H0 : µX = µY ,
H1 : µX < µY .
Solution. (4.1) Since population standard deviation σ is related to two populations, the confidence interval for σ 2 is
Iσ 2 = (
=(
(n1 + n2 − 2)s2
,
χ2α/2 (n1 + n2 − 2)
10 · 2.025
,
20.5
(n1 + n2 − 2)s2
(6 + 6 − 2) · 2.025
)=( 2
,
2
χ1−α/2 (n1 + n2 − 2)
χ0.025 (6 + 6 − 2)
10 · 2.025
),
3.24
Page 2/4
(6 + 6 − 2) · 2.025
)
χ20.975 (6 + 6 − 2)
where s2 =
(n1 −1)s21 +(n2 −1)s22
n1 +n2 −2
= 2.025. Therefore the confidence interval for σ is
r
Iσ = (
10 · 2.025
,
20.5
r
10 · 2.025
) = (0.99,
3.24
2.50).
(4.2) Since two population standard deviations are the same (which is σ), we use the case 1.3’ in CI-1’, therefore the
test statistic is
(58.5 − 62.8) − 0
(x̄ − ȳ) − 0
q
q
=√
= −5.23,
TS =
s · n11 + n12
2.025 · 16 + 61
and the rejection region is
C = (−∞,
−tα (n1 + n2 − 2)) = (−∞,
−t0.05 (10)) = (−∞,
−1.81).
Since T S ∈ C, we reject H0 . Namely, we believe µX < µY .
5
(3 points)
Suppose that a population has the probability density function
f (x) = θ · e−θx ,
if 0 < x < ∞,
where θ > 0 is an unknown parameter. A sample {x1 , x2 , . . . , xn } from this population is given.
(5.1). (1.5p) Find a point estimate θ̂M M of θ using Method of Moments.
(5.2). (1.5p) Find a point estimate θ̂M L of θ using Maximum-Likelihood method.
R1
Solution. (5.1) The first equation from the Method of Moments is E(X) = x̄. Since E(X) = 0 xf (x)dx = θ1 (this can
be even seen directly by recognizing that f (x) is actually the pdf for an exponential random variable), we have
1
1
= x̄, thus θ̂M M = .
θ
x̄
(5.2) The likelihood function is
L(θ) = f (x1 )f (x2 ) . . . f (xn ) = θ · e−θx1 · θ · e−θx2 . . . θ · e−θxn = θn · e−θ
Pn
i=1
xi
.
Maximizing L(θ) is equivalent to maximize ln L(θ) where
ln L(θ) = n ln(θ) − θ
n
X
xi .
i=1
By taking
d ln L(θ)
dθ
= 0, we get
n
θ
−
Pn
i=1
xi = 0. Therefore
n
θ̂M L = Pn
i=1
6
xi
=
1
.
x̄
(3 points)
In a scientific paper measurements of the thermal conductivity of polymer melts under “Short-hot-wire” method were
reported. The measurements are thermal conductivity y and temperature x (unit: 1000 o C), and data are analyzed
according to the models:
Model 1: Y = β00 + β10 x + ε0 , ε0 ∼ N (0, σ 0 )
Model 2: Y = β0 + β1 x + β2 x2 + ε, ε ∼ N (0, σ)
Modell 1. Regression Analysis: y versus x
Page 3/4
The regression equation is
y = 0.254 - 0.0451 x
Predictor
Constant
x
Coef
0.253770
-0.04510
S = 0.0108651
SE Coef
0.006334
0.03847
R-Sq = 8.9%
T
40.07
-1.17
P
0.000
0.261
R-Sq(adj) = 2.4%
Analysis of Variance
Source
Regression
Residual Error
Total
DF
1
14
15
SS
0.0001622
0.0016527
0.0018149
MS
0.0001622
0.0001180
F
1.37
P
0.261
Modell 2. Regression Analysis: y versus x, x^2
The regression equation is
y = 0.221 + 0.553 x - 2.08 x^2
Predictor
Constant
x
x^2
S = ???
Coef
0.221269
0.55278
-2.0814
SE Coef
0.003085
0.04768
0.1617
R-Sq = 93.4%
T
71.72
11.59
-12.87
P
0.000
0.000
0.000
R-Sq(adj) = 92.4%
Analysis of Variance
Source
Regression
Residual Error
Total
Source
x
x^2
DF
1
1
DF
2
13
15
SS
0.00169471
0.00012023
0.00181494
MS
0.00084736
0.00000925
F
91.62
P
0.000
Seq SS
0.00016224
0.00153247
(6.1). (1p) How does the analysis indicate that Model 1 works very poorly? Explain your answer using an appropriate
numerical value from the analysis.
(6.2). (1p) Is the term x2 useful as an explanatory variable i Modell 2? Explain your answer using an appropriate 95%
confidence interval or test.
(6.3). (1p) Estimate σ in Model 2.
Solution. (6.1). We see that in Model 1 the R-Sq = 8.9% which is too low. R-Sq describes the proportion of variation
due to x. A low R-Sq means that x explains y little. So Model 1 works very poorly.
(6.2). Yes, it is. We can see this by constructing a (two-sided) 95% confidence interval of the coefficient β2 of x2 , which is
βˆ2 ± t0.025 (16 − 2 − 1) · se(β2 ) = −2.0814 ± 2.16 · 0.1617 = −2.0814 ± 0.346 = (−2.431,
−1.732).
Since 0 is not in
interval, we p
believe β2 6= 0. Therefore x2 is useful as an explanatory variable.
√ this confidence
p
2
(6.3). σ ≈ s = s = SSE /(n − k − 1) = 0.00012023/13 = 0.00304108.
Page 4/4
Kurskod: TAMS28
Provkod: TEN1
MATEMATISK STATISTIK
16 mars 2017, kl. 14-18
Examinator: Zhenxia Liu (Tel: 070 0895208). Vänligen svara på ENGELSKA om du kan.
a. Tillåtna hjälpmedel är en räknare, formel -och tabellsamling utgiven av MAI.
b. Betygsgränser: 8-11 poäng ger betyg 3; 11.5-14.5 poäng ger betyg 4; 15-18 poäng ger betyg 5.
Svensk version
1
(3 poäng)
Anta att en stokastisk variabel X har täthetsfunktionen
f (x) =
4 2
− x
3 3
för 0 ≤ x ≤ 1.
(1.1). (1p) Beräkna sannolikheten P (X < 21 ).
(1.2). (1p) Beräkna väntevärdet E(X).
(1.3). (1p) Beräkna standardavvikelsen D(X).
2
(3 poäng)
Anta att de stokastiska variablerna X1 , X2 , . . . , X160 är oberoende och har samma fördelning som följer
Xi
f (x)
1
2
1
2
1
2
(2.1). (1p) Beräkna väntevärdet µ = E(X) och variansen σ 2 = V (X).
(2.2). (2p) Använd centrala gränsvärdessatsen för att hitta sannolikheten P (X1 + X2 + . . . + X160 ≤ 260).
3
(3 poäng)
Vid undersökning av en ny virussuspension tar man ut prover slumpmässigt och bestämmer antalet viruspartiklar i
varje prov. För 100 prover har man fått följande resultat:
Antal viruspart i
Antal prover Ni
≤3
20
4
23
5
16
≥6
41
Pröva på nivån α = 0.05 hypotesen H0 att antalet viruspartiklar i ett slumpmässigt prov är en Poisson stokastisk
variabel med väntevärde µ = 5 med hjälp av ett χ2 -test.
4
(3 poäng)
Anta att vi har två oberoende populationer X ∼ N (µX , σ) och Y ∼ N (µY , σ). Ett stickprov tas från varje population
och resultaten är:
stickprov från X :
stickprov från Y :
58.0
61.4
57.5
65.0
57.8
64.9
59.6
60.8
59.7
61.6
58.3,
63.2,
(4.1). (1.5p) Konstruera ett 95% konfidensintervall för σ.
(4.2). (1.5p) Pröva hypotesen på nivån α = 0.05 :
H0 : µX = µY ,
Page 1/2
H1 : µX < µY .
x̄ = 58.5,
ȳ = 62.8,
sX = 0.9.
sY = 1.8.
5
(3 poäng)
Anta att en population har täthetsfunktionen
f (x) = θ · e−θx ,
om 0 < x < ∞,
där θ > 0 är en okänd parameter. {x1 , x2 , . . . , xn } är ett stickprov från populationen.
(5.1). (1.5p) Hitta en punktskattning θ̂M M av θ genom att använda momentmetoden.
(5.2). (1.5p) Hitta en punktskattning θ̂M L av θ genom att använda Maximum-Likelihood-metoden.
6
(3 poäng)
I en vetenskaplig artikel redovisas mätresultat på värmeledningsförmågan för polymersmältor enligt
“short-hot-wire”-metoden. Man har fått värden på värmeledning y och temperatur x (enhet: 1000 o C), och data har
analyserats enligt modellerna:
Modell 1: Y = β00 + β10 x + ε0 , ε0 ∼ N (0, σ 0 )
Modell 2: Y = β0 + β1 x + β2 x2 + ε, ε0 ∼ N (0, σ)
Modell 1. Regression Analysis: y versus x
The regression equation is
y = 0.254 - 0.0451 x
Predictor
Constant
x
Coef
0.253770
-0.04510
S = 0.0108651
SE Coef
0.006334
0.03847
R-Sq = 8.9%
T
40.07
-1.17
P
0.000
0.261
R-Sq(adj) = 2.4%
Analysis of Variance
Source
Regression
Residual Error
Total
DF
1
14
15
SS
0.0001622
0.0016527
0.0018149
MS
0.0001622
0.0001180
F
1.37
P
0.261
Modell 2. Regression Analysis: y versus x, x^2
The regression equation is
y = 0.221 + 0.553 x - 2.08 x^2
Predictor
Constant
x
x^2
S = ???
Coef
0.221269
0.55278
-2.0814
SE Coef
0.003085
0.04768
0.1617
R-Sq = 93.4%
T
71.72
11.59
-12.87
P
0.000
0.000
0.000
R-Sq(adj) = 92.4%
Analysis of Variance
Source
Regression
Residual Error
Total
Source
x
x^2
DF
1
1
DF
2
13
15
SS
0.00169471
0.00012023
0.00181494
MS
0.00084736
0.00000925
F
91.62
P
0.000
Seq SS
0.00016224
0.00153247
(6.1). (1p) Hur framgår det av analysen att Modell 1 fungerar väldigt dåligt? Motivera ditt svar med hjälp av ett
lämpligt siffervärde ur analysen.
(6.2). (1p) Gör x2 nytta som förklaringsvariabel i Modell 2? Motivera ditt svar med hjälp av ett lämpligt 95%
konfidensintervall eller test.
(6.3). (1p) Skatta σ i Modell 2.
Page 2/2