Likelihood Statistics

Likelihood Statistics
Anja Vest
IEKP, Uni Karlsruhe
October 2005
B Event weights
B Confidence levels
B Likelihood plots
B TLimit root class
Likelihood Ratio
• Experimental result = configuration of events that agrees with either a
– pure background (b) hypothesis
– signal + background (s+b) hypothesis
• Discriminator: e.g. Mrec (peak in Mrec–distribution −→ ”signal observation”)
(also a 2D–discriminator possible, e.g. Mrec and b–tag content of an event)
• Divide Mrec into bins: i = 1, 2, ..., Nbins each containing Ni observed candidates
−→ histogram
• Likelihood ratio: ranks an experimental result between being
b like oder s+b like
Anja Vest, IEKP, Uni Karlsruhe
1
Likelihood Ratio
Likelihood ratio:
L(s + b)
Q =
L(b)
PP oisson(Data|s + b)
=
PP oisson(Data|b)
Nbins
exp(−(stot + btot)) Y si + bi Ni
=
(
)
exp(−btot)
bi
i=1
more simple: weighted sum of all observed events:
Nbins
−2lnQ(MH ) = 2stot − 2
X
i=1
Anja Vest, IEKP, Uni Karlsruhe
si(MH )
Ni ln(1 +
)
bi
2
The likelihood probability density function
Likelihood p.d.f.:
histogram generated when performing a large number of experiments
• Generate for example 50000 experiments without signal hypothesis (b-only)
• For each experiment:
event configuration (Ni) and MH
⇒ −2lnQ(MH )
⇒ p.d.f.
• Repeat the same for the signal + background hypothesis (s + b)
Anja Vest, IEKP, Uni Karlsruhe
3
Experiment 1
35
45
Signal+Background
Experiment 2
40
35
events/3 GeV
40
45
30
25
25
25
20
20
20
15
15
15
10
10
10
5
5
5
90
95
45
Signal+Background
40
0
100 105 110 115
Reconstructed mass
Experiment 4
35
80
85
90
95
45
Signal+Background
40
0
100 105 110 115
Reconstructed mass
Experiment 5
p.d.f.
85
Experiment 3
35
30
80
Signal+Background
40
30
0
events/3 GeV
Signal+Background
events/3 GeV
45
events/3 GeV
events/3 GeV
p.d.f. for a Higgs test mass of 115 GeV
80
30
0.15
25
25
0.125
20
20
0.1
15
15
5
0
0
100 105 110 115
Reconstructed mass
0.2
30
5
95
Test mass = 115 GeV
0.225
0.175
10
90
0.25
35
10
85
0.075
1
0.05
2
5
3
4
0.025
80
85
90
95
100 105 110 115
Reconstructed mass
Anja Vest, IEKP, Uni Karlsruhe
0
80
85
90
95
100 105 110 115
Reconstructed mass
-25
-20
s+b like
-15
-10
-5
0
5
10
15
20
-2lnQ
b like
4
Likelihood p.d.f.’s for diffeternt Higgs masses
p.d.f.
0.25
Test mass = 112 GeV
0.2
0.15
0.1
0
-100
p.d.f.
0.25
-80
-60
-40
-20
0
20
40
60
80
-2lnQ
Test mass = 115 GeV
0.2
-2lnQ
0.05
70
60
50
0.15
40
0.1
30
0.05
20
0
-25
p.d.f.
0.25
-20
-15
-10
-5
0
5
10
15
20
-2lnQ
Test mass = 118 GeV
0.2
10
0
likelihood of
mh=115
-10
0.15
-20
0.1
-30
0.05
0
median likelihood
of bg experiments
-10
-8
-6
-4
Anja Vest, IEKP, Uni Karlsruhe
-2
0
2
4
6
median likelihood
of s+b experiments
110
111
112
113
114
115
116
117
118
119
120
Test mass
8
-2lnQ
5
-2 ln(Q)
The likelihood probability density function
25
LEP
20
15
10
5
0
-5
-10
Observed
Expected background
Expected signal + background
100 102 104 106 108 110 112 114 116 118 120
2
mH(GeV/c )
Anja Vest, IEKP, Uni Karlsruhe
6
The likelihood probability density function
Example:
observed likelihood:
−2lnQ = −3
Anja Vest, IEKP, Uni Karlsruhe
7
Confidence levels
• CLb:
background confidence level,
measures the compatibility of the experiment with the background
• 1 − CLb:
probability for a b–only experiment to give a more s+b –like likelihood than the
observed one
• < 1 − CLb >= 0.5 irrespective of the Higgs mass
Correspondance between 1 − CLb and the resulting significance
(Gaussian approximation):
1 − CLb
Significance
Anja Vest, IEKP, Uni Karlsruhe
0.32
1σ
0.046
2σ
2.7 · 10−3
3σ
6.3 · 10−5
4σ
5.7 · 10−7
5σ
8
1-CLb
The likelihood probability density function
1
10
10
10
10
10
LEP
-1
2σ
-2
3σ
-3
-4
Observed
Expected for signal+background
Expected for background
4σ
-5
100 102 104 106 108 110 112 114 116 118 120
2
mH(GeV/c )
Anja Vest, IEKP, Uni Karlsruhe
9
Confidence levels
• CLs+b:
– measures the compatibility of the experiment with the s+b hypothesis
– A larger CLs+b means that the experimental result is more s+b –like, but not
neccessarily more s–like
⇓
If CLs+b is small (< 5%), the s+b hypothesis can be excluded at more than 95%
confidence level, but it does not mean, that the signal hypothesis is excluded at
that level.
• CLs:
– There is no way to directly measure the signal confidence level, because of the
presence of significant background
– The signal confidence level is apriori defined to be:
CLs+b
CLs =
CLb
Anja Vest, IEKP, Uni Karlsruhe
10
CLs
The likelihood probability density function
1
10
10
10
10
10
10
LEP
-1
-2
-3
-4
Observed
Expected for
background
-5
114.4
115.3
-6
100 102 104 106 108 110 112 114 116 118 120
2
mH(GeV/c )
Anja Vest, IEKP, Uni Karlsruhe
11
Confidence levels
CL = 1 − CLs
(obviously conservative since the coverage probability is in general larger than the CL)
Anja Vest, IEKP, Uni Karlsruhe
12
Using ROOT to calculate Significances and Limits
• TLimit is a ROOT add–on.
It computes limits using the Likelihood ratio method (originally implemented by
Tom Junk in fortran 77)
• Classes:
– TLimitDataSource
Takes the signal, background and data histograms to form a channel. More
channels can be added using AddChannel(), as well as different systematic
sources.
– TLimit
Actual algorithm to compute 95% C.L. limits using the Likelihood ratio semibayesian method. It takes TLimitDataSource as input and runs a set of MC
experiments in order to compute the limits. If needed, the inputs (si and bi) are
fluctuated according to their systematics. The output is a TConfidenceLevel.
– TConfidenceLevel
Final result of the TLimit algorithm. It is created just after the time–consuming
part and can be stored in a TFile for further processing. It contains light methods
to return CLs, CLb and other interesting quantities, e.g. Get5sProbability().
Anja Vest, IEKP, Uni Karlsruhe
13
Using ROOT to calculate Significances and Limits
TConfidenceLevel* ComputeLimit
(TLimitDataSource* data, Int t nmc, TRandom* generator,
Double t(*statistic) (Double t, Double t, Double t) stat)
• data: the input TLimitDataSource
• nmc: number of MC experiments to produce
• generator: MC generator used. Default: TRandom3
• stat: function used as statistic. Default: TLimit::LogLikelihood
Anja Vest, IEKP, Uni Karlsruhe
14
Using the root class TConfidenceLevel
// Get the histograms
TFile* infile=new TFile("Results.root","READ");
infile->cd();
TH1D* sh=(TH1D*)infile->Get("signal_histo");
TH1D* bh=(TH1D*)infile->Get("background_histo");
TH1D* dh=(TH1D*)infile->Get("data_histo");
// Compute the limits
cout << "Computing limits... " << endl;
TLimitDataSource* mydatasource = new TLimitDataSource(signal,background,data);
TConfidenceLevel *myconfidence = TLimit::ComputeLimit(mydatasource,50000);
cout << "CLs
: "
<< myconfidence->CLs() << endl;
cout << "CLsb
: "
<< myconfidence->CLsb() << endl;
cout << "CLb
: "
<< myconfidence->CLb() << endl;
cout << "< CLs > : " << myconfidence->GetExpectedCLs_b() << endl;
cout << "< CLsb > : " << myconfidence->GetExpectedCLsb_b() << endl;
cout << "< CLb > : " << myconfidence->GetExpectedCLb_b() << endl;
cout << "3 sigma probability : " << myconfidence->Get3sProbability() << endl;
cout << "5 sigma probability : " << myconfidence->Get5sProbability() << endl;
myconfidence->Draw();
Anja Vest, IEKP, Uni Karlsruhe
15
Example: mytest.C
Signal and background compared to data...
Using a set of randomly
created histograms:
40
35
30
25
Output:
20
15
root [0] .L mytest.C
root [1] limit()
Computing limits...
CLs
: 0.0206003
CLsb
: 0.0116837
CLb
: 0.56716
< CLs > : 0.0152156
< CLsb > : 0.00760808
< CLb > : 0.50002
3 sigma probability : 0.40762
5 sigma probability : 0.01254
10
5
0
-4
-2
-1
0
1
2
3
4
b_hist
Entries
50000
Mean
5.906
RMS
4.595
-2lnQ
4000
3500
3000
2500
2000
1500
1000
500
0
Anja Vest, IEKP, Uni Karlsruhe
-3
-25
-20
-15
-10
-5
0
5
10
15
20
16
Example: mytest.C
Signal and background compared to data...
Using a set of randomly
created histograms:
40
35
30
25
Output:
20
15
root [0] .L mytest.C
root [1] limit()
Computing limits...
CLs
: 0.0539502
CLsb
: 0.0414866
CLb
: 0.76898
< CLs > : 0.0152156
< CLsb > : 0.00760808
< CLb > : 0.50002
3 sigma probability : 0.40762
5 sigma probability : 0.01254
10
5
0
-4
-2
-1
0
1
2
3
4
b_hist
Entries
50000
Mean
5.906
RMS
4.595
-2lnQ
4000
3500
3000
2500
2000
1500
1000
500
0
Anja Vest, IEKP, Uni Karlsruhe
-3
-25
-20
-15
-10
-5
0
5
10
15
20
17
Example: mytest.C
Signal and background compared to data...
Using a set of randomly
created histograms:
Output:
50
40
30
20
root [0] .L mytest.C
root [1] limit()
Computing limits...
CLs
: 5.54126e-10
CLsb
: 3.07529e-10
CLb
: 0.55498
< CLs > : 2.79282e-10
< CLsb > : 1.39646e-10
< CLb > : 0.50002
3 sigma probability : 0.9998
5 sigma probability : 0.97468
10
0
-4
-3
-1
0
1
2
3
4
b_hist
Entries
50000
Mean
39.87
11.24
RMS
-2lnQ
7000
6000
5000
4000
3000
2000
1000
0
-120 -100
Anja Vest, IEKP, Uni Karlsruhe
-2
-80
-60
-40
-20
0
20
40
60
80
18