Feldman - Oxford Physics

Concluding Talk: Physics
Gary Feldman
Harvard University
PHYSTAT 05
University of Oxford
15 September, 2005
Topics

I will restrict my comments to two topics, both of
which I am interested in and both of which
received some attention at this meeting:


Event classification
Nuisance parameters
Gary Feldman
PHYSTAT 05
15 September 2005
2
Event Classification

The problem: Given a measurement of an event
X = (x1,x2,…xn), find the function F(X) which returns
1 if the event is signal (s) and 0 if the event is
background (b) to optimize a figure of merit, say
s
b for discovery or s
s+b for an established
signal.
Gary Feldman
PHYSTAT 05
15 September 2005
3
Theoretical Solution


In principle the solution is straightforward: Use a
Monte Carlo simulation to calculate the likelihood
ratio Ls(X)/Lb(X) and derive F(X) from it. By the
Neyman-Pearson Theorem, this is the optimum
solution.
Unfortunately, this does not work due to the
“curse of dimensionality.” In a high-dimension
space, even the largest data set is sparse with the
distance between neighboring events comparable
to the radius of the space.
Gary Feldman
PHYSTAT 05
15 September 2005
4
Practical Solutions



Thus, we are forced to substitute cleverness for
brute force.
In recent years, physicists have come to learn that
computers may be cleverer than they are.
They have turned to machine learning: One gives
the computer samples of signal and background
events and lets the computer figure out what F(X)
is.
Gary Feldman
PHYSTAT 05
15 September 2005
5
Artificial Neural Networks


Originally most of this effort was in artificial neural
networks (ANN). Although used successfully in
many experiments, ANNs tend to be finicky and
often require real cleverness from their creators.
At this conference, there was an advance in ANNs
reported by Harrison Prosper. The technique is to
average over a collection of networks. Each
network is constructed by sampling the weight
probability density constructed from the training
sample.
Gary Feldman
PHYSTAT 05
15 September 2005
6
Trees and Rules


In the past couple of years, interest has started to
shift to other techniques, such as decision trees,
at least partially sparked by Jerry Friedman’s talk
at PHYSTAT 03.
A single decision tree
has limited power, but
its power can be increased
by techniques that
effectively sum many
trees.
A cartoon
from
Roe’s talk
Gary Feldman
PHYSTAT 05
15 September 2005
7
Rules and Bagging Trees


Jerry Friedman gave a talk on rules, which
effectively combines a series of trees.
Harrison Prosper gave a talk (for Ilya Narsky) on
bagging (Bootstrap AGGregatING) trees. In this
technique, one builds a collection of trees by
selecting a sample of the training data and,
optionally, a subset of the variables.

Results on significance of B  gen at BaBar
Single decision tree
2.16 s
Boosted decision trees
2.62 s (not optimized)
Bagging decision trees
2.99 s
Gary Feldman
PHYSTAT 05
15 September 2005
8
Boosted Decision Trees


Byron Roe gave a talk on the use of boosted trees
in MiniBooNE. Misclassified events in one tree are
given a higher weight and a new tree is generated.
Repeat to generate 1000 trees. The final classifier
is a weighted sum of all of the trees.
Comparison
to ANN:
52 variables
Also more
robust.
21 variables
% of signal retained
Gary Feldman
PHYSTAT 05
15 September 2005
9
Other Talks

There were a couple of other talks on this subject
by Puneet Sarda and Alex Gray, which I could not
attend.
Gary Feldman
PHYSTAT 05
15 September 2005
10
Nuisance Parameters

Nuisance parameters are parameters with
unknown true values for which coverage is
required in a frequentist analysis.



They may be statistical, such as number of background
events in a sideband used for estimating the background
under a peak.
They may be systematic, such as the shape of the
background under the peak, or the error caused by the
uncertainty of the hadronic fragmentation model in the
Monte Carlo.
Most experiments have a large number of systematic
uncertainties.
Gary Feldman
PHYSTAT 05
15 September 2005
11
New Concerns for the LHC

Although the statistical treatment of these
uncertainties is probably the analysis question
that I have been asked the most, Kyle Cranmer has
pointed out that these issues will be even more
important at the LHC.



If the statistical error is O(1) and the systematic error is
O(0.1), the the systematic error will contribute as its
square or O(0.01) and it does not much matter how you
treat it.
However, at the LHC, we may have process with 100
background events and 10% systematic errors.
Even more critical, we want 5 s for a discovery level.
Gary Feldman
PHYSTAT 05
15 September 2005
12
Why 5 s?



LHC searches: 500 searches each of which has
100 resolution elements (mass, angle bins, etc.) x
5 x 104 chances to find something.
One experiment: False positive rate at 5 s 
(5 x 104) (3 x 10-7) = 0.015. OK.
Two experiments:




Allowable false positive rate: 10.
2 (5 x 104) (1 x 10-4) = 10  3.7 s required.
Required other experiment verification:
(1 x 10-3)(10) = 0.01  3.1 s required.
Caveats: Is the significance real? Are there common
systematic errors?
Gary Feldman
PHYSTAT 05
15 September 2005
13
A Cornucopia of Techniques


At this meeting we have seen a wide series of
techniques discussed for constructing confidence
intervals in the presence of nuisance parameters.
Everyone has expressed a concern that their
methods cover, at least approximately. This
appears to be important for LHC physics in light of
Cranmer’s concerns.
Gary Feldman
PHYSTAT 05
15 September 2005
14
Bayesian with Coverage


Joel Heinrich presented a decision by CDF to do
Bayesian analyses with priors that cover.
Advantage is Bayesian conditioning with
frequentist coverage. Possibly the maximum
amount of work for the experimenter.
Example of coverage
with a single
Flat
Poisson with
priors
normalization and
background nuisance
parameters:
Gary Feldman
PHYSTAT 05
15 September 2005
15
Bayesian with Coverage

Example of coverage with flat and 1/e and 1/b
priors for a 4-channel Poisson with normalization
and background nuisance parameters
Flat priors
Gary Feldman
PHYSTAT 05
1/e and 1/b
priors
15 September 2005
16
Frequentist/Bayesian Hybrid




Fredrik Tegenfeldt presented a likelihood-ratio ordered (LR)
Neyman construction after integrating out the nuisance
parameters with a flat priors. In a single channel test, there
was no undercoverage.
What happens for a multi-channel case? My guess is that
the confidence belt will be distorted by the use of flat priors,
but that the method will still cover due to the construction.
Cranmer considers a similar technique, as was used for LEP
Higgs searches.
Both are call “Cousins-Highland,” although probably neither
actually is.
Gary Feldman
PHYSTAT 05
15 September 2005
17
Profile Likelihood

44 years ago, Kendall and Stuart told us how to
eliminate nuisance parameters and do a LR
construction:
Gary Feldman
PHYSTAT 05
15 September 2005
18
One (Minor) Problem

The Kendall-Stuart prescription leads to the
problem that for Poisson problems as the
nuisance parameter is better and better known, the
confidence intervals do not converge to the limit of
being perfectly known. The reason is that the
introduction of a
nuisance parFrom Punzi’s talk
ameter breaks the
discreteness
of the Poisson
distribution.
Gary Feldman
PHYSTAT 05
15 September 2005
19
One More Try

Since this was referred to in a parallel session as “the Feldman
problem” and since two plenary speakers made fun of my Fermilab
Workshop plots, I will try to explain them again.
n
n
b
r=1
n
r=1
b
r << 1
n
b
r << 1
n
n
b known exactly
Gary Feldman
b
PHYSTAT 05
b known exactly
15 September 2005
20
The Cousins-Highland Problem




This correction also solves what Bob and I refer to
as the Cousins-Highland problem (as opposed to
method).
Cousins and Highland turned to a Bayesian
approach to calculate the effect of a normalization
error because the frequentist approach gave an
answer with the wrong sign.
We now understand this was due to simply
breaking the discreteness of the Poisson
distribution.
In one test case, using this correction reproduced
the Cousins-Highland result x/ 2.
Gary Feldman
PHYSTAT 05
15 September 2005
21
Use of Profile Likelihood

Wolfgang Rolke presented a talk on eliminating the
nuisance parameters via profile likelihood, but
with the Neyman construction replaced by the
-DlnL hill-climbing approximation. This is also
what MINUIT does. The coverage is good with
some minor undercoverage. Cranmer also
considers this method.
Gary Feldman
PHYSTAT 05
15 September 2005
22
Full Neyman Constructions


Both Giovanni Punzi and Kyle Cranmer attempted
full Neyman constructions for both signal and
nuisance parameters.
I don’t recommend you try this at home for the
following reasons:



The ordering principle is not unique. Both Punzi and
Cranmer ran into some problems.
The technique is not feasible for more than a few
nuisance parameters.
It is unnecessary since removing the nuisance
parameters through profile likelihood works quite well.
Gary Feldman
PHYSTAT 05
15 September 2005
23
Cranmer’s (Revised) Conclusions

In Cranmer’s talk,
he had an
unexpected result
for the coverage of
Rolke’s method
(“profile”). He did
in fact have
an error and it is
corrected here:
Gary Feldman
PHYSTAT 05
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
15 September 2005
24
Final Comments on
Nuisance Parameters



My preference is to eliminate at least the major
nuisance parameters through profile likelihood
and then do a LR Neyman construction. It is
straightforward and has excellent coverage
properties.
However, whatever method you choose, you
should check the coverage of the method.
Cranmer makes the point that if you can check the
coverage, you can also do a Neyman construction.
I don’t completely agree, but it is worth
considering.
Gary Feldman
PHYSTAT 05
15 September 2005
25