unfolding method

Iterative dynamically stabilized (IDS) method of data
unfolding (*)
Bogdan MALAESCU
CERN
PHYSTAT 2011
Workshop on unfolding
(*arXiv:0907.3791)
1
Outlook
•
•
•
•
•
Introduction: main effects to deal with
Additional problems in practice
An iterative unfolding method
A complex example
Discussion and conclusions
2
Introduction: detector effects, folding and unfolding
Example of transfer matrix (MC)
Resolution
+
Distortion
i
j
Aij
• Folding:
P  truespectrumdata;Pij 
Aij

NBins
k 1
Akj
• Unfolding of detector effects (acceptance corrected afterwards)
• Unfolding is not a simple numerical problem
3
Must use a regularization method.
Problems in practice: fluctuations due to background
subtraction
Unfolding
Background
subtraction
Folding
• A “standard” unfolding could propagate large fluctuations
into precise regions of the spectrum
• The uncertainties of the data points must be taken into
account in the unfolding! (used to compute the significance
of data-MC differences in each bin)
4
Problems in practice: transfer matrix simulation
 perfect
Detector simulation (folding): systematic uncertainty
New structures in data:
• must also be corrected for detector effects
• could bias MC normalization (needed in the unfolding, for data-MC
comparison)
New structure
(not simulated)
MC - standard
normalization
MC - improved
normalization
• Key: use the significance of data-MC differences in each bin
5
Ingredient for the unfolding procedure:
a regularization function
• Used to “measure” significance in the (bin by bin) comparison of experimental
data and MC simulation
• Allows one to perform a different treatment of fluctuations and significant new
structures in data
• Important for the dynamical regularization of fluctuations
• Depends (monotonously) on the absolute data – MC difference, their
uncertainties and a parameter l (scale factor)
f (x,  , l )  1 - e
 x 
-

 l 
2
• Behavior at small/large parameter values is important, but the exact choice of the
function is not critical
• Used at all the steps of the unfolding procedure, with different values for l
6
Model for the test of the method
Transfer matrix model:
• For the folding
• Fluctuated matrix used for
the unfolding
Resolution effect
Reconstructed MC
Generated MC
Systematic transfer
of events
7
Model for the test of the method
Generated MC
Generated
MC
+ New Structures
Truth Data
Reconstructed MC
Data
New Structures
Data Reconstructed MC
Data Generated MC
8
Ingredients for the unfolding procedure: the MC
normalization procedure
• First estimation of the number of events in data,
corresponding to structures simulated by MC:
n
N
MC
D
# data ev., in the bin k
  (d k - B )
d
k
k 1
# background subtraction fluctuation ev.,
in the bin k
• A better estimation:
N
  N DMC   1 - f  dk ,   dk  , lN   dk
n
MC
D
k 1
MC
N
d k  d k - Bkd - D  rk
N MC
ITERATIONS
2
N 
2
  dk     dk   


 rk 

 N MC 
2
MC
D
• The same method at the level of (corrected spectrum/ generated MC)
9
Ingredients for the unfolding procedure: the MC
normalization procedure
50 iterations
(at most)
•Relative improvement of the
normalization:
(ND – NDMC)/ND
•The number of iterations is
important only in the unstable
region
•The size of the unstable region
depends on the amplitude of
fluctuations in background
subtraction
Unstable
λN Choice
Study performed directly on data!
Stable
λN
10
Ingredients for the unfolding procedure: one step of
the unfolding method
Folding:
P  truespectrumdata;Pij 
Aij

n
k 1
Akj
Unfolding matrix (like d’Agostini method):
Aij
Pij  n
 k 1 Aik
By construction:
i
ri  n Pik  tk

k 1

n
t j  k 1 Pkj  rk
General equation
Aij
j
Only approximate for spectra other than MC
Unfolding: compare data and reconstructed MC spectra
True MC
Significant difference (unfolded)

n
N dMC
u
u j 
 t j  B j  f  d k ,   d k  , l   d k  Pkj  1 - f  d k ,   d k  , l    d k   kj
N MC
k 1
Fluctuation in background subtraction
11
Not significant difference (fixed)

1st step of the unfolding method
Choice:l  lL  
(all differences between data and reconstructed MC spectra treated as not
significant)
Reconstructed MC
Generated MC
+ New Structures
Truth Data
Corrected spectrum
Data
New Structures
Corrected spectrum generated MC
If one would choose lL=0 …
Data Reconstructed MC
Data Generated MC
12
Ingredients for the unfolding procedure :
Comparison of the corrected spectrum and generated MC:
• Estimation of large fluctuations in background subtraction:
not significant deviations, with large uncertainties


Buj  1 - f u j ,   u j  , lS   u j 


N DMC
u j  u j t j
N MC
Normalization procedure
• Transfer matrix improvement: use significant structures


Aij  Aij  f u j ,   u j  , lM  u j 
N MC
 Pij , pouri  1; n
MC
ND
N DMC
u j  u j - B t j
N MC
u
j
The folding matrix (P), describing detector effects, stays
unchanged. Only the generated MC spectrum is improved.
13
The Iterative Unfolding Method
• 1st unfolding, where the large fluctuations due to background
subtraction are kept unchanged
1)Estimation of large fluctuations due to background subtraction
2)Transfer matrix improvement (hence of the unfolding probability
matrix)
3)Improved unfolding
Dynamical regularization: from the treatment of fluctuations in
each bin, at each step of the procedure
When should the iterations stop?
• Comparison of data and reconstructed MC
• Study the number of needed iterations, with toys
Choice of parameters used at different steps, with a model for data.
One can (in general) give up some of the parameters (by performing
a maximal unfolding & transfer matrix modification).
14
Results after iterations
New structures
Data Reconstructed MC
Data – improved
reconstructed MC
Estimation of
background
fluctuations
15
Unfolding Result
Initial reconstructed
MC
Initial generated MC
+ New Structures
Truth Data
Corrected spectrum
Data
New Structures
Data - Initial
reconstructed MC
Data - Initial
generated MC
Corrected spectrum Initial generated MC
• Statistical uncertainties propagated using pseudo-experiments (“toys”).
16
Discussion
Studied but not discussed:
• N bins data  N bins result (rebinning in the
unfolding or afterwards)
• Effect of rebinning on correlations
• Effect of regularization on uncertainties and
correlations (see Kerstin’s talk)
• Treatment of bins with negative number of
events (data)
• Empty bins in MC
• Preventing the existence of negative bins in the
improved generated MC
17
Conclusion
• New general method for the unfolding of
binned data
• Can treat problems that were not considered
previously
• Dynamic regularization procedure, bin by bin
at each step
• This method allows one to keep some control
of bin to bin correlations in the unfolded
spectrum
• Root code is available
18
Backup
19
Zoom on the narrow resonance region
20
A simple example for the use of the unfolding method
Simplified example:
•
•
•
•
•
Reduced effects of the transfer matrix
Smoother « bias », without structures
No « deeps » in the spectrum
No important fluctuations from background subtraction
Statistics reduced by a factor 20
Data - Initial
reconstructed MC
Data uncertainties
Data - Final
reconstructed MC
(after one iteration)
21
A simple example for the use of the unfolding method
Simplified unfolding method:
•
•
•
•
Standard normalization for the MC
No estimation of left fluctuations (from background subtraction)
1st unfolding with λ = λL ( = 1.5, justified by a study (see next))
One iteration with λU= λM=0
Data uncertainties
Effect of the
1st unfolding
Effect of the
2nd unfolding
22
A test with known « generated data » (before folding)
• Use (data – reconstructed MC) as bias with respect to the
generated MC, in order to build « generated data » (toys)
• Folding with the matrix Aij
• (Do not) Fluctuate the folded data
• Unfolding with the matrix A’ij (Aij fluctuated)
• Compare the result with the « generated data »
Data uncertainties
Data - Initial
reconstructed MC
Data - Final
reconstructed MC
(after one iteration)
No extra
With
statistical
data fluctuations:
data fluctuations:
test systematic
stability test
effects
23
A test with known « generated data » (before folding)
Bias measurement after unfolding (without statistical
fluctuations of folded data) in large bins
Data uncertainties
Result – generated
data (1st step)
Result – generated
data (2nd step)
•The 1st unfolding provides a good result
•λL = 1.5 : very small bias and reduced correlations with
respect to the case λL = 0
24
A simple example for the use of the unfolding method
• Diagonal uncertainties after the 1st unfolding: larger
in the non trivial case (less correlations between the
bins)
Data uncertainties
Uncertainties after 1st
unfolding λL = 1.5
Uncertainties after
1st unfolding λL = 0
25