Uncertainty Quantification for Machine Learning and Statistical Models

UncertaintyQuantificationfor
MachineLearningandStatisticalModels
DavidJ.Stracuzzi
Jointworkwith:MaxChen,MichaelDarling,
StephenDauphin,MattPeterson,
andChrisYoung
Sandia National Laboratories is a multi-mission laboratory managed and operated by Sandia Corporation,
a wholly owned subsidiary of Lockheed Martin Corporation, for the U.S. Department of Energy’s National
Nuclear Security Administration under contract DE-AC04-94AL85000.
SAND2017-2966C
InverseProblems
D=M(T)
T=M-1(D)
§ Mhasaknownform
§ DeterminingM-1 isill-posed(unstable,non-unique)
§ Example:determineEarth’ssubsurfacedensity(T)from
gravitationalfieldmeasurements(D)
§ MisbasedonNewton’slaw,F=Gm/r2
§ DisalmostcertaintyinsufficienttoparameterizeM-1
§ Regularizeandoptimizetogeta“best”solution
§ Uncertaintycomesfrom
§ D
§ Regularization+Optimization(definitionof“best”)
§ Modelparameters
2
MachineLearningand
StatisticalModeling
D=M(T)
T=M-1(D)
§ Mhasanunknown form
§ Substitutestatisticsfordomainknowledge
§ Uncertaintycomesfrom
§
§
§
§
D
Regularization+Optimization(definitionof“best”)
Modelform(andparameters)
Inference
§ Lackofdomainknowledgeimpactsselectionofmodel
structure,regularizationmethods,anddefinitionof“best”
3
Example:SeismicOnsetDetection
Given
§ Waveformdatacontainingboth
noiseandseismicsignals
Produce
§ Signalonsettime
§ Precisioniscriticaltodownstream
processing
Known
§ Wavepropagationtheory
Unknown
§ Pointoforigin
Simplified:signalis3D.
Groundtruthdoesnotexist.
§ Eventsizeortype
§ Composition/densityofmedium(rock)
4
SeismicDetectionModels
Model
/
,
§ ARMA(p,q):𝑥" = 𝑐 + 𝜀" + ∑)-. 𝜙) 𝑥"*) + ∑)-. 𝜃) 𝜀"*)
BasicApproach
§ Modelthenoise(M1)andthesignal
plusnoise(M2)separately
§ Optimizemodelparameters𝜃, 𝜙
viamaximumlikelihood
§ Akaike informationcriterion(AIC)
toselectp,q
§ Pointatwhichtwomodelsmeet
isthe“bestguess”signalonset
5
SeismicDetectionUncertainty
Method1:ParametricBootstrap
§ Constructasamplingdistributionforthe
inputwaveform
§ SampleM1 &M2 tocreatenewwaveforms
§ Fitanewmodeltoeachsampleandrecordk
Method2:ModelSampling
§ M1=ARMA(p1,q1);M2=ARMA(p2,q2)
§ <p1,q1,p2,q2>determinesk andalikelihood
§ Samplefrom<p1,q1,p2,q2>andfitthedata
§ Usethelikelihoodstoconstruct
probabilitydistributionoverk
Notclearthattheseproducesimilardistributions
6
UncertaintyQuantificationforStatisticalModels
inverse uncertainty quantification problem
classic machine learning problem
approximate
unobservable
inferred state
solution
uncertainty
inference
=
inference
errors
+
model
induction /
calibration
model-form
uncertainty
input
indirect
regularization observations
+
regularization measurement
+ errors
effects
Key Question: How much do sensor observations really tell us about the world?
• Stats / ML research communities do not typically frame questions this way.
• Most work focuses on building a better statistical model
• “What” the model says emphasized over “how well”
2
ImpactonDownstreamAnalyses
§ Severalanalysesdependononsettime:
§
§
§
§
Location(hypocenter)
Eventtype(naturalorman-made)&size
Subsurfacetomography
Earthmodel
§ Provideadditionalinformationtohumananalysts
§
§
Distributionoveronsettimes(vs.confidenceinterval)
Relativereliabilityofdatapointsandsensors
§ Relymoreoncurrentdata,lessonhistorical
dataandmodelingassumptions
§
Improvesensitivity?
§ Nearterm:
§
§
§
Modelselection(improvefitanduncertainty)
Validateagainstsynthetic&knownevents
Compareuncertaintyresultsagainstanalystpicks
8
Example2:MultimodalImageAnalysis
§ Given
§
Co-registeredimageryfrommultiplesources
§ Produce
§
§
Segmentationwithassociatedclassprobabilitiesand
uncertainties,or
Detectionofspecificobjects
§ Motivation
§
§
§
§
Optical Image
Trendtowardautomatedimageanalysis
Trendtoward”layered”analysisandcombining
informationsources
Openquestion:Howreliablearetheresults?
Openquestion:Whichdataisuseful?
§ Currently
§
§
§
Asmanyanalyticmethodsastherearetasks
Someignoreuncertainty
Othersevaluateitintheirownuniquelysaneway
Lidar Image
9
Approach
1.
Mergedatasources
§
2.
Samplethedatatocreateanew“image”
§
3.
4.
Pixel=R+G+B+Height
Non-parametricbootstrap
Probabilisticallyclusterthepixels
§
GaussianMixture
§
NonparametricMixture
Recordtheclassprobabilitiesforeachpixel
§ Keepahistogramforeachclassateachpixel
5.
Bootstrap:Repeat2,3,4manytimes
§ Histogramsrepresenttheposteriorprobability
distributions
6.
Futurework:Translateunsupervised
classesintosemanticlabels
10
MultimodalImageAnalysis
Source Imagery
Probability
Most
Most LikelyofClass
Likely Class
Class Entropy
Top row: optical image only
Bottom row: combined optical and lidar imagery
Uncertainty
11
CloserLookattheUncertainty
Optical Only
Combined Imagery
12
ValueofInformation
§ Evaluatedatasufficiency
§ Howmuchdoesagivendatasourcecontributetoananalyticresult?
§
§
Whichdatadowereallyneed?
Whichdatadowewishwehad?
Delta Probability
Source
Max Likelihood
Probability Map
Delta Uncertainty
ResourceTaskingUnderUncertainty
Relax assumption that desired information is
actually obtained from a scheduled collect
Uncertainty
Representations
Stochastic
Optimization
Risk
Additional Value
Target / Environmental
Conditions
Quality Distribution
of Acquired
Information
Optimum
Risk
Additional Value
0%
Costs
Quality/Validation/Compliance 100%
Sensor Performance
Characteristics
Goal:Maximizeutilityofavailabledataandresources
Approach:Quantifyimpactofdataondetectionquality
§
Useuncertaintyasbasisforvalue
§
Definevalueasimprovedlabel,geospatial,ortemporaldiscrimination
§
Incorporateinformationvaluesintooptimizationobjectivefunctions
14
Summary
§ Uncertaintyanalysisdoesnot“buildabettermodel”
Itindicateshowwellagivenmodelcapturesthedata
§ Researchistobridgethetheory– applicationgap
§
§
§
Identifyingwhichinformationisuseful
Diggingitoutfromdeepwithinalgorithms
Propagatinguncertaintythroughlayersofanalysis
§ Importantquestions
§
What’stherelationshipbetweenuncertaintiesgeneratedby
§
Measurementerrors&datasampling(bootstrap)
§
Modelselectionandinductionprocesses(modelsampling)
§
Inference(MCMC)
§
…andhowdowecombinethem?Orshouldwe?
§
Whatissuesarisewhenwepropagateuncertaintiesfromonestatistical
inverseproblemtoanother?
15
UncertaintyQuotes
“Wedemandrigidlydefinedareasofdoubtanduncertainty!”
– DouglasAdams,TheHitchhiker'sGuidetotheGalaxy
“Itain’t whatyoudon’tknowthatgetsyouintotrouble.
It’swhatyouknowforsurethatjustain’t so.” – UncertainOrigin
“Uncertaintyisanuncomfortableposition.
Butcertaintyisanabsurdone.” – Voltaire
"Ignorancemorefrequentlybegetsconfidencethandoesknowledge…”
– CharlesDarwin
“Asfarasthelawsofmathematicsrefertoreality,theyarenotcertain;
andasfarastheyarecertain,theydonotrefertoreality.”
– AlbertEinstein
POC: David Stracuzzi
[email protected]