Maximum Likelihood Approach for Symmetric Distribution Property

MaximumLikelihoodApproachfor
SymmetricDistributionProperty
Estimation
Jayadev Acharya,Hirakendu Das,Alon Orlitsky,Ananda Suresh
Cornell,Yahoo,UCSD,Google
Propertyestimation
โ€ข ๐’‘:unknown discretedistributionover๐’Œ elements
โ€ข ๐’‡(๐’‘):apropertyof๐’‘
โ€ข ๐œบ:accuracyparameter,๐œน: errorprobability
โ€ข Given:accesstoindependentsamplesfrom๐’‘
โ€ข Goal:estimate๐’‡(๐’‘) to±๐œบ withprobability > ๐Ÿ โˆ’ ๐œน
Usually,๐œน constant,say0.1
Focusofthetalk:large๐’Œ
Samplecomplexity
๐‘‹. , ๐‘‹0 , โ€ฆ , ๐‘‹2 :independentsamplesfrom๐‘
๐‘“5(๐‘‹.2):estimateof๐‘“(๐‘)
Samplecomplexityof๐‘“5 (๐‘‹.2):
๐‘† ๐‘“5, ๐œ€, ๐‘˜, ๐›ฟ = min{๐‘›: โˆ€๐‘, Pr |๐‘“5(๐‘‹.2 ) โˆ’ ๐‘“(๐‘)| โ‰ฅ ๐œ€ โ‰ค ๐›ฟ}
Samplecomplexityof๐‘“:
๐‘† ๐‘“, ๐œ€, ๐‘˜, ๐›ฟ = min ๐‘† ๐‘“5 , ๐œ€, ๐‘˜, ๐›ฟ
G5
Symmetricproperties
Permutingsymbollabelsdoesnotchange๐‘“(๐‘)
Examples:
โ€ข ๐ป ๐‘ โ‰œ โˆ’ โˆ‘K ๐‘K โ‹… log ๐‘K
โ€ข ๐‘† ๐‘ โ‰œ โˆ‘K 1QRST
Renyi entropy,distancetouniformity,unseensymbols,
divergences,etc
Sequencemaximumlikelihood
๐‘K : #times๐‘ฅ appearsin๐‘‹.2 (multiplicity)
๐‘W :empiricaldistribution
๐‘KW =
๐‘K
๐‘›
๐‘“ W ๐‘‹.2 = ๐‘“(๐‘W )
Empiricalestimatorsaremaximumlikelihoodestimators
๐‘W = arg max ๐‘(๐‘‹.2 )
Q
Callthissequencemaximumlikelihood (SML)
Empiricalentropyestimation
Empiricalestimator:
๐ปW
๐‘K
๐‘›
=๐ป
= Z log .
๐‘›
๐‘K
K
๐‘˜
W
๐‘† ๐ป , ๐œ€, ๐‘˜, 0.1 = ฮ˜
๐œ€
๐‘‹.2
๐‘W
Variouscorrectionsproposed:
Miller-Maddow,Jackknifedestimator,Coverageadjusted,โ€ฆ
Samplecomplexity:ฮฉ(๐‘˜)
๐ป, ๐œ€, ๐‘˜, 0.1 = ๐‘œ(๐‘˜) (existential)
Note:For๐œ€ โ‰ช 1/๐‘˜,empiricalestimatorsarethebest
[Paninskiโ€™03]:๐‘†
Entropyestimation
[ValiantValiantโ€™11a]: ConstructiveproofsbasedonLP:
d
๐‘† ๐ป, ๐œ€, ๐‘˜, 0.1 = ฮ˜c efg d
โ€ข [YuWangโ€™14, HanJiaoVenkatWeissmanโ€™14, ValiantValiant11b]:
Simplifiedalgorithms,andexactrates:
๐‘† ๐ป, ๐œ€, ๐‘˜, 0.1 = ฮ˜
d
cefg d
Supportcoverage
Expectednumberofsymbolswhen๐‘ issampled๐‘š times
๐‘†i ๐‘ = Z(1 โˆ’ 1 โˆ’ ๐‘K
i)
K
Goal:Estimate๐‘†i ๐‘ to±(๐œ€ โ‹… ๐‘š)
[OrlitskySureshWuโ€™16, ZouValiantetalโ€™16]:
๐‘›=ฮ˜
i
.
log
efg i
c
samplesfor๐›ฟ = 0.1
Knownresultssummary
Manysymmetricproperties:entropy,supportsize,distance
touniform,supportcoverage
โ€ข Differentestimatorforeachproperty
โ€ข Sophisticatedresultsfromapproximationtheory
Mainresult
Simple,MLbasedplug-inmethodthatissample-optimal
forentropy,support-coverage,distancetouniform,
supportsize.
Profiles
Profileofasequenceisthemultiset ofmultiplicities:
ฮฆ ๐‘‹.2 = {๐‘K }
๐‘‹.2 = 1๐ป, 2๐‘‡ , ๐‘œ๐‘Ÿ๐‘‹.2 = 2๐ป, 1๐‘‡ ,ฮฆ ๐‘‹.2 = {1,2}
Symmetricpropertiesdependonmultiset ofprobabilities
Coinsw/bias0.4,and0.6havesamesymmetricproperty
Optimalestimatorshavesame outputfor sequenceswith
sameprofile.
Profilesaresufficientstatistic
Profilemaximumlikelihood [OSVZโ€™04]
Probabilityofaprofile:
๐‘ ฮฆ(๐‘‹.2) =
Z
๐‘(๐‘Œ.2)
opq :r opq sr tpq
Maximizetheprofileprobability:
uvw
๐‘r
= arg max ๐‘(ฮฆ ๐‘‹.2 )
Q
๐‘‹.2 = 1๐ป, 2๐‘‡ :
SML:(2/3, 1/3)
PML:(1/2, 1/2)
PMLforsymmetricproperties
Toestimateasymmetric๐‘“(๐‘):
โ€ข Find๐‘uvw ฮฆ(๐‘‹.2 )
โ€ข Output๐‘“(๐‘uvw )
Advantages:
โ€ข Notuningparameters
โ€ข Notfunctionspecific
Mainresult
PMLissample-optimal forentropy,supportcoverage,
distancetouniformity,andsupportsize.
Ingredients
GuaranteeforPML.
If๐‘› = ๐‘† ๐‘“, ๐œ€, ๐‘˜, ๐›ฟ ,then๐‘† ๐‘“
If ๐’ = ๐‘บ ๐’‡, ๐œบ, ๐’Œ, ๐’†}๐Ÿ‘
๐’
#profilesoflength๐‘› <
๐‘uvw
, 2๐œ€, ๐‘˜, ๐›ฟ
Wy q
โ‹… .T โ‰ค๐‘›
,then ๐‘บ ๐’‡ ๐’‘๐‘ท๐‘ด๐‘ณ , ๐Ÿ๐œบ, ๐’Œ, ๐ŸŽ. ๐Ÿ โ‰ค ๐’
Wy q
.T
Ingredients
๐‘› = ๐‘† ๐‘“, ๐œ€, ๐›ฟ ,achievedbyanestimator๐‘“5 (ฮฆ(๐‘‹ 2 )):
๐‘:underlyingdistribution.
โ€ข Profilesฮฆ ๐‘‹ 2 suchthat๐‘ ฮฆ ๐‘‹ 2
> ๐›ฟ,
๐‘uvw ฮฆ โ‰ฅ ๐‘ ฮฆ > ๐›ฟ
uvw
๐‘r
ฮฆ โ‰ฅ๐‘ ฮฆ >๐›ฟ
uvw
uvw
๐‘“ ๐‘r
โˆ’ ๐‘“ ๐‘ โ‰ค ๐‘“ ๐‘r
โˆ’ ๐‘“5 ฮฆ + ๐‘“5 ฮฆ โˆ’ ๐‘“ ๐‘
โ€ข Profileswith๐‘ ฮฆ ๐‘‹ 2
< 2๐œ€
< ๐›ฟ,
๐‘(๐‘ ฮฆ ๐‘‹ 2 < ๐›ฟ < ๐›ฟ โ‹… #๐‘๐‘Ÿ๐‘œ๐‘“๐‘–๐‘™๐‘’๐‘ ๐‘œ๐‘“๐‘™๐‘’๐‘›๐‘”๐‘กโ„Ž๐‘›
Ingredients
Bettererrorprobabilityguarantees.
Recall:
๐‘˜
๐‘† ๐ป, ๐œ€, ๐‘˜, 0.1 = ฮ˜
๐œ€ โ‹… log ๐‘˜
StrongererrorguaranteesusingMcDiarmidโ€™s inequality:
๐‘†
ลฝ.โ€ข
}2
๐ป, ๐œ€, ๐‘˜, ๐‘’
=ฮ˜
d
cโ‹…efg d
Withtwicethesampleserrordropsexponentially
Similarresultsforotherproperties
Mainresult
PMLissample-optimal forentropy,unseen,distanceto
uniformity,andsupportsize.
EvenapproximatePML isoptimalforthese.
Algorithms
โ€ข EMalgorithm[Orlitsky etal]
โ€ข ApproximatePMLviaBethePermanents[Vontobel]
โ€ข ExtensionsofMarkovChains[VatedkaVontobel]
PolynomialtimealgorithmsforapproximatingPML
Summary
โ€ข Symmetricpropertyestimation
โ€ข PMLplug-inapproach
โ€ข Universal,simpletostate
โ€ข Independentofparticularproperties
โ€ข Directions:
โ€ข Efficientalgorithmsโ€“ forapproximatePML
โ€ข Reliesheavilyonexistenceofotherestimators
InFisherโ€™swordsโ€ฆ
Ofcoursenobodyhasbeenabletoprovethatmaximum
likelihoodestimatesarebestunderallcircumstances.
Maximumlikelihoodestimatescomputedwithallthe
informationavailablemayturnouttobeinconsistent.
Throwingawayasubstantialpartoftheinformationmay
renderthemconsistent.
R.A.Fisher
References
1. Acharya,J.,Das,H.,Orlitsky,A.,&Suresh,A.T.(2016).AUnifiedMaximumLikelihoodApproachforOptimalDistributionPropertyEstimation.arXiv preprint
arXiv:1611.02960.
2. Paninski,L.(2003).Estimationofentropyandmutualinformation.Neuralcomputation,15(6),1191-1253.
3. Wu,Y.,&Yang,P.(2016).Minimax ratesofentropyestimationonlargealphabetsviabestpolynomialapproximation.IEEETransactionsonInformationTheory,
62(6),3702-3720.
4. Orlitsky,A.,Suresh,A.T.,&Wu,Y.(2016).Optimalpredictionofthenumberofunseenspecies.ProceedingsoftheNationalAcademyofSciences,201607774.
5. Orlitsky,A.,Santhanam,N.P.,Viswanathan,K.,&Zhang,J.(2004,July).Onmodelingprofilesinsteadofvalues.InProceedingsofthe20thconferenceon
Uncertaintyinartificialintelligence (pp.426-435).AUAIPress.
6. Valiant,G.,&Valiant,P.(2011,June).Estimatingtheunseen:ann/log(n)-sampleestimatorforentropyandsupportsize,shownoptimalvianewCLTs.In
Proceedingsoftheforty-thirdannualACMsymposiumonTheoryofcomputing (pp.685-694).ACM.
7. Jiao,J.,Venkat,K.,Han,Y.,&Weissman,T.(2015).Minimax estimationoffunctionals ofdiscretedistributions. IEEETransactionsonInformationTheory, 61(5),
2835-2885.
8. Vatedka,S.,&Vontobel,P.O.(2016,July).Patternmaximumlikelihoodestimationoffinite-statediscrete-timeMarkovchains.InInformationTheory(ISIT),2016
IEEEInternationalSymposiumon (pp.2094-2098).IEEE.
9. Vontobel,P.O.(2014,February).TheBetheandSinkhorn approximationsofthepatternmaximumlikelihoodestimateandtheirconnectionstotheValiantValiantestimate.InInformationTheoryandApplicationsWorkshop(ITA),2014 (pp.1-10).IEEE.
10. Valiant,G.,&Valiant,P.(2011,October).Thepoweroflinearestimators.InFoundationsofComputerScience(FOCS),2011IEEE52ndAnnualSymposiumon (pp.
403-412).IEEE.
11. Orlitsky,A.,Sajama,S.,Santhanam,N.P.,Viswanathan,K.,&Zhang,J.(2004).Algorithmsformodelingdistributionsoverlargealphabets.InInformationTheory,
2004.ISIT2004.Proceedings.InternationalSymposiumon (pp.304-304).IEEE.