MaximumLikelihoodApproachfor
SymmetricDistributionProperty
Estimation
Jayadev Acharya,Hirakendu Das,Alon Orlitsky,Ananda Suresh
Cornell,Yahoo,UCSD,Google
Propertyestimation
โข ๐:unknown discretedistributionover๐ elements
โข ๐(๐):apropertyof๐
โข ๐บ:accuracyparameter,๐น: errorprobability
โข Given:accesstoindependentsamplesfrom๐
โข Goal:estimate๐(๐) to±๐บ withprobability > ๐ โ ๐น
Usually,๐น constant,say0.1
Focusofthetalk:large๐
Samplecomplexity
๐. , ๐0 , โฆ , ๐2 :independentsamplesfrom๐
๐5(๐.2):estimateof๐(๐)
Samplecomplexityof๐5 (๐.2):
๐ ๐5, ๐, ๐, ๐ฟ = min{๐: โ๐, Pr |๐5(๐.2 ) โ ๐(๐)| โฅ ๐ โค ๐ฟ}
Samplecomplexityof๐:
๐ ๐, ๐, ๐, ๐ฟ = min ๐ ๐5 , ๐, ๐, ๐ฟ
G5
Symmetricproperties
Permutingsymbollabelsdoesnotchange๐(๐)
Examples:
โข ๐ป ๐ โ โ โK ๐K โ
log ๐K
โข ๐ ๐ โ โK 1QRST
Renyi entropy,distancetouniformity,unseensymbols,
divergences,etc
Sequencemaximumlikelihood
๐K : #times๐ฅ appearsin๐.2 (multiplicity)
๐W :empiricaldistribution
๐KW =
๐K
๐
๐ W ๐.2 = ๐(๐W )
Empiricalestimatorsaremaximumlikelihoodestimators
๐W = arg max ๐(๐.2 )
Q
Callthissequencemaximumlikelihood (SML)
Empiricalentropyestimation
Empiricalestimator:
๐ปW
๐K
๐
=๐ป
= Z log .
๐
๐K
K
๐
W
๐ ๐ป , ๐, ๐, 0.1 = ฮ
๐
๐.2
๐W
Variouscorrectionsproposed:
Miller-Maddow,Jackknifedestimator,Coverageadjusted,โฆ
Samplecomplexity:ฮฉ(๐)
๐ป, ๐, ๐, 0.1 = ๐(๐) (existential)
Note:For๐ โช 1/๐,empiricalestimatorsarethebest
[Paninskiโ03]:๐
Entropyestimation
[ValiantValiantโ11a]: ConstructiveproofsbasedonLP:
d
๐ ๐ป, ๐, ๐, 0.1 = ฮc efg d
โข [YuWangโ14, HanJiaoVenkatWeissmanโ14, ValiantValiant11b]:
Simplifiedalgorithms,andexactrates:
๐ ๐ป, ๐, ๐, 0.1 = ฮ
d
cefg d
Supportcoverage
Expectednumberofsymbolswhen๐ issampled๐ times
๐i ๐ = Z(1 โ 1 โ ๐K
i)
K
Goal:Estimate๐i ๐ to±(๐ โ
๐)
[OrlitskySureshWuโ16, ZouValiantetalโ16]:
๐=ฮ
i
.
log
efg i
c
samplesfor๐ฟ = 0.1
Knownresultssummary
Manysymmetricproperties:entropy,supportsize,distance
touniform,supportcoverage
โข Differentestimatorforeachproperty
โข Sophisticatedresultsfromapproximationtheory
Mainresult
Simple,MLbasedplug-inmethodthatissample-optimal
forentropy,support-coverage,distancetouniform,
supportsize.
Profiles
Profileofasequenceisthemultiset ofmultiplicities:
ฮฆ ๐.2 = {๐K }
๐.2 = 1๐ป, 2๐ , ๐๐๐.2 = 2๐ป, 1๐ ,ฮฆ ๐.2 = {1,2}
Symmetricpropertiesdependonmultiset ofprobabilities
Coinsw/bias0.4,and0.6havesamesymmetricproperty
Optimalestimatorshavesame outputfor sequenceswith
sameprofile.
Profilesaresufficientstatistic
Profilemaximumlikelihood [OSVZโ04]
Probabilityofaprofile:
๐ ฮฆ(๐.2) =
Z
๐(๐.2)
opq :r opq sr tpq
Maximizetheprofileprobability:
uvw
๐r
= arg max ๐(ฮฆ ๐.2 )
Q
๐.2 = 1๐ป, 2๐ :
SML:(2/3, 1/3)
PML:(1/2, 1/2)
PMLforsymmetricproperties
Toestimateasymmetric๐(๐):
โข Find๐uvw ฮฆ(๐.2 )
โข Output๐(๐uvw )
Advantages:
โข Notuningparameters
โข Notfunctionspecific
Mainresult
PMLissample-optimal forentropy,supportcoverage,
distancetouniformity,andsupportsize.
Ingredients
GuaranteeforPML.
If๐ = ๐ ๐, ๐, ๐, ๐ฟ ,then๐ ๐
If ๐ = ๐บ ๐, ๐บ, ๐, ๐}๐
๐
#profilesoflength๐ <
๐uvw
, 2๐, ๐, ๐ฟ
Wy q
โ
.T โค๐
,then ๐บ ๐ ๐๐ท๐ด๐ณ , ๐๐บ, ๐, ๐. ๐ โค ๐
Wy q
.T
Ingredients
๐ = ๐ ๐, ๐, ๐ฟ ,achievedbyanestimator๐5 (ฮฆ(๐ 2 )):
๐:underlyingdistribution.
โข Profilesฮฆ ๐ 2 suchthat๐ ฮฆ ๐ 2
> ๐ฟ,
๐uvw ฮฆ โฅ ๐ ฮฆ > ๐ฟ
uvw
๐r
ฮฆ โฅ๐ ฮฆ >๐ฟ
uvw
uvw
๐ ๐r
โ ๐ ๐ โค ๐ ๐r
โ ๐5 ฮฆ + ๐5 ฮฆ โ ๐ ๐
โข Profileswith๐ ฮฆ ๐ 2
< 2๐
< ๐ฟ,
๐(๐ ฮฆ ๐ 2 < ๐ฟ < ๐ฟ โ
#๐๐๐๐๐๐๐๐ ๐๐๐๐๐๐๐กโ๐
Ingredients
Bettererrorprobabilityguarantees.
Recall:
๐
๐ ๐ป, ๐, ๐, 0.1 = ฮ
๐ โ
log ๐
StrongererrorguaranteesusingMcDiarmidโs inequality:
๐
ลฝ.โข
}2
๐ป, ๐, ๐, ๐
=ฮ
d
cโ
efg d
Withtwicethesampleserrordropsexponentially
Similarresultsforotherproperties
Mainresult
PMLissample-optimal forentropy,unseen,distanceto
uniformity,andsupportsize.
EvenapproximatePML isoptimalforthese.
Algorithms
โข EMalgorithm[Orlitsky etal]
โข ApproximatePMLviaBethePermanents[Vontobel]
โข ExtensionsofMarkovChains[VatedkaVontobel]
PolynomialtimealgorithmsforapproximatingPML
Summary
โข Symmetricpropertyestimation
โข PMLplug-inapproach
โข Universal,simpletostate
โข Independentofparticularproperties
โข Directions:
โข Efficientalgorithmsโ forapproximatePML
โข Reliesheavilyonexistenceofotherestimators
InFisherโswordsโฆ
Ofcoursenobodyhasbeenabletoprovethatmaximum
likelihoodestimatesarebestunderallcircumstances.
Maximumlikelihoodestimatescomputedwithallthe
informationavailablemayturnouttobeinconsistent.
Throwingawayasubstantialpartoftheinformationmay
renderthemconsistent.
R.A.Fisher
References
1. Acharya,J.,Das,H.,Orlitsky,A.,&Suresh,A.T.(2016).AUnifiedMaximumLikelihoodApproachforOptimalDistributionPropertyEstimation.arXiv preprint
arXiv:1611.02960.
2. Paninski,L.(2003).Estimationofentropyandmutualinformation.Neuralcomputation,15(6),1191-1253.
3. Wu,Y.,&Yang,P.(2016).Minimax ratesofentropyestimationonlargealphabetsviabestpolynomialapproximation.IEEETransactionsonInformationTheory,
62(6),3702-3720.
4. Orlitsky,A.,Suresh,A.T.,&Wu,Y.(2016).Optimalpredictionofthenumberofunseenspecies.ProceedingsoftheNationalAcademyofSciences,201607774.
5. Orlitsky,A.,Santhanam,N.P.,Viswanathan,K.,&Zhang,J.(2004,July).Onmodelingprofilesinsteadofvalues.InProceedingsofthe20thconferenceon
Uncertaintyinartificialintelligence (pp.426-435).AUAIPress.
6. Valiant,G.,&Valiant,P.(2011,June).Estimatingtheunseen:ann/log(n)-sampleestimatorforentropyandsupportsize,shownoptimalvianewCLTs.In
Proceedingsoftheforty-thirdannualACMsymposiumonTheoryofcomputing (pp.685-694).ACM.
7. Jiao,J.,Venkat,K.,Han,Y.,&Weissman,T.(2015).Minimax estimationoffunctionals ofdiscretedistributions. IEEETransactionsonInformationTheory, 61(5),
2835-2885.
8. Vatedka,S.,&Vontobel,P.O.(2016,July).Patternmaximumlikelihoodestimationoffinite-statediscrete-timeMarkovchains.InInformationTheory(ISIT),2016
IEEEInternationalSymposiumon (pp.2094-2098).IEEE.
9. Vontobel,P.O.(2014,February).TheBetheandSinkhorn approximationsofthepatternmaximumlikelihoodestimateandtheirconnectionstotheValiantValiantestimate.InInformationTheoryandApplicationsWorkshop(ITA),2014 (pp.1-10).IEEE.
10. Valiant,G.,&Valiant,P.(2011,October).Thepoweroflinearestimators.InFoundationsofComputerScience(FOCS),2011IEEE52ndAnnualSymposiumon (pp.
403-412).IEEE.
11. Orlitsky,A.,Sajama,S.,Santhanam,N.P.,Viswanathan,K.,&Zhang,J.(2004).Algorithmsformodelingdistributionsoverlargealphabets.InInformationTheory,
2004.ISIT2004.Proceedings.InternationalSymposiumon (pp.304-304).IEEE.
© Copyright 2026 Paperzz