Generating a Synthetic Population of Individuals in Households

©CopyrightJASSS
MaximeLenormandandGuillaumeDeffuant(2013)
GeneratingaSyntheticPopulationofIndividualsinHouseholds:Sample-FreeVsSample-BasedMethods
JournalofArtificialSocietiesandSocialSimulation 16(4)12
<http://jasss.soc.surrey.ac.uk/16/4/12.html>
Received:27-Aug-2012Accepted:17-Jun-2013Published:31-Oct-2013
Abstract
Wecompareasample-freemethodproposedbyGargiuloetal.(2010)andasample-basedmethodproposedbyYeetal.(2009)forgeneratingasyntheticpopulation,organisedinhouseholds,fromvarious
statistics.WegenerateareferencepopulationforaFrenchregionincluding1310municipalitiesandmeasurehowbothmethodsapproximateitfromasetofstatisticsderivedfromthisreferencepopulation.We
alsoperformasensitivityanalysis.Thesample-freemethodbetterfitsthereferencedistributionsofbothindividualsandhouseholds.Itisalsolessdatademandingbutitrequiresmorepre-processing.Thequality
oftheresultsforthesample-basedmethodishighlydependentonthequalityoftheinitialsample.
Keywords:
SyntheticPopulations,Sample-Free,IterativeProportionalUpdating,SampleBasedMethod,Microsimulation
Introduction
1.1
Fortwodecades,thenumberofmicro-simulationmodels,simulatingtheevolutionoflargepopulationswithanexplicitrepresentationofeachindividual,hasbeenconstantlyincreasingwiththecomputing
capabilitiesandtheavailabilityoflongitudinaldata.Whenimplementingsuchanapproach,thefirstproblemisinitialisingproperlyalargenumberofindividualswiththeadequateattributes.Indeed,inmostofthe
cases,forprivacyreasons,exhaustiveindividualdataareexcludedfromthepublicdomain.Aggregateddataatvariouslevels(municipality,county,...),guaranteeingthisprivacy,arehenceonlyavailableingeneral.
Sometimes,individualdataareavailableonasampleofthepopulation,thesedatabeingchosenalsoforguaranteeingtheprivacy(forinstanceomittingtheindividual'slocationofresidence).Thispaperfocuseson
theproblemofgeneratingavirtualpopulationwiththebestuseofthesedata,especiallywhenthegoalisgeneratingbothindividualsandtheirorganisationinhouseholds.
1.2
Twomainmethods,bothrequiringasampleofthepopulation,aimattacklingthisproblem:
Thesyntheticreconstructionmethods(SR)(Wilson&Pownall1976).ThesemethodsgenerallyusetheIterativeProportionalFitting(Deming&Stephan1940)andasampleofthetargetpopulationto
obtainthejoint-distributionsofinterest(Huang&Williamson2002;Yeetal.2009;Arentzeetal.2007;Guo&Bhat2007;Beckmanetal.1996).ManyoftheSRmethodsmatchtheobservedandsimulated
householdsjoint-distributionorindividualjoint-distributionbutnotsimultaneously.TocircumventtheselimitationsYeetal.(2009),Arentzeetal.(2007),andGuo&Bhat(2007)proposeddifferent
techniquestomatchbothhouseholdandindividualattributes.Here,wefocusontheIterativeProportionalUpdatingdevelopedbyYeetal.(2009).
Thecombinatorialoptimization(CO).Thesemethodscreateasyntheticpopulationbyzoneusingmarginalsoftheattributesofinterestandasub-setofasampleofthetargetpopulationforeachzone(fora
completedescriptionseeHuang&Williamson(2002);Voas&Williamson(2000)).
1.3
Recently,sample-freeSRmethodsappeared(Gargiulo etal.2010;Barthelemy&Toint2012).Thesample-freeSRmethodsbuildhouseholdsbypickingupindividualsinasetcomprisinginitiallythewhole
populationandprogressivelyshrinking.InBarthelemy&Toint(2012),ifthereisnoappropriateindividualinthecurrentset,theindividualispickedupinthealreadygeneratedhouseholds,whereasin(Gargiulo
etal.2010),theindividualsarepickedupinthesetonly.Bothapproachesareillustratedonreallifeexamples,Barthelemy&Toint(2012)generatedasyntheticpopulationofBelgiumatthemunicipalityleveland
Gargiulo etal.(2010)generatedthepopulationoftwomunicipalitiesinAuvergneregion(France).Thesemethodscanbeusedintheusualsituationswherenosampleisavailableandonemustonlyuse
distributionsofattributes(ofindividualsandhouseholds).Hence,theyovercomeastronglimitofthepreviousmethods.Itisthereforeimportanttoassessifthislargerscopeofthesample-freemethodimpliesa
lossofaccuracycomparedwiththesample-basedmethod.
1.4
Theaimofthispaperiscontributingtothisassessment.Withthisaim,wecomparethesample-basedIPUmethodproposedbyYeetal.(2009)withthesample-freeapproachproposedbyGargiulo etal.(2010)on
anexample.
1.5
Inordertocomparethemethods,theidealcasewouldbetohaveapopulationwithcompletedataavailableaboutindividualsandhouseholds.Itwouldallowustomeasurepreciselytheaccuracyofeachmethod,
indifferentconditions.Unfortunately,wedonothavesuchdata.Inordertoputourselvesinasimilarsituation,wegenerateavirtualpopulationandthenuseitasareferencetocomparetheselectedmethodsasin
Barthelemy&Toint(2012).AllthealgorithmspresentedinthispaperareimplementedinJAVAonadesktopmachine(PCIntel2.83GHz).
1.6
Inthefirstsectionweformallypresentthetwomethods.Inthesecondsectionwepresentthecomparisonresults.Finally,wediscussourresults.
Detailsofthechosenmethods
Sample-freemethodGargiulo etal.(2010)
2.1
WeconsiderasetofnindividualsXtodispatchinasetofmhouseholdsYinordertoobtainasetoffilledhouseholds P.Eachindividualx ischaracterisedbyatype tx fromasetofqdifferentindividualtypesT
(attributesoftheindividual).Eachhouseholdy ischaracterisedbyatype uy fromasetofpdifferentshouseholdtypesU(attributesofthehousehold).WedefinenT={nt k } 1
ofeachtypeandnU={nul } 1
l
k
q asthenumberofindividuals
p asthenumberofhouseholdsofeachtype.Eachhouseholdy ofagiventypeuy hasaprobabilitytobefilledbyasubsetofindividualsL,thenthecontentofthehousehold
equalsL,whichisdenotedc ( y )=L.WeusethisprobabilitytoiterativelyfillthehouseholdswiththeindividualsofX.
( c ( y )=L|uy )
2.2
(1)
TheiterativealgorithmusedtodispachtheindividualsintothehouseholdsaccordingtotheEquation1isdescribedinAlgorithm1.ThealgorithmstartswiththelistofindividualsXandofthehouseholdsY,defined
bytheirtypes.Thenititerativelypicksatrandomahousehold,andfromitstypeandEquation1,derivesalistofindividualtypes.IfthislistofindividualtypesisavailableinthecurrentlistofindividualsX,thenthis
filledhouseholdisaddedtotheresult,andthecurrentlistsofindividualsandhouseholdsareupdated.ThisoperationisrepeateduntiloneofthelistsXorYisvoid,oralimitnumberofiterationsisreached.
http://jasss.soc.surrey.ac.uk/16/4/12.html
1
15/10/2015
2.3
Inthecaseofthegenerationofasyntheticpopulation,wecanreplacetheselectionofthelistLbytheselectionoftheindividualsoneatatimebyorderofimportanceinthehousehold.InthiscaseEquation2
replacesEquation1.
( x 1
y |uy )×
( x 2
y |uy ,x 1
y )×
( x 3
y |uy ,x 1
y ,x 2
(2)
y )×
...
2.4
TheiterativeapproachalgorithmassociatedwiththisprobabilityisdescribedinAlgorithm2.Theprincipleisthesameaspreviously,itissimplyquicker.Insteadofgeneratingthewholelistofindividualsinthe
householdbeforecheckingit,onegeneratesthislistonebyone,andassoonasoneofitsmemberscannotbefoundinX,theiterationstops,andonetriesanotherhousehold.
2.5
Inpracticethisstochasticapproachisdatadriven.Indeed,thetypesT andUaredefinedinaccordancewiththedataavailableandthecomplexitytoextractthedistributionoftheEquation2increaseswithnTand
nU.ThedistributionsdefinedinEquation2arecalleddistributionsforaffectingindividualintohousehold.Inconcreteapplications,itoccursthatoneneedstoestimatenT,nUandthedistributionsofprobabilities
presentedinEquation2.ThisestimationimpliesthattheAlgorithm2cannotconvergeinareasonabletimebecauseofthestoppingcriterion(y ≠∅).Thisstoppingcriterionisequivalenttoaninfinitenumberof
"filling"trialsbyhouseholds.Inthiscase,wecanreplacethestoppingcriterionbyamaximalnumberofiterationsbyhouseholdsandthenputtheremainingindividualsintheremaininghouseholdsusingrelieved
distributionsforaffectingindividualintohousehold.
2.6
Inaperfectcasewhereallthedataareavailableandthetimeinfinite,thealgorithmwouldfindaperfectsolution.Whenthedataarepartialandthetimeconstrained,itisinterestingtoassesshowthismethod
managestomakethebestuseoftheavailabledata.
Thesample-basedapproach(GeneralIterativeProportionalUpdating)
2.7
Thisapproach,proposedbyYeetal.(2009),startswithasamplePs ofPandthepurposeistodefineaweightwiassociatedwitheachindividualandeachhousehldofthesampleinordertomatchthetotal
numberofeachtypeofindividualsinXandhouseholdsinYtoreconstructP.ThemethodusedtoreachthisobjectiveistheIterativeProportionalUpdating(IPU).ThealgorithmproposedinYeetal.(2009)is
describedinAlgorithm3.Inthisalgorithm,foreachtypeofhouseholdsorindividualsjthepurposeistomatchtheweightedsumswjwiththeestimatedconstraintsejwithanadjustementoftheweights. ejisan
estimationofthetotalnumberofhouseholdsorindividualsjin P.ThisestimationisdoneseparetelyforeachindividualandhouseholdtypeusingastandardIPFprocedurewithmarginalvariables.Whenthematch
betweentheweightedsampleandtheconstraintbecomesstable,thealgorithmstops.TheprocedurethengeneratesasyntheticpopulationbydrawingatrandomthefilledhouseholdsofPs withprobabilities
correspondingtotheweights.Thisgenerationisrepeatedseveraltimesandonechoosestheresultwiththebestfitwiththeobserveddata.
http://jasss.soc.surrey.ac.uk/16/4/12.html
2
15/10/2015
Table1:IPUTable.ThelightgreytablerepresentsthefrequencymatrixDshowingthehouseholdtypeUandthefrequencyofdifferentindividualtypesT within
eachfilledhouseholdsforthesamplePs .ThedimensionofDis|Ps |×(p+q),where|Ps |isthecardinalnumberofthesamplePs ,qthenumberofindividualtypes
andpthenumberofhouseholdtypes.AnelementdijofDrepresentsthecontributionoffilledhouseholditothefrequencyofindividual/householdtypej.
Generatingasyntheticpopulationofreferenceforthecomparison
3.1
Becausewecannotaccessanypopulationwithcompletedataavailableaboutindividualsandhouseholds,wegenerateavirtualpopulationandthenuseitasareferencetocomparetheselectedmethodsasin
Barthelemy&Toint(2012).
3.2
WestartwithstatisticsaboutthepopulationofAuvergne(Frenchregion)in1990usingthesample-freeapproachpresentedabove.TheAuvergneregioniscomposedof1310municipalities,1,321,719inhabitants
gatheredin515,736households.Table2presentssummarystatisticsontheAuvergnemunicipalities.
Table2:Summarystatisticsonthe
Auvergnemunicipalities
Statistics
http://jasss.soc.surrey.ac.uk/16/4/12.html
Min
Max
Average
Households
8
63,226
408.2
Individuals
26 136,180 1,011.7
3
15/10/2015
Generationoftheindividuals
3.3
ForeachmunicipalityoftheAuvergneregionwegenerateasetXofindividualswithastochasticprocedure.Foreachindividualoftheagepyramid(distribution1inTable3),werandomlychooseanageinthebin
andthenwedrawrandomlyanactivitystatusaccordingtothedistribution2inTable3.
Generationofthehouseholds
3.4
ForeachmunicipalityoftheAuvergneregionwegenerateasetYofhouseholdsaccordingtothetotalnumberofindividualn=|X|withastochasticprocedure.Wedrawatrandomhouseholdsaccordingtothe
distribution3inTable3whilethesumofthecapacitiesisbelownandthenwedeterminethelasthouseholdtohavenequaltothesumofthesizeofthehouseholds.
Distributionsforaffectingindividualintohousehold
Single
Theageoftheindividual1isdeterminedusingthedistribution4inTable3.
Monoparental
Theageoftheindividual1isdeterminedusingthedistribution4inTable3.
Theagesofthechildrenaredeterminedaccordingtotheageofindividual1(Anindividualcandoachildafter15andbefore55)andthedistribution6inTable3.
Couplewithoutchild
Theageoftheindividual1isdeterminedusingthedistribution4inTable3.
Theageoftheindividual2isdeterminedusingthedistribution5inTable3.
Couplewithchild
Theageoftheindividual1isdeterminedusingthedistribution4inTable3.
Theageoftheindividual2isdeterminedusingthedistribution5inTable3.
Theagesofthechildrenaredeterminedaccordingtotheageofindividual1andthedistribution6inTable3.
Other
Theageoftheindividual1isdeterminedusingthedistribution4inTable3.
Theagesoftheothersindividualsaredeterminedaccordingtotheageofindividual1.
Table3:Datadescription
3.5
ID
Description
Level
1
Numberofindividualsgroupedbyages
Municipality
(LAU2)
2
Distributionofindividualbyactivitystatusaccording
totheage
Municipality
(LAU2)
3
Joint-distributionofhouseholdbytypeandsize
Municipality
(LAU2)
4
Probabilitytobetheheadofhouseholdaccordingto Municipality
theageandthetypeofhousehold
(LAU2)
5
Probabilityofhavingacoupleaccordingtothe
differenceofagebetweenthepartners(from"16years"to"21years")
Nationallevel
6
Probabilitytobeachild(child=livewithparent)of
householdaccordingtotheageandthetypeof
household
Municipality
(LAU2)
ToobtainasyntheticpopulationPwithhouseholdsYfilledbyindividuals XweusetheAlgorithm2whereweapproximatetheEquation2withthedistributions4,5and6inTable3.Weputnoconstraintonthe
numberofindividualsintheagepyramid,hencethereferencepopulationdoesnotgiveanyadvantagetothesample-freemethod.Figure4andFigure5showthevaluesobtainedforindividual'sandhousehold's
attributesfortheAuvergneregionandforMarsac-en-Livradois,amunicipalitydrawnatrandomamongthe1310Auvergnemunicipalities.Thesefiguresshowtheresultsobtainedwiththereference,thesample-free
andthesample-basedpopulations.
Comparingsample-freeandsample-basedapproaches
4.1
TheattributesofbothindividualsandhouseholdsarerespectivilydescribedinTable4andTable5.Thejoint-distributionsofboththeattributesforindividualsandhouseholdsgiverespectivelythenumberof
individualsofeachindividualtypenT={nt k } 1
k
q andthenumberofhouseholdsofeachhouseholdtypenU={nul } 1
l
.
p .Inthiscase,q=130andp=17.It'simportanttonotethatpisnotequalto6 5
=30becauseweremovefromthelistofhouseholdtypestheinconsistentvalueslikeforexamplesinglehouseholdsofsize5.Wedothesamefortheindividualtypes(removingforexampleretiredindividualsof
agecomprisedbetweeen0and5).
Table4:Individuallevelattributes
Attribute
Value
Age
[0,5[
[5,15[
[15,25[
[25,35[
[35,45[
[45,55[
[55,65[
[65,75[
[75,85[
85andmore
ActivityStatus Student
Active
FamilyStatus Headofasinglehousehold
http://jasss.soc.surrey.ac.uk/16/4/12.html
4
15/10/2015
Headofamonoparentalhousehold
Headofacouplewithoutchildren
household
Headofacouplewithchildrenhousehold
Headofaotherhousehold
Childofamonoparentalhousehold
Childofacouplewithchildrenhousehold
Partner
Other
Table5:Householdlevelattributes
Attribute Value
Size
1individual
2individuals
3individuals
4individuals
5individuals
6andmoreindividuals
Type
Single
Monoparental
Couplewithout
children
Couplewithchildren
Other
Fittingaccuracymeasures
4.2
WeneedfittingaccuracymeasurestoevaluatetheadequacybetweenbothobservedO andestimatedEhouseholdandindividualdistributions.ThefirstmeasureistheProportionofGoodPrediction(PGP)
(Equation3),wechoosethisfirstindicatorforthefacilityofinterpretation.IntheEquation3wemultipliedby0.5becauseaswehave
O k =
Ek ,eachmisclassifiedindividualorhouseholdis
countedtwice(Harlandetal.2012).
PGP=1-
4.3
Weusethe
(3)
distancetoperformastatistictest.Obviouslythemodalitieswithazerovaluefortheobserveddistributionarenotincludedinthe
differentfromzerointheobserveddistribution,the
distancefollowsa
computation.Ifweconsideradistibutionwithpmodalities
distributionwithp-1degreesoffreedom.
=
4.4
(4)
FormoredetailsonthefittingaccuracymeasuresseeVoas&Williamson(2001).
Sample-freeapproach
4.5
Totestthesample-freeapproach,weextractfromthereferencepopulation,foreachmunicipality,thedistributionspresentedinTable3.Thenweusetheprocedureusedforgeneratingthepopulationofreference
butnowwiththeconstraintsonthenumberofindividualsfromtheagepyramidderivedfromthereference(rememberthatwedidnothavesuchconstraintswhengeneratingthereferencepopulation).Thenwefill
thehouseholdswiththeindividualsoneatatimeusingthedistributionsforaffectingindividualintohousehold.Welimitthenumberofiterationsto1000trialsbyhousehold:Ifafter1000trialsahouseholdisnot
filled,weputatrandomindividualsinthishouseholdandwechangeitstypeto"other".Werepeattheprocess100timesandwechoose,foreachmunicipality,thesyntheticpopulationminimizingthe
distance
betweensimulatedandreferencedistributionsforaffectingindividualintohousehold.
4.6
Inordertoassesstherobustnessofthestochasticsample-freeapproach,wegenerate10syntheticpopulationsbymunicipalities,yielding13,100syntheticmunicipalitypopulationsintotal.Foreachofthemandfor
eachdistributionsforaffectingindividualintohouseholdwecomputethep-valueassociatedto
distancebetweenthereferenceandestimateddistributions.AswecanseeintheFigure1athealgorithmis
quiterobust.
4.7
Tovalidatethealgorithmwecomputetheproportionofgoodpredictionsforeach13,100syntheticpopulationsandforeachjoint-distribution.Weobtainanaverageof99.7%ofgoodpredictionsforthehousehold
distributionand91.5%ofgoodpredictionsfortheindividualdistribution(Figure1b).Wealsocomputethep-valueofthe
distancebetweentheestimatedandreferencedistributionsforeachofthesynthetic
populationsandforeachjoint-distribution.Amongthe13,100syntheticpopulations100%arestatisticallysimilartotheobservedoneata0.95%levelofconfidenceforthehouseholdjoint-distributionand94%for
theindividualjoint-distribution.
4.8
Inordertounderstandtheeffectofthemaximalnumberofiterationsbyhousehold,werepeattheprevioustestsfordifferentvaluesofthisparameter(1,10,100,500,1000,1500and2000)andwecomputethemean
proportionofgoodpredictionsobtainedforbothindividualandhousehold.Wenotethatafter100thequalityoftheresultsnolongerchanges.
http://jasss.soc.surrey.ac.uk/16/4/12.html
5
15/10/2015
Figure1:(a)Boxplotsofthep-valuesobtainedwiththe
distancebetweentheestimateddistributionsandtheobserveddistributionsforeachdistributionforaffectingindividualintohousehold,municipalitiesandreplications.Th
presentedinTable3.Theredlinerepresentstherisk5%forthe
test.(b)Boxplotsoftheproportionofgoodpredictionsforeachjoint-distribution,municipalitiesandreplications.(c)Averageproportionofgoodpredictionsasa
iterationbyhouseholds.Bluecirclesforthehouseholds.Redtrianglesfortheindividual.
IPU
4.9
TousetheIPUalgorithmweneedasampleoffilledhouseholdsandmarginalvariables.Inordertoobtainthesedatawepickatrandomasignificantsampleof25%ofhouseholdsfromthereferencepopulationP
andwealsoextractfromPthetwoone-dimensionalmarginals(SizeandTypedistributions)thatweneedtobuildthehouseholdjoint-distributionswithIPFandthethreetwo-dimensionalmarginals(Age×Activity
Status,Age×FamilyStatusandFamilyStatus×ActivityStatus)joint-distributionsthatweneedtobuildtheindividualjoint-distributionswithIPF.ThenweapplytheAlgorithm3usingtherecommendationofYe
etal.(2009)forthewell-knowzero-cellandzero-marginalproblemstoobtainaweightedsamplePs .Withthissamplewegenerate100timesthesyntheticpopulationPandchoosetheonewithlowest
distancebetweenreferenceandsimulatedindividualjoint-distributions.
4.10 TochecktheresultsobtainedwiththeIPUapproach,wegenerate10syntheticpopulationsbymunicipalityusingdifferentsamplesof25%ofhouseholdsrandomlyselected.Foreachofthesesyntheticpopulations
andforeachjoint-distributionwecomputetheproportionofgoodpredictions(Figure2a).Weobtainanaverageof98.6%ofgoodpredictionsforthehouseholddistributionand86.9%ofgoodpredictionsforthe
individualdistribution.TodeterminetheerrorofestimationduetotheIPFprocedurewecomputetheproportionofgoodpredictionsfortheestimatedandtheIPF-referencedistributions.AswecanseeinFigure2b
theresultsareimprovedforthehouseholddistributionbutnotfortheindividualdistribution.Wealsocomputethep-valueofthe
distancebetweentheestimatedandobserveddistributionsforeachofthe
syntheticpopulationsandforeachjoint-distribution.Amongthe13,100syntheticpopulations100%arestatisticallysimilartotheobservedoneata0.95%levelofconfidenceforthehouseholdjoint-distributionand
61%fortheindividualjoint-distribution.WeobtainedasimilaritybetweentheestimatedandtheIPF-objectivedistributionsof100%ata0.95%levelofconfidenceforthehouseholddistributionand64%forthe
individualdistribution.
4.11 Inordertocheckthesensitivityoftheresultstothesizeofthesample,weplot,onFigure2c,theaverageproportionofgoodpredictionsofthe13,100householdandindividualsjoint-distributonsfordifferentvalues
ofthepercentageofthereferencehouseholdsdrawnatrandominthesample(5,10,15,20,25,30,35,40,45and50).Wenotethattheresultsarealwaysgoodforthehouseholddistributionbutfortheindividuals
theresultsaregoodonlyfromrandomsampleofatleast25%ofthereferencehouseholdpopulation.Notsurprisingly,globallythequalityoftheresultsincreaseswiththeparameter.
Figure2:(a)Boxplotsoftheproportionofgoodpredictionsforacomparaisonbetweentheestimateddistributionandtheobserveddistributionforeachmunicipalityandreplication.(b)Boxplotsoftheproportionofgoodpredictionsfor
distributionandtheIPF-objectivedistributionforeachmunicipalityandreplication.(c)Averageproportionofgoodpredictionsasafunctionofthesamplesize.Bluecirclesforthehouseholds.Redtrianglesfort
Discussion
5.1
Thesample-freemethodislessdatademandingbutitrequiresmoredatapre-processing.Indeed,thisapproachrequirestoextractthedistributionsforaffectingindividualintohouseholdfromdata.ThesamplefreemethodgivesbetterfitbetweenobservedandsimulateddistributionforbothhouseholdandindividualdistributionthantheIPUapproach.WecanobserveinFigure3that,forbothmethods,thegoodness-of-fit
isnegativelycorrelatedwiththenumberofinhabitants.ThisobservationisespeciallytruefortheIPUmethodbecauseitdependsonthenumberofindividualsinthesample.Indeed,theloweristhenumberof
individuals,thehigheristhenumberofsparsecellsintheindividualdistribution.TheresultsobtainedwiththeIPUapproachdependofthequalityoftheinitialsample.Theexecutiontimeonadesktopmachine(PC
Intel2.83GHz)isalmostthesamefor100maximaliterationsbyhouseholdforthesample-freemethodand25%referencehouseholdsdrawnatrandominthesamplereferencehouseholdsforthesample-based
approach.
5.2
Toconclude,thesample-freemethodgivesgloballybetterresultsinthisapplicationonsmallFrenchmunicipalities.TheseresultsconfirmthoseofBarthelemy&Toint(2012)whocomparedtheirsample-free
methodforworkingwithdatafromdifferentsourceswithasample-basedmethod(Guo&Bhat2007),andobtainedsimilarconclusions.Ofcourse,theseconclusionscannotbegeneralizedtoallsample-freeand
sample-basedmethodswithoutfurtherinvestigation.However,theseresultsconfirmthepossibilitytoinitialiseaccuratelymicro-simulation(oragent-based)models,usingwidelyavailabledata(andwithoutany
sampleofhouseholds).
http://jasss.soc.surrey.ac.uk/16/4/12.html
6
15/10/2015
( a)
( b)
(c )
Figure:Mapsoftheaverageproportionofgoodpredictions((a)sample-freeand(b)IPU)andthenumberofinhabitants((c))bymunicipalityfortheAuvergnecasestudy.For(a)-(b),inblue0.5
<0.75;Ingreen0.75
PGP<0.9;Inred0.9
PGP
PGP.For(c),ingreen,thenumberofinhabitantsislowerthan350.Inred,thenumberofinhabitantsisupperthan350.Basemapssource:Cemagref-DTM
-DéveloppementInformatiqueSystèmed'InformationetBasedeDonnées:F.Bray&A.TorreIGN(Géofla®,2007).
Table6:Averageexecutiontimeforthe
twoapproachesfordifferentparameter
values.
IPU
Samplesize
Iterative
Time Iterations Time
5
13min
1
40min
10
24min
10
41min
15
29min
100
45min
20
38min
500
58min
25
45min
1000
66min
30
53min
1500
78min
40
74min
2000
88min
AppendixA:
( a)
http://jasss.soc.surrey.ac.uk/16/4/12.html
7
15/10/2015
( b)
(c )
Figure4:Barplotsofindividual'sandhousehold'sattributesfortheAuvergneregion.(a)Household'ssize.(b)Household'stype.(c)Individual'sagedistribution.Inblack,
thereferencepopulation.Indarkgrey,thepopulationobtainedwiththesample-freemethod(1000maximaliterations).Inlightgrey,thepopulationobtainedwiththe
sample-basedmethod(25%ofthereferencehouseholdpopulation).Thebarsrepresentthestandarddeviationsobtainedwith10replications.
( a)
http://jasss.soc.surrey.ac.uk/16/4/12.html
8
15/10/2015
( b)
(c )
Figure5:Barplotsofindividual'sandhousehold'sattributesforMarsac-en-Livradois,amunicipalitydrawnatrandomamongthe1310Auvergnemunicipalities.(a)
Household'ssize.(b)Household'stype.(c)Individual'sagedistribution.Inblack,thereferencepopulation.Indarkgrey,thepopulationobtainedwiththesample-free
method(1000maximaliterations).Inlightgrey,thepopulationobtainedwiththesample-basedmethod(25%ofthereferencehouseholdpopulation).Thebarsrepresent
thestandarddeviationsobtainedwith10replications.
Acknowledgements
ThispublicationhasbeenfundedbythePrototypicalpolicyimpactsonmultifunctionalactivitiesinruralmunicipalitiescollaborativeproject,EuropeanUnion7thFrameworkProgramme(ENV2007-1),contractno.
212345.TheworkofthefirstauthorhasbeenfundedbytheAuvergneregion.
References
ARENTZE,T.,T IMMERMANS,H.&HOFMAN, F.(2007).Creatingsynthetichouseholdpopulations:Problemsandapproach.TransportationResearchRecord:JournaloftheTransportationResearchBoard2014,8591.[doi:10.3141/2014-11]
BARTHELEMY,J.&T OINT, P.L.(2013).SyntheticPopulationGenerationWithoutaSample.TransportationScience47,2,266-279.[doi:10.1287/trsc.1120.0408]
BECKMAN, R.J.,BAGGERLY, K.A.&MCKAY, M.D.(1996).Creatingsyntheticbaselinepopulations.TransportationResearchPartA:PolicyandPractice30(6PARTA),415-429.[doi:10.1016/0965-8564(96)00004-3]
DEMING, W.E.&STEPHAN, F.F.(1940).Onaleastsquaresadjustmentofasamplefrequencytablewhentheexpectedmarginaltotalsareknown.AnnalsofMathematicalStatistics11,427-444.
[doi:10.1214/aoms/1177731829]
G ARGIULO, F.,T ERNES,S.,HUET,S.&DEFFUANT, G.(2010).Aniterativeapproachforgeneratingstatisticallyrealisticpopulationsofhouseholds.PLoSONE 5(1).[doi:10.1371/journal.pone.0008828]
G UO, J.Y.&BHAT,C.R.(2007).Populationsynthesisformicrosimulatingtravelbehavior.TransportationResearchRecord:JournaloftheTransportationResearchBoard2014,92-101.[doi:10.3141/2014-12]
HARLAND, K.,HEPPENSTALL,A.,SMITH,D.&BIRKIN,M.(2012).Creatingrealisticsyntheticpopulationsatvaryingspatialscales:Acomparativecritiqueofpopulationsynthesistechniques.JournalofArtificialSocieties
andSocialSimulation 15(1),1.http://jasss.soc.surrey.ac.uk/15/1/1.html.
HUANG,Z.&WILLIAMSON,P.(2002).Acomparisonofsyntheticreconstructionandcombinatorialoptimizationapproachestothecreationofsmall-areamicrodata.Workingpaper,DepartementofGeography,
UniversityofLiverpool.
VOAS, D.&WILLIAMSON,P.(2000).Anevaluationofthecombinatorialoptimisationapproachtothecreationofsyntheticmicrodata.InternationalJournalofPopulationGeography6(5),349-366.[doi:10.1002/10991220(200009/10)6:5<349::AID-IJPG196>3.0.CO;2-5]
VOAS, D.&WILLIAMSON,P.(2001).Evaluatinggoodness-of-fitmeasuresforsyntheticmicrodata.GeographicalandEnvironmentalModelling5(2),177-200.[doi:10.1080/13615930120086078]
WILSON,A.G.&POWNALL,C.E.(1976).Anewrepresentationoftheurbansystemformodellingandforthestudyofmicro-levelinterdependence. Area8(4),246-254.
YE, X.,KONDURI, K.,PENDYALA,R.M.,SANA, B.&WADDELL,P.(2009).Amethodologytomatchdistributionsofbothhouseholdandpersonattributesinthegenerationofsyntheticpopulations.In:88thAnnual
MeetingoftheTransportationResearchBoard.
http://jasss.soc.surrey.ac.uk/16/4/12.html
9
15/10/2015
http://jasss.soc.surrey.ac.uk/16/4/12.html
10
15/10/2015