Validating Model Assumptions Stephen Eubank Institute of Medicine workshop on the role of community-based mitigation strategies Oct 25-26, 2006 Network Dynamics and Simulation Science Laboratory 1st task element: evaluate and improve model utility • Conclusions and recommendations regarding – “strengths and weaknesses of the models presented” – “strategies to improve predictive ability and usefulness” • Strengths and weaknesses in model assumptions – Most relevant to Longini’s presentation of results (more complete list appended) – Appropriate level of complexity • Improving usefulness – Validation: Building confidence – Suitability: Addressing the questions “All models are wrong. But some are useful” [G.E.P. Box] Model assumptions are both explicit and implicit • Explicit assumptions – Equations: How does the world change? – Parameter values: What is the current state of the world? • Implicit assumptions – Intended scope of the model – What’s included and what’s left out • Absence of explicit parameter ⇒ no assumption is made All infectious disease models make assumptions about transmission For influenza: • • Opportunities arise from physical proximity: “social network” Likelihood depends on many poorly understood factors, so fit to historical attack rate Likelihood of transmitting Case serial interval = 3.2 days Symptomatic (67%) Asymptomatic (33%) 0 days Latency 1.2d Longini, et al., Science 309, 1083 (2005) Incubation 1.7d Possibly symptomatic 3.5d Simple (S-I-R) models make drastic assumptions about opportunities & likelihood Large classes of people (e.g. 5-14 year olds) are indistinguishable ⇒ uniform mixing: any infectious person has opportunity to transmit to any susceptible ⇒ likelihood of transmission is the same for each (infectious, susceptible) pair. β Individual-based models allow more detailed representations Each individual is represented separately in the computer • each person comes into contact with only some of the others Distribution of expected # transmissions per infectious person • likelihood of transmission may depend on who’s involved The VBI social network is generated from individuals’ daily activities Daily activities are estimated from census & surveys Demographics: Age, Gender, Income, Job, Household members, # vehicles, etc. Daily Activities and Locations: Work, Shopping, School, College, Home, etc. VBI social network is based on calibrated, peer-reviewed urban mobility models • R. J. Beckman, K. A. Baggerly, and M. D. McKay, Creating synthetic base-line populations, Transportation Research Part A 30 (1996) 415 - 429. • 5th - 9th biennial National Academies’ Transportation Research Board Conferences on Application of Transportation Planning Methods, 1995 - 2003. • Transportation Research Board annual meetings, 1998 - 2006. • Z. Toroczkai and S. Eubank, Agent-Based Modeling as a Decision-Making Tool, The Bridge, National Academies Press 35 (Winter, 2005) 99 - 108. • Eubank et al., Modelling disease outbreaks in realistic urban social networks, Nature 429 (2004) 180—184. Why So Complex? Simple Models Don’t Address the Question “Since all models are wrong the scientist cannot obtain a ‘correct’ one by excessive elaboration.” [Box] Compartmental models lead to results like “Reduce transmission rates by x%” This can be achieved by a combination of • Reducing opportunities for transmission • Reducing likelihood of transmission Robustness in the Strategy of Scientific Model Building, in Robustness in Statistics, ed. R. Launer and G. Wilkinson, Academic Press, (1979) p 201 - 236. What does it mean to reduce transmission by 50%? Original group of people in contact Split into a few unequal groups Split into 2 equal groups Reduce likelihood of transmission Model Choice Constrains Scenario Representation Representations in the VBI model • Closing schools – Replace school activities with “home” or next planned activity – Require adult to stay home in each household with student • “Generic” social distancing, compliance rate x – Replace all non-essential activities with “home” – Applied to x% of households chosen at random • No additional assumption about changes in transmission rate, because it’s determined by duration of contact Level of complexity depends on the question & the system For questions about targeted interventions on typical social networks, how much detail do the models need? • For some questions on some networks, need all the details • For all questions on all networks, need some details Appropriate level can be studied by sensitivity testing, but only If we know the questions. How can we validate (build confidence in) models? “Since all models are wrong the scientist must be alert to what is importantly wrong. It is inappropriate to be concerned about mice when there are tigers abroad.” [Box] • Verification • Comparing to historical data • Prediction • Structural validity • Suitability Robustness in the Strategy of Scientific Model Building, in Robustness in Statistics, ed. R. Launer and G. Wilkinson, Academic Press, (1979) p 201 - 236. Modeling and simulation are part of Box’s “Iteration between theory and practice” model modified model simulation Science and Statistics, JASA 71, 356 (1976) p 791-799. Model Model based based situation forecasting assessment in a data poor, in a data complex, rich environment or open environment External shocks interventions Passage of time corr e obs lated erva ble observation measurements state estimation States of a model Simulation of model 1) Verification: do simulations deduce correctly? Yes, with high confidence. • Reproduce known results in special cases, e.g. – Assumptions tuned to reproduce other models – Extreme cases • Software engineering appropriate for research software – Modular design and testing – Documentation 2) How do results compare to historical data? They compare well, but that doesn’t boost confidence. • In-sample (calibration) – Data used in model induction • Out-of-sample (generalization): – Models are sufficiently flexible to fit most epidemic curves – Many different models can fit the same data – More detailed analysis is difficult, because in each outbreak: • pathogen is different • social network is poorly specified • interventions and reactions typically not known • outbreak only partially observed 3) Have the models correctly predicted a future event? No, and they never will. • What would it mean to correctly predict the long-term “outcome” of a – complex – adaptively controlled – open – stochastic process? • We would have to know – How to infer the exact state of the system from noisy observations – What external shocks will occur – What controls would be applied (human behavior) 4) Are the models structurally valid? Yes, as far as is currently possible. • Structure known to be relevant is represented • Structure hypothesized to be relevant can be represented – structure of simple models’ social network is wrong – details of complex models’ social networks are wrong – relative impact of errors is not yet well understood 5) Are the models suitable for the intended purpose? Yes, because they • faithfully represent key aspects of the question (Structural validity, but for relevance, not correctness) • address other confidence building criteria • suggest correlated observables • identify gaps in data (Halloran and Longini Jr.Science 3 February 2006: 615-616) The “best” model is not always the most suitable • Newtonian mechanics: require observations and deductions that are not feasible • Keep your eye on the ball: rely on frequent situation assessment when plan is executed before and after Suitability Depends Crucially on the Question • Batter’s model includes – Pitcher is right-handed – Sun is in 1st baseman’s eyes – Historical data on right fielder • Outfielders’ models include – Batter is left-handed – Where’s the play? – Historical stats on batter Each player has different concerns Improving models’ usefulness is not just a modeling effort Modeling and simulation help everyone see the big picture: • not the final box score, but • what lineup to start against the opposing pitcher Models are most useful in an environment integrating • surveillance – for situational awareness • simulation – for course of action analysis • human reasoning – for hypothesizing about the system in a way that • makes assumptions transparent • represents all participants’ interests • frames the decisions fairly Conclusions • MIDAS models are suitable for assessing TLC. We are in the process of evaluating and understanding their similarities and differences. • The models would be more useful if they were part of a decision support environment. • The need for such an environment – and for understanding how best to use models of complex systems – spans many agencies and areas, including public health.. Network Dynamics and Simulation Science Lab •• •• •• •• •• •• •• •• •• •• •• •• •• •• Chris Chris Barrett Barrett Julia Julia Paul Paul Dick Dick Beckman Beckman Keith Keith Bissett Bissett Stephen Stephen Eubank Eubank Madhav Madhav Marathe Marathe Henning Henning Mortveit Mortveit Paula Paula Stretz Stretz Anil Anil Vullikanti Vullikanti Achla Achla Marathe Marathe Bryan Bryan Lewis Lewis Karla Karla Atkins Atkins Martin Martin Holzer Holzer Jiangzhuo Jiangzhuo Chen Chen Supplementary Information Assumptions for Infectious Disease Models Assumptions are grouped in following slides into 4 categories 1. 2. 3. 4. Effect of pathogen on the host & transmissibility: natural history model Process of transmission between hosts: transmission model Opportunities for transmission: contact model Assumptions in representing scenarios For each group, we present general structural assumptions, specific parameterizations, references, and note particularly important consequences and alternatives. All remarks are specific to VBI model. I. Natural history model structure Model for the progress of disease in humans: • a finite state system with timed stochastic transitions • each person’s state of health is chosen from a fixed set • transition probabilities can be conditioned on: • static variables, e.g. demographics • dynamic variables, e.g. treatment during the simulated outbreak State attributes include: • Susceptibility & Infectivity • Prodrome, Symptoms & Incapacitation • Distribution of residence times in each state • Probability of transition to other states Natural history assumptions in this work States and transitions were designed to correspond closely to peer reviewed estimates • influenza model developed by Elveback, • as modified and parameterized by Longini, • incorporating information contributed to the Consultation on Influenza 10/27/04. Natural History references Elveback, L. R., et al., (1976) Am. J. Epidemiol. 103, 152–165. Longini, et al., Science 309, 1083 (2005) Consultation on Influenza expert consultants: Paul Glezen, Influenza Research Center Robin Bush, University of California Irvine Richard Compans, Emory University Nancy Cox Nat’l Center for Infectious Diseases Marja Esveld, World Health Organization Kathleen Gensheimer, Maine Bureau of Health Frederick Hayden, University of Virginia Mark Lipsitch Harvard School of Public Health Marshall McBean Univ. of Minnesota School of Public Health Arnold Monto, University of Michigan Peter Palese Mount Sinai School of Medicine Lone Simonsen Fogarty International Center Particularly important natural history assumptions • Transmissibility (infectivity and susceptibility) • Infectivity – before symptoms are evident, including entirely asymptomatic infections ⇒ treatment of diagnosed cases can help slow spread, but cannot stop it – constant infectivity while infected ⇒ fraction of transmissions before diagnosis • Antiviral efficacy – against susceptibility, infectivity and symptoms ⇒ AV prophylaxis can reduce spread, but can also disguise infectious cases. • Age-dependent proclivity to withdraw to the home when symptomatic • Distribution of latent and incubation periods ⇒ serial interval Alternative natural history assumptions • A continuous model, e.g. viral load • Parameter values and distributions • Dependence on demographics II. Transmission model structure For an aerosol-borne, human-human transmissible disease, probability of transmission from an infectious person to a susceptible can depend on 1. 2. 3. 4. Infectivity and susceptibility (states of health) Duration of contact Activities (school, work, home, etc.) Demographics of infectious and susceptible In addition, the model must specify how the presence of multiple infectives and/or susceptibles influences person-person transmission. Transmission Assumptions in This Work 1. Susceptibility and infectivity are specified by the natural history model 2. Transmission is a Bernoulli process for each contact a) ∃ possibility of transmission whenever people are co-located b) duration of contact influences the probability of transmission 3. Activities do not directly influence the probability of transmission 4. Demographics do not directly influence the probability of transmission In addition, transmission from A to B doesn’t depend on presence of C. Transmission Model References Extensive literature, for a small sample: • M.E.J. Newman “The spread of epidemic disease on networks”, Phys. Rev. E 66 (2002) 016128 and references therein: – – – – – E. Ackerman F. Ball J. Koopman M. Kretzschmar D. Mollison –M. Morris –R. Pastor-Satorras –L. Sander –L. Sattenspiel –A. Vespignani Particularly important transmission assumptions • Probability of transmission depends on duration of contact: ⇒ Foci of transmission will be locations where people are in contact for long periods: home, work, school • No demographic factors in susceptibility or infectivity + No activity type factors ⇒ No explicit bias in transmissions from child to child, student to teacher, etc. other than as a consequence of duration Alternative transmission assumptions • Different probability of transmission vs contact duration – e.g. initial burst of transmission, then lower rate • Different treatment of multiple infectives – E.g. transmission probability is max of individual probabilities • Different treatment of multiple susceptibles – e.g. scale transmission probability by number present III. Contact Model structure 1. Create synthetic people in households with correct joint demographics 2. Associate a state of health with each synthetic person 3. Synthetic people move among physical locations (street addresses, city blocks, institutions) and engage in activities at those locations 4. Below a certain level of resolution, simple assumptions about contacts among people at locations are introduced 5. Each person’s activities are repeated every day, except for dynamic events such as withdrawal to home, quarantine, etc. Contact assumptions in this work Methodology for generating contacts was developed in TRANSIMS, a DoT-funded transportation modeling system. 1. Synthetic populations are generated by statistical techniques • • iterative proportional fitting to census data. “census” of synthetic population indistinguishable from actual census for block groups 2. Activity lists are assigned to households using decision trees based on household demographics. e.g. • • • for adults: household size, age of householder, HH income, number vehicles for kids < 5: number of adults, number of workers for kids > 14: worker status Contact assumptions in this work 3. Locations assigned based on gravity model (next slide) resolved to • • street addresses in Cook County, 1/4 city blocks in remaining 10 counties 4. Sub-location contacts • • Activity-specific “rooms” with maximum capacity Rules for daily distribution of people to “rooms” • • Workers return to same room each day Shoppers enter random room each trip Gravity Model for Location Choice A location D for performing each non-home activity is chosen based on • the activity to be performed, a • the location of the previous activity, O • the attractiveness of the destination location for activity a, AD,a (determined from the land use data) • for work: (Dun & Bradstreet) number employees at location • for shopping: number of retail employees at location • for schools: employees at public / private elementary - college • the travel cost (time, distance or a combination), c(O,D) b is a calibration constant fit to the CATS survey data. Particularly Important Contact Assumptions • No inter-”room” mixing, e.g. at schools ⇒ ?? • Closed network, e.g. no long-distance travelers ⇒ ?? • Periodic repetition of contacts ⇒ ?? Particularly Important Contact Assumptions In general, a difficult open question. Addressed by Consultation on Social Networks expert consultants: Albert-Laszlo Barabasi University of Notre Dame Peter Dodds, Columbia University Martina Morris, University of Washington Alan Penn, University College London Babak Pourbohloul Mark Handcock, University of Washington British Columbia James Koopman, University of Michigan Centre for Disease Control Edward Laumann, University of Chicago Alessandro Vespignani, Indiana University Alun Lloyd, North Carolina State University Jacco Wallinga National Institute of Public Health and the Environment (Netherlands) IV. Scenario representations • Seeding • 4 randomly chosen people per day were placed into a latent state. Everyone else was susceptible. • Closing schools – All children attending specific school change their activity patterns • If “compliant”, spend the day at home • If “non-compliant”, spend school hours at the next activity instead – One adult in each household stays home • Non-worker if possible • Randomly chosen if all work • Reducing contacts at work – Reduce the maximum occupancy of any room IV. Scenario representations, cont’d • Liberal leave – Incapacitated adults (determined by health state) stay home • Generic social distancing – Replace all non-anchor (work, school) activities with “home” for compliant households • Triggering interventions – midnight on day threshold illnesses exceeded in non-intervention case • Household TAP – All household members of diagnosed case prophylaxed within 24 hours • Diagnosis – Only people in states with symptom level above threshold diagnosed – Diagnosis immediate on entering state Slides I can’t part with yet Checkpoints for modelers • Scope √ √ √ – • Find the relevant audience √ √ √ – Address the purpose Address the salient questions Include factors that audiences care about Don’t overreach Understand the context Accept the burden of effort Attempt continuing dialog Pay attention to reputation Assumptions √ √ – • • Don’t assume the impossible Acknowledge data limitations When predicting, show track record • Focus on the problem, not the model – Simpler is better – Make models and analyses transparent Tell a story that makes sense – √ √ √ Explain clearly Get reviews Compare and collaborate Recognize time constraints T. Karas, Sandia Report SAND 2004-2888, 2004. A Household’s Activities 2002266 Person 1 Age = 40 Activity Time (Min) Home 465 Work 225 Other 45 Work 245 Home 135 Other 60 Home 150 H 2002855 Person 2 Age = 29 Activity Time (Min) Home 465 Other 5 Other 1 Other 30 Other 240 Home 80 Other 11 Home 141 Other 110 Home 240 2001740 Person-3 Age = 28 Activity Time (Min) Home 480 Work 45 College 60 Work 360 College 285 Home 150 W H O O H W C W H W 2011342 Person-4 Age = 56 Activity Time (Min) Home 1440 H H O C H 6 AM noon 6 PM O H H H Infectious Disease Simulation: from contact network to chain (web) of transmission From: a weighted graph representing opportunities for transmission To: subgraphs representing who infects whom and when Questions about targeted interventions require structured populations Activities that lead to transmission between households What are we going to do?!?
© Copyright 2026 Paperzz