Synthetic estimators in Ireland Anthony Staines DCU What are synthetic estimators? Estimates of something you haven't got Typically estimates for a small area of something Making maximum use of what you have Example Lung cancer risk Smoking is a key explanation Suppose you want to study the geography of lung cancer What you have Smoking data from a national survey by age and sex Small area level data on population and cancer incidence by age and sex What you can do at once Estimate prevalence for small areas included in the study Using the sample in the study What's wrong with this? The areas you need may not be included The estimates will be very imprecise You can do better In some obvious ways And some not so obvious What you assume National age and sex specific rates apply in each small area And so From these you calculate small area specific prevalence estimates This is indirect standardisation Can be done smarter requiring aggregation properties to hold Adding in area level covariates (urban/rural etc.) Can you do better? Yes How? Model based estimators These have a long history Many diverse applications Combine survey data and some kind of 'census data' 'Census data' is that available for every area of interest Roughly Use the survey data to estimate relationships at the relevant level between survey covariates and the census data Then Assume the same relationship applies in the other areas Issues Modelling can be hard Remember these are predictive models, not explanatory models Data not easy to get at the right small area level Models models using individual level covariates only models using area level covariates only models combining individual and area-level covariates Limits Available data Confidentiality Complexity of methods, esp. multi-level methods Validation Spatial data limits Have to be able to link survey and census to the same set of small areas Given the primitive systems in the UK and the nearly non-existent systems in the Republic this is a lot of work Errors here will lead to biassed estimates Confidentiality Need to respect confidentiality of survey respondents May limit the data available for these purposes May need to design survey and survey consent process carefully to get good estimates Modelling Can become very complex Clustered survey designs Survey weights Variable selection Model diagnostics What and where to model Data may exist at many different geographies Multi-level models with individual, household, local and regional effects can be considered GIS might be very useful here for data handling Not advisable to aggregate covariates at different spatial levels This is just making a bad embedded synthetic estimator Validation Not easy to do, but essential How do you validate your synthetic estimates? Cross-validation? Another survey? ? Options How about Health Atlas Ireland? This is a system built for HSE, (led by Howard Johnson) to plan health services It already has Maps Census HIPE Mortality data Census output options Recently they have developed a very flexible census output system Uses census data at ED level Locations of houses Assumes that all the houses in a DED are exchangeable Census output options Allocates census data to any given area Directly weighted by using the number of households and the ED composition of the desired area Futures? Modern design of surveys Could readily be extended to do SA from almost any survey data where the necessary geographical data have bene collected Greatly improves value for money of large scale surveys
© Copyright 2026 Paperzz