1 A theoretical framework for register-based statistics --- Can we carry on without it? Li-Chun Zhang Statistics Norway [email protected] Statistical data by combination of sources: Coverage, content & relevance Quality: Statistical vs. administrative register • Wallgren & Wallgren (2007, Wiley): – “An administrative register is maintained to store records on all objects to be administered.” (Ideally) – “A statistical register is based on data from administrative registers that have been processed to suit statistical purposes.” • A defining distinction in perspectives – Administrative register: Individual data of all importance – Statistical register: Properties at various aggregated levels Quality of register-based statistics Micro-data quality of a statistical register • Notable lag of theoretical framework (Platek and Särndal, Holt, Nanopoulos, 2001) – A framework for quality assessment – Theoretical frameworks for different quality aspects Process accuracy vs. statistical accuracy: Any unbiased, efficient estimators based on statistical registers? • Process accuracy – – – – Matching/mismatching rate Extent of duplicates Amount of missing values … • Statistical accuracy – Coverage – Relevance – Inherent stochastic variation An example of the UK claimant register (Holt, 2007, TAS) – – – – – people claiming unemployment related benefits entire population of claimants (say 1.5 million) no sampling error and arguably a perfect measure derived once each month on the same working day daily variation about 10,000 in this count A historic parallel: Survey sampling before Neyman (1934) • The representative method (Kiær, 1895) with a three-stage design using 1890 census as frame: – 1st: 128 counties and 23 towns throughout the country – 2nd: cohorts of males of age 17, 22, 27, 32, etc. – 3rd: persons with surname initial A, B, C, L, M, N • • ISI-committee 1924 report: “I think I may venture to say that nowadays there is hardly one statistician, who in principle will contest the legitimacy of the representative method”. (Jensen) Representative sampling (Neyman, 1934): Thus, if we are interested in a collective character X of a population and use methods of sampling and of esimation, allowing us to ascribe to every possible sample, , a confidence interval X 1 ( ), X 2 ( ) such that the frequency of errors in the statements X 1 ( ) X X 2 ( ) does not exceed the limit 1- prescribed in advance, whatever the unknown properties of the population, I should call the method of sampling representative and the method of estimation consistent. Comparisons to non-sampling errors in sample survey and census Sample Survey Census Register-based survey Coverage errors Coverage errors Relevance errors Non-response errors Non-response errors Integration errors Measurement errors Measurement errors Sampling errors • Unidentified units in register & non-response in survey – Related to under-coverage – Yes, imputation. But a quite different theory! – Example: register households • Coverage errors Matching/mismatching errors Missing-link errors Aggregation errors (Partial classification) ‘Imputation’ of household identity Which imputation methods do you use? Hot-deck? Definitional error in register source & measurement error – Related to relevance – Yes, a kind of measurement error. But bias dominates! And often clearly different in different sub-populations. – Example: register unemployment (REG_unemp) REG_unemp = ILO_unemp + Bias + Random_error A theory for detailed statistics: Signal or noise? Parameterd,t+1 Parameterd,t g ( xd ,t , xd ,t 1 ) ud ,t 1 ed ,t 1 ( N d ,t 1 ) d = domain of interest, t, t+1 = reference time points x d,t , x d,t+1 = explanatory variables/covariates/auxiliary information g(x) = description/model of underlying structural change u d,t+1 = random domain effect beyond structural explanation N d,t+1 = domain population size ed,t+1 = random errors governed by the law of large numbers A theory for micro-data quality • • • Reality at “Storgata 9”: – – – – H0101: Astrid (72) - widow H0102: Tommy (32) & Jenny (29) & Ronny (2) - cohabitation H0201: Olav (29) & Lena (29) - cohabitation since Census 2001 H0202: Knut (27) - single Register: – – – – – H0101: Astrid (72) - widow H0101: Tommy (32) & Jenny (29) & Ronny (2) - cohabitation H0101: Olav (29) - single ?: Lena (29) - single Imputed cohabitation ?: Knut (27) - single in household register Only Astrid is correctly registered. But when/how does it matter? Administrative register => Individual data of all importance => Unit-specific error Statistical register => A theory of types - How real is a record: how are variables related to each other - How representative is a record: distribution of the types
© Copyright 2026 Paperzz