Appl. Statist. (2005) 54, Part 3, pp. 645–658 Nonparametric estimation of spatial segregation in a multivariate point process: bovine tuberculosis in Cornwall, UK Peter Diggle and Pingping Zheng Lancaster University, UK and Peter Durr Veterinary Laboratories Agency, Weybridge, UK [Received July 2003. Final revision August 2004] Summary. The paper is motivated by a problem in veterinary epidemiology, in which spatially referenced breakdowns of bovine tuberculosis are classified according to their genotype and year of occurrence. We develop a nonparametric method for addressing spatial segregation in the resulting multivariate spatial point process, with associated Monte Carlo tests for the null hypothesis that different genotypes are randomly intermingled and no temporal changes in spatial segregation. Our spatial segregation estimates use a kernel regression method with bandwidth selected by a multivariate cross-validated likelihood criterion. Keywords: Bovine tuberculosis; Monte Carlo test; Multivariate point process; Spatial segregation 1. Introduction A multivariate spatial point process is a stochastic process that generates points in two-dimensional space, each point being one of two or more qualitatively distinguishable types. Spatial segregation occurs if, within some planar region of interest, particular types of point predominate in particular subregions, rather than being randomly intermingled. In this paper, we assume an underlying multivariate inhomogeneous Poisson point process and investigate spatial segregation via the nonparametric estimation of ratios of componentwise intensities of the process. The work was motivated by the following problem in veterinary epidemiology, in which the points identify farms which experience one or more cases of bovine tuberculosis (BTB) whereas the types refer to different strains of the disease. However, the methodology could also be useful in other disciplines where data of this kind arise, e.g. in human epidemiology (Richardson et al., 2002) or in ecology (Pielou 1961, 1977). BTB is a serious disease of cattle that is caused by the bacterium Mycobacterium bovis (M. bovis) and which is endemic in parts of the UK. As part of control measures, herds are regularly inspected for BTB by using a comparative tuberculin skin test. When disease is detected in a cattle herd and M. bovis is successfully cultured from at least one test positive animal, a deoxyribonucleic acid typing technique known as spoligotyping can then be used to Address for correspondence: Pingping Zheng, Department of Mathematics and Statistics, Fylde College, Lancaster University, Lancaster, LA1 4YF, UK. E-mail: [email protected] 2005 Royal Statistical Society 0035–9254/05/54645 646 P. Diggle, P. Zheng and P. Durr determine the genotype of M. bovis that is responsible for the BTB breakdown (Durr, Hewinson and Clifton-Hadley, 2000). If a particular genotype predominates in a given locality, a working hypothesis is that any new breakdown within that locality which is of the locally predominant type is likely to be a consequence of internal cross-infection, whereas a new breakdown of a different type is more likely to be the result of importation of infected animals from a remote location. A second possibility to explain an unexpected type in a subregion is from a mutation event. However, the evidence to date is that spoligotypes are genetically stable, so this is unlikely over a timespan of a few years. Characterizing the locally predominant genotypes, and the extent to which different genotypes are spatially segregated, is therefore a potentially useful tool in monitoring the progress of the disease within an administrative region (Durr, Clifton-Hadley and Hewinson, 2000). The data that were available to us consist of the spatial locations of a subset of BTB breakdowns within cattle herds in the county of Cornwall, UK, over the years 1989–2002 inclusively. Because it is not always possible to isolate M. bovis from test positive cattle, our analyses had to be necessarily restricted to those BTB breakdowns where M. bovis was successfully cultured and spoligotyped. We shall use these data to estimate the extent to which different spoligotypes are spatially segregated, to test the hypothesis of spatial segregation of the different spoligotypes and to assess possible changes in the spatial segregation over time. Section 2 describes the Cornwall BTB data in more detail. In Section 3 of the paper, we formally define the estimation problem and clarify the distinction between spatial segregation and the related concepts of spatial clustering or spatial aggregation. We also propose a nonparametric method of estimation based on kernel regression within a generalized additive modelling framework, together with associated Monte Carlo significance tests. The method is a multivariate generalization of a method proposed by Kelsall and Diggle (1998) for the estimation of relative risk from case–control data in spatial epidemiology. Section 4 discusses the application of the method to the Cornwall BTB data, including descriptive analyses of spatial segregation among different genotypes, and of temporal changes in the spatial segregation. Section 5 discusses possible extensions and alternative methodological approaches. The methods are implemented in a suite of R functions which are available from the second author’s Web page: www.maths.lancs.ac.uk/∼ zhengp1/tb. 2. The Cornwall bovine tuberculosis data 2.1. Description of the data The data cover annual or biennial inspections of beef, dairy and mixed cattle farms throughout the county of Cornwall, UK, over the years 1989–2002. Each herd that was tested has its spatial location recorded as the single-point Ordnance Survey map reference of the corresponding farm, to a spatial resolution of 100 m. Note, however, that georeferencing of individual farms can be problematic (Durr and Froggatt, 2002). Within the 14-year period that is covered by the data, 2404 skin test positive animals were slaughtered and assigned a spoligotype. When disease is confirmed in an animal, it is not possible to determine its date of onset; hence times of onset are censored to the left and the temporal resolution of the data is generally between 1 and 2 years, corresponding to the cycle of farm inspections. We call a skin test positive animal a ‘reactor’. A detectably infected animal is a ‘confirmed reactor’ and a herd with one or more confirmed reactors is a ‘confirmed breakdown’. Reactors are slaughtered and subjected to post mortem examination. Tissue samples are taken from at least one reactor per breakdown, to attempt the isolation of M. bovis. Since 1997, following the introduction of the spoligotyping technique, standard practice has been to perform Nonparametric Estimation of Spatial Segregation 647 Table 1. Frequency distribution of confirmed cases of BTB in Cornwall, classified by spoligotype and year of occurrence Year Cases with the following spoligotypes: Others Unresolved Total 9 12 15 20 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 4 6 23 19 19 12 9 33 42 81 79 74 37 56 10 7 7 13 1 0 3 5 5 17 14 9 5 13 11 11 7 7 7 2 5 5 8 8 17 35 15 28 2 2 4 2 2 0 0 15 6 19 6 18 9 19 3 1 0 1 0 1 0 3 3 7 14 11 1 1 2 0 0 0 0 1 0 0 1 5 5 4 1 1 30 27 41 42 29 15 17 61 64 132 130 147 67 117 Total 494 109 166 104 26 20 919 genotyping on at least one positive culture from the majority of confirmed breakdowns. Isolates before 1997 which had been successfully freeze dried and stored were recultured and had their spoligotype retrospectively determined. In most breakdowns only one spoligotype was assigned, either because only one reactor animal was identified or else, in multireactor breakdowns, only one was cultured or if more than one were cultured they all gave identical results. In a small number of multireactor breakdowns, more than one spoligotype was identified, and the herd was classified according to the predominant spoligotype. This process resulted in 919 confirmed breakdowns, in 20 of which the classification by spoligotype was unresolved. Henceforth, we refer to each confirmed breakdown as a case and abbreviate spoligotype to type. Table 1 shows the frequencies of cases classified by type and year of occurrence. The four most common types are, in order of frequency, 9, 15, 12 and 20. Together, these account for 873 out of the 919 cases. The total numbers of cases per year in fact reflect the intensity of the testing programme and have not directly measured the severity of the epidemic. We shall not attempt to incorporate the rarer types explicitly into our analysis of spatial segregation. Fig. 1 gives the overall spatial distribution of the 919 cases. The unit of length in Fig. 1, and thereafter in this paper, is 1 m. The map gives a strong visual impression of spatial aggregation of cases, with areas of high incidence in the north-east and south-west of Cornwall. Spatial aggregation could reflect either the infectious aetiology of the disease or an inhomogeneous spatial distribution of herds at risk. For comparison, Fig. 2 shows the locations of all 4353 currently known cattle farms in Cornwall. Their spatial distribution shows less variation in intensity than does the case distribution, but it is nevertheless clearly non-uniform, reflecting a lack of cattle farms in some parts of Cornwall. This underlines the importance of assessing spatial segregation in a way which allows for spatial heterogeneity in the underlying population at risk. P. Diggle, P. Zheng and P. Durr 80000 20000 40000 60000 Northings 100000 120000 648 140000 160000 180000 200000 220000 240000 Eastings Fig. 1. Spatial distribution of all cases over the 14 years 3. Estimation of spatial segregation 3.1. Descriptors of spatial structure A common approach in the exploratory analysis of spatial point pattern data is to identify spatial structure through rejection of a null hypothesis which formalizes the notion of an absence of structure. Within this paradigm, a test statistic is chosen to be sensitive to the alternative hypothesis of interest. Often, two or more tests are applied to the same data, but using statistics that are intended to be sensitive to different kinds of departure from the null hypothesis. The simplest example of this approach is a test of complete spatial randomness, by which we mean that the data form a partial realization of a homogeneous spatial Poisson process. We then define an aggregated pattern as one which deviates significantly from this null hypothesis in such a way that the points of the pattern tend to form local concentrations. No particular mechanism is implied. For this reason, we prefer to reserve the term clustering to describe a point process in which points form functional groups, e.g. a process in which parent points give rise to collections of offspring in their vicinity. By contrast, we use the word heterogeneous to describe a process in which the intensity, or mean number of points per unit area, varies spatially. In general, clustering and heterogeneity are not empirically distinguishable. It is possible to formulate a point process which can equally well be interpreted as a process of independent points in a heterogeneous environment, or as a process of clusters of related events in a homogeneous environment (Bartlett, 1964). Spatial segregation is a descriptor of structure in a multivariate pattern. We say that a multivariate pattern exhibits spatial segregation if for at least some j "= i the conditional intensity 649 80000 60000 20000 40000 Northings 100000 120000 Nonparametric Estimation of Spatial Segregation 140000 160000 180000 200000 Eastings 220000 240000 Fig. 2. Spatial distribution of the known and spatially discrete herds in Cornwall of type j points at x given a point of type i at x is less than the marginal intensity of type j points at x. A process which generates patterns of this kind is virtually bound also to generate patterns which are marginally aggregated. However, qualitatively similar patterns could also be generated by a process with independent, but strongly clustered, type-specific components. We conclude firstly that tests for spatial structure are only useful if they are designed to detect particular kinds of structure and are not based on assumptions which are so restrictive as to render the hypothesis under test self-evidently false a priori. For example, in testing for spatial segregation we would not want to assume that, under the null hypothesis of no segregation, the component patterns were completely random. Secondly, the detailed scientific interpretation of spatial structure almost invariably requires subject-matter knowledge or information in addition to the data themselves. Thirdly, tests are best constructed within an explicitly declared modelling framework, so that underlying assumptions are transparent, and in a way which leads naturally to an associated method for estimating quantities of interest in the event that the null hypothesis is rejected. 3.2. The Poisson process model To develop a method for detecting and estimating spatial segregation, we shall assume that the data are a partial realization of a multivariate, spatially inhomogeneous Poisson point process. The key assumption in the Poisson process model is that different cases are stochastically independent. In our motivating application, this would clearly be violated if cases were defined at 650 P. Diggle, P. Zheng and P. Durr the individual animal level. Strictly, the infectious nature of the disease also renders it incorrect at the herd level. However, in the absence of detailed data on the spatiotemporal spread of the disease, it is reasonable to use the Poisson process model to describe the resulting spatial distribution of the disease, essentially as a consequence of the duality between spatial heterogeneity and spatial clustering that was originally established by Bartlett (1964). The model assumes that the component processes, each corresponding to cases of a particular type, are independent Poisson processes with respective intensity functions λk .x/ : k = 1, . . . , m, where k denotes type. The λk .·/ are in turn derived as the product of two functions: λ0 .x/, the intensity function for the univariate Poisson process of herds at risk, and ρk .x/, the probability that a herd at location x will generate a case of type k within the period of time that is under consideration. Note that the data cannot identify the occurrence of multiple breakdowns within the same herd and year. We now define relative risk surfaces ρjk .x/ = ρj .x/=ρk .x/, for all j "= k. Note that ρjk .x/ = λj .x/=λk .x/ because λj .x/ = λ0 .x/ρj .x/. Similarly, if pk .x/ denotes the conditional probability that a case known to occur at location x is of type k, then !m !m " " pk .x/ = λk .x/ λj .x/ = ρk .x/ ρj .x/, j=1 j=1 and pj .x/=pk .x/ = ρjk .x/. These expressions show that relative risks can be estimated without assuming any particular form for the spatial distribution of herds at risk. We say that the underlying Poisson process is completely unsegregated if ρk .x/ = αk ρ.x/ for some spatial function ρ.·/ or, equivalently, pk .x/ = pk . In other words, different types may be more or less common but show no propensity to occur in relatively greater numbers in particular subregions. It follows that, under no spatial segregation, all the relative risk surfaces ρjk .x/ are spatially constant. At the opposite extreme, complete spatial segregation occurs if only one type of event can occur at any particular location. In this case, for each x, pk .x/ = 1 for some particular k = k.x/ and pk .x/ = 0 for all other k. In practice, less extreme forms of partial segregation would be expected to occur and would be expressed quantitatively through the spatial behaviour of the set of estimated functions p̂k .x/, k = 1, . . . , m. We call the functions pk .x/ the type-specific probabilities. 3.3. Kernel estimation of type-specific probabilities We propose to estimate the pk .x/ through a multivariate adaptation of the kernel smoothing methodology that was proposed by Kelsall and Diggle (1995, 1998) for case–control data in human epidemiology. The adaptation proceeds as follows. The data are represented as a set of multinomial outcomes Yi , i = 1, . . . , n, where, for each of k = 1, . . . , m, the outcome Yi = k denotes a breakdown of type k at the location xi , and the corresponding multinomial cell probabilities are the type-specific probabilities pk .xi /. We propose a kernel regression estimator for the probability surfaces pk .x/. This takes the form p̂k .x/ = where, for each of k = 1, 2, . . . , m, n " i=1 wik .x/ I.Yi = k/, ! n " wik .x/ = wk .x − xi / wk .x − xj /, j=1 .1/ Nonparametric Estimation of Spatial Segregation 651 and wk .·/ is the kernel function with bandwidth hk > 0; hence wk .x/ = w0 .x=hk / , h2k where w0 .·/ is the standardized form of the kernel function. In the results reported below, we use the Gaussian kernel w0 .x/ = exp.−$x$2 =2/, where $ · $ denotes the Euclidean distance of the point x from the origin. The log-likelihood function is n " m " I.Yi = k/ log{pk .xi /}, L.p1 , . . . , pm / = .2/ .3/ i=1 k=1 where I.·/ is the indicator function. In a parametric model for the pk .x/, a widely accepted method of parameter estimation is to choose parameter values to maximize the right-hand side of equation (3). In the kernel setting, to do so would lead to the unhelpful bandwidth choices hk = 0, giving p̂k .xi / = 1 or p̂k .xi / = 0 according to whether the corresponding Yi does or does not equal k. To circumvent this, we use a cross-validated log-likelihood function. Two variants of the cross-validated form of function (3) could be defined, according to whether we do or do not choose the same bandwidth for all m components of the p-surface. Using a common bandwidth h gives the desirable property that Σm k=1 p̂k .x/ = 1, for every location x. The cross-validated log-likelihood function for h is then defined as m n " " .i/ I.Yi = k/ log{p̂k .xi /}, .4/ Lc .h/ = i=1 k=1 .i/ where p̂k .xi / denotes the kernel estimator (1), based on all of the data except .xi , Yi /. 3.4. Monte Carlo inferences Kelsall and Diggle (1998) used Monte Carlo sampling to assess whether their estimated risk surface showed significant departure from spatially constant risk. In a similar fashion, we here use Monte Carlo methods to test the null hypothesis of no spatial variation in the relative risk surfaces between pairs of different spoligotypes. Recall that λk .x/, k = 1, 2, . . . , m, denote the type-specific intensity functions and that !m " λj .x/: pk .x/ = λk .x/ j=1 The null hypothesis is H0 : λk .x/ = αk λ0 .x/, k = 1, 2, . . . , m: hence, under hypothesis H0 , pk .x/ = αk , for all x. The αk can be estimated by α̂k = nk =n, where nk is the number of cases of type k and n is the total number of cases. Hence, our suggested statistic to test H0 is T= n " m " i=1 k=1 {p̂k .xi / − α̂k }2 : .5/ For each Monte Carlo simulation under H0 , we relabel the data at random while preserving the observed number of cases of each type. We denote the value of T for the original data as t1 , and values after simulated random relabelling as t2 , t3 , . . . , ts . The p-value for a Monte Carlo test of significance is p = .k + 1/=s, where k is the number of tj > t1 . 652 P. Diggle, P. Zheng and P. Durr 3.5. Case–control data When the absolute risk of disease is of interest, rather than relative risk between two different types, the analysis that was described above can still be used, by identifying non-cases among the population at risk as an additional type, say type 0. The interpretation of the relative risk surfaces ρjk .x/ when neither j nor k is 0 remains the same. The interpretation of the type-specific probabilities is slightly different, since at each location we now have Σm k=1 pk .x/ = 1 − p0 .x/, which could be spatially varying even in the absence of type-specific spatial segregation. 3.6. Investigating temporal changes in spatial segregation When each case is allocated to one of a discrete set of time periods, t = 1, 2, . . . , r say, we shall use the kernel method to estimate type-specific probability surfaces p̂k .x, t/ for each t. For comparability between time periods, we use a common bandwidth h for all types and all time periods, which we chose by maximizing the sum over time periods of the cross-validated log-likelihood criterion (4) for cases within each time period. In this context, the null hypothesis of interest is that the type-specific probability surfaces do not change over time: hence, H0 : pk .x, t/ = pk .x/, where k = 1, 2, . . . , m and t = 1, 2, . . . , r and a suggested test statistic for a Monte Carlo test of H0 is T= r " m " " k=1 t=1 x∈X {p̂k .x, t/ − p̄k .x/}2 , .6/ where X denotes the set of all case locations irrespective of type or time period, p̂k .x, t/ is the estimated type-specific probability surface for type k in time period t and p̄k .x/ = r −1 r " t=1 p̂k .x, t/: Because the true type-specific probability surfaces under hypothesis H0 are unknown, we propose an approximate Monte Carlo test in which we sample case labels from the estimated time constant type-specific probability surfaces p̄k .x/, holding the number of cases of each type in each time period fixed at their observed values. Simulation results suggest that the use of estimated p̄k .x/ renders the test slightly anticonservative, but that the effect of this is too small to affect the results that are reported for the Cornwall BTB data in Section 4. 4. Application to the Cornwall bovine tuberculosis data 4.1. Spatial segregation over the 14-year period Fig. 3 shows the spatial distributions of cases corresponding to each of the four most common spoligotypes. The visual impression is of strong spatial segregation, with each of the four types predominating in particular subregions. Fig. 4 shows the cross-validated log-likelihood Lc .h/ for the system consisting of the four most common types of case, covering the 14-year period and using a Gaussian kernel. The optimal choice of bandwidth is hopt = 5015 m. Fig. 5 shows the resulting estimated type-specific probabilities p̂k .x/, which confirm that there is strong spatial segregation among the different spoligotypes. The maximum type-specific probabilities are 0:999, 0:914, 0:934 and 0:927 for spoligotypes 9, 12, 15 and 20 respectively whereas the minimum type-specific probabilities are 0.015 for type 9, and zero (to three decimal places) for types 12, 15 and 20. All four types therefore show wide spatial variation in the estimates p̂k .x/. Note that the corresponding marginal proportions of Northings 40000 60000 80000 100000 120000 100000 80000 Northings 60000 20000 40000 20000 140000 160000 180000 200000 220000 140000 240000 160000 Eastings 180000 200000 220000 240000 220000 240000 Eastings (b) 80000 Northings 60000 20000 20000 40000 40000 60000 80000 100000 100000 120000 120000 (a) Northings 653 120000 Nonparametric Estimation of Spatial Segregation 140000 160000 180000 200000 Eastings (c) 220000 240000 140000 160000 180000 200000 Eastings (d) Fig. 3. Spatial distributions of the four most common spoligotype data over the 14 years: (a) spoligotype 9; (b) spoligotype 12; (c) spoligotype 15; (d) spoligotype 20 the four types are 0:566, 0:125, 0:190 and 0:119. The most common type 9 has two separate foci, a major one in the east which extends over a relatively large area, and a smaller one in the west of Cornwall. Type 12 has a single focus towards the west. Type 15 has a single focus in the central part of Cornwall. Type 20 occurs predominantly in the extreme west. The local maximum to the east of the main concentration of type 20 cases arises from two near-coincident but otherwise isolated cases. The Monte Carlo test for spatial segregation among different spoligotypes is perhaps redundant in view of the very strong segregation that is observed in the data and the smoothed typespecific probability maps, but it is reported for completeness. Using s = 1000, i.e. 999 simulated random relabellings of the spoligotypes among all cases, the test rejected the null hypothesis P. Diggle, P. Zheng and P. Durr −1200 −1000 Lc −800 −600 −400 654 0 20000 40000 h 60000 Fig. 4. Cross-validated log-likelihood for the four most common types over the 14 years 1.0 120000 80000 0.6 60000 0.8 100000 Northings 0.8 100000 Northings 1.0 120000 80000 0.6 60000 0.4 0.4 40000 40000 0.2 0.2 20000 20000 0.0 140000 160000 180000 200000 Eastings 220000 0.0 240000 140000 160000 (a) 180000 200000 Eastings 220000 240000 (b) 1.0 120000 80000 0.6 60000 0.8 100000 Northings 0.8 100000 Northings 1.0 120000 80000 0.6 60000 0.4 40000 0.4 40000 0.2 20000 0.2 20000 0.0 140000 160000 180000 200000 Eastings (c) 220000 240000 0.0 140000 160000 180000 200000 Eastings 220000 240000 (d) Fig. 5. Estimated type-specific probabilities for the four most common types over the 14 years: (a) spoligotype 9; (b) spoligotype 12; (c) spoligotype 15; (d) spoligotype 20 Nonparametric Estimation of Spatial Segregation 655 with a p-value of 0.001, i.e. the observed value of the test statistic (5) was greater than all 999 simulated values. 4.2. Changes in the spatial segregation over time Before 1997 the annual number of cases is too small for the application of nonparametric smoothing methods. To investigate temporal changes in the spatial segregation of M. bovis we therefore consider only the years 1997–2002 and define three time periods t corresponding to the years 1997–1998, 1999–2000 and 2001–2002. Figs 6 and 7 show the estimated type-specific probability surfaces for the four most common spoligotypes in each time period, using a common bandwidth of 9647 (m) estimated by our cross-validation criterion. The increase in h by comparison with the analysis repeated in Section 4.1 is to be expected because of the smaller number of cases within individual time periods. The Monte Carlo test for changes in the type-specific probability surfaces over time gives a p-value of 0.015 with s = 1000. This result suggests that the relatively subtle effects which appeared from a visual inspection of Figs 6 and 7 nevertheless represent genuine changes over time. In general terms, the predominant effect is of a progressive increase in the degree of segregation over time. Thus spoligotype 15 becomes progressively more dominant in north central Cornwall, whereas spoligotype 20 shows near dominance in the far west in 2001–2002. Spoligotype 9 remains dominant in the east of Cornwall in all three time periods, but its territory is confined to an area that is closer to the eastern boundary in 2001–2002 than in 1997–1998. Finally the distribution of spoligotype 15 appears to be relatively stable over the three time periods. 5. Discussion We have demonstrated a nonparametric method for the estimation of spatial segregation between different types of event in a multivariate point process. Application to the Cornwall BTB data confirms the existence of strong spatial segregation between different spoligotypes and identifies significant temporal changes in the spatial segregation between 1997 and 2002. We have chosen to classify each breakdown according to its predominant spoligotype. When more than one spoligotype is identified within a single breakdown, the likely explanation is that the herd in question has experienced two or more infection events within the annual or biennial interval between successive tests. If we could distinguish such multiple events, we would still be able to identify type-specific probabilities by using control data on the locations of all herds at risk, but there would no longer be a constraint that these probabilities should sum to 1 at each location. However, the current data do not support an analysis of this kind; specifically, when a breakdown involves more than one animal with the same associated spoligotype, the number of independent infection events cannot be determined. The immediate use of our methodology is to confirm (or not) visual impressions of spatial segregation that are obtained by smoothly mapping the data. A potentially more important use of estimated type-specific probability surfaces is that they can assist in the management of new cases investigating bought-in animals. If, for example, an animal came from a farm in north central Cornwall but was subsequently identified in a herd breakdown on a farm near the eastern boundary, a spoligotype 9 would suggest post-arrival infection whereas a spoligotype 15 would suggest an incipient breakdown on the source farms. A second potential application would be to enable a comparison between the spatial variation in spoligotype distributions among farm animals and in wildlife reservoirs. 656 P. Diggle, P. Zheng and P. Durr 1.0 120000 1.0 120000 0.8 100000 80000 0.6 60000 0.8 100000 80000 0.6 60000 0.4 40000 0.4 40000 0.2 20000 0.2 20000 0.0 140000 160000 180000 200000 220000 0.0 240000 140000 160000 (a) 180000 200000 220000 240000 (b) 1.0 1.0 120000 120000 0.8 100000 80000 0.6 0.8 100000 80000 0.6 60000 60000 0.4 0.4 40000 40000 0.2 0.2 20000 20000 0.0 0.0 140000 160000 180000 200000 (c) 220000 140000 240000 160000 180000 200000 220000 240000 (d) 1.0 120000 1.0 120000 0.8 100000 80000 0.6 60000 0.8 100000 80000 0.6 60000 0.4 40000 0.4 40000 0.2 20000 0.2 20000 0.0 140000 160000 180000 200000 (e) 220000 240000 0.0 140000 160000 180000 200000 220000 240000 (f) Fig. 6. Estimated type-specific probabilities for (a) spoligotype 9, 1997–1998, (b) spoligotype 12, 1997–1998, (c) spoligotype 9, 1999–2000, (d) spoligotype 12, 1999–2000, (e) spoligotype 9, 2001–2002, and (f) spoligotype 12, 2001–2002 Nonparametric Estimation of Spatial Segregation 657 1.0 1.0 120000 120000 0.8 100000 80000 0.6 0.8 100000 80000 0.6 60000 60000 0.4 0.4 40000 40000 0.2 0.2 20000 20000 0.0 0.0 140000 160000 180000 200000 220000 140000 240000 160000 180000 200000 220000 240000 (b) (a) 1.0 1.0 120000 120000 0.8 100000 80000 0.6 0.8 100000 80000 0.6 60000 60000 0.4 0.4 40000 40000 0.2 0.2 20000 20000 0.0 0.0 140000 160000 180000 200000 220000 140000 240000 160000 180000 200000 220000 240000 (d) (c) 1.0 1.0 120000 120000 0.8 100000 80000 0.6 0.8 100000 80000 0.6 60000 60000 0.4 0.4 40000 40000 0.2 0.2 20000 20000 0.0 0.0 140000 160000 180000 200000 (e) 220000 240000 140000 160000 180000 200000 220000 240000 (f) Fig. 7. Estimated type-specific probabilities for (a) spoligotype 15, 1997–1998, (b) spoligotype 20, 1997–1998, (c) spoligotype 15, 1999–2000, (d) spoligotype 20, 1999–2000, (e) spoligotype 15, 2001–2002, and (f) spoligotype 20, 2001–2002 658 P. Diggle, P. Zheng and P. Durr A useful extension to the methodology that is described in the current paper would be to allow adjustment for the effects of known risk factors at the herd level, which might themselves be spatially structured. A generalized additive model (Hastie and Tibshirani, 1990) with a logit link function can be applied in the form logit{pj .x, u/} = u& βj + gj .x/, .7/ where u is the vector of herd covariates and gj .x/ is a smooth function of x. Our kernel smoothing method has the advantage of transparency, but it represents only one of several different approaches which could have been used. Obvious competitors include spline smoothing methods (Wood, 2003) and hierarchical stochastic models in which the set of underlying type-specific probability surfaces are modelled as a realization of a latent multivariate spatial stochastic process, so extending to the multivariate setting the model-based geostatistics framework of Diggle et al. (1998). For example, the functions gj .x/ in equation (7) could be replaced by Sj .x/ where S.x/ = {S1 .x/, . . . , Sm .x/} is a multivariate spatial Gaussian process. It would be interesting to develop an overtly spatiotemporal model for the evolution of different spoligotypes. However, this would require data on the date of onset of each case, information which is not obtainable from the current annual or biennial testing protocol except in a heavily censored form. Acknowledgements We thank Roger Sainsbury of the State Veterinary Service, who helped to collect the Cornwall spoligotyping data sets, and Jackie Inwald and Si Palmer of the Department of Bacterial Diseases, Veterinary Laboratories Agency, Weybridge, who carried out the spoligotyping. This work was supported by the Department for Environment, Food and Rural Affairs (‘SE3020’) and by the UK Engineering and Physical Sciences Research Council through the award of a Senior Fellowship to Peter Diggle (grant GR/S48059/01). References Bartlett, M. S. (1964) The spectral analysis of two-dimensional point processes. Biometrika, 51, 299–311. Diggle, P. J., Tawn, J. A. and Moyeed, R. A. (1998) Model-based geostatistics (with discussion). Appl. Statist., 47, 299–350. Durr, P. A., Clifton-Hadley, R. S. and Hewinson, R. G. (2000) Molecular epidemiology of bovine tuberculosis: II, Applications of genotyping. Rev. Scient. Tech. Off. Int. Epizoo., 19, 689–701. Durr, P. A. and Froggatt, A. E. A. (2002) How best to geo-reference farms?: a case study from Cornwall, England. Prev. Veter. Med., 56, 51–62. Durr, P. A., Hewinson, R. G. and Clifton-Hadley, R. S. (2000) Molecular epidemiology of bovine tuberculosis: I, Mycobacterium bovis genotyping. Rev. Scient. Tech. Off. Int. Epizoo., 19, 675–688. Hastie, T. J. and Tibshirani, R. J. (1990) Generalized Additive Models. London: Chapman and Hall. Kelsall, J. E. and Diggle, P. J. (1995) Kernel estimation of relative risk. Bernoulli, 1, 3–16. Kelsall, J. E. and Diggle, P. J. (1998) Spatial variation in risk of disease: a nonparametric binary regression approach. Appl. Statist., 47, 559–573. Pielou, E. C. (1961) Segregation and symmetry in two-species populations as studied by nearest-neighbour relationships. J. Ecol., 49, 255–269. Pielou, E. C. (1977) Mathematical Ecology, 2nd edn. New York: Wiley. Richardson, M., van Lill, S. W. P., van der Spuy, G. D., Munch, Z., Booysen, C. N., Beyers, N., van Helden, P. D. and Warren, R. M. (2002) Historic and recent events contribute to the disease dynamics of Beijing-like Mycobacterium tuberculosis isolates in a high incidence region. Int. J. Tubercul. Lung Dis., 6, 1001–1011. Wood, S. N. (2003) Thin plate regression splines. J. R. Statist. Soc. B, 65, 95–114.
© Copyright 2026 Paperzz