GRTS for the Average Joe: A GRTS Sampler for Windows Trent McDonald, WEST, Inc. 2003 Central Avenue, Cheyenne, WY 82001 (307) 634-1756; e-mail: [email protected] Abstract: Generalized random tessellation stratified (GRTS) samples are useful spatial sampling designs for a number of reasons. But, actually drawing a GRTS sample can be so complicated that some practitioners opt for a simpler design. In this paper, I describe a computer program designed to draw GRTS samples of discrete sample units that are located in either 1-dimension or 2-dimension. This program, S-Draw, is a Fortran-based application that will run on any computer running a Windows operating system. The GRTS sampling program reads a sample frame from a standard ASCII file, and writes the sample in another ASCII file. While the existence of this program will not illuminate the theoretical details and justification of GRTS samples, it will, I hope, make drawing a GRTS sample accessible to non-statistically inclined researchers. Keywords: generalized random tessellation stratified designs; programs; environmental sampling; S-Draw 1. INTRODUCTION For good reasons, general randomized tessellation stratified (GRTS) samples (Stevens and Olsen 1999; Stevens and Olsen 2003; Stevens and Olsen 2004) are gaining popularity as a sampling scheme for large-scale long-term environmental surveys. GRTS samples with reverse hierarchical ordering are designed such that for any sample size, say n, the first n units in the sample will be spatially balanced (i.e., “spread out”). In fact, any contiguous set of n units in a reverse hierarchically ordered GRTS sample constitutes a spatially balanced set of sample units (Stevens and Olsen 2004). A spatially balanced GRTS sample makes it easy to both add units in a way that does not compromise spatial balance and to maximize the overlap (co-location) of multiple studies such that all sample sizes are spread out. The ease with which spatially balanced units are added to single or multiple studies is the chief advantage of GRTS samples over the next-most popular design, systematic sampling. Another advantage, although rarely realized in practice, of GRTS samples is that they avoid the alignment problems and subsequent adverse effects on estimates that can occur with systematic sampling (Stevens and Olsen 2003). However, even though GRTS sampling is a good idea statistically, in practice the theory behind GRTS samples and the details required to actually draw one are difficult to understand. This difficulty, in part, stems from the flexibility of GRTS. The GRTS methodology can be applied to discrete units located in 1-dimension, discrete units located in 2-dimensions, and points in continuous 2-dimensional areas, all with either equal or unequal inclusion probabilities. Even when GRTS sampling is understood conceptually, the programming requirements to actually implement the procedure can be overwhelming. In this paper, I describe a Windows-based computer program (that I call S-Draw) that will draw a GRTS sample of discrete units located in 1 or 2 dimensions. This program can be used to closely approximate a point sample in a continuous 2-dimesional area by defining a fine grid of points over the area and inputting grid locations into the program as a discrete frame. My hope is that this program will make drawing a GRTS sample accessible to the average scientist, even if they do not fully understand the details of the methodology. This paper is intended to describe the methods programmed into S-Draw, and as such I do not cover the theoretical details of GRTS samples (see Stevens and Olsen 2004). In the next section, I outline the methods used in the program and its basic capabilities. Various formats of the discrete sample frame are also described. In Examples, I present a few examples for illustration. I close with a short discussion of the program’s performance, planned enhancements, availability, and where suggestions for future versions can be submitted. 2. METHODS Due to the increased popularity of GRTS samples, and the fact that I was being called upon to actually draw them, I desired an easy-to-use computer program that implemented the GRTS methodology with the reverse hierarchical ordering described in Stevens and Olsen (2004). To be useful, I wanted the program to handle very large sampling problems, run quickly, and have a simple graphical user interface containing a minimum of input parameter. I also wanted a command-line interface for the program (that I eventually called S-DrawB) that would facilitate batch processing and simulation. I choose to write such a program for Windows operating systems using the Fortran 95 language. I choose Fortran because of the speed with which it manipulates large arrays of numbers, and because Fortran produces stand-alone applications that do not require another program to run (such as R, S-Plus, or SAS). I used the well-established Fortran 95 compiler available from Lahey Computer Systems, Inc. (version 7.1), and the Windows API routines available in their Wisk library, to implement the program. I call the program S-Draw because samples are often denoted by “S”, and this program draws S’s. I designed the S-Draw program to draw samples of discrete units that are located in either 1 or 2 dimensions. An example of a discrete 1-dimensional unit is a river segment identified by its’ distance from the mouth. Examples of discrete 2-dimensional units include grid cells identified by the location of their centers, and river segments identified by the UTM (or latitude and longitude) coordinates of their lower endpoints. The coordinates of all units in the population to be sampled are input into S-Draw by specifying the name of a text file containing the coordinates and other information, such as identifiers and weights. This text file I call the sample frame. It contains one row per population unit, and varying numbers of fields (columns) on each line depending on the type of sample being drawn. These fields are summarized in Table 1. All fields on a line in the sample frame are separate by 1 or more spaces or commas. I also designed S-Draw to produce simple samples that do not have an explicit sampling frame. If a sampling frame, in the form of a text file, is not specified, S-Draw will accept an arbitrary population size and draw an equi-probable assuming units are located in 1-dimension. In this case, units are assumed to be located at coordinates of 1, 2, …., up to the population’s size, and units are identified by their 1-D coordinate. Following specification of the input parameters, the first step taken by S-Draw is to map the identify of all units in the population to a line segment in the interval (0,n] in a way that preserves some spatial proximity. This is accomplished by mapping either the 1-D or 2-D coordinates for all units onto the 1-D interval (0,n]. The mapping implemented in S-Draw is the quadrant–recursive function suggested by Stevens and Olsen (2004). The bounding box for populations of units located in 2-D consists of the minimum and maximum horizontal and vertical coordinates. The bounding interval for populations of units located in 1-D consists of the minimum and maximum coordinate. In the quadrant-recursive map, each unit’s coordinates are converted into either a base-2 (for units located in 1 dimension) or base-4 (for units located in 2 dimensions) number of the form x1.x2.x3. … .xK , where xi is a digit representing the quadrant of the unit’s location at the ith stage of the recursive mapping, and K is the number of recursive levels used in the hierarchical ordering scheme. If randomization is called for (the default in SDraw), the digits at each level of the hierarchical identifier are randomly permuted. That is, the frame is randomized by randomly mapping the unique digits at the ith stage of the recursive map onto themselves in a 1-to-1 way. If randomization is not called for, no permutation of digits is done. After randomization, the new base-2 or base-4 representation for each unit’s location is converted back to a base-10 number, and the entire frame is sorted in ascending order according to this base-10 number. The order of units in the sorting frame is the order of units assigned to line segments in the interval (0,n]. Because of the way digits in the base-2 or base-4 representations of each unit’s location are constructed and permuted, units that are close (in 1 or 2 dimensional space) also tend to be close in the 1-D order generated by the quadrant-recursive map. The parameter K of the quadrant-recursive mapping can be controlled by the pixelsize parameter in S-Draw. Conceptually, pixelsize is the length of one side of a square quadrant at the lowest level of the quadrant-recursive map. If pixelsize in S-Draw is set smaller than the minimum distance between unit locations, quadrant-recursive mapping will continue until all units occupy quadrants by themselves. In general, S-Draw sets ⎡ ln(range/pixelsize) ⎤ K = ⎢⎡log 2 (range / pixelsize) ⎥⎤ = ⎢ ⎥, ln(2) ⎢ ⎥ where range is the maximum extent of the population’s bounding box along a single dimension, and the “ceiling” function ⎡⎢ x ⎤⎥ returns the smallest integer greater than or equal to x. True size of the smallest quadrant in the recursive mapping is then range / 2K. S-Draw randomizes the order of all units in the same lowest-level pixel. This is equivalent to randomizing the order of all units in the frame with the same base-2 or base-4 representation. To draw a simple random sample using S-Draw, pixelsize can be set to a value larger than range. In addition to regular quadrant-recursive mapping, S-Draw can process an arbitrary hierarchical ordering that has been predefined and stored in the frame by the user. This feature will allow users, for example, to use triangle-recursive mapping that defines sub-triangles inside a bounding triangle by connecting the midpoints of each side. In fact, any hierarchical ordering that produces an identifier of the form k1.k2.k3. … .kK, where ki is a number identifying the subregion of the unit’s location at the ith level of the hierarchy, can be used. Under this option, the user must construct the hierarchical identifiers outside S-Draw and include them in the frame. No recursive mapping is done inside S-Draw, but digits at each level of the hierarchy are randomly permuted, reassigned to the same level, the new identifiers are converted from the mixed base numbers that were input to base-10 numbers, and the frame is sorted according to this base-10 number. This has the effect of hierarchically sorting the frame based upon digits in the first level, then digits in the second level within levels of the first, then digits in the third level within levels of the first two, and so on. This option was included because for certain problems it may be easier to construct the hierarchical identifiers k1.k2.k3. … .kK than it is to construct the coordinates of individual units. This will generally only happen when a geographic information system (GIS) is not available. For example, a spatially balanced sample of stream segments in the United States could be drawn by assigning to all segments in the U.S. a hierarchical identifier of the form state.county.watershed.segment, where state is the number of the state in which the segment resides (i.e., 1, 2, … 50), county is the number of the county within the state where the segment resides, watershed is the number of the segment’s watershed within the county, and segment is the number of the segment within the watershed. Following the order of units established by the random quandrant-recursive map or predefined hierarchical identifiers, and permutation of units in the same pixel, units are assigned to a line segment within (0,n] with a length that is a direct function of the unit’s sample weight specified in the frame. If all weights in the frame are equal, or if weights are not given, the length of each unit’s line segment is n/N, where N is population size, and an equi-probable sample is drawn. If weights are not equal, the length of each unit’s line segment is set to πi = ( wi / ∑ i wi ) N , where wi is the weights value for unit i. If any πi >1.0, they are set equal to 1.0 and the remaining πi are rescaled so that all πi sum to n. To draw the GRTS sample, a systematic sample of size n is drawn from the ordered line segments on (0,n] by first choosing a random start, say m, between 0 and 1. Units that are associated with line segments that contain one of the points in the sequence {m, m+1, m+2, …, m+(n-1)} are then included in the sample. Because units that are close together in space tend to have line segments in (0,n] that are close together, the systematic sample across (0,n] assures that sample locations will be spatially spread out. If reverse hierarchical ordering (Stevens and Olsen 2004) is called for by the S-Draw user (the default), S-Draw assigns the integers 1, 2, ..., n to units in the realized sample, and then converts those numbers to either base-2 (units in 1-dimension) or base-4 (units in 2-dimensions) numbers. When converted, S-Draw reverses the digits of these base-2 or base-4 numbers, converts the reversed-digit numbers back to base-10, and sorts the sample according to this base10 number. If n is an integer power of 2, reverse hierarchical reordering of a 1-D sample forces units from the first half of the bounding interval to be followed immediately by a unit from the second half of the bounding interval. For 2-D samples and assuming n is a power of 4, the effect of this reverse hierarchical ordering is to contiguously place one unit from each quadrant in the re-ordered sample. That is, if n is a power of 4 any four adjacent units in a reordered 2-D sample will consist of one unit from quadrant 1, one unit from quadrant 2, one unit from quadrant 3, and one unit from quadrant 4. The same phenomenon is true at lower levels of the quadrantrecursive map. That is, if n is a power of 4, units separated by exactly 4 positions in the reordered 2-D sample will consist of one unit from the first sub-quadrant of a particular quadrant, one from the second sub-quadrant of that same quadrant, one from the third subquadrant of that same quadrant, and one from the fourth sub-quadrant of that same quadrant. This ordering assures that any contiguous set of units in the sample will be spatially balanced over the population. Simulation (Stevens and Olsen 2004) shows that this spatial balance is present for general sample sizes, not just powers of 2 or 4. 3. EXAMPLES S-Draw is controlled by filling in the boxes in its graphical user interface (Figure 1), clicking the appropriate radio or check boxes, and selecting “OK” to run. All parameters in S-Draw have default values except sample size, population size, and input frame. The minimum set of parameters needed to run S-Draw is either sample size and population size, or sample size and input frame. The default output file is named “sample_yyyymmdd_hhmmss.txt” where yyyy is the year that this particular run of S-Draw was started, mm is the month, dd is the day, hh is the hour, mm is the minute, and ss is the second. When finished, this output file will contain a list of all the input parameters, as well as a list of all units in the sample. The sample listing includes coordinate(s) of the unit, actual first-order inclusion probability of the unit, and identifier for the unit (either input or made up inside S-Draw). The sample listing is in reverse hierarchical order if that option was checked. The seed for the sequence of random draws can be specified by the user so that particular samples can be replicated on multiple runs. The random seed can be any value integer between -2 billion and +2 billion, but if it is set to -1, a seed is constructed from the computer’s clock and written to the output file. If sample size is set to 20 and population size is set to 100, S-Draw will produce a 1-D GRTS sample of size 20 with reverse hierarchical ordering assuming the 100 units are located at coordinates 1, 2, 3, …, 100. If sample size is 20, population size is 100, and pixelsize is 100, SDraw will produce a simple random sample of size 20 in reverse hierarchical order. In this case, reverse hierarchical ordering has no real effect other than to shuffle the already randomly ordered sample. If sample size is 20, population size is 100, pixel size is 1, and the randomize box is unchecked, S-Draw will produce a fixed-size systematic sample assuming units are ordered from 1 to 100. S-Draw is capable of drawing randomized or un-randomized variable probability systematic samples (VPS) (Brewer and Hanif 1982; Sunter 1986; Stehman and Overton 1994; McDonald 1996). Un-randomized VPS samples are produced by specifying sample weights in an input frame file, and unchecking the randomize box. Randomized VPS designs are produced by specifying sample weights in an input frame file, leaving the randomize box checked, and specifying a pixelsize that is larger than the maximum difference between unit coordinates, which are assumed to be 1, 2, …, N in the 1-D case These parameters by-pass the quadrantrecursive mapping and the frame is completely randomized. Both types of samples may be output in reverse hierarchical order, although doing so for the randomized VPS design is redundant. In August and September of 2003, the U.S. Fish and Wildlife funded an aerial survey of golden eagle (Aquila chrysaetos) in the western half of the United States. Aerial survey transects were 100 km in length, and oriented east-west. A dense grid of potential transect start points was constructed by overlaying the study area (Figure 2) with points spaced 2 km apart northsouth and 100 km apart east-west. Portions of transects in this initial frame that covered Department of Defense lands, Department of Energy lands, “no fly” National Parks, large urban areas, large bodies of water, and lands > 10,000 feet in elevation were removed. The resulting list of potential transects contained 27,058 starting points. The frame file containing these points consisted of one line per point, and the following fields on each line (separated by spaces): STARTPNT_X, STARTPNT_Y, and LINE_ID. A target of 17,500 km of total transect was desired, and based on the average clipped transect length, resulted in a desired sample size of 208 transects. This sample size was doubled to allow for an adequate list of alternate transects that could be flown if conditions did not permit sampling of the original. An equi-probable 2dimensional GRTS sample of size 416 was taken using S-Draw, where the first 208 transects in the reverse hierarchically ordered sample were considered the original sample and the second 208 transects were considered alternates. The original and alternate sample of transects is plotted in Figure 2. For this sample draw, the parameters of S-Draw were set as follows: 2dimensional sample was checked, coordinates in the frame was checked, ID’s in the frame was checked, sample size was set to 416, population size was blank, pixelsize was 1, randomize was checked, output in reverse hierarchical order was checked, and input frame listed the name of the text file containing coordinates and ID’s of all 27,058 points. The final report on this golden eagle survey is available on-line at http://mountain-prairie.fws.gov/species/birds/golden_eagle/. 4. DISCUSSION S-Draw uses dynamic memory allocation for all arrays. Consequently, the number of records in the sampling frame is only limited by a computer’s memory. I tested S-Draw on my laptop that contains a Pentium 4 processor and 512 MB of RAM memory. I generated random point locations in the bounding box of [0,100000] × [0,100000] and asked S-Draw to take an equiprobable GRTS sample. S-Draw was able to draw a 2-dimensional GRTS sample of size 500 from a population of 100,000 units in 4.2 seconds. For this sample, I requested 28 levels of quadrant-recursive mapping and reverse hierarchical ordering. Drawing a GRTS sample for size 500 from 1,000,000 units with 17 levels of recursive mapping took 44.3 seconds. During this run I noticed significant hard disk activity, implying that S-Draw was using slow virtual RAM memory on the hard drive (I had 4 other programs open at the time including Word, S-Plus, and Eudora). I believe S-Draw would have performed better on a machine with more RAM memory. From knowledge of the algorithms, plus these and other tests I have not mentioned, I believe the order of the program to be approximately O(N). That is, I would expect the program to complete in approximately 4.45e-5N seconds, where N is population size. In the future, it would be convenient if S-Draw was able to directly read GIS export files (i.e., .e00 files) containing point coverages as the input sample frame. Enhancing S-Draw to read .e00 files, and take a GRTS sample of the points therein, should not be difficult and is planned. If S-Draw were capable of reading .e00 files, another logical extension would to allow S-Draw to draw true point samples inside a region bounded by polygons. Utilizing publicly available routines, it is theoretically possible for S-Draw to directly read ArcGIS binary files, which would by-pass exporting the frame to a text file. In addition to enhancing the frame files that S-Draw is capable of reading, I plan to produce a Windows dynamic link library (.dll) that contains the core GRTS sampling routines. When this is accomplished, it will be possible to call the S-Draw Fortran code from within an S-Plus or R function. When this enhancement is added, users will no longer be required to export their sampling frame data from S-Plus and R to a text file. I will send anyone who contacts me a copy of S-Draw free of charge. The program is also available on the West-Inc web site, www.west-inc.com. Send any suggestions or bug reports to [email protected]. Although I have tested it thoroughly on multiple problems, I offer no warranties or guarantees regarding S-Draw’s performance. REFERENCES Brewer, K. R. W. and Hanif, M. (1982), Sampling with unequal probabilities, New York: Springer-Verlag. McDonald, T. L. (1996), "Analysis of finite population surveys: sample size and testing considerations," Unpublished dissertation, Oregon State University. Stehman, S. V. and Overton, W. S. (1994), "Environmental sampling and monitoring," in Handbook of Statistics, Patil, G. P. and Rao, C. R. (eds.). Stevens, Don L. and Olsen, Anthony R. (1999), "Spatially restricted surveys over time for aquatic resources," Journal of Agricultural, Biological, and Environmental Statistics, 4, 415-428. ----- (2003), "Variance estimation for spatially balanced samples of environmental resources," EnvironMetrics, 14, 593-610. ----- (2004), "Spatially balanced sampling of natural resources," Journal of the American Statistical Association, 99, 262-278. Sunter, A. (1986), "Solutions to the problem of unequal probability sampling without replacement," International Statistical Review, 54, 33-50. Table 1: Order of fields in the sample frame file for all possible combinations of input parameters to S-Draw. Characters allowed on each line following the fields listed, but they are ignored. Fields are separated by "white space", which is any non-numeric character for numbers (including spaces and commas, but not the decimal and negative sign), and a space for the alphanumeric ID field. These formats only apply a frame file is specified. Sample Structure Pre-defined 1-D Yes Yes Yes Yes Yes Yes Yes Yes 2-D Yes Yes Yes Yes Yes* Yes* Yes* Yes* * Data Included in Frame Sample Coordinates ID's weights Yes Yes Yes Yes Yes No Yes No Yes Yes No No No Yes Yes No Yes No No No Yes No No No Yes Yes Yes Yes Yes No No Yes Yes No Yes No Yes Yes Yes No No Yes No No K specified on the first line of the frame file Order of fields in frame file x wgt id x wgt x id x wgt id wgt id [# lines counted] x y wgt id x y wgt x y id xy k1 … kK wgt id k1 … kK wgt k1 … kK id k1 … kK Figure 1: The user interface for S-Draw showing the input parameters. Figure 2: A map of the GRTS sample locations for 100 km aerial transects flown during the U. S. Fish and Wildlife’s golden eagle survey in August and September 2003. A grid of 27,058 potential starting points with spacing of 2km north-south and 100km east-west was constructed and input into S-Draw. Twice the necessary number of transects were selected so that an adequate list of alternate transects were obtained.
© Copyright 2026 Paperzz