Proceedings of International Symposium on City Planning 2013 The genetic algorithm geospatial modeling of the residential distribution: The case of Kwang-Joo City of Korea Nae-Young Choei1, Seong-Hun Kim2, Inhan Kang3 Abstract This study tries to delve into the causes of the population concentration in KwangJoo City near Seoul, Korea. The City is taken as the case locality since it has shown the second most rapid growth in both residential population and total building floor area among more than 140 localities in the Capital Region during the three recent consecutive years. Simultaneously, the study tries to explore the genetic algorithms as a method for geospatial modeling of the residential densities using the Korean National Geographic Information Systems (NGIS) as property-level ancillary data. The gridded map of population distribution is built from Korea Land Information System (KLIS) and electronic Architectural Information System (eAIS). Then the functional form of the genetic algorithm model is formulated to have a hierarchical weighting system in which categorical weights of variable groups and individual weights of subordinate variables are sought bilaterally to explain the residential densities. The model is run by a carefully pretested set of reproductive plan parameters. As such, the genetic algorithm solutions are fully discussed in an effort to probe its unique advantages in facilitating the analysis processes for numerous potential geospatial applications. Keywords: genetic algorithms; ancillary data; residential density; geospatial modeling 1 2 3 Professor of Urban Planning at Hongik University, Seoul, Korea, corresponding author. Graduate student of Urban Planning at Hongik University, Seoul, Korea. Undergraduate student of Urban Planning at Hongik University, Seoul, Korea. -1- 1. Introduction Korean Capital Region is well known for its uncontrolled development during the nation’s rapid economic growth, and it now contains about 48% of the national population. Especially, the recent evidences indicate that, among more than 140 localities in the region, the second fastest growing locality in the Region is Kwang-Joo City that lies just beneath the southeastern part of Seoul. One of the reason of its fast growth is thought to be its adjacency to Bun-Dang Newtown built about two decades ago. Bun-Dang Newtown is still very preferred bed-town than other competing localities as it is filled with high-priced apartment complexes that accommodate rather affluent former Seoulites who mostly commute to Kang-Nam (River South) area in Seoul. Such a phenomenal population concentration in Kwang-Joo City, however, is suspected to be the so-called free-rider problem in that, without sufficient urban infrastructures such as well-planned roads, parks, green areas, and renowned schools in the existing City, the incoming residents are exploiting the advantages of adjacent public facilities in the Newtown that they did not pay for to provide. This paper, in this context, has two major objectives: one is to figure out if it is evident that the population distribution in Kwang-Joo is truly related to the easy access to Bun-Dang Newtown, and; the other is to explore the genetic algorithm technique in identifying the explaining model for the purpose. For modeling, population data from ‘source’ enumeration areas (EAs) in Kwang-Joo City are disaggregated into ‘target’ grid populations with the aid of the property-level ancillary data available from the Korean National Geographic Information System (NGIS). To benchmark the performance of the genetic algorithm model, regression model is run in a parallel fashion. The paper is organized as follows. Section 2 describes the study area and the NGIS data. The gridded population surface is constructed and the training dataset is selected in Section 3. Section 4 deals with genetic algorithm model building, and Section 5 interprets the model outcomes, and the conclusion appears in Sections 7. 2. Study area and data 2.1 Study area Kwang-Joo City is located to the southeast of Seoul City in Korea as shown in Figure 1. Lying at 37 ° 22 ′ N and 127 ° 06 ′ E, and with its population of 259,387 (602 persons/km2) that form 96,584 households as of 2012, Kwang-Joo City currently has ten wards within its total area of 430.96km2. As can be seen in Figure 1Figure 2, four western wards comprise the much urbanized sector than other corners of the City with population density higher than 10 persons/ha. The three northern wards, on the other hand, are of mostly rugged mountain terrain with extremely low residential densities less than 2 persons/ha. The remaining three southeastern wards are mixed with high lands and fluvial lowlands and the latter are filled mostly with irrigated rice fields with medium population densities. Bun-Dang Newtown is located to the west of the City as -2- Proceedings of International Symposium on City Planning 2013 indicated in the figure, sharing one contact point across the ridge that divides the two jurisdictions. 2.2 Data description The population data for the study are from the national databases built annually based on the registration of resident by the Ministry of Security and Public Administration (MOSPA). The data are collected and reported by each ward level as the minimum statistical area each year. Except a few incidences of undetected disguised resident registration, the integrity and the precision of the data are guaranteed by the MOSPA. Three consecutive annual ward-base population data ranging from 2010 to 2012 of Kwang-Joo City are extracted and used in the study. By the permission of the MOLIT (Ministry of Land, Infrastructure and Transport), two authentic NGIS databases have also been made available for the study as ancillary data: the Korea Land Information System (KLIS); and the electronic Architectural Information System (eAIS). The KLIS has been completed in 2007 by incorporating the two previous national databases: the Parcel Based Land Information System (PBLIS) of the MOSPA; and the Land Management Information System (LMIS) of the MOLIT. The cadastral information from the PBLIS and the land-use/planning information from the LMIS have been combined into one georeferenced database of KLIS. The eAIS, on the other hand, has been constructed by combining the Building Management System (BMS) of the MOSPA and the Construction Administration Management System (CAMS) of the MOLIT so that the building permit information from the BMS and the building register information from the CAMS have been consolidated into one database. By joining the two DBs with the corresponding Parcel Numbering Unit (PNU) codes, the study data within the boundary of Kwang-Joo City have been constructed. -3- a) The location and size of the Gyeonggi Province in Korea. b) The location of shape of Kwang-Joo City in the Korean Capital Region. Figure 1. The location and administrative boundaries of Kwang-Joo City, the case area. -4- Proceedings of International Symposium on City Planning 2013 Figure 2. Choroplethic population density maps of ten wards in Kwang-Joo City. 3. Building gridded population surface 3.1 Gridded population and choice of the cell size Since the areal units used to report population data (enumeration districts, census tracts, wards, local government units) do not have any natural or meaningful geographical identity, analyses based on zonal data may be subject to the modifiable areal unit problem (MAUP), where observed patterns and relationships are sensitive to the spatial arrangement of the arbitrarily defined zones or to combinations of them (Openshaw 1984). One promising alternative to avoid this well documented problem is to generate a gridded population map (Martin and Bracken 1991). The advantages of the gridded population surface are frequently summarized as follows: 1) the regular grid can be easily re-aggregated to any areal arrangement required; 2) producing ecological data in grid form is one way of ensuring compatibility between heterogeneous data sets; 3) data in grid form make multi-resolution and multi-source information fusion easier; and 4) converted grid form can provide a way of avoiding some of the problems imposed by artificial boundaries (Martin and Bracken 1991, Yue et al. 2003, Liao et al. 2010). In the context just described, this study tries to construct a 200 by 200 meter gridded population surface over entire jurisdiction of Kwang-Joo City for model building. -5- 3.2 Generation of the gridded population map 3.2.1 Disaggregation of EA population into the gridded map Suppose there are wards ( ) with total population in the study area that has housing units on land parcels for . Consider also that each ward has housing units accommodating population so that and . If is aggregate floor area and is individual floor area of the -th housing unit in ward , then the average per capita floor area for each resident in ward follows as: Here, the floor area is not just the building coverage on the ground but the total floor area of the building (in the case of multi-storied structure). The exact floor area data of each building thus are an important factor to obtain correct measure of in Equation (1) above. In this study, the accurate floor area data have been safely extracted from the eAIS database aforementioned. Now imagine that the study area is disaggregated into grid cells ( ) of the cell size (= 200m). Then, if the centroid of a parcel is located within the boundary of a cell , the housing floor area on could be allocated to . If we let be the total floor area of housing in the cell , then, , the cell population of in ward , could be calculated as: The rationale for using the ward-based average floor areas in the denominator of Equation (2) in calculating the lies not only on the fact that ’s of EA are the smallest population data available, but, more important, extreme heterogeneities in residential densities, settlement patterns, and dwelling types between more urbanized wards and less-densely populated agrarian or forested wards should be taken fairly into account on the individual ward basis while homogeneities of the residential circumstances within a ward are consistently maintained. 3.2.2 Outcomes of the gridded population map The consequent matrix of cells each containing a population estimate is shown in Figure 3. The source populations of EAs (wards) are disaggregated into target 200m grids so that the settlement geography is drastically enhanced than the simple greyscaled choroplethic population density map that would otherwise be drawn by the source areal unit of wards as in Figure 2. The distinction is particularly vivid in the -6- Proceedings of International Symposium on City Planning 2013 agrarian and forested wards. In Figure 2, populations of the six northern and southeastern wards are marked only by their averages (as low as 0.36 to 6.65 persons/ha) whereas the same wards in the gridded map in Figure 3 spots the populated areas by removing non-resident cells in areas of widespread rurality and sparsity. In like manner, the highly populated cells in the four western wards also clearly indicate exactly where the actual population is confined and concentrated in those wards. Such extreme cells in the populated eastern wards, however, could undermine the robustness in either heuristic or stochastic model fitting and weaken the predicting potential of the fitted models unless they are properly eliminated. The next procedure, in this context, should be to sort out the training dataset that the models would be trained or constrained so as to enhance the likelihood of estimated population. Figure 3. Gridded population map of 200m cell-size estimated with NGIS ancillary data. 3.3 Selection of training dataset To select the potentially explanatory set of training data cells, the following steps were applied: First, two top northern wards were excluded as they are not only too steep and mountainous for habitation but, more important, they are doubly designated as the greenbelt zone and water supply protection zone in which any development activities are legally prohibited; second, those cells that accommodate uninhabitable facilities such as the golf links, university campuses, watersheds, and agriculture promotion zones are also ruled out; third, the cells placed on the steep slopes over 30% are left out too, as they are unsuitable for physical development for any purpose; fourth, all the -7- uninhabited cells have been set aside since they could not contribute any meaningful interpretation of the residential distribution at all; and, finally, only those cells that lie within 10 to 90 percentile as well as within plus/minus 1.0 standard deviation around mother population mean among the left cells are solely chosen in an effort to eliminate the extreme outlying population counts that could distort the central tendency of the majority of the training data cells. The selected cells are seen to include populated clusters in rural wards yet excluding most of the under-populated hinterlands. Figure 4. The selected 280 training data cells dispersed across the jurisdiction of Kwang-Joo City. represents the finally chosen 280 data points. In extensive pretests, the dataset obtained this way has outperformed any of the myriad alternative datasets obtainable by varying the cut-off criteria than set out above. The later discussion on study models hence is confined to the outcomes run with this dataset alone. Figure 4. The selected 280 training data cells dispersed across the jurisdiction of Kwang-Joo City. 4. Methodological approach 4.1 Conceptual basis of genetic algorithm The mechanics of genetic algorithms seek to mimic the approach of nature in the evolution of species as well described elsewhere (Holland 1975, Fisher and Leung 1998, Gen and Cheng 2000,Venkatesan et al. 2004, Brandl 2007, Liao et al. 2010, Yim et al. -8- Proceedings of International Symposium on City Planning 2013 2010). They are based on the population genetics in natural selection that are connected to the traditional approaches in problem optimization. The genetic algorithms require the parameter set of the optimization problem to be coded as finite-length bit-strings of binary numbers, real (floating-point) numbers, or integer numbers. The string is called a chromosome (or individual) and the bits in the string are called genes. The genetic algorithms work simultaneously with a group of different individuals called a population. The power of genetic algorithms comes from implicit parallelism through operating on these populations that offer a better chance of finding the global optimum: successive generations of population are bred in such a way that the average fitness of succeeding generations improves. For this, genetic operators are applied to the individuals in the breeding population, generating new offspring by recombining elements from the population. Though the recombinations are random, average fitness tends to increase because of the biased selection process. Typically, the genetic operators act at a randomly chosen point in the string. Figure 5 illustrates the effects of the crossover and mutation operations on binary numbers. In Figure 5(a), two candidate chromosomes and , represented by their dyadic expansion of bits, are undergoing a crossover operation at the randomly chosen bit location to produce two offspring chromosomes and . The first bits of are the same as that of while the last ( ) bits of are those of . The digits of are similarly formed. Crossover is equivalent to breeding because it involves trading portions of two parents to create two offspring that contain random portions of parental gene structures. Mutation, on the other hand, randomly changes constituent nodes of a parent to create a new offspring. In Figure 5(b), a candidate chromosome is undergoing mutation at the two bits and to not- ( ) and not- ( ). Mutation is equivalent to a random search heuristic and ensures that no point in the individual search space remains unexplored. -9- a) Crossover Operation. b) Mutation Operation. Figure 5. Illustration of the mechanisms of the crossover and mutation operation. 4.2 Input variables chosen from the planning perspective In genetic algorithms, as in any data-driven prediction model, the selection of appropriate model input variables is very important, and the choice is generally based on a priori knowledge of causal relationship and/or physical and ecological insight into the problem. Numerous studies consider topographic features as the basic factors that influence the population distribution such as: 1) altitude (Mennis 2003, Yue et al. 2003, Flowerdew et al. 2007, Langford et al. 2008); 2) slope (Cai et al. 2006, Liao et al. 2010); and 3) hydrology (Yuan et al. 1997, Flowerdew et al. 2007). Transport network, - 10 - Proceedings of International Symposium on City Planning 2013 school catchment, markets, service centers, and land use classification are all important factors that are also taken frequently as major input variables (for instance, Mennis 2003, Balk et al. 2006, Langford et al. 2008, and Liu et al. 2008). Table 1. The list of total twenty one input variables ( classified by five major categories ( ). Categories C1 Variables Elevation v1,2 Slope v2,1 Stream v2,2 Road v2,3 School v2,4 Park v2,5 Bridge v3,1 Library v3,2 Ward Office v3,3 Health Center v3,4 Fire Station v3,5 University v3,6 Large Discount Mart Regional Access v4,1 Highway IC & JC Points v4,2 Industrial Complex v4,3 Central Park in Bun-Dang v4,4 Main Commercial in Bun-Dang v4,5 Entrance to Bun-Dang Land Use v5,1 Green Belt Regulation v5,2 Water Protect v5,3 Agrarian Improvement Feature C2 C3 C4 C5 Infrastructures Public Services Expected Effect Negative as magnitude increases v1,1 Topographic ) of the study model that are Unit Altitude (m) % Positive as accessibility increases Real-valued distance (m) Negative as magnitude increases Table 1 summarizes total 21 input variables for the study model chosen from the planning perspective. They are classified by categories for , and the variables are denoted categorically as for where is the number of variables in . For topographic feature variables of , the scores are predetermined as in Table 2. - 11 - Table 2. Scores of elevation ( category of . Code Variables v1,1 Elevation (m) v1,2 Slope (%) ) and slope ( ) variables in the topographic feature Scores 5 4 3 2 1 0 h ≤ 100 100 < h ≤ 200 200 < h ≤ 300 300 < h ≤ 400 400 < h ≤ 500 500 < h d ≤ 5 5<d ≤ 8 8 < d ≤ 10 10 < d ≤ 12 12 < d ≤ 15 15 < d For the rest of the variables, values are measured by air distances from each cell centroid to the target points of facility locations ( to ) or to the target polygons of areal zones ( ). For the latter, the distance is measured from the cell centroid to the target zone boundary point that is on the line drawn perpendicular to the boundary from the cell. Category comprises the infrastructures and is included, in particular, to see how decisive the impacts of these infrastructures are on the residents’ locational choice. The ) is included in instead of since it is seen to function more as an stream ( amenity factor providing waterfront greeneries and open spaces just like parks ( ) and green areas ( ) in . Variables of are local public services for the represent the major regional access points that are neighborhoods, and those of thought to affect the location of population in Kwang-Joo City. They are highway interchanges/junctions, industrial complexes, central parks and commercial area of BunDang Newtown. Specifically, the main entrance points from Kwang-Joo City to BunDang Newtown ( ) is taken as one most important variable that provides the major access corridor from the City to the Newtown thus could indicate the level of dependency of the Kwang-Joo citizens on the Newtown. Figure 6 and Figure 7 indicate the topographic features such as elevation and slope of Kwang-Joo City, respectively. Elevation map is composed from the Digital Elevation Map (DEM) that is formerly constructed by the National Geographic Information Institute (NGII), whereas the slope analysis map was drawn from the digital contour information from the spatial database of KLIS. Figure 8 and Figure 9 also depicts the locations of all the variables in the categories of Infrastructures, Public Services, Regional Access Points, and Regulatory Land Uses. It is notable that most of the facilities are stationed in or near the populated sector so that their distribution largely coincides with that of the training data cells shown in Figure 4. - 12 - Proceedings of International Symposium on City Planning 2013 Figure 6. Elevation map of Kwang-Joo City drawn from the DEM data. Figure 7. Slope analysis map of Kwang-Joo City drawn from the KLIS digital data. - 13 - Figure 8. Point locations of the Public Service and Regional Access Point variables. Figure 9. Infrastructure and Regulatory Land Use category variable points and areas. - 14 - Proceedings of International Symposium on City Planning 2013 4.3 Models, fitness functions, and measurement 4.3.1 Formulation of population estimation models Consider that and are the hierarchically bivariate parametric weighting factors for the categories and variables , respectively, for , and where is the number of variables in as denoted earlier, and is the rescaling factor for the actual populations to be preserved (to be mass-preserving). together with their subordinate ’s could be Then the relative contribution of each determined by their respective weights and using the genetic algorithms. If denotes the cell population in each training cell, it could then be formulated as: 4.3.2 Fitness function for model evaluation The fitness of the genetic algorithm model as well as its corresponding regression model could be evaluated by using a fitness function that benchmarks the performance of the model. Here, a typical deviation measure of correlation coefficient (as used by Muttil and Lee (2005) and Parasuraman et al. (2007)) has been adopted as the fitness function. The correlation coefficient ( ) evaluates the linear correlation between the observed and the computed values, and, as a fitness function, R should be maximized. Let be the number of training cells presented to the models; and be observed and estimated counterparts for ; and and be the means and . Then, could be calculated using the following equations: of 4.3.3 Parameter settings and hardware configuration Table 3 summarizes the major genetic algorithm model parameter settings adopted in this study, and they have been proven to provide the best performance of the model after extensive trial runs. Some of the most important settings include: the population size of 200; crossover rate of 0.9; mutation rate of 0.01 and; the chromosome length of 8 bits. The genetic algorithm model of this study were run on the system of Quad-Core 2.66 GHz CPU with 16Gb RAM under 64-bit Windows 7 OS environment. - 15 - Table 3. Settings of the reproductive plan parameters for the genetic algorithm model. Parameter Value Population size ( ): number of individuals in the breeding pool 200 Generation ( ): generations over which individuals evolve 200 Crossover rate ( ): probability that an individual will be the offspring of two parents 0.9 Mutation rate ( 0.01 ): probability that an individual mutates Reproduction rate ( ): probability that an individual will be the offspring of one parent 1- ( + Generation gap ( 0.98 ): fraction of the population to be replaced after each iteration Elitist strategy: reentry of the best individuals ( ) without genetic operation ) on Diversity operator: allowance of non-uniform mutation on Chromosome length: number of genes in a chromosome 8-bits Random seed: reproduction of the results of previously run evolutionary cycles 1 Termination of evolution: number of generations that the best fitness unchanged 100 5. Results 5.1 Genetic algorithm model outputs 5.1.1 Performance of the 12-variable model Since the entire 21 variables are certainly intuitive candidates of influential ones to explain the population distribution in many of the real world general urban settings, some of them could be redundant and extensive under the specific locational peculiarities of Kwang-Joo City. The first step, in this line of reasoning, was to screen much less relevant variables among all to curtail the unnecessarily expansive runtimes of genetic algorithm model runs. It was done by utilizing the stepwise regression technique. Here, the reduced set of 12 more relevant variables were chosen among one of the intermediate steps in the course of stepwise regression runs in an attempt to yet leave enough room for the genetic algorithm to finalize the remaining calibration of the model specification. Those 12 screened variables that belong to 5 categories are enlisted in Table 1Table 4 accompanied by their estimated weights obtained from the three best genetic algorithm runs among total 20 repeated runs. Category weights ( ) and variable weights ( ) are reported in the left- and right-half of the table, respectively, while their sums ( and ) are shown at the bottom. The fitness value of of the three best genetic algorithm runs among total 20 repeated runs are also reported in subsequent Table 5. As can be seen in the table, the values approach 0.428 and 0.429. To ensure that the model did not have the problem of convergence to local maxima, the traces of the evolutionary process had better be checked graphically. In case the model is trapped, the plots usually show that the coordinates of the fittest individual cease to change over a range of generations and stay on the flat spot. This type of lack of convergence can be assessed by restarting the algorithms with different starting values and random number seeds, comparing the resulting solutions (Meyer 2003). If - 16 - Proceedings of International Symposium on City Planning 2013 the solutions are farther apart than a given tolerance, it indicates a problem with convergence. In this case, a larger population size and/or more mutation should be tried. Figure 10 illustrates the tracks of the correlation coefficient , the fitness criteria, of the three best runs of the model. They follow different paths in the course but eventually converge to the value near 0.429 after 1,300 generations, confirming that the global optimum has been successfully reached. It is notable that the tracks do not consistently increase but sometime plummet and recover. Such abrupt changes also are signs of intermittent occurrence of beneficial mutations. Table 4. Summary of category- and variable weighting factors for population estimation obtained by the three best runs of full-variable genetic algorithm model. Categories C1 C2 C3 C4 C5 Topographic feature Infrastructures Public Services Best GA Solutions 1st 2nd 3rd 0.259 0.243 0.239 0.247 0.039 0.247 0.055 0.294 0.031 Regional access points Land Use Regulation 0.455 0.506 0.498 0.000 0.000 0.000 Variables Best GA Solutions 1st 2nd 3rd v1,1 Elevation 0.216 0.067 0.075 v1,2 Slope 0.784 0.925 0.937 v2,1 Stream 0.004 0.000 0.004 v2,4 Park 0.000 0.000 0.047 v2,5 Bridge 0.976 1.000 0.992 v3,1 Library 0.118 0.000 0.431 v3,2 Public Office 0.525 0.969 0.855 v3,5 University 0.400 0.831 0.671 v4,1 IC&JC 0.000 0.000 0.004 v4,5 Enterance to Bun-Dang 0.996 0.996 0.999 v5,1 Green Belt 0.671 0.020 0.459 v5,3 Agrarian Improvement 0.333 0.973 0.553 5.1.2 Interpretation of the 12-variable model output In Table 5, the category weights of (public services) and (land use regulation) resulted in near zero values in all three runs, which indicates that the locations of these facilities have minimal influence on population distribution in Kwang-Joo City. It is understood such that the public services are already evenly installed so that their proximity is maintained within fairly uniform ranges from every training data point. Regulatory land use zones such as greenbelts and agrarian improvement zones are certainly uninhabitable and the consequence is readily understandable. - 17 - The weights of (topographic features) and (infrastructures), however, is surprising since the impacts of these variables are almost half of that of (regional access point). Unlike the public services, however, the infrastructures yet have reasonable impact as much as major topographic features such as elevation and slopes. Expectedly, (regional access points) is the category with weights of substantial relative importance, indicating about half (0.455-0.506) of the entire explaining power with the sole contribution of, especially, the “entrance to Bun-Dang Newtown” ( ) that shows absolute predominance (0.996-0.999) in it. In all, three categories of , , and seems to substantially account for the population in Kwang-Joo City. In terms of their variables, however, (0.000-0.047) indicate naught or meager contributions to . Only six remaining variables of: and in ; and in ; and and in seem, therefore, to persist as due candidate variables to be used in the best subset model runs. Table 5. Summary of the fitness values achieved by the fitness criteria of R for the genetic algorithm model run with the 12 screened variables. Fitness Value 1st 2nd 3rd R 0.429 0.428 0.428 Figure 10. Tracks of the fitness value of correlation coefficient (R) obtained by the three best runs of the genetic algorithm model using the 12 screened variables. - 18 - Proceedings of International Symposium on City Planning 2013 5.1.3 Final model of six-variable genetic algorithms Using the six best variables selected above, the genetic algorithm subset model has been run and the tracks of obtained from the three best runs again are illustrated in Figure 11. The values converge to the global maximum of 0.424 only after 700 generations. Faster convergence with minor fluctuation to the optimum in much less iterations implies that the composition of the subset variables significantly enhanced the explanatory capability of the model. The model runtime statistics are also reported in Table 6. Based on the 20 repeated runs, the average runtime takes about 9 minutes and 18 seconds with the standard deviation of 2 minutes and 49 seconds. While the best runs evolved up to nearly 800 generations, the average generation that reached the optimum value is 548 with the standard deviation of 143 generations. Table 6. Runtime statistics of the 200 population size genetic algorithm model based on 20 runs. Genetic algorithms subset model POP SIZE : 200 Generations Runtimes Mean 548 9:18 SD 143 2:49 Figure 11. Tracks of the fitness value of correlation coefficient (R) obtained by the three best runs of the final genetic algorithm model using the six variables. - 19 - Table 7 summarizes the model outputs. In the table, category weights ( ) and variable weights ( ) are reported in the left- and right-half of the table, respectively, while their sums ( and ) are shown at the bottom. It can be seen that weight totals sum near to, but not exactly, 1.0 (or 3.0 in the case of variable weights for ). This is because the tolerance is set to be internally modified to a smallest possible number instead of zero: the chances for the fitness values to arrive at feasible optima are usually enhanced significantly this way. While the weight of has slightly increased, the dominance of the slope ( ) in it has slightly lowered to 0.620-0.749, yet far overriding the impacts of elevation ( ). For category (infrastructures), the contribution of the bridges ( ) is still absolute, ranging from 0.996 to 1.000, indicating that rather complicated watershed system spread across the lowlands in the City is a considerable obstacle in inner moving circulation within the jurisdiction so that the adjacency to bridges to get across the streams are an important attraction factor for the residents to locate their residences. Table 7. Summary of category- and variable weighting factors for population estimation obtained by the three best runs of six-variable genetic algorithm final model. Categories Best GA Solutions 1st 2nd 3rd C1 Topographic feature 0.263 0.263 0.247 C2 Infrastructures 0.299 0.298 0.247 C4 Regional access points Sum of Category Weights 0.500 0.502 0.502 1.062 1.063 0.996 Factors (Variables) Best GA Solutions 1st 2nd 3rd v1,1 Elevation 0.372 0.376 0.251 v1,2 Slope 0.620 0.624 0.749 v2,1 Stream 0.002 0.001 0.001 v2,5 Bridge 1.000 0.992 0.996 v4,1 IC&JC 0.125 0.110 0.063 v4,5 Entrance to Bun-Dang 0.875 0.875 0.929 Sum of Variable Weights 2.994 2.978 2.989 Finally, and most important, the weights of category (regional access points) and its dominant variable (entrance to Bun-Dang Newtown) are 0.500-0.502 and 0.875-0.929, respectively in the final model, and it is notable that the ranges of these values did not deviate much from those of initial 12 variable model. While highway interchanges as an important access points for inter-regional transportation, their contribution of 0.063-0.125 is incomparable to that of the entrance to Bun-Dang Newtown. In sum, it is evident that the connection to the Newtown explains almost 45% of the causes of the entire population distribution in Kwang-Joo City. - 20 - Proceedings of International Symposium on City Planning 2013 6. Conclusion The attractions of new modeling techniques depend on awareness of what they can offer and on empirical illustrations of what can be gained by comparison with current best practice. Should classes of models with superior performances be found, then there would be a basis for the development of new inductive approach (Openshaw 1988). This study has, in the same context, examined the genetic algorithm technique as a geodemographic parameter estimation method. It has been found that the genetic algorithm solutions seemed to be quite a reasonable explanatory tool in understanding the residential distribution of the population and, further, to test the presumed urban phenomena in this case of Kwang-Joo City. To investigate the generality of the findings, however, further research seems to be needed: to explore relative performances on multiple training datasets; to identify the extent to which they are dependent on the choice of different goodness-of-fit statistics. In connection with the specific topic of free-rider problem of the fast growing towns and localities as in this case of Kwang-Joo City, it has been seen that dominant explanatory causes of the population distribution is the connecting gateway for the nearby city residents to access the fully-equipped and well-planned infrastructures available in the Newtown. The capacities of the infrastructures in the Newtown are set at the optimal level to adequately serve the planned number of population in accordance with the initial Newtown master plan. Non-paying additional users from the adjacent cities not only causes the congestion and inconvenience to the Newtown dwellers thus raises the equity problem, but also brings about the uncontrolled growth in the neighboring localities before their authority could prepare enough time and fiscal resources to provide adequate the growth plan to accommodate their inflowing population. The regulatory provision to counteract such a phenomenon is deemed to be urgent in the Capital Region since it is expecting the sporadically planned newtowns to be completed soon in the coming years. References Balk, D.L., Deichmann, U., Yetman, G., Pozzi, F., Hay, S.I. and Nelson, A., 2006, Determining global population distribution: methods, applications and data. Advances in Parasitology 62, 120-156. Brandl, B., 2007, “Automated modelling in empirical social sciences using a genetic algorithm,” in Lecture Notes in Computer Science, Vol.4739, Computer Aided Systems Theory-Eurocast 2007, Eds. Moreno Diaz et al., 912-919. Cai, Q., Rushton, G., Bhaduri, B., Bright, E. and Coleman, P., 2006, Estimating smallarea populations by age and sex using spatial interpolation and statistical inference methods. Transactions in GIS, 10(4), 577-598. Fisher, M. and Leung, Y., 1998, A genetic algorithms based evolutionary computational neural network for modelling spatial interaction data. The Annals of Regional Science, 32, 437-458. - 21 - Flowerdew, R., Feng, Z. and Manley, D., 2007, Constructing data zones for Scottish Neighbourhood Statistics. Computers, Environment and Urban Systems, 31, 7690. Gen, M. and Cheng, R., 2000, Genetic algorithms and engineering optimization. John Wiley and Sons, Inc., New York. Holland, H.H., 1975, Adaptation in Natural and Artificial Systems. The University of Michigan Press: Ann Arbor. Langford M., Higgs, G., Radcliffe, J. and White, S., 2008, Urban population distribution models and service accessibility estimation. Computers, Environment and Urban Systems, 32, 66-80. Liao, Y., Wang, J., Meng, B. and Li, X., 2010, Integration of GP and GA for mapping population distribution. International Journal of Geographical Information Science, 24(1), 47-67. Liu, X.H., Kyriakidis, P. and Goodchild, M., 2008, Population-density estimation using regression and area-to-point residual kriging. International Journal of Geographical Information Science, 22(4), 431-447. Martin, D. and Bracken, I., 1991, Techniques for modelling population-related raster databases. Environment and Planning A, 23, 1069–1075. Mennis, J., 2003, Generating surface models of population using dasymetric mapping. The Professional Geographer, 55(1), 31-42. Meyer, M.C., 2003, An evolutionary algorithm with applications to statistics. Journal of Computational and Graphical Statistics, 12(2), 265-281. Muttil, N. and Lee, J., 2005, Genetic programming for analysis and real-time prediction of coastal algal blooms. Ecological Modelling, 189, 363-376. Openshaw, S., 1984, Ecological fallacies and the analysis of areal census data. Environment and Planning A, 16, 17-31. Openshaw, S., 1988, Building an automated modelling system to explore a universe of spatial interaction models. Geographic Analysis, 20(1), 31-46. Parasuraman, K., Elshorbagy, A. and Carey, S.K., 2007, Modelling the dynamics of the evapotranspiration process using genetic programming. Hydrological Sciences Journal, 52(3), 563-578. Venkatesan, R., Krishnan, T. and Kumar, V., 2004, Evolutionary estimation of macrolevel diffusion models using genetic algorithms: an alternative to nonlinear least squares. Marketing Science, 23(3), 451-464. Yim, K.W., Wong, S.C., Chen, A., Wong, C.K. and Lam, W.H.K., 2010, A reliabilitybased land use and transportation optimization model. Transportation Research Part C (in press), doi:10.1016/j.trc.2010.05.019 - 22 - Proceedings of International Symposium on City Planning 2013 Yuan, Y., Smith, R.M. and Limp, W.F., 1997, Remodeling census population with spatial information from LandSat TM imagery. Computers, Environment and Urban Systems, 21(3), 245-258 Yue, T.X., Wang, Y.A. Chen, S.P., Liu, J.Y., Qiu, D.S., Deng, X.Z., Liu, M.L. and Tian, Y.Z., 2003, Numerical simulation of population distribution in China. Population and Environment, 25(2), 141-163. - 23 -
© Copyright 2026 Paperzz