The genetic algorithm geospatial modeling of the residential

Proceedings of International Symposium on City Planning 2013
The genetic algorithm geospatial modeling of the residential
distribution: The case of Kwang-Joo City of Korea
Nae-Young Choei1, Seong-Hun Kim2, Inhan Kang3
Abstract
This study tries to delve into the causes of the population concentration in KwangJoo City near Seoul, Korea. The City is taken as the case locality since it has shown the
second most rapid growth in both residential population and total building floor area
among more than 140 localities in the Capital Region during the three recent
consecutive years. Simultaneously, the study tries to explore the genetic algorithms as a
method for geospatial modeling of the residential densities using the Korean National
Geographic Information Systems (NGIS) as property-level ancillary data. The gridded
map of population distribution is built from Korea Land Information System (KLIS)
and electronic Architectural Information System (eAIS). Then the functional form of the
genetic algorithm model is formulated to have a hierarchical weighting system in which
categorical weights of variable groups and individual weights of subordinate variables
are sought bilaterally to explain the residential densities. The model is run by a carefully
pretested set of reproductive plan parameters. As such, the genetic algorithm solutions
are fully discussed in an effort to probe its unique advantages in facilitating the analysis
processes for numerous potential geospatial applications.
Keywords: genetic algorithms; ancillary data; residential density; geospatial modeling
1
2
3
Professor of Urban Planning at Hongik University, Seoul, Korea, corresponding author.
Graduate student of Urban Planning at Hongik University, Seoul, Korea.
Undergraduate student of Urban Planning at Hongik University, Seoul, Korea.
-1-
1. Introduction
Korean Capital Region is well known for its uncontrolled development during the
nation’s rapid economic growth, and it now contains about 48% of the national
population. Especially, the recent evidences indicate that, among more than 140
localities in the region, the second fastest growing locality in the Region is Kwang-Joo
City that lies just beneath the southeastern part of Seoul. One of the reason of its fast
growth is thought to be its adjacency to Bun-Dang Newtown built about two decades
ago. Bun-Dang Newtown is still very preferred bed-town than other competing
localities as it is filled with high-priced apartment complexes that accommodate rather
affluent former Seoulites who mostly commute to Kang-Nam (River South) area in
Seoul. Such a phenomenal population concentration in Kwang-Joo City, however, is
suspected to be the so-called free-rider problem in that, without sufficient urban
infrastructures such as well-planned roads, parks, green areas, and renowned schools in
the existing City, the incoming residents are exploiting the advantages of adjacent
public facilities in the Newtown that they did not pay for to provide.
This paper, in this context, has two major objectives: one is to figure out if it is
evident that the population distribution in Kwang-Joo is truly related to the easy access
to Bun-Dang Newtown, and; the other is to explore the genetic algorithm technique in
identifying the explaining model for the purpose. For modeling, population data from
‘source’ enumeration areas (EAs) in Kwang-Joo City are disaggregated into ‘target’
grid populations with the aid of the property-level ancillary data available from the
Korean National Geographic Information System (NGIS). To benchmark the
performance of the genetic algorithm model, regression model is run in a parallel
fashion.
The paper is organized as follows. Section 2 describes the study area and the NGIS
data. The gridded population surface is constructed and the training dataset is selected in
Section 3. Section 4 deals with genetic algorithm model building, and Section 5
interprets the model outcomes, and the conclusion appears in Sections 7.
2. Study area and data
2.1 Study area
Kwang-Joo City is located to the southeast of Seoul City in Korea as shown in Figure
1. Lying at 37 ° 22 ′ N and 127 ° 06 ′ E, and with its population of 259,387 (602
persons/km2) that form 96,584 households as of 2012, Kwang-Joo City currently has ten
wards within its total area of 430.96km2. As can be seen in Figure 1Figure 2, four
western wards comprise the much urbanized sector than other corners of the City with
population density higher than 10 persons/ha. The three northern wards, on the other
hand, are of mostly rugged mountain terrain with extremely low residential densities
less than 2 persons/ha. The remaining three southeastern wards are mixed with high
lands and fluvial lowlands and the latter are filled mostly with irrigated rice fields with
medium population densities. Bun-Dang Newtown is located to the west of the City as
-2-
Proceedings of International Symposium on City Planning 2013
indicated in the figure, sharing one contact point across the ridge that divides the two
jurisdictions.
2.2 Data description
The population data for the study are from the national databases built annually
based on the registration of resident by the Ministry of Security and Public
Administration (MOSPA). The data are collected and reported by each ward level as the
minimum statistical area each year. Except a few incidences of undetected disguised
resident registration, the integrity and the precision of the data are guaranteed by the
MOSPA. Three consecutive annual ward-base population data ranging from 2010 to
2012 of Kwang-Joo City are extracted and used in the study.
By the permission of the MOLIT (Ministry of Land, Infrastructure and Transport),
two authentic NGIS databases have also been made available for the study as ancillary
data: the Korea Land Information System (KLIS); and the electronic Architectural
Information System (eAIS). The KLIS has been completed in 2007 by incorporating the
two previous national databases: the Parcel Based Land Information System (PBLIS) of
the MOSPA; and the Land Management Information System (LMIS) of the MOLIT.
The cadastral information from the PBLIS and the land-use/planning information from
the LMIS have been combined into one georeferenced database of KLIS. The eAIS, on
the other hand, has been constructed by combining the Building Management System
(BMS) of the MOSPA and the Construction Administration Management System
(CAMS) of the MOLIT so that the building permit information from the BMS and the
building register information from the CAMS have been consolidated into one database.
By joining the two DBs with the corresponding Parcel Numbering Unit (PNU) codes,
the study data within the boundary of Kwang-Joo City have been constructed.
-3-
a) The location and size of the Gyeonggi Province in Korea.
b) The location of shape of Kwang-Joo City in the Korean Capital Region.
Figure 1. The location and administrative boundaries of Kwang-Joo City, the case area.
-4-
Proceedings of International Symposium on City Planning 2013
Figure 2. Choroplethic population density maps of ten wards in Kwang-Joo City.
3. Building gridded population surface
3.1 Gridded population and choice of the cell size
Since the areal units used to report population data (enumeration districts, census
tracts, wards, local government units) do not have any natural or meaningful
geographical identity, analyses based on zonal data may be subject to the modifiable
areal unit problem (MAUP), where observed patterns and relationships are sensitive to
the spatial arrangement of the arbitrarily defined zones or to combinations of them
(Openshaw 1984). One promising alternative to avoid this well documented problem is
to generate a gridded population map (Martin and Bracken 1991). The advantages of the
gridded population surface are frequently summarized as follows: 1) the regular grid can
be easily re-aggregated to any areal arrangement required; 2) producing ecological data
in grid form is one way of ensuring compatibility between heterogeneous data sets; 3)
data in grid form make multi-resolution and multi-source information fusion easier; and
4) converted grid form can provide a way of avoiding some of the problems imposed by
artificial boundaries (Martin and Bracken 1991, Yue et al. 2003, Liao et al. 2010). In
the context just described, this study tries to construct a 200 by 200 meter gridded
population surface over entire jurisdiction of Kwang-Joo City for model building.
-5-
3.2 Generation of the gridded population map
3.2.1 Disaggregation of EA population into the gridded map
Suppose there are
wards ( ) with total population
in the study area that has
housing units on land parcels
for
. Consider also that each ward
has
housing units accommodating population
so that
and
. If
is aggregate floor area and
is individual floor area of the -th
housing unit in ward
, then the average per capita floor area
for each resident in
ward
follows as:
Here, the floor area is not just the building coverage on the ground but the total floor
area of the building (in the case of multi-storied structure). The exact floor area data of
each building thus are an important factor to obtain correct measure of
in Equation
(1) above. In this study, the accurate floor area data have been safely extracted from the
eAIS database aforementioned.
Now imagine that the study area is disaggregated into grid cells ( ) of the cell size
(= 200m). Then, if the centroid of a parcel
is located within the boundary of a cell
, the housing floor area on
could be allocated to
. If we let
be the total floor
area of housing in the cell , then, , the cell population of
in ward
, could be
calculated as:
The rationale for using the ward-based average floor areas in the denominator of
Equation (2) in calculating the
lies not only on the fact that ’s of EA are the
smallest population data available, but, more important, extreme heterogeneities in
residential densities, settlement patterns, and dwelling types between more urbanized
wards and less-densely populated agrarian or forested wards should be taken fairly into
account on the individual ward basis while homogeneities of the residential
circumstances within a ward are consistently maintained.
3.2.2 Outcomes of the gridded population map
The consequent matrix of cells each containing a population estimate is shown in
Figure 3. The source populations of EAs (wards) are disaggregated into target 200m
grids so that the settlement geography is drastically enhanced than the simple greyscaled choroplethic population density map that would otherwise be drawn by the
source areal unit of wards as in Figure 2. The distinction is particularly vivid in the
-6-
Proceedings of International Symposium on City Planning 2013
agrarian and forested wards. In Figure 2, populations of the six northern and
southeastern wards are marked only by their averages (as low as 0.36 to 6.65
persons/ha) whereas the same wards in the gridded map in Figure 3 spots the populated
areas by removing non-resident cells in areas of widespread rurality and sparsity.
In like manner, the highly populated cells in the four western wards also clearly
indicate exactly where the actual population is confined and concentrated in those wards.
Such extreme cells in the populated eastern wards, however, could undermine the
robustness in either heuristic or stochastic model fitting and weaken the predicting
potential of the fitted models unless they are properly eliminated. The next procedure, in
this context, should be to sort out the training dataset that the models would be trained
or constrained so as to enhance the likelihood of estimated population.
Figure 3. Gridded population map of 200m cell-size estimated with NGIS ancillary data.
3.3 Selection of training dataset
To select the potentially explanatory set of training data cells, the following steps
were applied: First, two top northern wards were excluded as they are not only too steep
and mountainous for habitation but, more important, they are doubly designated as the
greenbelt zone and water supply protection zone in which any development activities
are legally prohibited; second, those cells that accommodate uninhabitable facilities
such as the golf links, university campuses, watersheds, and agriculture promotion
zones are also ruled out; third, the cells placed on the steep slopes over 30% are left out
too, as they are unsuitable for physical development for any purpose; fourth, all the
-7-
uninhabited cells have been set aside since they could not contribute any meaningful
interpretation of the residential distribution at all; and, finally, only those cells that lie
within 10 to 90 percentile as well as within plus/minus 1.0 standard deviation around
mother population mean among the left cells are solely chosen in an effort to eliminate
the extreme outlying population counts that could distort the central tendency of the
majority of the training data cells.
The selected cells are seen to include populated clusters in rural wards yet excluding
most of the under-populated hinterlands. Figure 4. The selected 280 training data cells
dispersed across the jurisdiction of Kwang-Joo City. represents the finally chosen 280
data points. In extensive pretests, the dataset obtained this way has outperformed any of
the myriad alternative datasets obtainable by varying the cut-off criteria than set out
above. The later discussion on study models hence is confined to the outcomes run with
this dataset alone.
Figure 4. The selected 280 training data cells dispersed across the jurisdiction of
Kwang-Joo City.
4. Methodological approach
4.1 Conceptual basis of genetic algorithm
The mechanics of genetic algorithms seek to mimic the approach of nature in the
evolution of species as well described elsewhere (Holland 1975, Fisher and Leung 1998,
Gen and Cheng 2000,Venkatesan et al. 2004, Brandl 2007, Liao et al. 2010, Yim et al.
-8-
Proceedings of International Symposium on City Planning 2013
2010). They are based on the population genetics in natural selection that are connected
to the traditional approaches in problem optimization. The genetic algorithms require
the parameter set of the optimization problem to be coded as finite-length bit-strings of
binary numbers, real (floating-point) numbers, or integer numbers. The string is called a
chromosome (or individual) and the bits in the string are called genes. The genetic
algorithms work simultaneously with a group of different individuals called a
population. The power of genetic algorithms comes from implicit parallelism through
operating on these populations that offer a better chance of finding the global optimum:
successive generations of population are bred in such a way that the average fitness of
succeeding generations improves. For this, genetic operators are applied to the
individuals in the breeding population, generating new offspring by recombining
elements from the population. Though the recombinations are random, average fitness
tends to increase because of the biased selection process. Typically, the genetic
operators act at a randomly chosen point in the string. Figure 5 illustrates the effects of
the crossover and mutation operations on binary numbers. In Figure 5(a), two candidate
chromosomes
and
, represented by their dyadic expansion of
bits, are
undergoing a crossover operation at the randomly chosen bit location to produce two
offspring chromosomes
and
. The first bits of
are the same as that of
while the last (
) bits of
are those of
. The digits of
are similarly formed.
Crossover is equivalent to breeding because it involves trading portions of two parents
to create two offspring that contain random portions of parental gene structures.
Mutation, on the other hand, randomly changes constituent nodes of a parent to create a
new offspring. In Figure 5(b), a candidate chromosome
is undergoing mutation at
the two bits
and
to not- ( ) and not- ( ). Mutation is equivalent to a
random search heuristic and ensures that no point in the individual search space remains
unexplored.
-9-
a) Crossover Operation.
b) Mutation Operation.
Figure 5. Illustration of the mechanisms of the crossover and mutation operation.
4.2 Input variables chosen from the planning perspective
In genetic algorithms, as in any data-driven prediction model, the selection of
appropriate model input variables is very important, and the choice is generally based
on a priori knowledge of causal relationship and/or physical and ecological insight into
the problem. Numerous studies consider topographic features as the basic factors that
influence the population distribution such as: 1) altitude (Mennis 2003, Yue et al. 2003,
Flowerdew et al. 2007, Langford et al. 2008); 2) slope (Cai et al. 2006, Liao et al.
2010); and 3) hydrology (Yuan et al. 1997, Flowerdew et al. 2007). Transport network,
- 10 -
Proceedings of International Symposium on City Planning 2013
school catchment, markets, service centers, and land use classification are all important
factors that are also taken frequently as major input variables (for instance, Mennis 2003,
Balk et al. 2006, Langford et al. 2008, and Liu et al. 2008).
Table 1. The list of total twenty one input variables (
classified by five major categories ( ).
Categories
C1
Variables
Elevation
v1,2
Slope
v2,1
Stream
v2,2
Road
v2,3
School
v2,4
Park
v2,5
Bridge
v3,1
Library
v3,2
Ward Office
v3,3
Health Center
v3,4
Fire Station
v3,5
University
v3,6
Large Discount Mart
Regional Access
v4,1
Highway IC & JC
Points
v4,2
Industrial Complex
v4,3
Central Park in Bun-Dang
v4,4
Main Commercial in Bun-Dang
v4,5
Entrance to Bun-Dang
Land Use
v5,1
Green Belt
Regulation
v5,2
Water Protect
v5,3
Agrarian Improvement
Feature
C2
C3
C4
C5
Infrastructures
Public Services
Expected Effect
Negative as
magnitude
increases
v1,1
Topographic
) of the study model that are
Unit
Altitude (m)
%
Positive as
accessibility
increases
Real-valued
distance (m)
Negative as
magnitude
increases
Table 1 summarizes total 21 input variables for the study model chosen from the
planning perspective. They are classified by
categories for
, and the
variables are denoted categorically as
for
where
is the number of
variables in . For topographic feature variables of , the scores are predetermined as
in Table 2.
- 11 -
Table 2. Scores of elevation (
category of .
Code
Variables
v1,1
Elevation (m)
v1,2
Slope (%)
) and slope (
) variables in the topographic feature
Scores
5
4
3
2
1
0
h ≤ 100
100 < h ≤ 200
200 < h ≤ 300
300 < h ≤ 400
400 < h ≤ 500
500 < h
d ≤ 5
5<d ≤ 8
8 < d ≤ 10
10 < d ≤ 12
12 < d ≤ 15
15 < d
For the rest of the variables, values are measured by air distances from each cell
centroid to the target points of facility locations (
to ) or to the target polygons of
areal zones ( ).
For the latter, the distance is measured from the cell centroid to the target zone
boundary point that is on the line drawn perpendicular to the boundary from the cell.
Category
comprises the infrastructures and is included, in particular, to see how
decisive the impacts of these infrastructures are on the residents’ locational choice. The
) is included in
instead of
since it is seen to function more as an
stream (
amenity factor providing waterfront greeneries and open spaces just like parks (
)
and green areas (
) in
. Variables of
are local public services for the
represent the major regional access points that are
neighborhoods, and those of
thought to affect the location of population in Kwang-Joo City. They are highway
interchanges/junctions, industrial complexes, central parks and commercial area of BunDang Newtown. Specifically, the main entrance points from Kwang-Joo City to BunDang Newtown (
) is taken as one most important variable that provides the major
access corridor from the City to the Newtown thus could indicate the level of
dependency of the Kwang-Joo citizens on the Newtown.
Figure 6 and Figure 7 indicate the topographic features such as elevation and slope of
Kwang-Joo City, respectively. Elevation map is composed from the Digital Elevation
Map (DEM) that is formerly constructed by the National Geographic Information
Institute (NGII), whereas the slope analysis map was drawn from the digital contour
information from the spatial database of KLIS.
Figure 8 and Figure 9 also depicts the locations of all the variables in the categories
of Infrastructures, Public Services, Regional Access Points, and Regulatory Land Uses.
It is notable that most of the facilities are stationed in or near the populated sector so
that their distribution largely coincides with that of the training data cells shown in
Figure 4.
- 12 -
Proceedings of International Symposium on City Planning 2013
Figure 6. Elevation map of Kwang-Joo City drawn from the DEM data.
Figure 7. Slope analysis map of Kwang-Joo City drawn from the KLIS digital data.
- 13 -
Figure 8. Point locations of the Public Service and Regional Access Point variables.
Figure 9. Infrastructure and Regulatory Land Use category variable points and areas.
- 14 -
Proceedings of International Symposium on City Planning 2013
4.3 Models, fitness functions, and measurement
4.3.1 Formulation of population estimation models
Consider that
and
are the hierarchically bivariate parametric weighting
factors for the categories
and variables
, respectively, for
, and
where
is the number of variables in
as denoted earlier, and
is
the rescaling factor for the actual populations to be preserved (to be mass-preserving).
together with their subordinate
’s could be
Then the relative contribution of each
determined by their respective weights
and
using the genetic algorithms. If
denotes the cell population in each training cell, it could then be formulated as:
4.3.2 Fitness function for model evaluation
The fitness of the genetic algorithm model as well as its corresponding regression
model could be evaluated by using a fitness function that benchmarks the performance
of the model. Here, a typical deviation measure of correlation coefficient (as used by
Muttil and Lee (2005) and Parasuraman et al. (2007)) has been adopted as the fitness
function. The correlation coefficient ( ) evaluates the linear correlation between the
observed and the computed values, and, as a fitness function, R should be maximized.
Let
be the number of training cells presented to the models;
and
be
observed and estimated counterparts for
; and
and
be the means
and
. Then,
could be calculated using the following equations:
of
4.3.3 Parameter settings and hardware configuration
Table 3 summarizes the major genetic algorithm model parameter settings adopted in
this study, and they have been proven to provide the best performance of the model after
extensive trial runs. Some of the most important settings include: the population size of
200; crossover rate of 0.9; mutation rate of 0.01 and; the chromosome length of 8 bits.
The genetic algorithm model of this study were run on the system of Quad-Core 2.66
GHz CPU with 16Gb RAM under 64-bit Windows 7 OS environment.
- 15 -
Table 3. Settings of the reproductive plan parameters for the genetic algorithm model.
Parameter
Value
Population size ( ): number of individuals in the breeding pool
200
Generation ( ): generations over which individuals evolve
200
Crossover rate ( ): probability that an individual will be the offspring of two parents
0.9
Mutation rate (
0.01
): probability that an individual mutates
Reproduction rate ( ): probability that an individual will be the offspring of one parent
1- ( +
Generation gap (
0.98
): fraction of the population to be replaced after each iteration
Elitist strategy: reentry of the best individuals (
) without genetic operation
)
on
Diversity operator: allowance of non-uniform mutation
on
Chromosome length: number of genes in a chromosome
8-bits
Random seed: reproduction of the results of previously run evolutionary cycles
1
Termination of evolution: number of generations that the best fitness unchanged
100
5. Results
5.1 Genetic algorithm model outputs
5.1.1 Performance of the 12-variable model
Since the entire 21 variables are certainly intuitive candidates of influential ones to
explain the population distribution in many of the real world general urban settings,
some of them could be redundant and extensive under the specific locational
peculiarities of Kwang-Joo City. The first step, in this line of reasoning, was to screen
much less relevant variables among all to curtail the unnecessarily expansive runtimes
of genetic algorithm model runs. It was done by utilizing the stepwise regression
technique. Here, the reduced set of 12 more relevant variables were chosen among one
of the intermediate steps in the course of stepwise regression runs in an attempt to yet
leave enough room for the genetic algorithm to finalize the remaining calibration of the
model specification. Those 12 screened variables that belong to 5 categories are enlisted
in Table 1Table 4 accompanied by their estimated weights obtained from the three best
genetic algorithm runs among total 20 repeated runs. Category weights ( ) and variable
weights ( ) are reported in the left- and right-half of the table, respectively, while their
sums (
and
) are shown at the bottom. The fitness value of
of the
three best genetic algorithm runs among total 20 repeated runs are also reported in
subsequent Table 5. As can be seen in the table, the
values approach 0.428 and
0.429.
To ensure that the model did not have the problem of convergence to local maxima,
the traces of the evolutionary process had better be checked graphically. In case the
model is trapped, the plots usually show that the coordinates of the fittest individual
cease to change over a range of generations and stay on the flat spot. This type of lack
of convergence can be assessed by restarting the algorithms with different starting
values and random number seeds, comparing the resulting solutions (Meyer 2003). If
- 16 -
Proceedings of International Symposium on City Planning 2013
the solutions are farther apart than a given tolerance, it indicates a problem with
convergence. In this case, a larger population size and/or more mutation should be tried.
Figure 10 illustrates the tracks of the correlation coefficient , the fitness criteria, of the
three best runs of the model. They follow different paths in the course but eventually
converge to the value near 0.429 after 1,300 generations, confirming that the global
optimum has been successfully reached. It is notable that the tracks do not consistently
increase but sometime plummet and recover. Such abrupt changes also are signs of
intermittent occurrence of beneficial mutations.
Table 4. Summary of category- and variable weighting factors for population estimation
obtained by the three best runs of full-variable genetic algorithm model.
Categories
C1
C2
C3
C4
C5
Topographic
feature
Infrastructures
Public Services
Best GA Solutions
1st
2nd
3rd
0.259
0.243
0.239
0.247
0.039
0.247
0.055
0.294
0.031
Regional access
points
Land Use
Regulation
0.455
0.506
0.498
0.000
0.000
0.000
Variables
Best GA Solutions
1st
2nd
3rd
v1,1
Elevation
0.216
0.067
0.075
v1,2
Slope
0.784
0.925
0.937
v2,1
Stream
0.004
0.000
0.004
v2,4
Park
0.000
0.000
0.047
v2,5
Bridge
0.976
1.000
0.992
v3,1
Library
0.118
0.000
0.431
v3,2
Public Office
0.525
0.969
0.855
v3,5
University
0.400
0.831
0.671
v4,1
IC&JC
0.000
0.000
0.004
v4,5
Enterance to
Bun-Dang
0.996
0.996
0.999
v5,1
Green Belt
0.671
0.020
0.459
v5,3
Agrarian
Improvement
0.333
0.973
0.553
5.1.2 Interpretation of the 12-variable model output
In Table 5, the category weights of
(public services) and
(land use
regulation) resulted in near zero values in all three runs, which indicates that the
locations of these facilities have minimal influence on population distribution in
Kwang-Joo City. It is understood such that the public services are already evenly
installed so that their proximity is maintained within fairly uniform ranges from every
training data point. Regulatory land use zones such as greenbelts and agrarian
improvement zones are certainly uninhabitable and the consequence is readily
understandable.
- 17 -
The weights of
(topographic features) and
(infrastructures), however, is
surprising since the impacts of these variables are almost half of that of
(regional
access point). Unlike the public services, however, the infrastructures yet have
reasonable impact as much as major topographic features such as elevation and slopes.
Expectedly,
(regional access points) is the category with weights of substantial
relative importance, indicating about half (0.455-0.506) of the entire explaining power
with the sole contribution of, especially, the “entrance to Bun-Dang Newtown” (
)
that shows absolute predominance (0.996-0.999) in it.
In all, three categories of , , and
seems to substantially account for the
population in Kwang-Joo City. In terms of their variables, however,
(0.000-0.047)
indicate naught or meager contributions to . Only six remaining variables of:
and
in
;
and
in
; and
and
in
seem, therefore, to
persist as due candidate variables to be used in the best subset model runs.
Table 5. Summary of the fitness values achieved by the fitness criteria of R for the
genetic algorithm model run with the 12 screened variables.
Fitness Value
1st
2nd
3rd
R
0.429
0.428
0.428
Figure 10. Tracks of the fitness value of correlation coefficient (R) obtained by the three
best runs of the genetic algorithm model using the 12 screened variables.
- 18 -
Proceedings of International Symposium on City Planning 2013
5.1.3 Final model of six-variable genetic algorithms
Using the six best variables selected above, the genetic algorithm subset model has
been run and the tracks of
obtained from the three best runs again are illustrated in
Figure 11. The values converge to the global maximum of 0.424 only after 700
generations. Faster convergence with minor fluctuation to the optimum in much less
iterations implies that the composition of the subset variables significantly enhanced the
explanatory capability of the model.
The model runtime statistics are also reported in Table 6. Based on the 20 repeated
runs, the average runtime takes about 9 minutes and 18 seconds with the standard
deviation of 2 minutes and 49 seconds. While the best runs evolved up to nearly 800
generations, the average generation that reached the optimum value is 548 with the
standard deviation of 143 generations.
Table 6. Runtime statistics of the 200 population size genetic algorithm model based on 20 runs.
Genetic algorithms subset model
POP SIZE : 200
Generations
Runtimes
Mean
548
9:18
SD
143
2:49
Figure 11. Tracks of the fitness value of correlation coefficient (R) obtained by the three
best runs of the final genetic algorithm model using the six variables.
- 19 -
Table 7 summarizes the model outputs. In the table, category weights ( ) and
variable weights ( ) are reported in the left- and right-half of the table, respectively,
while their sums (
and
) are shown at the bottom. It can be seen that
weight totals sum near to, but not exactly, 1.0 (or 3.0 in the case of variable weights for
). This is because the tolerance is set to be internally modified to a smallest
possible number instead of zero: the chances for the fitness values to arrive at feasible
optima are usually enhanced significantly this way.
While the weight of
has slightly increased, the dominance of the slope (
) in it
has slightly lowered to 0.620-0.749, yet far overriding the impacts of elevation ( ).
For category
(infrastructures), the contribution of the bridges (
) is still
absolute, ranging from 0.996 to 1.000, indicating that rather complicated watershed
system spread across the lowlands in the City is a considerable obstacle in inner moving
circulation within the jurisdiction so that the adjacency to bridges to get across the
streams are an important attraction factor for the residents to locate their residences.
Table 7. Summary of category- and variable weighting factors for population estimation
obtained by the three best runs of six-variable genetic algorithm final model.
Categories
Best GA Solutions
1st
2nd
3rd
C1
Topographic
feature
0.263
0.263
0.247
C2
Infrastructures
0.299
0.298
0.247
C4
Regional access
points
Sum of Category
Weights
0.500
0.502
0.502
1.062
1.063
0.996
Factors (Variables)
Best GA Solutions
1st
2nd
3rd
v1,1
Elevation
0.372
0.376
0.251
v1,2
Slope
0.620
0.624
0.749
v2,1
Stream
0.002
0.001
0.001
v2,5
Bridge
1.000
0.992
0.996
v4,1
IC&JC
0.125
0.110
0.063
v4,5
Entrance to
Bun-Dang
0.875
0.875
0.929
Sum of
Variable
Weights
2.994
2.978
2.989
Finally, and most important, the weights of category
(regional access points) and
its dominant variable
(entrance to Bun-Dang Newtown) are 0.500-0.502 and
0.875-0.929, respectively in the final model, and it is notable that the ranges of these
values did not deviate much from those of initial 12 variable model. While highway
interchanges as an important access points for inter-regional transportation, their
contribution of 0.063-0.125 is incomparable to that of the entrance to Bun-Dang
Newtown. In sum, it is evident that the connection to the Newtown explains almost 45%
of the causes of the entire population distribution in Kwang-Joo City.
- 20 -
Proceedings of International Symposium on City Planning 2013
6. Conclusion
The attractions of new modeling techniques depend on awareness of what they can
offer and on empirical illustrations of what can be gained by comparison with current
best practice. Should classes of models with superior performances be found, then there
would be a basis for the development of new inductive approach (Openshaw 1988).
This study has, in the same context, examined the genetic algorithm technique as a
geodemographic parameter estimation method. It has been found that the genetic
algorithm solutions seemed to be quite a reasonable explanatory tool in understanding
the residential distribution of the population and, further, to test the presumed urban
phenomena in this case of Kwang-Joo City. To investigate the generality of the findings,
however, further research seems to be needed: to explore relative performances on
multiple training datasets; to identify the extent to which they are dependent on the
choice of different goodness-of-fit statistics.
In connection with the specific topic of free-rider problem of the fast growing towns
and localities as in this case of Kwang-Joo City, it has been seen that dominant
explanatory causes of the population distribution is the connecting gateway for the
nearby city residents to access the fully-equipped and well-planned infrastructures
available in the Newtown. The capacities of the infrastructures in the Newtown are set
at the optimal level to adequately serve the planned number of population in accordance
with the initial Newtown master plan. Non-paying additional users from the adjacent
cities not only causes the congestion and inconvenience to the Newtown dwellers thus
raises the equity problem, but also brings about the uncontrolled growth in the
neighboring localities before their authority could prepare enough time and fiscal
resources to provide adequate the growth plan to accommodate their inflowing
population. The regulatory provision to counteract such a phenomenon is deemed to be
urgent in the Capital Region since it is expecting the sporadically planned newtowns to
be completed soon in the coming years.
References
Balk, D.L., Deichmann, U., Yetman, G., Pozzi, F., Hay, S.I. and Nelson, A., 2006,
Determining global population distribution: methods, applications and data.
Advances in Parasitology 62, 120-156.
Brandl, B., 2007, “Automated modelling in empirical social sciences using a genetic
algorithm,” in Lecture Notes in Computer Science, Vol.4739, Computer Aided
Systems Theory-Eurocast 2007, Eds. Moreno Diaz et al., 912-919.
Cai, Q., Rushton, G., Bhaduri, B., Bright, E. and Coleman, P., 2006, Estimating smallarea populations by age and sex using spatial interpolation and statistical
inference methods. Transactions in GIS, 10(4), 577-598.
Fisher, M. and Leung, Y., 1998, A genetic algorithms based evolutionary computational
neural network for modelling spatial interaction data. The Annals of Regional
Science, 32, 437-458.
- 21 -
Flowerdew, R., Feng, Z. and Manley, D., 2007, Constructing data zones for Scottish
Neighbourhood Statistics. Computers, Environment and Urban Systems, 31, 7690.
Gen, M. and Cheng, R., 2000, Genetic algorithms and engineering optimization. John
Wiley and Sons, Inc., New York.
Holland, H.H., 1975, Adaptation in Natural and Artificial Systems. The University of
Michigan Press: Ann Arbor.
Langford M., Higgs, G., Radcliffe, J. and White, S., 2008, Urban population distribution
models and service accessibility estimation. Computers, Environment and
Urban Systems, 32, 66-80.
Liao, Y., Wang, J., Meng, B. and Li, X., 2010, Integration of GP and GA for mapping
population distribution. International Journal of Geographical Information
Science, 24(1), 47-67.
Liu, X.H., Kyriakidis, P. and Goodchild, M., 2008, Population-density estimation using
regression and area-to-point residual kriging. International Journal of
Geographical Information Science, 22(4), 431-447.
Martin, D. and Bracken, I., 1991, Techniques for modelling population-related raster
databases. Environment and Planning A, 23, 1069–1075.
Mennis, J., 2003, Generating surface models of population using dasymetric mapping.
The Professional Geographer, 55(1), 31-42.
Meyer, M.C., 2003, An evolutionary algorithm with applications to statistics. Journal of
Computational and Graphical Statistics, 12(2), 265-281.
Muttil, N. and Lee, J., 2005, Genetic programming for analysis and real-time prediction
of coastal algal blooms. Ecological Modelling, 189, 363-376.
Openshaw, S., 1984, Ecological fallacies and the analysis of areal census data.
Environment and Planning A, 16, 17-31.
Openshaw, S., 1988, Building an automated modelling system to explore a universe of
spatial interaction models. Geographic Analysis, 20(1), 31-46.
Parasuraman, K., Elshorbagy, A. and Carey, S.K., 2007, Modelling the dynamics of the
evapotranspiration process using genetic programming. Hydrological Sciences
Journal, 52(3), 563-578.
Venkatesan, R., Krishnan, T. and Kumar, V., 2004, Evolutionary estimation of macrolevel diffusion models using genetic algorithms: an alternative to nonlinear least
squares. Marketing Science, 23(3), 451-464.
Yim, K.W., Wong, S.C., Chen, A., Wong, C.K. and Lam, W.H.K., 2010, A reliabilitybased land use and transportation optimization model. Transportation Research
Part C (in press), doi:10.1016/j.trc.2010.05.019
- 22 -
Proceedings of International Symposium on City Planning 2013
Yuan, Y., Smith, R.M. and Limp, W.F., 1997, Remodeling census population with
spatial information from LandSat TM imagery. Computers, Environment and
Urban Systems, 21(3), 245-258
Yue, T.X., Wang, Y.A. Chen, S.P., Liu, J.Y., Qiu, D.S., Deng, X.Z., Liu, M.L. and Tian,
Y.Z., 2003, Numerical simulation of population distribution in China.
Population and Environment, 25(2), 141-163.
- 23 -