Population Change in Regional and Metropolitan Areas: Cluster

Population Change in Regional and
Metropolitan Areas: Cluster analysis
using Australian Data
Karim Mardaneh
The Business School Working Paper Series:
001 - 2013
Karim Mardaneh - The Business School, University of Ballarat, Mt Helen,
Victoria, Australia 3353
Copyright © 2011
Working papers are in draft form. This working paper is distributed for purposes of comment and
discussion only. It may not be reproduced without permission of the copyright holder. Copies of working
papers are available from the author.
CRICOS Provider No. 00103D
Abstract
A large body of research is focused on population growth and decline, however, less attention has been paid to the
possible effects of the socio-economic factors and sustenance activities on population change. Using the Australian
Bureau of Statistics Census Data 2001-2005, the study examines the role of socio-economic factors and sustenance
activities in population change in both regional and metropolitan areas. The novelty of the study is twofold. Conceptually
it compares the combined role of socio-economic factors and sustenance activities on population change; and,
empirically it uses cluster analysis technique to conduct the analysis. The results suggest that the explanatory factors for
regional areas are mainly socio-economic factors and for metropolitan areas mainly relate to sustenance activities. Policy
implications of the study indicate the need for study on regional development in relation to socio-economic factors, and
particularly income, education and age, and their impact on viability of the regional areas.
Key Words: Population, Regional, Clustering, k-means, Sustenance activities
Introduction
The current comparative study examines, compares, and contrasts the impact of some socio-economic
factors (individual weekly income, education level, age group) and sustenance activities (retail trade; mining;
agriculture, forestry and fishing; and manufacturing) on population growth and decline in regional (nonmetropolitan) and metropolitan areas. The most important activity in providing sustenance to the residents of
an area is referred to as the key sustenance activity (Gibbs and Martin, 1959) which determines the level of
resources available to population, thus affecting the area’s pattern of population change, and it’s study is
crucial in economic analysis (Stimson et al., 2006).
The existing literature mainly focuses on socio-economic factors and some consider sustenance activities.
However, the possible population variation due to combined effect of socio-economic factors and sustenance
activities across regional and metropolitan areas has not been investigated. This study attempts to fill this
gap in the literature by examining the possible population variation and patterns of activities within
geographical areas using these factors and activities.
Researchers have examined population growth and decline based on economic functions of population
centres, employing a range of analytical methodologies, (see, for example, Brown, 1968; Lee and Carter,
1992; Beer and Clower, 2009; Beer and Maude, 1995; Calthorpe and Fulton, 2001; Sanderson, 1998). Smith
(1965) used industry of employment data to study similarity of functional specialisation in different towns.
Similar research has been conducted by Beer and colleagues (See, for example, Beer and Keane, 2000;
Beer et al., 2003) in more recent years. Some researchers have used cluster analysis in larger data sets
(See, for example, Freestone et al., 2003; Sorensen and Weinand, 1991). Other researchers (see, for
example, Beer and Clower, 2009; Beer and Maude, 1995) used both cluster and regression analysis to
examine changes in the economic functions of towns between 1961 and 1991.
Cluster analysis is the task of assigning a set of objects into groups (called clusters) so that objects in the
same cluster are more similar (in some sense or another) to each other than to those in other clusters
(Bagirov, 2008; Bagirov and Mardaneh, 2006; Mardaneh, 2007). Clustering algorithms can be used to
analyse large data sets comprising a myriad of economic, social and demographic variables for numerous
samples (Statistical Local Areas or SLAs in this study). They seek to group samples with similar
characteristics and ensure maximum statistical separation from other contrasting clusters. In this process of
pattern recognition, they simplify understanding of those large data sets. Empirical studies of the
performance of clustering algorithms suggest that one of the iterative clustering methods (e.g., k-means
CRICOS Provider No. 00103D
2011 The Business School Working Papers: xxx-2011
Page 2 of 16
clustering) is preferable to other hierarchical methods (e.g., Ward’s clustering), (Punj and Stewart, 1983). As
all clustering algorithms include more and more observations, their performance tend to deteriorate.
However, the k-means algorithm despite the possibility of converging to a local minima (Bagirov, 2008) has
proven to be a more efficient clustering algorithm in many studies (see, for example, Bayne, et al., 1980;
Mezzich, 1978; Milligan, 1980) and it is more robust than any of the hierarchical methods with respect to the
presence of outliers (Mardaneh, in-press).
The study attempts to address clustering of geographical areas by taking an alternative approach originally
proposed by Beer et al. (2003) and Freestone et al. (2003). In this sense, the major contribution of the study
to the urban studies literature is that it considers the combined effect of socio-economic factors and
sustenance activities on population change. This requires large data sets and sophisticated clustering
techniques for pattern recognition, which has seldom been used in population change research. This study
uses the k-means clustering algorithm (one of the algorithms that can be used in cluster analysis) to cluster
SLAs. Additionally multiple regression analysis is used to determine the impact of the generated optimal
clusters on population change.
The paper is organised in the following order: the first section reviews the existing literature on the effects of
socio-economic factors and sustenance activities on population change and sets a framework for explaining
the possible population variation and pattern of activities. Second section provides a description of the data
and the analytical approach. Third section explains the regional and metropolitan SLAs growth and decline.
In the fourth section the results of cluster and regression analysis is presented. Finally it follows the
discussion and the conclusion.
Literature Review
Population change has been the focus of research for a long time. Many studies have considered a range of
socio-economic factors influencing population change (see, for example, Black and Henderson, 1999; Goetz
and Debertin, 1996; Millward, 2005; O’Connor et al., 2001; Polese and Shearmur, 2006; Watts, 2009; Nam
and Reilly, 2012). A few of these studies are outlined here. Sustenance activities are defined as activities
through which a community’s resources such as labour and capital support the area’s population (Krout,
1982). Population growth of an area may not lie in the diversification of the economy alone but in the ability
to select optimum economic activities (e.g. agriculture, mining, manufacturing, specialised industrial and
service sectors) for sustenance specialisation (Murdock et al., 1991).
Shumway and Davis (1996) used sustenance activities to examine the impact of each on net migration.
Frisbie and Poston (1975, 1978) examined population change against a number of socio-economic factors
such as income as well as sustenance activities. Frisbie and Poston (1976) examined differences in the
structure of sustenance activities in non-metropolitan counties of the U.S. for the 1960s and 1970s. Krout
(1982) examined the impact of some sustenance activities (mining, manufacturing, agriculture, service, retail)
on net migration for each of the 1960-1970 and 1970-1974 periods in the U.S.. Poindexter and Clifford
(1983) assessed the importance of sustenance activities in the period 1970 to 1980 in non-metropolitan
counties of the U.S. Some studies consider service industries as sustenance activities (see for example,
Frisbie and Poston,1976; Hutton, 2004; Poindexter and Clifford,1983).
Findings of these above mentioned studies are in some cases conflicting (e.g. agriculture and mining
industries having positive or negative impact on net migration). They indicate that in most cases population
increase and decrease is impacted directly by socio-economic factors such as median income. Further, the
most common factor identified in these studies that positively influences population change is the
constellation of sustenance activities, specifically in industries such as retail, wholesale and service.
However, there is a lack of agreement on the extent of the impacts on population change from socioeconomic and sustenance activities. This issue is investigated in this study by separating out the regional
and metropolitan areas.
CRICOS Provider No. 00103D
2011 The Business School Working Papers: xxx-2011
Page 3 of 16
Research on population change in both regional and metropolitan areas has been of continuing interest due
to changing government policies. Stimson et al. (2006) specified the changing economic development
policies and their impact on population change across different regions. During the 1950s and 1960s
regional economic development became a priority for the governments. From the 1970s, globalisation
highlighted the issue of national and regional competitiveness. As a result, substantial economic
restructuring affected the economy and demographics of the regions, towns and cities. This economic
structural adjustment caused population growth in some regions and population decline in others (Stimson et
al., 1998). From the 1970s to the 1990s, these policy-based structural adjustments led to new approaches in
understanding regional economic development.
Through economic restructuring and adjustment, some industries became known as ‘sustenance activities’
and were pivotal to the growth of the regions while other regions declined. Stimson et al. (2006) indicate that
distribution of sustenance activities vary from regional to metropolitan areas. Thus, it is expected that their
impact on population growth and decline to be different based on location.
Almost two-thirds of the population of Australia live in metropolitan areas (Australian Bureau of Statistics,
2010b). Due to the sparser population distribution in regional areas they lack the critical mass. As a result,
population growth and decline in these areas is influenced by only a few, and relatively simple factors,
compared to metropolitan areas with more factors in absolute numbers and a high level of complexity.
Metropolitan areas create new growth industries, generate employment (Karlsson, 1999), influence
population change (Curran, 2010; Stimson et al., 1998; Krakover, 1985) and generate prosperity (Stimson et
al., 2006). The generative power of the sustenance activities, as a base for understanding population change
is stronger in the metropolitan areas with critical mass and complexity.
This brief overview of the research literature identifies a gap in which there is uncertainty in terms of relative
impact on population change between socio-economic factors and sustenance activities. Clustering analysis
that separates regional and metropolitan data may assist in clarifying and disentangling this uncertainty. This
is the research that is pursued below.
Objectives, Analytical Methodology and Data
The objective of this study is to examine and compare metropolitan and regional population variations in the
context of socio-economic factors and sustenance activities.
The study builds upon previous studies (see, for example, Beer and Maude, 1995; Beer, 1999; Beer and
Clower, 2009) and uses k-means clustering algorithm to cluster SLA data, because of its recognised ability
to analyse large data sets. The k-means algorithm considers each sample (SLA in this study) in a data as a
n
point in n-dimensional space ( R ) and chooses k centres (also called centroids) and assigns each point to
the cluster nearest the centre. The centre is the average of all the points in the cluster, that is, its coordinates
are the arithmetic mean for each dimension separately over all the points in the cluster. This current study
proposes that both socio-economic factors and sustenance activities and their varied presence in different
regions play a significant role in population growth and decline.
For the analysis a two stage process is conducted, which is explained as follows:
1. In the first stage of the analysis SLAs are clustered based on the identified variables. This is to
cluster all SLAs using each variable of the industry of employment, individual weekly income,
education level, age group separately. The output of clustering is clusters relevant to each of the
mentioned variables and each cluster includes a set of SLAs. Clusters and the variables are then
cross-tabulated to obtain the mean value for each category of a variable (e.g. cross-tabulate industry
of employment with cluster 1, which will show the mean value for categories of the industry of
employment: mining, retail trade, etc.). See Appendix 1 for an example of the industry of
employment clusters. This represents the percentage of employed people under each category of
the industry of employment. Then there is an identification of variables which have the highest mean
CRICOS Provider No. 00103D
2011 The Business School Working Papers: xxx-2011
Page 4 of 16
value within the cluster (e.g. retail trade has the highest mean within cluster 1). Thus the cluster is
named after the category with the highest mean value. For example, since retail trade has the
highest mean value in cluster 1, then it is called ‘retail trade’ cluster.
2. In the second stage of the analysis the generated clusters are used in a regression model to
examine their impact on population change. This stage aims to identify clusters which are impacting
on population growth and decline.
This study uses the census data for 2001-2006 which is sourced from the ABS (Australian Bureau of
Statistics, 2010a). This includes industry of employment (19 categories), occupation type (8 categories),
employment status (6 categories), individual weekly income (18 categories), education level (5 categories),
and age group (21 categories) (See Appendix 2 for details). Both the industry of employment and occupation
type is derived from sustenance activities. The other four factors (employment status, individual weekly
income, education level, age group) are socio-economic related factors.
The study used all industry and occupation categories. Since other variables were too detailed (e.g.
individual weekly income; employment status, etc.) and that level of detail was not necessary for the
analysis, for simplicity those were merged into fewer categories. As a result the six employment status
categories were merged to employed, unemployed and not in the labour force. The eighteen income
categories were collapsed into five categories as negative income, nil income, $1-$999, $1000-$1999, and
$2000 and more. Age groups were merged into three new categories: 25-39, 40-64, 65 and older. Education
levels were collapsed into two new categories described here as high level of tertiary and postgraduate, and
low level of tertiary and postgraduate.
It was necessary to examine and analyse regional and metropolitan data separately to be able to compare
and contrast the results. The study separated SLA data into regional and metropolitan data based on the
Australian Standard Geographical Classification (ASGC) (See ABS, 2006 and ABS, 2009). Decision was
taken on some SLAs that are technically ‘regional’ to include them within ‘metropolitan’ category. For this the
study used the ASGC’s classification. According to this classification the study considered ‘major urban’
areas (SLAs with populations exceeding 100,000) as metropolitan SLAs. Also SLAs with fewer than 100,000
people that were on the fringes of ‘major urban’ areas were considered as metropolitan. All the rest of the
SLAs (with population less than 100,000) were considered as regional. These ‘fringe areas’ fall outside the
expanding metropolitan sphere and will be most likely to participate in growth and change that is happening
within metropolitan areas. Also a high level of population travel for work from these areas to metropolitan
areas (Budge, 2006). This procedure resulted in a split of 1431 SLAs into 726 for regional and 705 SLAs for
metropolitan areas. Some SLAs were detected with extreme population growth or decline values which were
considered as outliers. Outliers could skew the data and manipulate the analysis results, so they were
eliminated and 689 SLAs for regional and 659 SLAs for metropolitan areas were used in the analysis. Tables
1 and 2 provide a breakdown of these SLAs in terms of categories of population change. Allowance was
made for the fact that there had been changes in some SLAs between 2001 and 2006.
Regional and Metropolitan SLA Growth and Decline
Population growth and decline rates are presented in Tables 1 and 2 in six categories encompassing all
growth and decline rates. While 111 regional SLAs experience a decline of more than 5%, only 34
metropolitan SLAs experience the same decline level. In total, 38% of regional SLAs had a declining
population whereas this rate for metropolitan areas is only 20%. Within the declining group, 16% of regional
SLAs and only 5% of metropolitan SLAs experienced a decline of more than 5%. This scenario reverses
when population increase is considered. Within increasing group, 11% of regional SLAs and 16% of
metropolitan SLAs experienced an increase of more than 15%.
Table 1. Categories of population change (%) for regional SLAs
Categories of change %
CRICOS Provider No. 00103D
No of SLAs
within each
%SLAs within
each Category
2011 The Business School Working Papers: xxx-2011
Population (000’s) within each category
Page 5 of 16
Category
2001
2006
262.0
Decline more than 5
111
16
289.9
Decline 0 to 5
153
22
1015.3
995.9
Increase 0 to 5
178
26
1627.2
1667.2
Increase 5-10
118
17
1431.1
1530.3
Increase 10-15
50
8
504.8
568.2
Increase 15-20
79
11
779.9
968.3
Total
689
100
5648.2
5991.9
Table 2. Categories of population change (%) for Metropolitan SLAs
Categories of change %
No of SLAs
within each
Category
%SLAs within
each Category
Population (000’s) within each category
2001
2006
Decline more than 5
34
5
96.1
89.5
Decline 0 to 5
102
15
2190.9
2153.0
Increase 0 to 5
211
32
6100.1
6240.2
Increase 5-10
131
20
2606.2
2785.1
Increase 10-15
77
12
1140.0
1282.8
Increase 15-20
104
16
1633.4
2032.9
Total
659
100
13766.7
14583.6
Results
A two-stage process of analysis was followed to produce the results outlined in this paper. The k-means
clustering algorithm was used in the first stage to cluster all SLAs once for each of the used variables
(industry of employment, occupation type, employment status, individual weekly income, education level,
and age group). In the second stage the clusters generated in stage one were used in a multiple regression
model to examine their impact on population growth and decline.
Clustering analysis
SLAs were clustered using the k-means algorithm. To find the optimal number of clusters for different
variables clustering of 2 to 10 clusters were tested. The optimal number of clusters is a number that better
reflects the underlying cluster structure of the data set. Additionally it best separates clearly the clusters
according to their Centroids (mean value). When the number of clusters is more than optimal, artificial
clusters are generated and when the number is less than optimal, clusters are merged and as a result
Centroids do not get well separated. Additionally when the number of clusters is not optimal, the chance of
one category of a variable appearing repeatedly within more than one cluster increases. Clustering the data
for the industry of employment, occupation type and individual weekly income resulted in an optimal number
of 7 clusters. The optimal number for the employment status was 3, and it was 2 for both the age group and
the education level.
Clusters for regional data are as follows:
Industry of employment (7 clusters: IndusC1 – IndusC7);
Occupation type (7 clusters: OccupC1 – OccupC7);
Employment status (3 clusters: EmployC1 – EmployC3);
Individual weekly income (7 clusters: IncomeC1 – IncomeC7);
Education level (2 clusters: EduC1 – EduC2);
Age group (2 clusters: AgeC1 – AgeC2).
CRICOS Provider No. 00103D
2011 The Business School Working Papers: xxx-2011
Page 6 of 16
Table 3. Variables and categories, clusters and mean values for each cluster (Regional)
Clusters
Categories (Name)
Highest Mean score
(%)/
Cluster
Industry of employment
IndusC1
Retail Trade
12.20
IndusC2
Public Administration and Safety
53.20
IndusC3
Accommodation and Food Services
23.75
IndusC4
Professional, Scientific and Technical Services
13.12
IndusC5
Mining
30.98
IndusC6
Agriculture, Forestry and Fishing
23.01
IndusC7
Agriculture, Forestry and Fishing
46.41
Occupation type
OccupC1
Professional, Scientific and Technical Services
28.75
OccupC2
Technicians and Trades Workers
16.75
OccupC3
Managers
24.20
OccupC4
Labourers
37.98
OccupC5
Labourers
37.17
OccupC6
Managers
41.13
OccupC7
Machinery Operators And Drivers
18.57
EmployC1
Employed
47.11
EmployC3
Unemployed
2.79
EmployC3
Not in labour
29.32
IncomeC2
Nil income
6.39
IncomeC5
Negative income
1.91
IncomeC6
$1000- $1999
18.48
IncomeC6
$2000 and more
8.83
IncomeC7
$1- $ 999
56.56
High Tertiary and Postgraduate
29.72
Employment status
Individual weekly income
Education level
EduC2
Age group
AgeC1
40-64 years old
34.25
AgeC1
65 and more
14.47
AgeC2
25-39 years old
22.29
Table 4. Variables and categories, clusters and mean values for each cluster (Metropolitan)
Variable
Categories (Name)
Highest Mean%
Cluster/Variable
Industry of employment
IndusC1
Health Care and Social Assistance
13.17
IndusC2
Retail Trade
11.27
IndusC3
Manufacturing
15.40
IndusC4
Retail Trade
12.29
IndusC5
Professional, Scientific and Technical Services
12.90
IndusC6
Public Administration and Safety
33.37
IndusC7
Agriculture, Forestry and Fishing
20.12
CRICOS Provider No. 00103D
2011 The Business School Working Papers: xxx-2011
Page 7 of 16
Occupation type
OccupC1
Managers
23.95
OccupC2
Labourers
17.99
OccupC3
Professional, Scientific and Technical Services
36.88
OccupC4
Professional, Scientific and Technical Services
26.74
OccupC5
Technicians and Trades Workers
16.68
OccupC6
Professional, Scientific and Technical Services
19.83
OccupC7
Technicians and Trades Workers
18.39
EmployC1
Unemployed
3.36
EmployC1
Not in labour
27.88
EmployC2
Employed
50.36
IncomeC1
$1000- $1999
23.55
IncomeC3
Nil income
5.83
IncomeC3
$2000 and more
11.32
IncomeC5
Negative income
0.34
IncomeC5
$1- $ 999
54.95
High Tertiary and Postgraduate
40.44
AgeC1
40-64 years old
31.95
AgeC1
65 and more
15.89
AgeC2
25-39 years old
22.25
Employment status
Individual weekly income
Education level
EduC2
Age group
Clusters generated for metropolitan data were similar to the regional, however, there were two clusters for
employment (EmployC1 – EmployC2) and five clusters for income (IncomeC1 – IncomeC5). Tables 3 and 4
provide cluster descriptions.
Cross-tabulating a variable (e.g. industry of employment) with a relevant cluster (e.g. IndusC1) reveals the
mean scores for different categories (e.g. retail trade, mining, etc.) of that particular variable. For example
within cluster 1 (IndusC1) categories and their mean values are as: agriculture, forestry and fishing (4.95),
mining (1.52), manufacturing (10.33), construction (9.22), retail trade (12.20) and so on. This in reality
represents the percentage of people employed under each category of the industry of employment. A cluster
then is named after the category which has the highest mean score. As Table 3 shows retail trade category
has the highest mean score (12.20) within cluster IndusC1, therefore the cluster is considered as retail trade
cluster. In Table 4, the category of Retail Trade appears to have the highest mean score within two clusters
(IndusC2 and IndusC4) so it is reported twice.
As indicated in Table 3 individual weekly income includes seven clusters and the number of clusters (7)
exceeds the total number of categories (5). In this case, clusters are only reported if they include at least one
category of income with the highest mean score. For regional individual weekly income the highest mean
score (56.56) belongs to the $1-$999 category within cluster 7 (IncomeC7). For employment status, since
the employed category appeared with the highest mean score within all three relevant clusters (meaning that
majority of people belong to this category) the values for the other categories (unemployment, not in labour)
were repressed. To overcome this problem, two categories (unemployed and not in labour force) with higher
mean scores within cluster 3 were selected. For this reason the cluster (EmployC3) has been reported twice
in Table 3.
CRICOS Provider No. 00103D
2011 The Business School Working Papers: xxx-2011
Page 8 of 16
Each occupation type cluster includes a combination of different occupation categories with different mean
scores. In this case, the highest mean score for regional belongs to managers within cluster 6 (OccupC6)
and for metropolitan belongs to professional, scientific and technical services within cluster 3 (OccupC3).
This indicates that the proportion of employed people in the managers and professionals categories is higher
than all the other occupation types. Both managers and professional, scientific and technical services
categories happen to have the highest mean score within more than one cluster.
For education, for both regional and metropolitan the highest mean score relates to the high level of tertiary
and postgraduate category (cluster EduC2, where higher proportion of people have high rather than low level
of tertiary and postgraduate education). For age group, two clusters emerged, with the highest mean score
belonging to the 40-64 years old category for both regional and metropolitan. Cluster C1 (AgeC1) appears to
include the highest mean score for both the 40-64 years old and 65 and older categories for both areas.
By this stage and as a result of the above process, categories with the highest mean score are selected
which represent a cluster (e.g. IndusC1, AgeC2, etc.) and are summarised in Tables 3 and 4. An overall
comparison of the clusters between regional and metropolitan areas show that for regional areas
employment in the public administration and safety industry, and managers occupation is higher. Also people
in 40-46 years old age group are more in regional areas. For metropolitan areas employment in professional,
scientific and technical services occupation is higher. Also more people in metropolitan areas have the
income levels of $1000-$1999, $2000 and more and a higher tertiary and postgraduate degree.
Regression analysis
Initial regression analysis was conducted by including sets of clusters (e.g. IndusC1, AgeC1, etc.) as dummy
variables, not the master variables (e.g. industry of employment, age group, etc.) in separate regional and
metropolitan regression models. In other words these clusters (as independent variables) were used to
examine their impact on population change (as dependent variable).
This analysis revealed that for regional areas neither occupation type nor employment status had a
statistically significant impact on population change. Therefore, these two were not included in the analysis.
Their importance may have been subsumed under industry of employment categories. For metropolitan
areas only the industry of employment and the age group showed a statistically significant impact on
population change, thus only these two were included in the analysis.
Under each variable (industry of employment, individual weekly income, education level, age group) one of
the clusters, with the lowest coefficient value (associated with the lowest population growth) constitutes the
‘base’ (reference) cluster and all the other clusters (of a particular variable) are compared against this base
cluster. As a rule when incorporating these clusters (as dummy variables) into a regression model, only n-1
(where n signifies the number of clusters) clusters (variables) are entered to represent the required
information. As a result these base clusters (IndusC6; IncomeC5; EduC1; AgeC1) do not appear in Tables 5
and 6. Comparison of other clusters to the base cluster helps to explain whether other clusters contribute to
population growth or not, and if they do contribute what is the significance of their contribution compared to
the base (lowest) cluster. Tables 5 and 6 provide appropriate summary information of statistical results, and
the proportion of SLAs inside each cluster that are facing population growth or decline.
CRICOS Provider No. 00103D
2011 The Business School Working Papers: xxx-2011
Page 9 of 16
Net % Change inside
each cluster
Positive
% of members
(SLAs) inside each
cluster facing
negative or positive
population growth
Negative
Coefficient
Categories (Name)
Independent Variables
Table 5. Regional model regression results
IndusC1
Retail Trade
0.026
28
72
6
IndusC2
Public Administration and Safety
-0.014
48
52
-1
IndusC3
Accommodation and Food Services
0.052
29
71
5
IndusC4
Professional, Scientific and Technical Services
0.049
10
90
19
IndusC5
Mining
-0.100
53
47
-2
IndusC7
Agriculture, Forestry and Fishing
-0.034
62
38
-1
7
Intercept
0.660
Industry of employment
Individual weekly income
IncomeC1
$1- $ 999
0.007
28
72
IncomeC2
Nil income
0.006
43
57
3
IncomeC3
$1- $ 999
0.026
27
73
11
IncomeC4
$1- $ 999
0.021
19
81
6
IncomeC6
$1000-$1999 & $2000 and more
0.136
3
97
36
IncomeC7
$1- $ 999
-0.002
40
60
3
High level of tertiary and postgraduate
0.014
15
85
24
25-39 years old
0.022
35
65
12
Education level
EduC2
Age group
AgeC2
Regional model:
R 2 =0.353; adjusted R 2 =0.340;
Coefficients in bold: Significant at the 95% level
Intercept
Net % Change inside
each cluster
Positive
% of members
(SLAs) inside each
cluster facing
negative or positive
population growth
Negative
Coefficient
Categories (Name)
Independent Variables
Table 6. Metropolitan model regression results
2.236
Industry of employment
IndusC1
Health Care and Social Assistance
0.304
9
91
7
IndusC2
Retail Trade
0.334
14
86
9
IndusC3
0.222
20
80
7
IndusC4
Manufacturing
Retail Trade
0.383
14
86
9
IndusC5
Professional, Scientific and Technical Services
0.296
11
89
8
IndusC7
Agriculture, Forestry and Fishing
0.224
0
100
6
25-39 years old
0.202
37
63
5
Age group
AgeC2
CRICOS Provider No. 00103D
2011 The Business School Working Papers: xxx-2011
Page 10 of 16
Metropolitan model:
R 2 =0.073; adjusted R 2 =0.052;
Coefficients in bold: Significant at the 95% level
Regression Models
Regional regression model: includes dummy variables associated with the industry of employment, individual
weekly income, education level, and age group:
a + b1 IndusC1+ b2 IndusC2+ b3 IndusC3+ b4 IndusC4+ b5 IndusC5+ b6 IndusC7+ b7
IncomeC1+ b8 IncomeC2+ b9 IncomeC3+ b10 IncomeC4+ b11 IncomeC6+ b12 IncomeC7+ b13 EduC2+ b14
Population change=
AgeC2
Metropolitan regression model: includes dummy variables associated with the industry of employment, and
age group:
Population change=
AgeC2
a + b1 IndusC1+ b2 IndusC2+ b3 IndusC3+ b4 IndusC4+ b5 IndusC5+ b6 IndusC7+ b7
Tables 5 and 6 show regression analysis results for both regional and metropolitan data separately. In the
regional regression model more socio-economic factors (i.e. income, education, age) appear as significant.
In the metropolitan regression model mainly sustenance activities (i.e. industrial employment) happen to be
significant.
Goodness of fit measure ( R 2 ) indicates the overall fit between the model and the data (Gow, 2007). Since
the regression analysis yielded a relatively low R 2 (0.073) for the metropolitan regression model it is difficult
to deduce any concrete conclusions. Nevertheless the relevant issues will be discussed briefly later to
enable a comparison with the regional results. Based on the analysis the R 2 value indicates that the
independent variables used in this analysis were able to explain up to about 35% of the variation in regional
population data.
The relationship between independent variables (sustenance activities and socio-economic factors) and the
dependent variable (population change) has not been identical in magnitude or direction. For regional data
three industry clusters were positively associated with population growth and therefore linked with attracting
population to regional areas: accommodation and food services, professional, scientific and technical
services, and retail trade. Two industries produced negative associations with population growth: agriculture,
forestry and fishing and mining.
Accommodation and food services (IndusC3) has the highest positive coefficient of 0.052. This industry has
a positive correlation with the 25-39 years old age group (r= 0.15, p<0.001) and the high level of tertiary and
postgraduate education cluster (r= 0.39, p<0.001). Professional, scientific and technical services (IndusC4)
produced the second-highest positive coefficient (0.049). This industry shows a positive correlation with the
25-39 years old age group (r= 0.28, p<0.001), the high level of tertiary and postgraduate education cluster
(r= 0.71, p<0.001), and the $2000 and more income level group (r= 0.44, p<0.001). Retail trade (IndusC1)
has the next level of positive coefficient (0.026). Retail trade correlates with the 40-64 years old (r= 0.23,
p<0.001) and the 65 and older age groups (r= 0.49, p<0.001) as well as the $1000-$1999 income level (r=
0.12, p<0.001).
Agriculture, forestry and fishing cluster (IndusC7) shows a negative coefficient of -0.034. There is a negative
correlation between this industry with both the 25-39 years old age group (r= -0.03, p<0.352) and the high
level of tertiary and postgraduate education cluster (r=-0.05, p<0.174), but neither are statistically significant.
Similarly the mining industry (IndusC5) has a negative impact on population growth with a negative
coefficient of -0.100. A positive correlation exists between mining and the $2000 and more income level (r=
0.57, p<0.001). Manufacturing (IndusC3) shows the lowest coefficient value (0.222), and a negative
correlation with the high tertiary and postgraduate (r= -0.10, p<0.001).
CRICOS Provider No. 00103D
2011 The Business School Working Papers: xxx-2011
Page 11 of 16
As shown in Table 5 for regional areas the highest coefficient (0.136) belongs to “$1000-$1999 and $2000
and more” income level (IncomeC6). The high level of tertiary and postgraduate (EduC2) has a positive
coefficient of 0.014, and a strong correlation (r= 0.63, p<0.001) with the $1000-$1999 as well as the $2000
and more income levels (r=0.47, p<0.001). In terms of the age the 25-39 years old (AgeC2) age group has a
positive coefficient of 0.022. Not surprisingly the 25-39 years old age group has a positive correlation (r=
0.20, p<0.001) with the high level of tertiary and postgraduate education.
A lack of the strong explanatory results from the metropolitan regression model means that the following
results can only be broadly indicative. They are included for comparison purposes in relation to regional
areas. For metropolitan data six industry clusters were positively associated with population growth: retail
trade (2 clusters), health care and social assistance, professional, scientific, and technical services,
agriculture, forestry and fishing, and manufacturing. The retail trade industry shows the highest (0.383) and
the second highest (0.334) positive coefficient. Health care and social assistance (IndusC1) has the third
highest coefficient value of 0.304. This industry shows a positive correlation with the 65 and more age group
(r= 0.13, p<0.001) as well as the $2000 and more income level (r= 0.10, p<0.001). Professional, scientific
and technical services industry shows the next strongest positive coefficient (0.296). This industry shows a
very strong correlation (r= 0.70, p<0.001) with the $2000 and more income level as well as with the high
tertiary and postgraduate (r= 0.15, p<0.001). Agriculture, forestry and fishing industry (IndusC7) has the
second lowest coefficient of 0.224 and a negative correlation (r= -0.13, p<0.001) with the $2000 and more
income level. Manufacturing shows the lowest coefficient value of 0.222. As shown in Table 6 for
metropolitan areas the 25-39 years old age group (AgeC2) shows a positive coefficient of 0.202, a very
strong positive correlation with the $1-$999 (r= 0.97, p<0.001), the $1000-$1999 income levels (r= 0.94,
p<0.001), and the high tertiary and postgraduate (r= 0.96, p<0.001).
Discussion
This discussion will concentrate on the regional results and will only briefly cover metropolitan results for
comparison purposes. Out of the factors included in the analysis, socio-economic factors in addition to
sustenance activities explain most of the variation in population (in this study) for regional areas. The impact
of sustenance activities and other socio-economic factors vary both in type and the strength. This results in
great diversity of sustenance activities that supports regions. Towns and cities in regional areas are specific
and diverse and they should all be considered as fine gradations of a general trend. For regional data
accommodation and food services, professional, scientific and technical services and retail trade are all
positively related to population change, while mining and agriculture, forestry and fishing are negatively
related to population change.
Sustenance activities
For regional data three industry clusters were positively associated with population growth and therefore
linked with attracting population to regional areas:
The most significant effect was exerted by accommodation and food services industry. It attracts young
people with higher education however, this attraction could be a temporary one. Young people are highly
mobile and they could move in and out of the industries quickly. This is an industry that changes by fashion
and trends in what is ‘in’ to visit, which makes demand (and thus required supply) in this industry very
volatile. It also takes the relatively heavier toll in economic downturn periods. Within this industry cluster 71%
of SLAs are facing population increase as opposed to 29% of which are losing population.
Professional, scientific and technical services industry has the second highest impact on regional population
change. This industry is labour intensive and it absorbs highly educated people which can result in high
amount of interaction between this industry, education and the government. This in turn could help with the
economic development of the specific regions. The interaction between the University of Ballarat in Australia
and the adjoining technology park is an example of this (Harvey, 2008). Within this industry cluster 90% of
SLAs are gaining population and only 10% are losing population.
CRICOS Provider No. 00103D
2011 The Business School Working Papers: xxx-2011
Page 12 of 16
Retail trade industry has the next level of impact on regional population change. This industry is more
domestic based and to a great deal it depends on the trade activities within its own local community. Within
this industry cluster 72% of SLAs are facing population increase as opposed to only 28% of SLAs which are
losing their population.
Mining industry shows the highest negative impact on population change in regional areas. This industry is
relatively labour intensive and its income level is attractive, however it does not attract numerous highly
educated people. As a result the balance of population change for mining clusters in regional areas is
negative. Stimson et al. (1998) identified this earlier as they found that many of the regions that relied on
mining over the decade 1986-1996 experienced a loss in their national share of employment in mining.
There are many people who fly-in/fly-out in this industry which involves work in relatively remote areas, and it
has increased over the last decade (Storey, 2001). Indication of loss in mining clusters of this study could be
due to the fact that the data used in this study (Census 2001-2006) does not fully reflect the mining boom of
the late 2000s. For Mining industry cluster 53% of SLAs are losing population and 47% are gaining
population.
Agriculture, forestry and fishing industry is a commodity export industry and has a negative impact on
population change in regional areas. This industry is not labour intensive and highly educated people are
absorbed in professional, scientific and technical services industry which indirectly provides services to
agriculture, forestry and fishing. Within the agriculture, forestry and fishing cluster up to 62% of SLAs are
facing population loss and only 38% are gaining population. Explanation of the variation in regional
population data also relies on socio-economic factors, as described below:
Socio-economic factors
Income
Socio-economic factors impacting population change in regional areas have been identified and explained in
previous work (Mardaneh, 2010). Income levels of “$1000- $1999 and $2000 and more” impact on
population change. $1000-$1999 income level is strongly associated with accommodation and food services
and the retail trade sectors. Similarly incomes of $2000 and more are strongly associated with professional,
scientific and technical services, and the mining industries. Within these income levels cluster 97% of SLAs
are gaining population and only 3% are losing population. Analysis of individual weekly income reveals that
weekly incomes over $1000 have a positive impact on population change.
Education
High level of tertiary and postgraduate education cluster shows a positive impact on population change for
regional areas. This shows that a higher level of education contributes positively to higher population growth.
This in turn indicates the importance of placing educational institutes (universities and schools) close to
these population centres. Within the high level of tertiary and postgraduate education cluster 85% of SLAs
are gaining population and 15% are losing population.
Age
Age groups of 40-64 and 65 and older do not have as much impact on population growth, whereas 25-39
years old age group shows a positive impact on population change for regional areas. Ageing patterns are
crucial in understanding how to sustain population and age categories in regional Australia appear to be
skewed compared to the urban capital centres. For the 25-39 years old age group cluster 65% of SLAs are
gaining population and 35% are losing.
In metropolitan model it appears that the explanatory factors mainly include sustenance activities and all
industries were positively related to population change. However the relevant results are limited. Due to the
large population density in metropolitan areas there exist some other factors (e.g. communication,
CRICOS Provider No. 00103D
2011 The Business School Working Papers: xxx-2011
Page 13 of 16
transportation, health provision, proximity to the metropolitan centre, size of the regional centre, etc.) that
could explain more of the variation in population (Chi, 2012; Ding, 2012). Since this study attempts to
compare identical factors between regional and metropolitan areas, those other possible factors were not
included, and it could be a separate study.
The study reveals a negative net percentage of population change in regional as opposed to metropolitan
areas. This study builds upon the works of other researchers (Beer and Clower, 2009; Freestone et al.,
2003; Frisbie and Poston, 1975, 1978; Smith, 1965) but takes a different approach to investigate the impact
of sustenance activities and socio-economic factors on population change in regional and metropolitan
areas. The findings of this current study add to the importance of the role of the mentioned factors in
development of the regions.
Conclusion
The paper introduced a new approach that can be used in the study of population change in regional and
metropolitan areas. The main contribution of the study to the urban studies literature is twofold. First, the
study attempts to explore the combined effect of socio-economic factors and sustenance activities on
population change study. As the results show, rate of employment in some industries and occupations (e.g.
public administration and safety) in regional areas are much higher than the traditional industries and
occupations (e.g. agriculture, forestry and fishing). This indicates that development of some traditional
industries in regional areas does not necessarily attract population to these areas as sometimes people’s
workplace and where they live can be quite different. Socio-economic factors appear from this study to play
a more important role in regional areas (in addition to sustenance activities), particularly in the service based
industries. Other factors not included in the study should be considered for a full explanation of population
variation in metropolitan areas.
The other contribution to the field is that the study used two distinct methods: a clustering algorithm (kmeans) is used to cluster SLAs in regional and metropolitan areas; and the impact of emerging clusters on
population growth and decline are investigated separately for each cluster using regression models. In
summary for regional areas, clusters contributing to population change are industries in association with
some socio-economic factors. For metropolitan areas these are clusters that are mainly linked with service
based sustenance activities. However this is a weaker link in the results because there are many other
factors effecting population change in metropolitan areas.
Results of this study using clustering method particularly for regional areas reveal specific patterns and help
to fill the gap in the regional economic development literature. Also, the more significant role of sustenance
activities in metropolitan vis-à-vis regional areas is identified. Therefore there are implications for both
regional and metropolitan economic policy making. It is believed that in order to generate a balance in
population change between regional and metropolitan areas the sustenance activities and socio-economic
factors should be considered separately for regional and metropolitan areas.
These results indicate the need for more study on regional development in understanding the impact of
population change on the viability of smaller communities. Metropolitan areas on the other hand are much
more complex and more important factors impacting population change in those areas need to be identified.
References
Australian Bureau of Statistics, (2006) Australian Standard Geographical Classification (ASGC), Statistical
Geography 1, ABS catalogue No. 1216.0.
Australian Bureau of Statistics, (2009) Australian Standard Geographical Classification (ASGC), ABS
catalogue No. 1216.0.
Australian Bureau of Statistics, (2010a) Census CDATA online, 2006.
Australian Bureau of Statistics, (2010b) Regional Population Growth, Australia 2008-09, ABS catalogue No.
3218.0.
CRICOS Provider No. 00103D
2011 The Business School Working Papers: xxx-2011
Page 14 of 16
Bagirov, A. M. (2008) Modified global k-means algorithm for minimum sum-of-squares clustering problems,
Pattern Recognition, 41, pp. 3192−3199.
Bagirov, A. M. and Mardaneh, K. (2006) Modified global k-means Algorithm for Clustering in Gene
Expression Datasets, in: M. Boden and T. L. Bailey (Eds), Intelligent systems for Bioinformatics
2006, pp. 23−28. Australian Computer Society (ACS), Hobart.
Bayne, C. K., Beauchamp, J. J., Begovich, C. L. and Kane, V. E. (1980) Monte Carlo Comparisosn of
Selected Clustering Procedures. Pattern Recognition, 12, pp. 51-62.
Beer, A. (1999) Regional Cities Within Australia’s Evolving Urban System, 1991-96, Australasian Journal of
Regional Studies, 5(3), pp. 329−348.
Beer, A. and Clower T. (2009) Specialisation and Growth: Evidence from Australia’s Regional Cities, Urban
Studies, 46, pp. 369−388.
Beer, A. and Keane, R. (2000) Population decline and service provision in regional Australia: A South
Australian Case Study, People and Place, 8(2), pp. 69−75.
Beer, A. and Maude, A. (1995) Regional Cities in The Australian Urban system, 1961-1991, Urban Policy
and Research, 13(3), pp. 135−148.
Beer, A., Maude, A. and Pritchard, B. (2003) Developing Australia’s Regions: theory and practice, pp.1−271.
University of New South Wales Press, Sydney.
Black, D. and Henderson, V. (1999) A Theory of Urban Growth, Journal of Political Economy, 107(2), pp.
252−284.
Brown, B. B. (1968) Delphi Process: A Methodology Used for the Elicitation of Opinions of Experts, ASTME
vectors, pp. 1−14.
Budge, T. (2006) Sponge Cities and Small Towns: a New Economic Partnership, in M. Rogers and D. R.
Jones (Eds), The Changing Nature of Australia’s Country Towns, pp. 38-52. VURRN press, Ballarat.
Calthorpe, P. and Fulton, W. (2001) The regional City: planning for the end of sprawl, pp.1−75. Washington
DC, Island press.
Chi, G. (2012) The Impacts of Transport Accessibility on Population Change across Rural, Suburban and
Urban Areas: A Case Study of Wisconsin at Sub-country Levels, Urban Studies, 49(12), pp. 27112731.
Curran, W. (2010) In Defence of Old Industrial Spaces: Manufacturing, Creativity and Innovation in
Williamsburg, Brooklyn, International Journal of Urban and Regional Research, 34(4), pp. 871−885.
Ding, C. (2012) Transport Development, Regional Concentration and Economic Growth, Urban Studies,
1(17), pp. 1-17.
Freestone, R., Murphy, P. and Jenner, A. (2003) The functions of Australian towns, revisited, Tijdschrift voor
economische en sociale geografie, 94(2), pp. 188−204.
Frisbie, W. P. and Poston, JR. D.L. (1975) Components of Sustenance Organization and Nonmetropolitan
Population Change: a Human Ecological Investigation, American Sociological review, 40(6), pp.
773−784.
Frisbie, W. P. and Poston, JR. D.L. (1976) The Structure of Sustenance Organization and Population
Change in Nonmetropolitan America, Rural Sociology, 41(3), pp. 354−370.
Frisbie, W. P. and Poston, JR. D.L. (1978) Sustenance Differentiation and Population Redistribution, Social
Forces, 57(1), pp. 42−56.
Goetz, S. J. and Debertin, D. L. (1996) Rural Population Decline in the 1980s: Impacts of Farm Structure and
Federal Farm Programs, American Journal of Agricultural Economics, 78, pp. 517−529.
Gow, D. J. (2007) Fundamentals of multiple regression analysis. 2010 ACSPRI summer course lecture
notes, pp.1−330. The Australian National University, Canberra.
Harvey, T. (2008) Transition- The IBM story, pp. 77-84. Switzer media and publishing, Wollahra, NWS.
Hutton, T.A. (2004) Service industries, globalization, and urban restructuring within the Asia-Pacific: new
development trajectories and planning responses, Progress in Planning, 61, pp. 1−74.
Karlsson, C. (1999) Spatial Industrial Dynamics in Sweden: Urban growth Industries, Growth and Change,
30, pp. 184−212.
Krakover, S. (1985) Spatio-Temporal Structure of Population Growth in Urban Regions: The Case of Tel-Aviv
and Haifa, Israel, Urban Studies, 22, pp. 317-328.
Krout, J. A. (1982) The Changing Impact of Sustenance Organization Activities on Non-Metropolitan Net
Migration, Sociological Focus, 15(1), pp. 1−13.
Lee, R. D. and Carter, L. R. (1992) Modelling and Forecasting U.S. Mortality, Journal of the American
Statistical Association, 87(419), pp. 659−671.
Mardaneh, K. (2010) Clustering Australian Regional Areas: An Optimisation Approach, in: P.DALZIEL, (Ed)
Innovation and Regions: Theory, Practice and Policy, pp.99−110.
AERU research group,
Canterbury.
Mardaneh, K. (2012) Small-to-Medium Enterprises and Economic Growth: A Comparative Study of
Clustering Techniques. Journal of Modern Applied Statistical Methods, 11(2), 469-478.
CRICOS Provider No. 00103D
2011 The Business School Working Papers: xxx-2011
Page 15 of 16
Mezzich, J. E. (1978) Evaluation Clustering Methods for Psychiatric Diagnosis. Biological Psychiatry, 13, pp.
265-281.
Milligan, G. W. (1980) An Examination of the Effect of Six Types of Error Perturbation on Fifteen Clustering
Algorithms. Psychometrika, 45, pp. 325-342.
Millward, H. (2005) Rural Population Change in Nova Scotia, 1991-2001: Bivariate and multivariate analysis
of key drivers, The Canadian Geographer 49(2), pp. 180−197.
Murdock, S. H., Backman, k., Hwang, S.S. and Hamm, R.R. (1991) International Dimensions of Post-1980
Internal Migration in the United States: The Role of Sustenance Specialization and Dominance,
Sociological Inquiry, 61(4), pp. 492−504.
Nam, K-M. and Reilly, J.M. (2012) City Size Distribution as a Function of Socioeconomic Conditions: An
Eclectic Approach to Downscaling Global Population, Urban Studies, pp. 1-18.
O’Connor, K., Stimson, R. and Daly, M. (2001) Australia’s Changing Economic Geography: A society
Dividing, pp.1−120. Oxford University Press, Melbourne.
Poindexter, J.R. and Clifford, W.B. (1983) Components of Sustenance Organization and Nonmetropolitan
Population Change: The 1970s, Rural Sociology 48(3), pp. 421−435.
Polese, M. and Shearmur, R. (2006) Why some regions will decline: A Canadian case study with thoughts on
local development strategies, Papers in Regional Science, 85(1), pp. 23−46.
Punj, G., and Stewart, D.W. (1983) Cluster analysis in Marketing Research: Review and Suggestions for
Application. Journal of Marketing Research, 10, pp. 134-148.
Sanderson, W. C. (1998) Knowledge Can Improve Forecasts: A Review of Selected Socioeconomic
Population Projection Models, Population and Development Review, 24, pp. 88−117.
Shumway, J. M. and Davis, J. A. (1996) Nonmetropolitan Population Change in the Mountain West: 19701995, Rural Sociology, 61(3), pp. 513−529.
Smith, A., Courvisanos, J., Tuck, J. and Mceachern, S. (2011) Building innovation capacity: the role of
human capital formation in enterprises, in: P. Curtin, J. Stanwick and F. Beddie (Eds) Fostering
Enterprise: The Innovation and Skills Nexus – Research Readings, pp.1−15. NCVER, Adelaide.
Smith, R. H.T. (1965) Method and Purpose in Functional Town Classification, Annals of the Association of
American Geographers, 55(3), pp. 539−548.
Sorensen, T. and Weinand, H. (1991) Regional Well-Being in Australia Revisited, Australian Geographical
Studies, 29(1), pp. 42−70.
Stimson, R. J., Shuaib, F.A. and O’Connor, K.B. (1998) Population and Employment in Australia, Regional
Hot Spots and Cold Spots, 1986 to 1996, pp.1−124. University of Queensland Press, Queensland.
Stimson, R. J., Stough, R.R. and Roberts, B.H. (2006) Regional Economic Development, Analysis and
Planning Strategy, pp.1−112. Springer-Verlag, Berlin.
Storey, K. (2001) Fly-in/fly-out and fly over: mining and regional development in Western Australia,
Australian Geographer, 32 (2), pp. 133−148.
Watts, M. J. (2009) The impact of Spatial Imbalance and Socioeconomic Characteristics
on
Average
Distance Commuted in the Sydney Metropolitan Area, Urban Studies, 46(2), pp. 317-339.
CRICOS Provider No. 00103D
2011 The Business School Working Papers: xxx-2011
Page 16 of 16