Population Change in Regional and Metropolitan Areas: Cluster analysis using Australian Data Karim Mardaneh The Business School Working Paper Series: 001 - 2013 Karim Mardaneh - The Business School, University of Ballarat, Mt Helen, Victoria, Australia 3353 Copyright © 2011 Working papers are in draft form. This working paper is distributed for purposes of comment and discussion only. It may not be reproduced without permission of the copyright holder. Copies of working papers are available from the author. CRICOS Provider No. 00103D Abstract A large body of research is focused on population growth and decline, however, less attention has been paid to the possible effects of the socio-economic factors and sustenance activities on population change. Using the Australian Bureau of Statistics Census Data 2001-2005, the study examines the role of socio-economic factors and sustenance activities in population change in both regional and metropolitan areas. The novelty of the study is twofold. Conceptually it compares the combined role of socio-economic factors and sustenance activities on population change; and, empirically it uses cluster analysis technique to conduct the analysis. The results suggest that the explanatory factors for regional areas are mainly socio-economic factors and for metropolitan areas mainly relate to sustenance activities. Policy implications of the study indicate the need for study on regional development in relation to socio-economic factors, and particularly income, education and age, and their impact on viability of the regional areas. Key Words: Population, Regional, Clustering, k-means, Sustenance activities Introduction The current comparative study examines, compares, and contrasts the impact of some socio-economic factors (individual weekly income, education level, age group) and sustenance activities (retail trade; mining; agriculture, forestry and fishing; and manufacturing) on population growth and decline in regional (nonmetropolitan) and metropolitan areas. The most important activity in providing sustenance to the residents of an area is referred to as the key sustenance activity (Gibbs and Martin, 1959) which determines the level of resources available to population, thus affecting the area’s pattern of population change, and it’s study is crucial in economic analysis (Stimson et al., 2006). The existing literature mainly focuses on socio-economic factors and some consider sustenance activities. However, the possible population variation due to combined effect of socio-economic factors and sustenance activities across regional and metropolitan areas has not been investigated. This study attempts to fill this gap in the literature by examining the possible population variation and patterns of activities within geographical areas using these factors and activities. Researchers have examined population growth and decline based on economic functions of population centres, employing a range of analytical methodologies, (see, for example, Brown, 1968; Lee and Carter, 1992; Beer and Clower, 2009; Beer and Maude, 1995; Calthorpe and Fulton, 2001; Sanderson, 1998). Smith (1965) used industry of employment data to study similarity of functional specialisation in different towns. Similar research has been conducted by Beer and colleagues (See, for example, Beer and Keane, 2000; Beer et al., 2003) in more recent years. Some researchers have used cluster analysis in larger data sets (See, for example, Freestone et al., 2003; Sorensen and Weinand, 1991). Other researchers (see, for example, Beer and Clower, 2009; Beer and Maude, 1995) used both cluster and regression analysis to examine changes in the economic functions of towns between 1961 and 1991. Cluster analysis is the task of assigning a set of objects into groups (called clusters) so that objects in the same cluster are more similar (in some sense or another) to each other than to those in other clusters (Bagirov, 2008; Bagirov and Mardaneh, 2006; Mardaneh, 2007). Clustering algorithms can be used to analyse large data sets comprising a myriad of economic, social and demographic variables for numerous samples (Statistical Local Areas or SLAs in this study). They seek to group samples with similar characteristics and ensure maximum statistical separation from other contrasting clusters. In this process of pattern recognition, they simplify understanding of those large data sets. Empirical studies of the performance of clustering algorithms suggest that one of the iterative clustering methods (e.g., k-means CRICOS Provider No. 00103D 2011 The Business School Working Papers: xxx-2011 Page 2 of 16 clustering) is preferable to other hierarchical methods (e.g., Ward’s clustering), (Punj and Stewart, 1983). As all clustering algorithms include more and more observations, their performance tend to deteriorate. However, the k-means algorithm despite the possibility of converging to a local minima (Bagirov, 2008) has proven to be a more efficient clustering algorithm in many studies (see, for example, Bayne, et al., 1980; Mezzich, 1978; Milligan, 1980) and it is more robust than any of the hierarchical methods with respect to the presence of outliers (Mardaneh, in-press). The study attempts to address clustering of geographical areas by taking an alternative approach originally proposed by Beer et al. (2003) and Freestone et al. (2003). In this sense, the major contribution of the study to the urban studies literature is that it considers the combined effect of socio-economic factors and sustenance activities on population change. This requires large data sets and sophisticated clustering techniques for pattern recognition, which has seldom been used in population change research. This study uses the k-means clustering algorithm (one of the algorithms that can be used in cluster analysis) to cluster SLAs. Additionally multiple regression analysis is used to determine the impact of the generated optimal clusters on population change. The paper is organised in the following order: the first section reviews the existing literature on the effects of socio-economic factors and sustenance activities on population change and sets a framework for explaining the possible population variation and pattern of activities. Second section provides a description of the data and the analytical approach. Third section explains the regional and metropolitan SLAs growth and decline. In the fourth section the results of cluster and regression analysis is presented. Finally it follows the discussion and the conclusion. Literature Review Population change has been the focus of research for a long time. Many studies have considered a range of socio-economic factors influencing population change (see, for example, Black and Henderson, 1999; Goetz and Debertin, 1996; Millward, 2005; O’Connor et al., 2001; Polese and Shearmur, 2006; Watts, 2009; Nam and Reilly, 2012). A few of these studies are outlined here. Sustenance activities are defined as activities through which a community’s resources such as labour and capital support the area’s population (Krout, 1982). Population growth of an area may not lie in the diversification of the economy alone but in the ability to select optimum economic activities (e.g. agriculture, mining, manufacturing, specialised industrial and service sectors) for sustenance specialisation (Murdock et al., 1991). Shumway and Davis (1996) used sustenance activities to examine the impact of each on net migration. Frisbie and Poston (1975, 1978) examined population change against a number of socio-economic factors such as income as well as sustenance activities. Frisbie and Poston (1976) examined differences in the structure of sustenance activities in non-metropolitan counties of the U.S. for the 1960s and 1970s. Krout (1982) examined the impact of some sustenance activities (mining, manufacturing, agriculture, service, retail) on net migration for each of the 1960-1970 and 1970-1974 periods in the U.S.. Poindexter and Clifford (1983) assessed the importance of sustenance activities in the period 1970 to 1980 in non-metropolitan counties of the U.S. Some studies consider service industries as sustenance activities (see for example, Frisbie and Poston,1976; Hutton, 2004; Poindexter and Clifford,1983). Findings of these above mentioned studies are in some cases conflicting (e.g. agriculture and mining industries having positive or negative impact on net migration). They indicate that in most cases population increase and decrease is impacted directly by socio-economic factors such as median income. Further, the most common factor identified in these studies that positively influences population change is the constellation of sustenance activities, specifically in industries such as retail, wholesale and service. However, there is a lack of agreement on the extent of the impacts on population change from socioeconomic and sustenance activities. This issue is investigated in this study by separating out the regional and metropolitan areas. CRICOS Provider No. 00103D 2011 The Business School Working Papers: xxx-2011 Page 3 of 16 Research on population change in both regional and metropolitan areas has been of continuing interest due to changing government policies. Stimson et al. (2006) specified the changing economic development policies and their impact on population change across different regions. During the 1950s and 1960s regional economic development became a priority for the governments. From the 1970s, globalisation highlighted the issue of national and regional competitiveness. As a result, substantial economic restructuring affected the economy and demographics of the regions, towns and cities. This economic structural adjustment caused population growth in some regions and population decline in others (Stimson et al., 1998). From the 1970s to the 1990s, these policy-based structural adjustments led to new approaches in understanding regional economic development. Through economic restructuring and adjustment, some industries became known as ‘sustenance activities’ and were pivotal to the growth of the regions while other regions declined. Stimson et al. (2006) indicate that distribution of sustenance activities vary from regional to metropolitan areas. Thus, it is expected that their impact on population growth and decline to be different based on location. Almost two-thirds of the population of Australia live in metropolitan areas (Australian Bureau of Statistics, 2010b). Due to the sparser population distribution in regional areas they lack the critical mass. As a result, population growth and decline in these areas is influenced by only a few, and relatively simple factors, compared to metropolitan areas with more factors in absolute numbers and a high level of complexity. Metropolitan areas create new growth industries, generate employment (Karlsson, 1999), influence population change (Curran, 2010; Stimson et al., 1998; Krakover, 1985) and generate prosperity (Stimson et al., 2006). The generative power of the sustenance activities, as a base for understanding population change is stronger in the metropolitan areas with critical mass and complexity. This brief overview of the research literature identifies a gap in which there is uncertainty in terms of relative impact on population change between socio-economic factors and sustenance activities. Clustering analysis that separates regional and metropolitan data may assist in clarifying and disentangling this uncertainty. This is the research that is pursued below. Objectives, Analytical Methodology and Data The objective of this study is to examine and compare metropolitan and regional population variations in the context of socio-economic factors and sustenance activities. The study builds upon previous studies (see, for example, Beer and Maude, 1995; Beer, 1999; Beer and Clower, 2009) and uses k-means clustering algorithm to cluster SLA data, because of its recognised ability to analyse large data sets. The k-means algorithm considers each sample (SLA in this study) in a data as a n point in n-dimensional space ( R ) and chooses k centres (also called centroids) and assigns each point to the cluster nearest the centre. The centre is the average of all the points in the cluster, that is, its coordinates are the arithmetic mean for each dimension separately over all the points in the cluster. This current study proposes that both socio-economic factors and sustenance activities and their varied presence in different regions play a significant role in population growth and decline. For the analysis a two stage process is conducted, which is explained as follows: 1. In the first stage of the analysis SLAs are clustered based on the identified variables. This is to cluster all SLAs using each variable of the industry of employment, individual weekly income, education level, age group separately. The output of clustering is clusters relevant to each of the mentioned variables and each cluster includes a set of SLAs. Clusters and the variables are then cross-tabulated to obtain the mean value for each category of a variable (e.g. cross-tabulate industry of employment with cluster 1, which will show the mean value for categories of the industry of employment: mining, retail trade, etc.). See Appendix 1 for an example of the industry of employment clusters. This represents the percentage of employed people under each category of the industry of employment. Then there is an identification of variables which have the highest mean CRICOS Provider No. 00103D 2011 The Business School Working Papers: xxx-2011 Page 4 of 16 value within the cluster (e.g. retail trade has the highest mean within cluster 1). Thus the cluster is named after the category with the highest mean value. For example, since retail trade has the highest mean value in cluster 1, then it is called ‘retail trade’ cluster. 2. In the second stage of the analysis the generated clusters are used in a regression model to examine their impact on population change. This stage aims to identify clusters which are impacting on population growth and decline. This study uses the census data for 2001-2006 which is sourced from the ABS (Australian Bureau of Statistics, 2010a). This includes industry of employment (19 categories), occupation type (8 categories), employment status (6 categories), individual weekly income (18 categories), education level (5 categories), and age group (21 categories) (See Appendix 2 for details). Both the industry of employment and occupation type is derived from sustenance activities. The other four factors (employment status, individual weekly income, education level, age group) are socio-economic related factors. The study used all industry and occupation categories. Since other variables were too detailed (e.g. individual weekly income; employment status, etc.) and that level of detail was not necessary for the analysis, for simplicity those were merged into fewer categories. As a result the six employment status categories were merged to employed, unemployed and not in the labour force. The eighteen income categories were collapsed into five categories as negative income, nil income, $1-$999, $1000-$1999, and $2000 and more. Age groups were merged into three new categories: 25-39, 40-64, 65 and older. Education levels were collapsed into two new categories described here as high level of tertiary and postgraduate, and low level of tertiary and postgraduate. It was necessary to examine and analyse regional and metropolitan data separately to be able to compare and contrast the results. The study separated SLA data into regional and metropolitan data based on the Australian Standard Geographical Classification (ASGC) (See ABS, 2006 and ABS, 2009). Decision was taken on some SLAs that are technically ‘regional’ to include them within ‘metropolitan’ category. For this the study used the ASGC’s classification. According to this classification the study considered ‘major urban’ areas (SLAs with populations exceeding 100,000) as metropolitan SLAs. Also SLAs with fewer than 100,000 people that were on the fringes of ‘major urban’ areas were considered as metropolitan. All the rest of the SLAs (with population less than 100,000) were considered as regional. These ‘fringe areas’ fall outside the expanding metropolitan sphere and will be most likely to participate in growth and change that is happening within metropolitan areas. Also a high level of population travel for work from these areas to metropolitan areas (Budge, 2006). This procedure resulted in a split of 1431 SLAs into 726 for regional and 705 SLAs for metropolitan areas. Some SLAs were detected with extreme population growth or decline values which were considered as outliers. Outliers could skew the data and manipulate the analysis results, so they were eliminated and 689 SLAs for regional and 659 SLAs for metropolitan areas were used in the analysis. Tables 1 and 2 provide a breakdown of these SLAs in terms of categories of population change. Allowance was made for the fact that there had been changes in some SLAs between 2001 and 2006. Regional and Metropolitan SLA Growth and Decline Population growth and decline rates are presented in Tables 1 and 2 in six categories encompassing all growth and decline rates. While 111 regional SLAs experience a decline of more than 5%, only 34 metropolitan SLAs experience the same decline level. In total, 38% of regional SLAs had a declining population whereas this rate for metropolitan areas is only 20%. Within the declining group, 16% of regional SLAs and only 5% of metropolitan SLAs experienced a decline of more than 5%. This scenario reverses when population increase is considered. Within increasing group, 11% of regional SLAs and 16% of metropolitan SLAs experienced an increase of more than 15%. Table 1. Categories of population change (%) for regional SLAs Categories of change % CRICOS Provider No. 00103D No of SLAs within each %SLAs within each Category 2011 The Business School Working Papers: xxx-2011 Population (000’s) within each category Page 5 of 16 Category 2001 2006 262.0 Decline more than 5 111 16 289.9 Decline 0 to 5 153 22 1015.3 995.9 Increase 0 to 5 178 26 1627.2 1667.2 Increase 5-10 118 17 1431.1 1530.3 Increase 10-15 50 8 504.8 568.2 Increase 15-20 79 11 779.9 968.3 Total 689 100 5648.2 5991.9 Table 2. Categories of population change (%) for Metropolitan SLAs Categories of change % No of SLAs within each Category %SLAs within each Category Population (000’s) within each category 2001 2006 Decline more than 5 34 5 96.1 89.5 Decline 0 to 5 102 15 2190.9 2153.0 Increase 0 to 5 211 32 6100.1 6240.2 Increase 5-10 131 20 2606.2 2785.1 Increase 10-15 77 12 1140.0 1282.8 Increase 15-20 104 16 1633.4 2032.9 Total 659 100 13766.7 14583.6 Results A two-stage process of analysis was followed to produce the results outlined in this paper. The k-means clustering algorithm was used in the first stage to cluster all SLAs once for each of the used variables (industry of employment, occupation type, employment status, individual weekly income, education level, and age group). In the second stage the clusters generated in stage one were used in a multiple regression model to examine their impact on population growth and decline. Clustering analysis SLAs were clustered using the k-means algorithm. To find the optimal number of clusters for different variables clustering of 2 to 10 clusters were tested. The optimal number of clusters is a number that better reflects the underlying cluster structure of the data set. Additionally it best separates clearly the clusters according to their Centroids (mean value). When the number of clusters is more than optimal, artificial clusters are generated and when the number is less than optimal, clusters are merged and as a result Centroids do not get well separated. Additionally when the number of clusters is not optimal, the chance of one category of a variable appearing repeatedly within more than one cluster increases. Clustering the data for the industry of employment, occupation type and individual weekly income resulted in an optimal number of 7 clusters. The optimal number for the employment status was 3, and it was 2 for both the age group and the education level. Clusters for regional data are as follows: Industry of employment (7 clusters: IndusC1 – IndusC7); Occupation type (7 clusters: OccupC1 – OccupC7); Employment status (3 clusters: EmployC1 – EmployC3); Individual weekly income (7 clusters: IncomeC1 – IncomeC7); Education level (2 clusters: EduC1 – EduC2); Age group (2 clusters: AgeC1 – AgeC2). CRICOS Provider No. 00103D 2011 The Business School Working Papers: xxx-2011 Page 6 of 16 Table 3. Variables and categories, clusters and mean values for each cluster (Regional) Clusters Categories (Name) Highest Mean score (%)/ Cluster Industry of employment IndusC1 Retail Trade 12.20 IndusC2 Public Administration and Safety 53.20 IndusC3 Accommodation and Food Services 23.75 IndusC4 Professional, Scientific and Technical Services 13.12 IndusC5 Mining 30.98 IndusC6 Agriculture, Forestry and Fishing 23.01 IndusC7 Agriculture, Forestry and Fishing 46.41 Occupation type OccupC1 Professional, Scientific and Technical Services 28.75 OccupC2 Technicians and Trades Workers 16.75 OccupC3 Managers 24.20 OccupC4 Labourers 37.98 OccupC5 Labourers 37.17 OccupC6 Managers 41.13 OccupC7 Machinery Operators And Drivers 18.57 EmployC1 Employed 47.11 EmployC3 Unemployed 2.79 EmployC3 Not in labour 29.32 IncomeC2 Nil income 6.39 IncomeC5 Negative income 1.91 IncomeC6 $1000- $1999 18.48 IncomeC6 $2000 and more 8.83 IncomeC7 $1- $ 999 56.56 High Tertiary and Postgraduate 29.72 Employment status Individual weekly income Education level EduC2 Age group AgeC1 40-64 years old 34.25 AgeC1 65 and more 14.47 AgeC2 25-39 years old 22.29 Table 4. Variables and categories, clusters and mean values for each cluster (Metropolitan) Variable Categories (Name) Highest Mean% Cluster/Variable Industry of employment IndusC1 Health Care and Social Assistance 13.17 IndusC2 Retail Trade 11.27 IndusC3 Manufacturing 15.40 IndusC4 Retail Trade 12.29 IndusC5 Professional, Scientific and Technical Services 12.90 IndusC6 Public Administration and Safety 33.37 IndusC7 Agriculture, Forestry and Fishing 20.12 CRICOS Provider No. 00103D 2011 The Business School Working Papers: xxx-2011 Page 7 of 16 Occupation type OccupC1 Managers 23.95 OccupC2 Labourers 17.99 OccupC3 Professional, Scientific and Technical Services 36.88 OccupC4 Professional, Scientific and Technical Services 26.74 OccupC5 Technicians and Trades Workers 16.68 OccupC6 Professional, Scientific and Technical Services 19.83 OccupC7 Technicians and Trades Workers 18.39 EmployC1 Unemployed 3.36 EmployC1 Not in labour 27.88 EmployC2 Employed 50.36 IncomeC1 $1000- $1999 23.55 IncomeC3 Nil income 5.83 IncomeC3 $2000 and more 11.32 IncomeC5 Negative income 0.34 IncomeC5 $1- $ 999 54.95 High Tertiary and Postgraduate 40.44 AgeC1 40-64 years old 31.95 AgeC1 65 and more 15.89 AgeC2 25-39 years old 22.25 Employment status Individual weekly income Education level EduC2 Age group Clusters generated for metropolitan data were similar to the regional, however, there were two clusters for employment (EmployC1 – EmployC2) and five clusters for income (IncomeC1 – IncomeC5). Tables 3 and 4 provide cluster descriptions. Cross-tabulating a variable (e.g. industry of employment) with a relevant cluster (e.g. IndusC1) reveals the mean scores for different categories (e.g. retail trade, mining, etc.) of that particular variable. For example within cluster 1 (IndusC1) categories and their mean values are as: agriculture, forestry and fishing (4.95), mining (1.52), manufacturing (10.33), construction (9.22), retail trade (12.20) and so on. This in reality represents the percentage of people employed under each category of the industry of employment. A cluster then is named after the category which has the highest mean score. As Table 3 shows retail trade category has the highest mean score (12.20) within cluster IndusC1, therefore the cluster is considered as retail trade cluster. In Table 4, the category of Retail Trade appears to have the highest mean score within two clusters (IndusC2 and IndusC4) so it is reported twice. As indicated in Table 3 individual weekly income includes seven clusters and the number of clusters (7) exceeds the total number of categories (5). In this case, clusters are only reported if they include at least one category of income with the highest mean score. For regional individual weekly income the highest mean score (56.56) belongs to the $1-$999 category within cluster 7 (IncomeC7). For employment status, since the employed category appeared with the highest mean score within all three relevant clusters (meaning that majority of people belong to this category) the values for the other categories (unemployment, not in labour) were repressed. To overcome this problem, two categories (unemployed and not in labour force) with higher mean scores within cluster 3 were selected. For this reason the cluster (EmployC3) has been reported twice in Table 3. CRICOS Provider No. 00103D 2011 The Business School Working Papers: xxx-2011 Page 8 of 16 Each occupation type cluster includes a combination of different occupation categories with different mean scores. In this case, the highest mean score for regional belongs to managers within cluster 6 (OccupC6) and for metropolitan belongs to professional, scientific and technical services within cluster 3 (OccupC3). This indicates that the proportion of employed people in the managers and professionals categories is higher than all the other occupation types. Both managers and professional, scientific and technical services categories happen to have the highest mean score within more than one cluster. For education, for both regional and metropolitan the highest mean score relates to the high level of tertiary and postgraduate category (cluster EduC2, where higher proportion of people have high rather than low level of tertiary and postgraduate education). For age group, two clusters emerged, with the highest mean score belonging to the 40-64 years old category for both regional and metropolitan. Cluster C1 (AgeC1) appears to include the highest mean score for both the 40-64 years old and 65 and older categories for both areas. By this stage and as a result of the above process, categories with the highest mean score are selected which represent a cluster (e.g. IndusC1, AgeC2, etc.) and are summarised in Tables 3 and 4. An overall comparison of the clusters between regional and metropolitan areas show that for regional areas employment in the public administration and safety industry, and managers occupation is higher. Also people in 40-46 years old age group are more in regional areas. For metropolitan areas employment in professional, scientific and technical services occupation is higher. Also more people in metropolitan areas have the income levels of $1000-$1999, $2000 and more and a higher tertiary and postgraduate degree. Regression analysis Initial regression analysis was conducted by including sets of clusters (e.g. IndusC1, AgeC1, etc.) as dummy variables, not the master variables (e.g. industry of employment, age group, etc.) in separate regional and metropolitan regression models. In other words these clusters (as independent variables) were used to examine their impact on population change (as dependent variable). This analysis revealed that for regional areas neither occupation type nor employment status had a statistically significant impact on population change. Therefore, these two were not included in the analysis. Their importance may have been subsumed under industry of employment categories. For metropolitan areas only the industry of employment and the age group showed a statistically significant impact on population change, thus only these two were included in the analysis. Under each variable (industry of employment, individual weekly income, education level, age group) one of the clusters, with the lowest coefficient value (associated with the lowest population growth) constitutes the ‘base’ (reference) cluster and all the other clusters (of a particular variable) are compared against this base cluster. As a rule when incorporating these clusters (as dummy variables) into a regression model, only n-1 (where n signifies the number of clusters) clusters (variables) are entered to represent the required information. As a result these base clusters (IndusC6; IncomeC5; EduC1; AgeC1) do not appear in Tables 5 and 6. Comparison of other clusters to the base cluster helps to explain whether other clusters contribute to population growth or not, and if they do contribute what is the significance of their contribution compared to the base (lowest) cluster. Tables 5 and 6 provide appropriate summary information of statistical results, and the proportion of SLAs inside each cluster that are facing population growth or decline. CRICOS Provider No. 00103D 2011 The Business School Working Papers: xxx-2011 Page 9 of 16 Net % Change inside each cluster Positive % of members (SLAs) inside each cluster facing negative or positive population growth Negative Coefficient Categories (Name) Independent Variables Table 5. Regional model regression results IndusC1 Retail Trade 0.026 28 72 6 IndusC2 Public Administration and Safety -0.014 48 52 -1 IndusC3 Accommodation and Food Services 0.052 29 71 5 IndusC4 Professional, Scientific and Technical Services 0.049 10 90 19 IndusC5 Mining -0.100 53 47 -2 IndusC7 Agriculture, Forestry and Fishing -0.034 62 38 -1 7 Intercept 0.660 Industry of employment Individual weekly income IncomeC1 $1- $ 999 0.007 28 72 IncomeC2 Nil income 0.006 43 57 3 IncomeC3 $1- $ 999 0.026 27 73 11 IncomeC4 $1- $ 999 0.021 19 81 6 IncomeC6 $1000-$1999 & $2000 and more 0.136 3 97 36 IncomeC7 $1- $ 999 -0.002 40 60 3 High level of tertiary and postgraduate 0.014 15 85 24 25-39 years old 0.022 35 65 12 Education level EduC2 Age group AgeC2 Regional model: R 2 =0.353; adjusted R 2 =0.340; Coefficients in bold: Significant at the 95% level Intercept Net % Change inside each cluster Positive % of members (SLAs) inside each cluster facing negative or positive population growth Negative Coefficient Categories (Name) Independent Variables Table 6. Metropolitan model regression results 2.236 Industry of employment IndusC1 Health Care and Social Assistance 0.304 9 91 7 IndusC2 Retail Trade 0.334 14 86 9 IndusC3 0.222 20 80 7 IndusC4 Manufacturing Retail Trade 0.383 14 86 9 IndusC5 Professional, Scientific and Technical Services 0.296 11 89 8 IndusC7 Agriculture, Forestry and Fishing 0.224 0 100 6 25-39 years old 0.202 37 63 5 Age group AgeC2 CRICOS Provider No. 00103D 2011 The Business School Working Papers: xxx-2011 Page 10 of 16 Metropolitan model: R 2 =0.073; adjusted R 2 =0.052; Coefficients in bold: Significant at the 95% level Regression Models Regional regression model: includes dummy variables associated with the industry of employment, individual weekly income, education level, and age group: a + b1 IndusC1+ b2 IndusC2+ b3 IndusC3+ b4 IndusC4+ b5 IndusC5+ b6 IndusC7+ b7 IncomeC1+ b8 IncomeC2+ b9 IncomeC3+ b10 IncomeC4+ b11 IncomeC6+ b12 IncomeC7+ b13 EduC2+ b14 Population change= AgeC2 Metropolitan regression model: includes dummy variables associated with the industry of employment, and age group: Population change= AgeC2 a + b1 IndusC1+ b2 IndusC2+ b3 IndusC3+ b4 IndusC4+ b5 IndusC5+ b6 IndusC7+ b7 Tables 5 and 6 show regression analysis results for both regional and metropolitan data separately. In the regional regression model more socio-economic factors (i.e. income, education, age) appear as significant. In the metropolitan regression model mainly sustenance activities (i.e. industrial employment) happen to be significant. Goodness of fit measure ( R 2 ) indicates the overall fit between the model and the data (Gow, 2007). Since the regression analysis yielded a relatively low R 2 (0.073) for the metropolitan regression model it is difficult to deduce any concrete conclusions. Nevertheless the relevant issues will be discussed briefly later to enable a comparison with the regional results. Based on the analysis the R 2 value indicates that the independent variables used in this analysis were able to explain up to about 35% of the variation in regional population data. The relationship between independent variables (sustenance activities and socio-economic factors) and the dependent variable (population change) has not been identical in magnitude or direction. For regional data three industry clusters were positively associated with population growth and therefore linked with attracting population to regional areas: accommodation and food services, professional, scientific and technical services, and retail trade. Two industries produced negative associations with population growth: agriculture, forestry and fishing and mining. Accommodation and food services (IndusC3) has the highest positive coefficient of 0.052. This industry has a positive correlation with the 25-39 years old age group (r= 0.15, p<0.001) and the high level of tertiary and postgraduate education cluster (r= 0.39, p<0.001). Professional, scientific and technical services (IndusC4) produced the second-highest positive coefficient (0.049). This industry shows a positive correlation with the 25-39 years old age group (r= 0.28, p<0.001), the high level of tertiary and postgraduate education cluster (r= 0.71, p<0.001), and the $2000 and more income level group (r= 0.44, p<0.001). Retail trade (IndusC1) has the next level of positive coefficient (0.026). Retail trade correlates with the 40-64 years old (r= 0.23, p<0.001) and the 65 and older age groups (r= 0.49, p<0.001) as well as the $1000-$1999 income level (r= 0.12, p<0.001). Agriculture, forestry and fishing cluster (IndusC7) shows a negative coefficient of -0.034. There is a negative correlation between this industry with both the 25-39 years old age group (r= -0.03, p<0.352) and the high level of tertiary and postgraduate education cluster (r=-0.05, p<0.174), but neither are statistically significant. Similarly the mining industry (IndusC5) has a negative impact on population growth with a negative coefficient of -0.100. A positive correlation exists between mining and the $2000 and more income level (r= 0.57, p<0.001). Manufacturing (IndusC3) shows the lowest coefficient value (0.222), and a negative correlation with the high tertiary and postgraduate (r= -0.10, p<0.001). CRICOS Provider No. 00103D 2011 The Business School Working Papers: xxx-2011 Page 11 of 16 As shown in Table 5 for regional areas the highest coefficient (0.136) belongs to “$1000-$1999 and $2000 and more” income level (IncomeC6). The high level of tertiary and postgraduate (EduC2) has a positive coefficient of 0.014, and a strong correlation (r= 0.63, p<0.001) with the $1000-$1999 as well as the $2000 and more income levels (r=0.47, p<0.001). In terms of the age the 25-39 years old (AgeC2) age group has a positive coefficient of 0.022. Not surprisingly the 25-39 years old age group has a positive correlation (r= 0.20, p<0.001) with the high level of tertiary and postgraduate education. A lack of the strong explanatory results from the metropolitan regression model means that the following results can only be broadly indicative. They are included for comparison purposes in relation to regional areas. For metropolitan data six industry clusters were positively associated with population growth: retail trade (2 clusters), health care and social assistance, professional, scientific, and technical services, agriculture, forestry and fishing, and manufacturing. The retail trade industry shows the highest (0.383) and the second highest (0.334) positive coefficient. Health care and social assistance (IndusC1) has the third highest coefficient value of 0.304. This industry shows a positive correlation with the 65 and more age group (r= 0.13, p<0.001) as well as the $2000 and more income level (r= 0.10, p<0.001). Professional, scientific and technical services industry shows the next strongest positive coefficient (0.296). This industry shows a very strong correlation (r= 0.70, p<0.001) with the $2000 and more income level as well as with the high tertiary and postgraduate (r= 0.15, p<0.001). Agriculture, forestry and fishing industry (IndusC7) has the second lowest coefficient of 0.224 and a negative correlation (r= -0.13, p<0.001) with the $2000 and more income level. Manufacturing shows the lowest coefficient value of 0.222. As shown in Table 6 for metropolitan areas the 25-39 years old age group (AgeC2) shows a positive coefficient of 0.202, a very strong positive correlation with the $1-$999 (r= 0.97, p<0.001), the $1000-$1999 income levels (r= 0.94, p<0.001), and the high tertiary and postgraduate (r= 0.96, p<0.001). Discussion This discussion will concentrate on the regional results and will only briefly cover metropolitan results for comparison purposes. Out of the factors included in the analysis, socio-economic factors in addition to sustenance activities explain most of the variation in population (in this study) for regional areas. The impact of sustenance activities and other socio-economic factors vary both in type and the strength. This results in great diversity of sustenance activities that supports regions. Towns and cities in regional areas are specific and diverse and they should all be considered as fine gradations of a general trend. For regional data accommodation and food services, professional, scientific and technical services and retail trade are all positively related to population change, while mining and agriculture, forestry and fishing are negatively related to population change. Sustenance activities For regional data three industry clusters were positively associated with population growth and therefore linked with attracting population to regional areas: The most significant effect was exerted by accommodation and food services industry. It attracts young people with higher education however, this attraction could be a temporary one. Young people are highly mobile and they could move in and out of the industries quickly. This is an industry that changes by fashion and trends in what is ‘in’ to visit, which makes demand (and thus required supply) in this industry very volatile. It also takes the relatively heavier toll in economic downturn periods. Within this industry cluster 71% of SLAs are facing population increase as opposed to 29% of which are losing population. Professional, scientific and technical services industry has the second highest impact on regional population change. This industry is labour intensive and it absorbs highly educated people which can result in high amount of interaction between this industry, education and the government. This in turn could help with the economic development of the specific regions. The interaction between the University of Ballarat in Australia and the adjoining technology park is an example of this (Harvey, 2008). Within this industry cluster 90% of SLAs are gaining population and only 10% are losing population. CRICOS Provider No. 00103D 2011 The Business School Working Papers: xxx-2011 Page 12 of 16 Retail trade industry has the next level of impact on regional population change. This industry is more domestic based and to a great deal it depends on the trade activities within its own local community. Within this industry cluster 72% of SLAs are facing population increase as opposed to only 28% of SLAs which are losing their population. Mining industry shows the highest negative impact on population change in regional areas. This industry is relatively labour intensive and its income level is attractive, however it does not attract numerous highly educated people. As a result the balance of population change for mining clusters in regional areas is negative. Stimson et al. (1998) identified this earlier as they found that many of the regions that relied on mining over the decade 1986-1996 experienced a loss in their national share of employment in mining. There are many people who fly-in/fly-out in this industry which involves work in relatively remote areas, and it has increased over the last decade (Storey, 2001). Indication of loss in mining clusters of this study could be due to the fact that the data used in this study (Census 2001-2006) does not fully reflect the mining boom of the late 2000s. For Mining industry cluster 53% of SLAs are losing population and 47% are gaining population. Agriculture, forestry and fishing industry is a commodity export industry and has a negative impact on population change in regional areas. This industry is not labour intensive and highly educated people are absorbed in professional, scientific and technical services industry which indirectly provides services to agriculture, forestry and fishing. Within the agriculture, forestry and fishing cluster up to 62% of SLAs are facing population loss and only 38% are gaining population. Explanation of the variation in regional population data also relies on socio-economic factors, as described below: Socio-economic factors Income Socio-economic factors impacting population change in regional areas have been identified and explained in previous work (Mardaneh, 2010). Income levels of “$1000- $1999 and $2000 and more” impact on population change. $1000-$1999 income level is strongly associated with accommodation and food services and the retail trade sectors. Similarly incomes of $2000 and more are strongly associated with professional, scientific and technical services, and the mining industries. Within these income levels cluster 97% of SLAs are gaining population and only 3% are losing population. Analysis of individual weekly income reveals that weekly incomes over $1000 have a positive impact on population change. Education High level of tertiary and postgraduate education cluster shows a positive impact on population change for regional areas. This shows that a higher level of education contributes positively to higher population growth. This in turn indicates the importance of placing educational institutes (universities and schools) close to these population centres. Within the high level of tertiary and postgraduate education cluster 85% of SLAs are gaining population and 15% are losing population. Age Age groups of 40-64 and 65 and older do not have as much impact on population growth, whereas 25-39 years old age group shows a positive impact on population change for regional areas. Ageing patterns are crucial in understanding how to sustain population and age categories in regional Australia appear to be skewed compared to the urban capital centres. For the 25-39 years old age group cluster 65% of SLAs are gaining population and 35% are losing. In metropolitan model it appears that the explanatory factors mainly include sustenance activities and all industries were positively related to population change. However the relevant results are limited. Due to the large population density in metropolitan areas there exist some other factors (e.g. communication, CRICOS Provider No. 00103D 2011 The Business School Working Papers: xxx-2011 Page 13 of 16 transportation, health provision, proximity to the metropolitan centre, size of the regional centre, etc.) that could explain more of the variation in population (Chi, 2012; Ding, 2012). Since this study attempts to compare identical factors between regional and metropolitan areas, those other possible factors were not included, and it could be a separate study. The study reveals a negative net percentage of population change in regional as opposed to metropolitan areas. This study builds upon the works of other researchers (Beer and Clower, 2009; Freestone et al., 2003; Frisbie and Poston, 1975, 1978; Smith, 1965) but takes a different approach to investigate the impact of sustenance activities and socio-economic factors on population change in regional and metropolitan areas. The findings of this current study add to the importance of the role of the mentioned factors in development of the regions. Conclusion The paper introduced a new approach that can be used in the study of population change in regional and metropolitan areas. The main contribution of the study to the urban studies literature is twofold. First, the study attempts to explore the combined effect of socio-economic factors and sustenance activities on population change study. As the results show, rate of employment in some industries and occupations (e.g. public administration and safety) in regional areas are much higher than the traditional industries and occupations (e.g. agriculture, forestry and fishing). This indicates that development of some traditional industries in regional areas does not necessarily attract population to these areas as sometimes people’s workplace and where they live can be quite different. Socio-economic factors appear from this study to play a more important role in regional areas (in addition to sustenance activities), particularly in the service based industries. Other factors not included in the study should be considered for a full explanation of population variation in metropolitan areas. The other contribution to the field is that the study used two distinct methods: a clustering algorithm (kmeans) is used to cluster SLAs in regional and metropolitan areas; and the impact of emerging clusters on population growth and decline are investigated separately for each cluster using regression models. In summary for regional areas, clusters contributing to population change are industries in association with some socio-economic factors. For metropolitan areas these are clusters that are mainly linked with service based sustenance activities. However this is a weaker link in the results because there are many other factors effecting population change in metropolitan areas. Results of this study using clustering method particularly for regional areas reveal specific patterns and help to fill the gap in the regional economic development literature. Also, the more significant role of sustenance activities in metropolitan vis-à-vis regional areas is identified. Therefore there are implications for both regional and metropolitan economic policy making. It is believed that in order to generate a balance in population change between regional and metropolitan areas the sustenance activities and socio-economic factors should be considered separately for regional and metropolitan areas. These results indicate the need for more study on regional development in understanding the impact of population change on the viability of smaller communities. Metropolitan areas on the other hand are much more complex and more important factors impacting population change in those areas need to be identified. References Australian Bureau of Statistics, (2006) Australian Standard Geographical Classification (ASGC), Statistical Geography 1, ABS catalogue No. 1216.0. Australian Bureau of Statistics, (2009) Australian Standard Geographical Classification (ASGC), ABS catalogue No. 1216.0. Australian Bureau of Statistics, (2010a) Census CDATA online, 2006. Australian Bureau of Statistics, (2010b) Regional Population Growth, Australia 2008-09, ABS catalogue No. 3218.0. CRICOS Provider No. 00103D 2011 The Business School Working Papers: xxx-2011 Page 14 of 16 Bagirov, A. M. (2008) Modified global k-means algorithm for minimum sum-of-squares clustering problems, Pattern Recognition, 41, pp. 3192−3199. Bagirov, A. M. and Mardaneh, K. (2006) Modified global k-means Algorithm for Clustering in Gene Expression Datasets, in: M. Boden and T. L. Bailey (Eds), Intelligent systems for Bioinformatics 2006, pp. 23−28. Australian Computer Society (ACS), Hobart. Bayne, C. K., Beauchamp, J. J., Begovich, C. L. and Kane, V. E. (1980) Monte Carlo Comparisosn of Selected Clustering Procedures. Pattern Recognition, 12, pp. 51-62. Beer, A. (1999) Regional Cities Within Australia’s Evolving Urban System, 1991-96, Australasian Journal of Regional Studies, 5(3), pp. 329−348. Beer, A. and Clower T. (2009) Specialisation and Growth: Evidence from Australia’s Regional Cities, Urban Studies, 46, pp. 369−388. Beer, A. and Keane, R. (2000) Population decline and service provision in regional Australia: A South Australian Case Study, People and Place, 8(2), pp. 69−75. Beer, A. and Maude, A. (1995) Regional Cities in The Australian Urban system, 1961-1991, Urban Policy and Research, 13(3), pp. 135−148. Beer, A., Maude, A. and Pritchard, B. (2003) Developing Australia’s Regions: theory and practice, pp.1−271. University of New South Wales Press, Sydney. Black, D. and Henderson, V. (1999) A Theory of Urban Growth, Journal of Political Economy, 107(2), pp. 252−284. Brown, B. B. (1968) Delphi Process: A Methodology Used for the Elicitation of Opinions of Experts, ASTME vectors, pp. 1−14. Budge, T. (2006) Sponge Cities and Small Towns: a New Economic Partnership, in M. Rogers and D. R. Jones (Eds), The Changing Nature of Australia’s Country Towns, pp. 38-52. VURRN press, Ballarat. Calthorpe, P. and Fulton, W. (2001) The regional City: planning for the end of sprawl, pp.1−75. Washington DC, Island press. Chi, G. (2012) The Impacts of Transport Accessibility on Population Change across Rural, Suburban and Urban Areas: A Case Study of Wisconsin at Sub-country Levels, Urban Studies, 49(12), pp. 27112731. Curran, W. (2010) In Defence of Old Industrial Spaces: Manufacturing, Creativity and Innovation in Williamsburg, Brooklyn, International Journal of Urban and Regional Research, 34(4), pp. 871−885. Ding, C. (2012) Transport Development, Regional Concentration and Economic Growth, Urban Studies, 1(17), pp. 1-17. Freestone, R., Murphy, P. and Jenner, A. (2003) The functions of Australian towns, revisited, Tijdschrift voor economische en sociale geografie, 94(2), pp. 188−204. Frisbie, W. P. and Poston, JR. D.L. (1975) Components of Sustenance Organization and Nonmetropolitan Population Change: a Human Ecological Investigation, American Sociological review, 40(6), pp. 773−784. Frisbie, W. P. and Poston, JR. D.L. (1976) The Structure of Sustenance Organization and Population Change in Nonmetropolitan America, Rural Sociology, 41(3), pp. 354−370. Frisbie, W. P. and Poston, JR. D.L. (1978) Sustenance Differentiation and Population Redistribution, Social Forces, 57(1), pp. 42−56. Goetz, S. J. and Debertin, D. L. (1996) Rural Population Decline in the 1980s: Impacts of Farm Structure and Federal Farm Programs, American Journal of Agricultural Economics, 78, pp. 517−529. Gow, D. J. (2007) Fundamentals of multiple regression analysis. 2010 ACSPRI summer course lecture notes, pp.1−330. The Australian National University, Canberra. Harvey, T. (2008) Transition- The IBM story, pp. 77-84. Switzer media and publishing, Wollahra, NWS. Hutton, T.A. (2004) Service industries, globalization, and urban restructuring within the Asia-Pacific: new development trajectories and planning responses, Progress in Planning, 61, pp. 1−74. Karlsson, C. (1999) Spatial Industrial Dynamics in Sweden: Urban growth Industries, Growth and Change, 30, pp. 184−212. Krakover, S. (1985) Spatio-Temporal Structure of Population Growth in Urban Regions: The Case of Tel-Aviv and Haifa, Israel, Urban Studies, 22, pp. 317-328. Krout, J. A. (1982) The Changing Impact of Sustenance Organization Activities on Non-Metropolitan Net Migration, Sociological Focus, 15(1), pp. 1−13. Lee, R. D. and Carter, L. R. (1992) Modelling and Forecasting U.S. Mortality, Journal of the American Statistical Association, 87(419), pp. 659−671. Mardaneh, K. (2010) Clustering Australian Regional Areas: An Optimisation Approach, in: P.DALZIEL, (Ed) Innovation and Regions: Theory, Practice and Policy, pp.99−110. AERU research group, Canterbury. Mardaneh, K. (2012) Small-to-Medium Enterprises and Economic Growth: A Comparative Study of Clustering Techniques. Journal of Modern Applied Statistical Methods, 11(2), 469-478. CRICOS Provider No. 00103D 2011 The Business School Working Papers: xxx-2011 Page 15 of 16 Mezzich, J. E. (1978) Evaluation Clustering Methods for Psychiatric Diagnosis. Biological Psychiatry, 13, pp. 265-281. Milligan, G. W. (1980) An Examination of the Effect of Six Types of Error Perturbation on Fifteen Clustering Algorithms. Psychometrika, 45, pp. 325-342. Millward, H. (2005) Rural Population Change in Nova Scotia, 1991-2001: Bivariate and multivariate analysis of key drivers, The Canadian Geographer 49(2), pp. 180−197. Murdock, S. H., Backman, k., Hwang, S.S. and Hamm, R.R. (1991) International Dimensions of Post-1980 Internal Migration in the United States: The Role of Sustenance Specialization and Dominance, Sociological Inquiry, 61(4), pp. 492−504. Nam, K-M. and Reilly, J.M. (2012) City Size Distribution as a Function of Socioeconomic Conditions: An Eclectic Approach to Downscaling Global Population, Urban Studies, pp. 1-18. O’Connor, K., Stimson, R. and Daly, M. (2001) Australia’s Changing Economic Geography: A society Dividing, pp.1−120. Oxford University Press, Melbourne. Poindexter, J.R. and Clifford, W.B. (1983) Components of Sustenance Organization and Nonmetropolitan Population Change: The 1970s, Rural Sociology 48(3), pp. 421−435. Polese, M. and Shearmur, R. (2006) Why some regions will decline: A Canadian case study with thoughts on local development strategies, Papers in Regional Science, 85(1), pp. 23−46. Punj, G., and Stewart, D.W. (1983) Cluster analysis in Marketing Research: Review and Suggestions for Application. Journal of Marketing Research, 10, pp. 134-148. Sanderson, W. C. (1998) Knowledge Can Improve Forecasts: A Review of Selected Socioeconomic Population Projection Models, Population and Development Review, 24, pp. 88−117. Shumway, J. M. and Davis, J. A. (1996) Nonmetropolitan Population Change in the Mountain West: 19701995, Rural Sociology, 61(3), pp. 513−529. Smith, A., Courvisanos, J., Tuck, J. and Mceachern, S. (2011) Building innovation capacity: the role of human capital formation in enterprises, in: P. Curtin, J. Stanwick and F. Beddie (Eds) Fostering Enterprise: The Innovation and Skills Nexus – Research Readings, pp.1−15. NCVER, Adelaide. Smith, R. H.T. (1965) Method and Purpose in Functional Town Classification, Annals of the Association of American Geographers, 55(3), pp. 539−548. Sorensen, T. and Weinand, H. (1991) Regional Well-Being in Australia Revisited, Australian Geographical Studies, 29(1), pp. 42−70. Stimson, R. J., Shuaib, F.A. and O’Connor, K.B. (1998) Population and Employment in Australia, Regional Hot Spots and Cold Spots, 1986 to 1996, pp.1−124. University of Queensland Press, Queensland. Stimson, R. J., Stough, R.R. and Roberts, B.H. (2006) Regional Economic Development, Analysis and Planning Strategy, pp.1−112. Springer-Verlag, Berlin. Storey, K. (2001) Fly-in/fly-out and fly over: mining and regional development in Western Australia, Australian Geographer, 32 (2), pp. 133−148. Watts, M. J. (2009) The impact of Spatial Imbalance and Socioeconomic Characteristics on Average Distance Commuted in the Sydney Metropolitan Area, Urban Studies, 46(2), pp. 317-339. CRICOS Provider No. 00103D 2011 The Business School Working Papers: xxx-2011 Page 16 of 16
© Copyright 2026 Paperzz