Insights from analysing tourist expenditure using quantile regression António Almeida University of Madeira Brian Garrod* Aberystwyth University, UK *Corresponding author. School of Management and Business, Aberystwyth University, Llanbadarn Campus, Abersytwyth, Ceredigion, Wales, SY23 3AL, UK Abstract Mature tourism destinations are increasingly needing to diversity their products and markets. To be successful, such strategies require a very detailed understanding of potential tourists’ levels and patterns of spending. Empirical studies of tourist expenditure have tended to employ ordinary least squares regression for this purpose. There are, however, a number of important limitations to this technique, chief among which is its inability to distinguish between tourists who have higher- and lower-than-average levels of spending. As such, some researchers recommend the use of an alternative estimation technique, known as quantile regression, which does allow such distinctions to be made. This study uses a single dataset, collected among rural tourists in Madeira, to analyse the determinants of tourist expenditure using both techniques. This enables direct comparison to be made and illustrates the additional insights to be gained by using quantile regression. Key words: Quantile regression; Expenditure levels; Expenditure patterns; Determinants; Madeira 1. Introduction Mature tourism destinations tend to operate in slow-moving and highly competitive markets (Chapman and Speake, 2011; Garcá-Falcón & MedinaMunõz, 1999; Ismeri Europa, 2011). Consequently, they often exhibit slowgrowing or even declining tourism arrivals, a reduction in tourists’ average length of stay, declining tourist spending, over-dependence on repeat tourists and erratic business performance (Sharpley, 2002, 2003). Diversification, either into new products or new markets, has frequently been recommended as a suitable strategic response (Baum & Hagen, 1999; Benur & Bramwell, 2015; Boukas & Ziakas, 2013; Canavan, 2014; Sharpley, 2002). The objective is typically to attract new tourists with higher-than-average spending (Chapman & Speake, 2011). While tourist expenditure patterns have been a recurrent theme in micro-level tourism research (Thrane & Farstad, 2011), most previous studies have adopted ordinary least squares (OLS) regression as the estimation method (Brida & Scuderi, 2012). A major limitation of this technique is, however, that it provides estimates only at the mean level of the dependent variable. As such, using OLS regression does not allow the researcher to explore the determinants of expenditure among those who spend more or less than the average. This is unhelpful when the purpose of the exercise is to identify high-spending tourists so that they can be effectively targeted by marketing campaigns (Kozak and Martin, 2012). Some researchers are recommending an alternative estimation method, known as quantile regression, which is able to analyse the determinants of tourist expenditure at different levels, or ‘quantiles’, of expenditure (e.g. Lew & Ng, 2011; Thrane, 2014), which is particularly suitable for segmented markets such as those involved in tourism. As such, quantile regression can generate more nuanced and ultimately more useful information to guide strategies such as product or market diversification. This research note demonstrates the additional insights than can be gained by using quantile regression. Both OLS and quantile regression are used with a single data set, collected from rural tourists in Madeira, to illustrate the merits of the latter. 2. Theoretical background A substantial number of studies have attempted to identify the key determinants of tourist expenditure. Brida and Scuderi (2012), for example, review some 86 such studies. Wang & Davidson (2010), meanwhile, conduct a review of 27 micro-economic studies of tourist expenditure. Given space restrictions in this research note, interested readers are referred to these works. Most of these 2 studies used variants of OLS regression. Some recent studies are now, however, using quantile regression as an alternative estimation method for the analysis of tourism expenditure (e.g. Chen & Chang, 2012; Kuo & Lu, 2013; Lew & Ng, 2011; Marrocu et al., 2015; Othmen, 2013; Saayman & Saayman, 2012, 2014; Santos & Cabral Vieira, 2012; Thrane, 2014). The majority of tourist expenditure studies to date have included a mixture of both socio-demographic and trip-related variables (Brida & Scuderi, 2012). In the case of the former, the most-frequently used variables are age, gender, income and level of education. With regard to the latter, the variables typically included are accommodation type, travelling group, time of booking, number of visits per year and repeat visits. Destination attributes, tourist motivations and indicators of satisfaction are also sometimes examined. The literature emphasises several advantages of quantile regression over OLS. Firstly, OLS estimates conditional means functions based on the minimisation of the sum of squared residuals. This has the effect of ignoring the differential effects of the explanatory variables across the entire conditional distribution, i.e. across different ‘quantiles’ (Zhang & Leonard, 2014). Often such differential effects can be the most crucial piece of information in explaining the relationship in question. Secondly, traditional OLS regression assumes that the conditional variance of the dependent variable is constant (i.e. homoscedastic) for all values of the covariates. However, the Breusch-Pagan/Cook-Weisberg test for heteroscedasticity leads to the rejection of the null hypothesis of constant variance. Thirdly, traditional OLS regression is over-influenced by outliers, and the usual approach of excluding outliers prevents the analyst from using the full conditional distribution of the dependent variable. Consequently, Uematsu et al. (2013, p.151) refer to quantile regression as a ‘more flexible semi-parametric regression model to explore potentially complex and heterogeneous relationship’. 3. Methodology Following the example of Santos and Cabral Vieira (2012), an OLS regression was first used to estimate the coefficients based on the conditional mean. The quantile regression approach was then applied to take into consideration unobserved heterogeneity across quantiles and heteroscedasticity among the error terms. The differences between the OLS and quantile regression parameters estimates can then be identified and considered. A log-linear functional form is used in this study, which relates expenditure to various socio-demographic and travel related variables. Following the example of Hung et al. (2012), the model is specified as follows: 3 where Ei represents the level of expenditure of each sampled individual; θ is the expenditure distribution quantile (θ = 25th, 50th and 75th); β, θ and k are the parameters to be estimated in each expenditure distribution quantile, measuring the relationships between the determinants of expenditure and expenditure; Zi represents the vector composed of expenditure determinants; uθi is the disturbance term; and Quant represents the quantile of the dependent variable being conditional in relation to the vector Z (i.e. the independent variables). The coefficients are estimated based on the procedure described by Koeneker and Hallock (2001) using the ‘Qreg’ command in STATA. According to Mueller and Loomis (2014), Qreg estimates the regression using linear programming techniques as described in Armstrong et al. (1979). It also estimates the variance-covariance matrix of the coefficients by using a method proposed by Koenker and Bassett (1982) and Rogers (1992). The data were from a survey of rural tourists in Madeira carried out between June and November 2010. Properties affiliated to the Madeira Rural Houses Association were approached to solicit their support. Printed copies of the selfadministered questionnaire were left with the receptionists and either handed to guests or left in bedrooms. The reliability and effectiveness of property owners in distributing and collecting the questionnaires was checked every fortnight. In this way, 173 usable questionnaires were received. The socio-demographic profile was considered to be reasonably representative of tourists in general, albeit with a slight over-representation of German and French nationals in terms of country of origin. 4. Findings The variables included in the model are identified in Table 1, while Table 2 presents the results from the OLS and quantile regression models. In the case of the quantile regression, the estimated coefficients refer to the 25th, 50th and 75th quantiles for ease of interpretation. The signs of the coefficients of the explanatory variables were generally in line with expectations and most coefficients were statistically significant. The R2 for the OLS regression and the pseudo R2 for the three quantiles are shown. This goodness of fit is well in line with previous studies in this area (e.g. Marrocu et al., 2015). *** Table 1 near here *** 4 *** Table 2 near here *** For the three quantiles under analysis, the variables achieving statistical significance were not consistent with those found using OLS. Indeed, only in the cases of length of stay and travelling in a family could this be said to be true. This is largely true of both the signs and values of the coefficients. Moreover, the test for equal coefficient estimates across different quantiles suggests the rejection of the null hypothesis, which allows us to conclude that the OLS coefficients alone provide a poor insight into the determinants of expenditure among rural tourists at different expenditure levels. This is illustrated in Figure 1. The shaded area represents the 95% confidence interval of the quantile estimate, while the dashed line represents the OLS mean estimate and the solid line shows the quantile regression estimate. It is evident that the estimated impact of almost all of the variables varies across the expenditure distribution. Furthermore, by comparing the coefficients of the OLS and the 50th quantile, it can be seen that OLS either under- or over-estimates the effect size of a specified. *** Figure 1 near here *** In terms of comparison, the OLS model suggests that the variables ‘length of stay’, ‘first visit’, ‘travelling in a family’, ‘activities pursued’ and German or Portuguese country of origin all have a statistically significant effect on total expenditure at the 5% significance level. The variables ‘income’ and ‘gender’ are significant at the 10% level, while the variables ‘education’ and travelling with a ‘low-cost’ company are not significant. It is also evident that the country of origin exerts a strong influence over total expenditure: to be German (perhaps counter-intuitively) or Portuguese (which is more in line with expectations) has a negative effect on spending. The quantile regression, in contrast, suggests that the effect of country of origin is not consistent across all quantiles. British country of origin is the reference group, so the coefficients must be interpreted as effects relative to UK spending levels at different quantiles. Accordingly, at the 50th quantile a tourist from Portugal quantile would be expected to spend 14% less than a person from Britain at the 50th quantile. This negative effect is much greater at the 25th quantile and much less at the 75th: in other words, the effect diminishes as the level of expenditure increases. In the case of German country of origin, however, the negative effect actually strengthens from the lower quantile, through the middle quantile and up into the upper quantile. This should be of some concern to Madeira’s destination marketing organisation, as Germany has often been identified as a potentially lucrative inbound tourism market. 5 As expected, the mean effect of income (i.e. using OLS) was found to be positively related to total expenditure. This supports the intuition that highincome tourists need to be targeted to make Madeira’s diversification strategy effective. The quantile regression, however, tells a rather different story. While the estimated coefficient on income is positive across all quantiles, the strength of its influence varies over the conditional expenditure distribution. Indeed, the effect of income on expenditure is not significant at the 75th quantile: other things being equal, high-income tourists are not reliably high-spending ones. The OLS estimation found no association between the number of activities pursued and tourist expenditure. The quantile regression, however, suggested that the impact of this variable differed across different expenditure levels. Each additional activity resulted in a decrease in total expenditure by 8.5% in the lower quantile and 6.6% in the middle quantile. The best interpretation of this finding is that tourists staying in rural areas typically wish to pursue activities such as hiking, taking scenic drives, having picnics, and so forth: low-cost activities and have few spending opportunities associated with them than others they could have been undertaking. The variables ‘gender’, ’education’ and ‘age’, meanwhile, displayed a similar pattern of results in terms of their variation over the conditional expenditure distribution. Except for nationality, therefore, the task of identifying highspending rural tourists is unrelated to socio-demographic characteristics. Examining the coefficients at the 75th quantile indicates that being British country of origin (the reference category), travelling in a family and the holiday being a first-time visit may therefore demarcate the ideal target in terms of attracting high-expenditure rural tourists to Madeira 5. Conclusions This study has employed both OLS and quantile regression to investigate the determinants of the expenditure a sample of 173 rural tourists to Madeira. A key finding of this study is that the coefficient estimates at the upper quantile differ significantly from those in the middle and lower quantiles of the sample. This suggests that segmenting the market on the basis of average tourist characteristics could well be an inefficient strategy, targeting the wrong tourists in the wrong ways. The estimates provided by the quantile regression thus offer a more detailed and realistic picture of the determinants of tourist expenditure than those provided by OLS. Crucially, the quantile regression results do not support the view that income significantly influences expenditure at the upper quantile. Another feature of the analysis is that the marginal effect of many key variables 6 becomes less important at higher levels of total expenditure. Only the effects of ‘length of stay’, ‘travelling in a family’, ‘first visit’ and the German origin market are significant at the 75th quantile. The results suggest that in attempting to diversify its tourism offer into rural tourism, Madeira’s efforts should be focused on first-time tourists, family groups and the British. This may still be rather a heterogeneous group, but it at least provides a starting point for marketers to further refine, and then to target and position towards. The important point is that this is fundamentally different to the segmentation that was identified using OLS. If researchers continue to use OLS, marketers will continue to target potential rural tourists poorly. The quantile method is greatly to be preferred. 7 References Armstrong, R. D., Frome, E.L., & Kung, D.S. (1979). Algorithm 79-01: a revised simplex algorithm for the absolute deviation curve fitting problem. Communications in Statistics, Simulation and Computation 8(2): 175–190. Baum, T., & Hagen, L. (1999). Responses to seasonality: The experiences of peripheral destinations. International Journal of Tourism Research, 5(1), 299– 312. Benur, A. M., & Bramwell, B. (2015). Tourism product development and product diversification in destinations. Tourism Management, 50, 213–224. Brida, J., & Scuderi, R. (2012). Determinants of tourist expenditure: A review of microeconometric models. Tourism Management Perspectives, 6, 28–40. Canavan, B. (2014). Sustainable tourism: Development, decline and de-growth. Management issues from the Isle of Man. Journal of Sustainable Tourism, 22(1), 127–147. Chapman, A., & Speake, J. (2011). Regeneration in a mass-tourism resort: The changing fortunes of Bugibba, Malta. Tourism Management, 32(3), 482–491. Chen, C. M., & Chang, K. L. (2012). The influence of travel agents on travel expenditures. Annals of Tourism Research, 39(2), 1258–1263. Hung, W. T., Shang, J. K., & Wang, F. C. (2012). Another look at the determinants of tourism expenditure. Annals of Tourism Research, 39(1), 495– 498. Ismeri Europa (2011). Growth factors in the outermost regions, Final Report Vol. II, European Commission. Koenker, R., & Bassett, G.W. (1982). Robust tests for heteroscedasticity based on regression quantiles. Econometrica, 50(1), 43–61. Kozak, M., & Martin, D. (2012), Tourism life cycle and sustainability analysis: Profit-focused strategies for mature destinations. Tourism Management, 33(1), 188–194. Kuo, H. I., & Lu, C. L. (2013). Expenditure-based segmentation: Application of quantile regression to analyse the travel expenditures of baby boomer households. Tourism Economics, 19(6), 1429–1441. 8 Lew. A., & Ng, P. (2011). Using quantile regression to understand tourist spending. Journal of Travel Research, 20(10), 1–11. Marrocu, E., Paci, R., & Zara, A. (2015). Micro-economic determinants of tourist expenditure: A quantile regression approach. Tourism Management, 50, 13–30. Mueller, J., & Loomis, J.B. (2014). Does the estimated impact of wildfires vary with the housing price distribution? A quantile regression approach. Land Use Policy, 41,121–127. Othmen, A. B. (2013). Nature-based Tourists in the Gironde Estuary: Examining and identifying the relationship between their expenditure and the motivations for their visit. Review of Economic Analysis, 5(1), 70–85. Rogers, W. (1992). quantile regression standard errors. Stata Technical Bulletin 9, 16–19. Saayman, M., & Saayman, A. (2012). Determinants of spending: An evaluation of three major sporting events. International Journal of Tourism Research, 14(2), 124–138. Saayman, M., & Saayman, A. (2014). How deep are scuba divers' pockets?. Tourism Economics, 20(4), 813–829. Santos, C., & Cabral Vieira, J. (2012). An analysis of tourists' expenditures in a tourist destination: OLS, quantile regression and instrumental variable estimators. Tourism Economics, 18(3), 555-576. Sharpley, R. (2002). Rural tourism and the challenge of tourism diversification: The case of Cyprus. Tourism Management, 23(3), 233–244. Sharpley, R. (2003). Tourism, modernisation and development on the island of Cyprus: Challenges and policy responses. Journal of Sustainable Tourism, 11(2-3), 246–265. Thrane, C. (2014) Modelling micro-level tourism expenditure: Recommendations on the choice of independent variables, functional form and estimation technique. Tourism Economics, 20(1), 51–60. Thrane, C, & Farstad, E. (2011). Domestic tourism expenditures: The non-linear effects of length of stay and travel party size. Tourism Management, 32(1), 46– 52. 9 Uematsu, H., Khanal, A., & Mishra, A., (2013). The impact of natural amenity on farmland values: A quantile regression approach. Land Use Policy, 33, 151-160. Wang, Y., & Davidson, M. C. (2010). A review of micro-analyses of tourist expenditure. Current Issues in Tourism, 13(6), 507–524. Zhang, L., & Leonard, T. (2014). Neighborhood impact of foreclosure: A quantile regression approach, Journal of Regional Science and Urban Economics, 48, 133-143. 10 Table 1: Measurement of variables and expected signs Variables used Expected Sign Dependent variable Travel expenditure Explanatory variables Socio-demographic variables Income (7 levels of income) Age (7 levels of age) Gender (dummy variable; 1 if male; 0 if female) Education (7 levels) Portuguese national (dummy variable; 1 if Portuguese national; 0 otherwise) German national (dummy variable) Dutch national (dummy variable) Travel-related variables Length of stay Activities Travelling with a low-cost company (dummy variable) Travelling in a family (dummy variable) First visit (dummy variable) 11 + + +/+ + + + + + + Table 2: Model estimation results Log Expenditure Total OLS 25th quantile 50th quantile 75th quantile Coef. St. Er. Coef. St. Er. Coef. St. Er. Coef. St. Er. Length of stay 0.132 0.013** 0.105 0.015** 0.109 0.011** 0.113 0.014** Age 0.015 0.039 0.028 0.038 0.011 0.035 0.014 0.053 Gender 0.162 0.097+ 0.163 0.098+ 0.193 0.086* 0.115 0.131 Income 0.067 0.036+ 0.079 0.033* 0.110 0.032** 0.065 0.052 Education 0.021 0.030 0.040 0.032 0.003 0.027 0.021 0.042 First visit 0.260 0.084** 0.092 0.063 0.233 0.076 0.273 0.134* Low cost -0.213 0.109 -0.280 0.116* -0.001 0.097 -0.136 0.149 Travelling in a family 0.290 0.114* 0.235 0.114* 0.282 0.099** 0.408 0.144** Portuguese national -0.367 0.166* -0.832 0.171** -0.592 0.149** -0.320 0.224 German national -0.262 0.129* -0.100 0.125 -0.306 0.115** -0.354 0.186** Dutch national 0.055 0.143 0.146 0.142 0.020 0.128 -0.043 0.189 Activities pursued -0.042 0.027 -0.085 0.027** -0.066 0.024** 0.014 0.035 Constant 4.024 0.308 4.063 0.318 4.230 0.272 4.412 0.399 R2 0.599 Pseudo R2 0.4174 0.3612 0.3344 +significant at the 90% level, *significant at the 95% level, **significant at the 99% level 12 Dutch -1.00 -0.50 0.000.501.001.50 .8 Fig.1e 1 0 .2 .6 .4 Quantile .8 1 0 .2 .6 .4 Quantile .8 1 0 0 .2 Fig.1i .2 .6 .4 Quantile .6 .4 Quantile .6 .4 Quantile .8 .8 .8 1 Fig.1j 1 0 1 0 0 .2 Fig.1f .2 .2 .6 .4 Quantile Fig.1g .6 .4 Quantile .6 .4 Quantile -0.50 0.000.501.00 .2 FirstVisit 0 -0.40-0.20 0.000.20 Activities 1 German .6 .4 Quantile .8 0.000.100.200.30 LengthofStay 1.00 2.00 3.00 4.00 5.00 6.00 Intercept Fig.1c .8 .8 Fig.1l .8 1 -0.50 0.00 0.50 1.00 1.50 Gender -0.40 -0.20 Age 0.000.20 0.40 Fig.1b -1.00-0.50 0.000.501.00 .2 .6 .4 Quantile -2.00-1.00 0.001.00 0 .2 Portuguese 0 -0.40 -0.20 0.00 0.200.40 Income -0.20 -0.10 0.00 0.10 0.20 0.30 Academic Fig.1a -1.00 -0.50 0.00 0.50 1.00 Family -2.00-1.00 0.001.00 LowCostCarrier Figure 1: Quantile regression versus OLS regression 1 1 Fig.1d 0 .2 .6 .4 Quantile .8 1 Fig.1h 0 .2 .6 .4 Quantile .8 1 0 .2 .6 .4 Quantile .8 1
© Copyright 2026 Paperzz