View/Open - Cadair - Aberystwyth University

Insights from analysing tourist expenditure using quantile
regression
António Almeida
University of Madeira
Brian Garrod*
Aberystwyth University, UK
*Corresponding author. School of Management and Business, Aberystwyth University,
Llanbadarn Campus, Abersytwyth, Ceredigion, Wales, SY23 3AL, UK
Abstract
Mature tourism destinations are increasingly needing to diversity their products
and markets. To be successful, such strategies require a very detailed
understanding of potential tourists’ levels and patterns of spending. Empirical
studies of tourist expenditure have tended to employ ordinary least squares
regression for this purpose. There are, however, a number of important
limitations to this technique, chief among which is its inability to distinguish
between tourists who have higher- and lower-than-average levels of spending.
As such, some researchers recommend the use of an alternative estimation
technique, known as quantile regression, which does allow such distinctions to
be made. This study uses a single dataset, collected among rural tourists in
Madeira, to analyse the determinants of tourist expenditure using both
techniques. This enables direct comparison to be made and illustrates the
additional insights to be gained by using quantile regression.
Key words: Quantile regression; Expenditure levels; Expenditure patterns;
Determinants; Madeira
1. Introduction
Mature tourism destinations tend to operate in slow-moving and highly
competitive markets (Chapman and Speake, 2011; Garcá-Falcón & MedinaMunõz, 1999; Ismeri Europa, 2011). Consequently, they often exhibit slowgrowing or even declining tourism arrivals, a reduction in tourists’ average
length of stay, declining tourist spending, over-dependence on repeat tourists
and erratic business performance (Sharpley, 2002, 2003). Diversification, either
into new products or new markets, has frequently been recommended as a
suitable strategic response (Baum & Hagen, 1999; Benur & Bramwell, 2015;
Boukas & Ziakas, 2013; Canavan, 2014; Sharpley, 2002). The objective is
typically to attract new tourists with higher-than-average spending (Chapman &
Speake, 2011).
While tourist expenditure patterns have been a recurrent theme in micro-level
tourism research (Thrane & Farstad, 2011), most previous studies have
adopted ordinary least squares (OLS) regression as the estimation method
(Brida & Scuderi, 2012). A major limitation of this technique is, however, that it
provides estimates only at the mean level of the dependent variable. As such,
using OLS regression does not allow the researcher to explore the determinants
of expenditure among those who spend more or less than the average. This is
unhelpful when the purpose of the exercise is to identify high-spending tourists
so that they can be effectively targeted by marketing campaigns (Kozak and
Martin, 2012).
Some researchers are recommending an alternative estimation method, known
as quantile regression, which is able to analyse the determinants of tourist
expenditure at different levels, or ‘quantiles’, of expenditure (e.g. Lew & Ng,
2011; Thrane, 2014), which is particularly suitable for segmented markets such
as those involved in tourism. As such, quantile regression can generate more
nuanced and ultimately more useful information to guide strategies such as
product or market diversification. This research note demonstrates the
additional insights than can be gained by using quantile regression. Both OLS
and quantile regression are used with a single data set, collected from rural
tourists in Madeira, to illustrate the merits of the latter.
2. Theoretical background
A substantial number of studies have attempted to identify the key determinants
of tourist expenditure. Brida and Scuderi (2012), for example, review some 86
such studies. Wang & Davidson (2010), meanwhile, conduct a review of 27
micro-economic studies of tourist expenditure. Given space restrictions in this
research note, interested readers are referred to these works. Most of these
2
studies used variants of OLS regression. Some recent studies are now,
however, using quantile regression as an alternative estimation method for the
analysis of tourism expenditure (e.g. Chen & Chang, 2012; Kuo & Lu, 2013;
Lew & Ng, 2011; Marrocu et al., 2015; Othmen, 2013; Saayman & Saayman,
2012, 2014; Santos & Cabral Vieira, 2012; Thrane, 2014).
The majority of tourist expenditure studies to date have included a mixture of
both socio-demographic and trip-related variables (Brida & Scuderi, 2012). In
the case of the former, the most-frequently used variables are age, gender,
income and level of education. With regard to the latter, the variables typically
included are accommodation type, travelling group, time of booking, number of
visits per year and repeat visits. Destination attributes, tourist motivations and
indicators of satisfaction are also sometimes examined.
The literature emphasises several advantages of quantile regression over OLS.
Firstly, OLS estimates conditional means functions based on the minimisation of
the sum of squared residuals. This has the effect of ignoring the differential
effects of the explanatory variables across the entire conditional distribution, i.e.
across different ‘quantiles’ (Zhang & Leonard, 2014). Often such differential
effects can be the most crucial piece of information in explaining the relationship
in question. Secondly, traditional OLS regression assumes that the conditional
variance of the dependent variable is constant (i.e. homoscedastic) for all
values of the covariates. However, the Breusch-Pagan/Cook-Weisberg test for
heteroscedasticity leads to the rejection of the null hypothesis of constant
variance. Thirdly, traditional OLS regression is over-influenced by outliers, and
the usual approach of excluding outliers prevents the analyst from using the full
conditional distribution of the dependent variable. Consequently, Uematsu et al.
(2013, p.151) refer to quantile regression as a ‘more flexible semi-parametric
regression model to explore potentially complex and heterogeneous
relationship’.
3. Methodology
Following the example of Santos and Cabral Vieira (2012), an OLS regression
was first used to estimate the coefficients based on the conditional mean. The
quantile regression approach was then applied to take into consideration
unobserved heterogeneity across quantiles and heteroscedasticity among the
error terms. The differences between the OLS and quantile regression
parameters estimates can then be identified and considered.
A log-linear functional form is used in this study, which relates expenditure to
various socio-demographic and travel related variables. Following the example
of Hung et al. (2012), the model is specified as follows:
3
where Ei represents the level of expenditure of each sampled individual; θ is the
expenditure distribution quantile (θ = 25th, 50th and 75th); β, θ and k are the
parameters to be estimated in each expenditure distribution quantile, measuring
the relationships between the determinants of expenditure and expenditure; Zi
represents the vector composed of expenditure determinants; uθi is the
disturbance term; and Quant represents the quantile of the dependent variable
being conditional in relation to the vector Z (i.e. the independent variables). The
coefficients are estimated based on the procedure described by Koeneker and
Hallock (2001) using the ‘Qreg’ command in STATA. According to Mueller and
Loomis (2014), Qreg estimates the regression using linear programming
techniques as described in Armstrong et al. (1979). It also estimates the
variance-covariance matrix of the coefficients by using a method proposed by
Koenker and Bassett (1982) and Rogers (1992).
The data were from a survey of rural tourists in Madeira carried out between
June and November 2010. Properties affiliated to the Madeira Rural Houses
Association were approached to solicit their support. Printed copies of the selfadministered questionnaire were left with the receptionists and either handed to
guests or left in bedrooms. The reliability and effectiveness of property owners
in distributing and collecting the questionnaires was checked every fortnight. In
this way, 173 usable questionnaires were received. The socio-demographic
profile was considered to be reasonably representative of tourists in general,
albeit with a slight over-representation of German and French nationals in terms
of country of origin.
4. Findings
The variables included in the model are identified in Table 1, while Table 2
presents the results from the OLS and quantile regression models. In the case
of the quantile regression, the estimated coefficients refer to the 25th, 50th and
75th quantiles for ease of interpretation. The signs of the coefficients of the
explanatory variables were generally in line with expectations and most
coefficients were statistically significant. The R2 for the OLS regression and the
pseudo R2 for the three quantiles are shown. This goodness of fit is well in line
with previous studies in this area (e.g. Marrocu et al., 2015).
*** Table 1 near here ***
4
*** Table 2 near here ***
For the three quantiles under analysis, the variables achieving statistical
significance were not consistent with those found using OLS. Indeed, only in the
cases of length of stay and travelling in a family could this be said to be true.
This is largely true of both the signs and values of the coefficients. Moreover,
the test for equal coefficient estimates across different quantiles suggests the
rejection of the null hypothesis, which allows us to conclude that the OLS
coefficients alone provide a poor insight into the determinants of expenditure
among rural tourists at different expenditure levels. This is illustrated in Figure 1.
The shaded area represents the 95% confidence interval of the quantile
estimate, while the dashed line represents the OLS mean estimate and the solid
line shows the quantile regression estimate. It is evident that the estimated
impact of almost all of the variables varies across the expenditure distribution.
Furthermore, by comparing the coefficients of the OLS and the 50th quantile, it
can be seen that OLS either under- or over-estimates the effect size of a
specified.
*** Figure 1 near here ***
In terms of comparison, the OLS model suggests that the variables ‘length of
stay’, ‘first visit’, ‘travelling in a family’, ‘activities pursued’ and German or
Portuguese country of origin all have a statistically significant effect on total
expenditure at the 5% significance level. The variables ‘income’ and ‘gender’
are significant at the 10% level, while the variables ‘education’ and travelling
with a ‘low-cost’ company are not significant. It is also evident that the country
of origin exerts a strong influence over total expenditure: to be German
(perhaps counter-intuitively) or Portuguese (which is more in line with
expectations) has a negative effect on spending.
The quantile regression, in contrast, suggests that the effect of country of origin
is not consistent across all quantiles. British country of origin is the reference
group, so the coefficients must be interpreted as effects relative to UK spending
levels at different quantiles. Accordingly, at the 50th quantile a tourist from
Portugal quantile would be expected to spend 14% less than a person from
Britain at the 50th quantile. This negative effect is much greater at the 25th
quantile and much less at the 75th: in other words, the effect diminishes as the
level of expenditure increases. In the case of German country of origin, however,
the negative effect actually strengthens from the lower quantile, through the
middle quantile and up into the upper quantile. This should be of some concern
to Madeira’s destination marketing organisation, as Germany has often been
identified as a potentially lucrative inbound tourism market.
5
As expected, the mean effect of income (i.e. using OLS) was found to be
positively related to total expenditure. This supports the intuition that highincome tourists need to be targeted to make Madeira’s diversification strategy
effective. The quantile regression, however, tells a rather different story. While
the estimated coefficient on income is positive across all quantiles, the strength
of its influence varies over the conditional expenditure distribution. Indeed, the
effect of income on expenditure is not significant at the 75th quantile: other
things being equal, high-income tourists are not reliably high-spending ones.
The OLS estimation found no association between the number of activities
pursued and tourist expenditure. The quantile regression, however, suggested
that the impact of this variable differed across different expenditure levels. Each
additional activity resulted in a decrease in total expenditure by 8.5% in the
lower quantile and 6.6% in the middle quantile. The best interpretation of this
finding is that tourists staying in rural areas typically wish to pursue activities
such as hiking, taking scenic drives, having picnics, and so forth: low-cost
activities and have few spending opportunities associated with them than others
they could have been undertaking.
The variables ‘gender’, ’education’ and ‘age’, meanwhile, displayed a similar
pattern of results in terms of their variation over the conditional expenditure
distribution. Except for nationality, therefore, the task of identifying highspending rural tourists is unrelated to socio-demographic characteristics.
Examining the coefficients at the 75th quantile indicates that being British
country of origin (the reference category), travelling in a family and the holiday
being a first-time visit may therefore demarcate the ideal target in terms of
attracting high-expenditure rural tourists to Madeira
5. Conclusions
This study has employed both OLS and quantile regression to investigate the
determinants of the expenditure a sample of 173 rural tourists to Madeira. A key
finding of this study is that the coefficient estimates at the upper quantile differ
significantly from those in the middle and lower quantiles of the sample. This
suggests that segmenting the market on the basis of average tourist
characteristics could well be an inefficient strategy, targeting the wrong tourists
in the wrong ways.
The estimates provided by the quantile regression thus offer a more detailed
and realistic picture of the determinants of tourist expenditure than those
provided by OLS. Crucially, the quantile regression results do not support the
view that income significantly influences expenditure at the upper quantile.
Another feature of the analysis is that the marginal effect of many key variables
6
becomes less important at higher levels of total expenditure. Only the effects of
‘length of stay’, ‘travelling in a family’, ‘first visit’ and the German origin market
are significant at the 75th quantile.
The results suggest that in attempting to diversify its tourism offer into rural
tourism, Madeira’s efforts should be focused on first-time tourists, family groups
and the British. This may still be rather a heterogeneous group, but it at least
provides a starting point for marketers to further refine, and then to target and
position towards. The important point is that this is fundamentally different to the
segmentation that was identified using OLS. If researchers continue to use OLS,
marketers will continue to target potential rural tourists poorly. The quantile
method is greatly to be preferred.
7
References
Armstrong, R. D., Frome, E.L., & Kung, D.S. (1979). Algorithm 79-01: a revised
simplex algorithm for the absolute deviation curve fitting problem.
Communications in Statistics, Simulation and Computation 8(2): 175–190.
Baum, T., & Hagen, L. (1999). Responses to seasonality: The experiences of
peripheral destinations. International Journal of Tourism Research, 5(1), 299–
312.
Benur, A. M., & Bramwell, B. (2015). Tourism product development and product
diversification in destinations. Tourism Management, 50, 213–224.
Brida, J., & Scuderi, R. (2012). Determinants of tourist expenditure: A review of
microeconometric models. Tourism Management Perspectives, 6, 28–40.
Canavan, B. (2014). Sustainable tourism: Development, decline and de-growth.
Management issues from the Isle of Man. Journal of Sustainable Tourism, 22(1),
127–147.
Chapman, A., & Speake, J. (2011). Regeneration in a mass-tourism resort: The
changing fortunes of Bugibba, Malta. Tourism Management, 32(3), 482–491.
Chen, C. M., & Chang, K. L. (2012). The influence of travel agents on travel
expenditures. Annals of Tourism Research, 39(2), 1258–1263.
Hung, W. T., Shang, J. K., & Wang, F. C. (2012). Another look at the
determinants of tourism expenditure. Annals of Tourism Research, 39(1), 495–
498.
Ismeri Europa (2011). Growth factors in the outermost regions, Final Report Vol.
II, European Commission.
Koenker, R., & Bassett, G.W. (1982). Robust tests for heteroscedasticity based
on regression quantiles. Econometrica, 50(1), 43–61.
Kozak, M., & Martin, D. (2012), Tourism life cycle and sustainability analysis:
Profit-focused strategies for mature destinations. Tourism Management, 33(1),
188–194.
Kuo, H. I., & Lu, C. L. (2013). Expenditure-based segmentation: Application of
quantile regression to analyse the travel expenditures of baby boomer
households. Tourism Economics, 19(6), 1429–1441.
8
Lew. A., & Ng, P. (2011). Using quantile regression to understand tourist
spending. Journal of Travel Research, 20(10), 1–11.
Marrocu, E., Paci, R., & Zara, A. (2015). Micro-economic determinants of tourist
expenditure: A quantile regression approach. Tourism Management, 50, 13–30.
Mueller, J., & Loomis, J.B. (2014). Does the estimated impact of wildfires vary
with the housing price distribution? A quantile regression approach. Land Use
Policy, 41,121–127.
Othmen, A. B. (2013). Nature-based Tourists in the Gironde Estuary: Examining
and identifying the relationship between their expenditure and the motivations
for their visit. Review of Economic Analysis, 5(1), 70–85.
Rogers, W. (1992). quantile regression standard errors. Stata Technical Bulletin
9, 16–19.
Saayman, M., & Saayman, A. (2012). Determinants of spending: An evaluation
of three major sporting events. International Journal of Tourism Research,
14(2), 124–138.
Saayman, M., & Saayman, A. (2014). How deep are scuba divers' pockets?.
Tourism Economics, 20(4), 813–829.
Santos, C., & Cabral Vieira, J. (2012). An analysis of tourists' expenditures in a
tourist destination: OLS, quantile regression and instrumental variable
estimators. Tourism Economics, 18(3), 555-576.
Sharpley, R. (2002). Rural tourism and the challenge of tourism diversification:
The case of Cyprus. Tourism Management, 23(3), 233–244.
Sharpley, R. (2003). Tourism, modernisation and development on the island of
Cyprus: Challenges and policy responses. Journal of Sustainable Tourism,
11(2-3), 246–265.
Thrane,
C.
(2014)
Modelling
micro-level
tourism
expenditure:
Recommendations on the choice of independent variables, functional form and
estimation technique. Tourism Economics, 20(1), 51–60.
Thrane, C, & Farstad, E. (2011). Domestic tourism expenditures: The non-linear
effects of length of stay and travel party size. Tourism Management, 32(1), 46–
52.
9
Uematsu, H., Khanal, A., & Mishra, A., (2013). The impact of natural amenity on
farmland values: A quantile regression approach. Land Use Policy, 33, 151-160.
Wang, Y., & Davidson, M. C. (2010). A review of micro-analyses of tourist
expenditure. Current Issues in Tourism, 13(6), 507–524.
Zhang, L., & Leonard, T. (2014). Neighborhood impact of foreclosure: A quantile
regression approach, Journal of Regional Science and Urban Economics, 48,
133-143.
10
Table 1: Measurement of variables and expected signs
Variables used
Expected
Sign
Dependent variable
Travel expenditure
Explanatory variables
Socio-demographic variables
Income (7 levels of income)
Age (7 levels of age)
Gender (dummy variable; 1 if male; 0 if female)
Education (7 levels)
Portuguese national (dummy variable; 1 if Portuguese national;
0 otherwise)
German national (dummy variable)
Dutch national (dummy variable)
Travel-related variables
Length of stay
Activities
Travelling with a low-cost company (dummy variable)
Travelling in a family (dummy variable)
First visit (dummy variable)
11
+
+
+/+
+
+
+
+
+
+
Table 2: Model estimation results
Log
Expenditure
Total
OLS
25th quantile
50th quantile
75th quantile
Coef.
St. Er.
Coef.
St. Er.
Coef.
St. Er.
Coef.
St. Er.
Length of stay
0.132
0.013**
0.105
0.015**
0.109
0.011**
0.113
0.014**
Age
0.015
0.039
0.028
0.038
0.011
0.035
0.014
0.053
Gender
0.162
0.097+
0.163
0.098+
0.193
0.086*
0.115
0.131
Income
0.067
0.036+
0.079
0.033*
0.110
0.032**
0.065
0.052
Education
0.021
0.030
0.040
0.032
0.003
0.027
0.021
0.042
First visit
0.260
0.084**
0.092
0.063
0.233
0.076
0.273
0.134*
Low cost
-0.213
0.109
-0.280
0.116*
-0.001
0.097
-0.136
0.149
Travelling in a family
0.290
0.114*
0.235
0.114*
0.282
0.099**
0.408
0.144**
Portuguese national
-0.367
0.166*
-0.832
0.171**
-0.592
0.149**
-0.320
0.224
German national
-0.262
0.129*
-0.100
0.125
-0.306
0.115**
-0.354
0.186**
Dutch national
0.055
0.143
0.146
0.142
0.020
0.128
-0.043
0.189
Activities pursued
-0.042
0.027
-0.085
0.027**
-0.066
0.024**
0.014
0.035
Constant
4.024
0.308
4.063
0.318
4.230
0.272
4.412
0.399
R2
0.599
Pseudo R2
0.4174
0.3612
0.3344
+significant at the 90% level, *significant at the 95% level, **significant at the
99% level
12
Dutch
-1.00
-0.50
0.000.501.001.50
.8
Fig.1e
1
0
.2
.6
.4
Quantile
.8
1
0
.2
.6
.4
Quantile
.8
1
0
0
.2
Fig.1i
.2
.6
.4
Quantile
.6
.4
Quantile
.6
.4
Quantile
.8
.8
.8
1
Fig.1j
1
0
1
0
0
.2
Fig.1f
.2
.2
.6
.4
Quantile
Fig.1g
.6
.4
Quantile
.6
.4
Quantile
-0.50
0.000.501.00
.2
FirstVisit
0
-0.40-0.20
0.000.20
Activities
1
German
.6
.4
Quantile
.8
0.000.100.200.30
LengthofStay
1.00
2.00
3.00
4.00
5.00
6.00
Intercept
Fig.1c
.8
.8
Fig.1l
.8
1
-0.50
0.00
0.50
1.00
1.50
Gender
-0.40
-0.20
Age
0.000.20
0.40
Fig.1b
-1.00-0.50
0.000.501.00
.2
.6
.4
Quantile
-2.00-1.00
0.001.00
0
.2
Portuguese
0
-0.40
-0.20
0.00
0.200.40
Income
-0.20
-0.10
0.00
0.10
0.20
0.30
Academic
Fig.1a
-1.00
-0.50
0.00
0.50
1.00
Family
-2.00-1.00
0.001.00
LowCostCarrier
Figure 1: Quantile regression versus OLS regression
1
1
Fig.1d
0
.2
.6
.4
Quantile
.8
1
Fig.1h
0
.2
.6
.4
Quantile
.8
1
0
.2
.6
.4
Quantile
.8
1