Identifying Optimal High-Risk Driver Segments for Safety Messaging

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
Identifying Optimal High-Risk Driver Segments for Safety Messaging:
A Geodemographic Modeling Approach
George Eguakun, M.B.A., Ph.D. Candidate
Manager
Traffic Safety Program Evaluation
Saskatchewan Government Insurance (SGI)
Regina Operations Center
5104 Donnelly Crescent, Regina, Saskatchewan, Canada, S4X 4C9
Tel: 1 306 775 6274; Fax, 1 306 352 3154
Email: [email protected]
Peter Y. Park, Ph.D., P.Eng. (Corresponding Author)
Associate Professor
Department of Civil and Geological Engineering
University of Saskatchewan
57 Campus Drive, Saskatoon, SK, Canada, S7N 5A9
Tel: 1 306 966 1314; Fax, 1 306 966 5427
Email: [email protected]
Kwei Quaye, Ph.D. MA (Econ), P.Eng.
Assistant Vice President
Traffic Safety Services and Driver Development
Saskatchewan Government Insurance (SGI)
Head Office
th
2260 11 Avenue, Regina, Saskatchewan, Canada, S4P 0J9
Tel: 1 306 775 6182; Fax, 1 306 352 3154
Email: [email protected]
Word Count: Abstract (230) + Main Body (4,767) + Figures (2,000) = 6,997 words
Presented at the 94th Annual Meeting of Transportation Research Board,
January 11 – 15, 2015
Eguakun, Park, Quaye
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
Abstract
Given the public safety risk posed by high-risk drivers, most traffic safety agencies consider this
group a key target for strategic planning purposes. The aim of this research is to develop a
framework that can be used to efficiently and effectively target high-risk drivers. The specific
objectives are to establish whether high-risk drivers are homogenous, and if not, to determine the
optimal set of primary and secondary clusters for efficient and effective targeting with minimal
resources. The study area is Saskatchewan, Canada. Multiple databases (including traffic
collisions, insurance claims and conviction data) formed the basis for the research. In this study,
HRDs are defined as all drivers who are both enrolled in Saskatchewan Government Insurance’s
(SGI) Driver Improvement Program and in the negative or penalty zone of SGI’s Safety Driver
Rating scale as a result of accumulated demerit points. Geodemographic modeling, using the
neighbourhood as the unit of analysis, a large number of variables, and a set of probabilistic
clustering techniques, was used in the analysis. The results indicate that the high-risk driver
group is heterogeneous, falling into sub-clusters with varying collision and traffic behaviour
profiles. The study found that Saskatchewan, high-risk drivers are mainly in the major cities
(56%), rural municipalities (18%) and towns (15%). The optimal primary high-risk segments for
efficient targeting are those major cities and towns where both the risk of collision involvement
and the concentration of high-risk drivers are higher than the driver population. Drivers in the
primary target area for messaging show higher levels of distracted, impaired and aggressive
driving behaviours, driver-inexperience, extreme fatigue, falling asleep behind the wheel, and
inattention.
2
Eguakun, Park, Quaye
70
1.
Introduction
71
1.1
Problem statement
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
This paper uses a geodemographic modeling approach to identify the most appropriate high-risk
driver segments for safety messages aimed at high-risk drivers (HRDs). By identifying the most
appropriate segments of HRDs and then using appropriate messages targeted at these groups,
changes in driving behaviour, reductions in the number of collisions, and the use of resources
should also be maximized. While HRDs constitute a relatively small percentage of the driving
population, they are believed to account for a significant proportion of deaths and injuries in
collisions. Research conducted in Canada (1) indicates that only three to four percent of the
driver population is HRDs, but about 12 percent of fatalities and eight percent of injuries involve
HRDs. In New Zealand, it has been estimated that about 33 percent of at-fault collisions are due
to HRDs (2).
The percentage of HRD involvement in collisions may vary across jurisdictions
depending on the way HRDs are defined in each jurisdiction. For example, McSaveney and
Jones (2) defined HRDs as drivers who have a history of dangerous and reckless driving.
Dangerous and reckless driving included disqualified driving, unlicensed driving, involvement in
illegal street car racing, repeat drink/drug driving, high Blood Alcohol Content (BAC) offenders,
repeat speeding offenders, and high-level speed offenders. The Traffic Injury Research
Foundation (1) defined HRDs as suspended/prohibited drivers and repeat traffic offenders with a
pattern of illegal driving behaviours (e.g., drivers with recurring incidences of alcohol/drug
impaired violations, traffic violations, and collision involvement).
However HRDs are defined, it is fair to say that there is a consensus amongst traffic
safety agencies/jurisdictions around the world that HRDs pose serious public safety issues, and
thus it is not surprising that most traffic safety agencies select HRDs as a target safety area in
plans designed to reduce collision deaths and injuries (3, 4,). For example, Canada’s national
Road Safety Strategy (RSS) 2015 (for the five-year period 2011 to 2015) identified HRDs as a
key target area of safety concern. The presumed rationale is that high-risk driving behaviour
often leads to increased risk of collision involvement (5). RSS 2015 commits each of the
Canadian provinces and territories to developing appropriate methods that can effectively and
efficiently identify and deliver appropriate traffic safety messages to each group of HRDs. This
approach entails drawing on every possible source of information to answer pertinent questions
about who, where, and what risk profiles comprise HRD drivers.
One of the easiest and most common ways of identifying HRDs and categorizing them
into various segments is to develop multiple HRD groups based primarily on a few selected
driver characteristics. For instance, young drivers, impaired drivers, repeat traffic offenders and
collision involved drivers can each be viewed as a unique HRD group (6, 7). A problem with this
simple approach is that it treats each HRD segment (e.g. young drivers, impaired drivers, repeat
speeding violators, etc.) as a unique group although there may well be overlaps between the
groups, (e.g., a young and impaired drivers group). It is certainly reasonable to assume that a
young driver who is a repeat traffic offender, but who has no impaired driving history, would
have a different risk profile from a young and impaired driver who does not engage in repeat
traffic violations.
The way in which groups are defined can be important in the efficient and effective use
of financial resources invested in safety messages designed to reduce the number of collisions.
The lack of a clearly distinct boundary between groups invariably leads to difficulties in
3
Eguakun, Park, Quaye
115
116
117
targeting and formatting safety messages. The aim of this research is to develop a framework that
traffic safety professionals can use to efficiently and effectively segment and target high-risk
drivers.
118
1.2
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
Effective segmentation (i.e., identification) of consumers is a key consideration in, for example,
the social marketing industry. To overcome the overlapping syndrome and other problem issues
when targeting products, social marketers use an advanced segmentation technique known as
geodemographic modeling (8, 9). The technique has been described as the most cost-effective
technique available for determining and segmenting target groups for disseminating public
policy statements (10, 11).
The modeling effort requires: 1) collecting massive amounts of data on consumer
characteristics and purchasing behaviour, 2) constructing statistical models of consumer identity,
and 3) mapping and analyzing distributions of consumers’ market-related activities (12, 13). The
marketing industry typically includes demographic data (age, gender, and income), geographic
data (rural, urban) and psychographic data (lifestyle) in their models. These data are integrated
and analyzed using advanced data management technologies, statistical models, and
geographical information systems (14) to understand consumers’ purchasing activities and
related characteristics, for example, where consumers live and what they choose to buy. The
approach assumes that like-minded consumers tend to “cluster together” in spatially proximate
neighbourhoods. Heitgerd and Lee (15) applied a geodemographic model to conduct public
health risk assessments. They wanted to identify risky clusters in order to conduct efficient
targeting of health education activities. They used the neighbourhood as a geospatial unit of
analysis to establish national priority sites.
Although geodemographic modeling is widely used in areas such as social marketing it
has been little used in traffic safety (16). Cambois and Fontaine’s (17) study can be regarded as
an early attempt to classify HRD groups by defining unique segments that are mutually exclusive
from each other. The study used traffic collision data and other driver information, such as
aggressive driving, speeding, red-light running, and driving impaired, as the basis for
segmentation. Blatt and Furman (18) used a geodemographic modeling approach to investigate
whether collisions in rural areas involved urban dwellers or rural dwellers. They concluded that
most rural collisions involved residents living in rural areas and small towns. Shankar and
Warkell (19) applied a geodemographic modeling approach to analyze fatal motorcycle
collisions. They then identified which safety message targets and specific media channels were
most appropriate to each segment of road users involved in the fatal motorcycle collisions.
Anderson (20) applied geodemographic modeling to determine drivers’ injury risk in London,
UK. Anderson found distinct spatial and statistical patterns in certain groups of drivers who were
more likely to be at risk of being involved in a collision. An important area which is largely
unexplored in the few studies identified above is the homogeneity or lack of homogeneity of the
HRD group using the neighbourhood as a unit of analysis.
154
3.
155
156
157
158
159
The goal of this research is to develop a framework that traffic safety professionals can use for
the efficient and effective segmentation and targeting of high-risk drivers. The specific
objectives are 1) to establish whether HRDs are homogenous, and if not, to identify
neighbourhood clusters in which, they dominate; and 2) to determine the optimal set of primary
clusters that could be reached with minimal resources. In this study, HRDs are defined as all
Literature review
Study objectives and scope
4
Eguakun, Park, Quaye
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
drivers who are both enrolled in Saskatchewan Government Insurance’s (SGI) Driver
Improvement Program and in the negative or penalty zone of SGI’s Safety Driver Rating scale as
a result of accumulated demerit points. The Driver Improvement Program is a progressive
sanctions program for problem drivers who exceed nine demerit points. The Safety Driver Rating
Scale defines a driver’s safety position on a scale from -10 to 19 as part of a Safety Recognition
Program. The Safety Recognition Program uses the safety rating scale to determine driver safety
points. It consists of four zones: the negative or penalty zone (-10 to <0), neutral zone (0)
vehicle insurance discount zone (>0 to 12), and platinum customers (>12 to 19). A rating of 0 or
greater is an indication of safe driving. A driver in this zone could be in a discount position while
a driver in the negative zone attracts a financial penalty for each sliding point in that zone. For
example, suppose a driver is at 2 in the Safety Zone. She is convicted of running a stop sign and
loses 4 points. This moves her to -2 points in the Penalty zone. She is immediately assessed a
one-time financial penalty of $50 for the incident. Such a driver will be included in our high-risk
driver group.
The study considers high-risk driver groups on the basis of driving experience. It is expected that
analyzing HRDs separately by taking into account their driving experience, driver behaviour,
collision risk and geodemographic profiles will allow us to identify true high-risk drivers. It
should also enable us exclude drivers whose high-risk driving behaviour, e.g., excessive
speeding, is due to a lack of driving experience rather than a disposition towards high-risk
driving.
180
4.
181
182
183
184
185
186
187
188
189
190
191
The efficient segmentation of the HRD group required bringing together a number of datasets
from SGI: 1) traffic collision database (2006-2011), 2) vehicle characteristics database, 3)
problem driver database from Driver Improvement and Safe Driver Recognition programs. Other
database includes the most recent census data from Statistics Canada, and the convictions
database from the Saskatchewan Department of Justice. The traffic collision database included
collision characteristics, driver and vehicle occupant information. The census data included
updated postal codes, and the convictions database included summary offence tickets and
criminal code convictions. A total of 30,453 high-risk drivers who were in the penalty zone of
the safety rating scale were extracted as the subjects in this study.
The datasets were prepared and classified into geodemographic, risk profile and traffic
behaviour categories using the following steps:
192
193
194
195
196
197
198
199
200
201
202
203
204
1. To create the geodemographic category, high-risk drivers were initially assigned random
codes and aggregated into groups using postal codes. For privacy purposes, the only personal
data collected were gender and age. The HRD group dataset was then linked to the most
recent census data using postal codes as unique identifiers. This allowed the subsequent
extraction of geodemographic variables for each low level neighbourhood.
2. Risk profiles were developed using, age and gender variables which have been found to
significantly influence collision risk (21-28). In this study, we estimated the probability of
involvement in a collision when the driver was a male or female. For age, we estimated the
probability of involvement in a collision when the driver was less than 25 years (young), over
65 years (elderly), and between 25 and 65 years (other). The probabilities, which were used
as inputs into the subsequent cluster analyses, were estimated using logistic regression
models similar to those described in Guo and Fang (29). The data elements used in the
estimation process were derived by merging the aggregated codes with the traffic collision
Data Sources and Preparation
5
6
Eguakun, Park, Quaye
205
206
207
208
database. Table 1 shows the two main categories (geodemographic and collisions risk) and
the variables for each category.
TABLE 1 Main Categories and Aggregated Variables used in the Cluster Analysis
Main Category
Geodemographic
Collision Risk Profile
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
Aggregated Variable
Variable Description
%_gender
%_age
Postal_Code
Dwellings
Persons_per_family
Aboriginal_popn
Employment Status
Education_level
Average_Income
Percent male Drivers
Percent young Drivers
Neighbourhood unique identifier
Housing units in Neighbourhood
Count of family members in dwelling
Number of Aboriginals
Employed versus unemployed
Level of Education—5 levels
Average neighbourhood income
Gender_Prob
Young_driverProb
Elderly_driverProb
Other (Reference)
Prob(Male|Collision)
Prob(Driver<25 years|Collision)
Prob(Driver>65 years|Collision)
Prob(25<=Driver<=65 years|Collision)
3. As many variables in the collision and summary offence datasets might describe traffic
behaviour, it was necessary to aggregate variables that measure common underlying
constructs. For example, human condition, as a major contributing factor associated with
collisions, includes driver inattention, inexperience, distraction, driving while impaired, had
been drinking, falling asleep, being fatigued, losing consciousness, and many other variables.
Using factor analysis techniques, a human condition index was created that comprised of four
underlying constructs. Six other indices were similarly developed covering constructs
describing issues such as following too closely, impaired driving, and use of seat belts. The
seven indices are presented in Table 2. The factor analysis procedure used to develop the
underlying constructs in this study was similar to that described by Saccamonno and Lai
(30), with the constructs being prime candidates for modeling purposes as they are deemed to
be non-collinear.
7
Eguakun, Park, Quaye
224
225
TABLE 2 Traffic Behaviour Indices derived from Factor Analysis Procedure
Behaviour Index
Human Condition Index
(HC)
Human Action Index (HA)
Impaired Driver Index (ID)
Aggressive Driver Index
Distracted Driving Index
Medical Index
Seat Belt Use Index
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
5.
Orthogonal
Factors
(Constructs)
HC1
HC2
HC3
HC4
HA1
HA2
HA3
HA4
ID1
AD
DD
MI
SBI
Description of Variable Components
Driver inexperience or confused state
Medical disability, defective vision or hearing
Extreme fatigue,falling asleep
Losing consciousness or sickness
Following too closely
Disregard for traffic control device
Backing unsafely
Taking evasive action such as hard braking
Ability impaired by alcohol or drugs,
Speeding, running redlights, tailgating, weaving in
and out of traffic, failing to yield the right of way.
Inattention, distracted in or out of vehicle
Medically Risk
Seat Belt Use rate
Modeling Procedure
The modeling procedure used in this study was based on two analyses: cluster analysis and the
creation of a collision risk index and a high-risk driver penetration index.
5.1 Cluster Analysis
The first part of the cluster analysis was designed to investigate whether we should treat the
high-risk group as a homogenous group or whether we should treat it as a number of sub-units.
We tested the hypothesis that the high-risk driver group is homogenous and therefore cannot be
clustered. To test the hypothesis, the main categories and the traffic behaviour indices presented
in Tables 1 and 2 were used as primary inputs for a cluster analysis, and the number of clusters
was pre-specified. The clustering procedure used for this part of the analysis has been widely
used in traffic safety research (29, 30, and 31).
The choice of clustering procedure was guided by the need to develop a set of high-risk
driver clusters with minimal overlap of attributes while avoiding outliers in the dataset creating
small clusters consisting of only a few observations. The cluster analysis procedure selected
allows for these two principles to be addressed adequately. The procedure uses the K-means
model to develop cluster centroids that are far apart from each other. It assigns an observation to
an initial cluster, with the closest value to the cluster centroid, while minimizing the least squares
sum within clusters. This iteration is repeated until all high-risk drivers are assigned and the
clusters are replaced by the cluster means or centroids. The k-means clustering procedure used in
this study is considered suitable for large datasets, as is the case in this study with 30,453
observations.
5.2
Collision Risk Index and High-Risk Drivers Penetration Index
8
Eguakun, Park, Quaye
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
Two indices were created to refine the main clusters and the sub-clusters: the collision risk index
and the high-risk driver penetration index.
The collision risk index (CI) was estimated by dividing the number of collisions in a
cluster by the total number of collisions, and dividing this number by the number of drivers in
the cluster divided by the total number of drivers. The number of drivers was defined as the
number of licensed drivers. The CI for each cluster was estimated as follows:
280
using the centroid for all cluster PIs, determined as: in , where n is the total number of final
clusters to be identified. A high penetration cluster means that messages targeted at that cluster
would reach more high-risk drivers than messages targeted at a low penetration cluster. The two
indices are discussed further in Section 6.3.
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
Cluster Collision Index =
(CI)
Cluster Collisions/Total Collisions
Cluster Driving Population/Total Driving Population
(1)
Equation 1 indicates whether a cluster is overrepresented (Collision Index >1) or
underrepresented (Collision Index <1) in the number of collisions. The Equation treats the cluster
as an entity and provides an indication of the collision risk relative to the Saskatchewan driving
population.
The high-risk drivers penetration index (PI) was estimated by dividing the number of
high-risk drivers in a cluster by the total number of high-risk drivers, and dividing this number
by the population of the cluster divided by the total population. The PI for each cluster was
estimated as follows:
Cluster Penetration Index = High-Risk Drivers in Cluster/Total High Risk Drivers
(PI)
Cluster Population/Total Population
(2)
The Penetration Index provides an indication of whether high-risk drivers are overrepresented
(PI>1) or underrepresented (PI<1) in any given cluster when compared with the general driver
population profile. Unlike the collision index, which relies on collision incidents, the penetration
index (PI) is a marketing term that signifies the degree to which traffic safety messages can reach
a target group—equivalent to the market penetration rate. A PI, as computed in the study, is a
normalized ratio that can be skewed in either direction. Thus, the threshold could be determined
∑n PI
6
RESULTS
6.1
Clusters and Sub-Clusters
Table 3 presents the results of the cluster analysis for HRDs in Saskatchewan. The cluster
analysis suggests that HRDs can be segmented into a number of main and sub-clusters using
postal codes as the unit of analysis. The main (high-level postal codes aggregated using census
subdivisions) clusters are cities (accounting for 56% of all HRDs by population), followed by
rural municipalities (18%), and towns (15%).
Within each main cluster, the cluster analysis found three to five sub-clusters. Cluster 13,
is the dominant sub-cluster for cities and includes Regina, Saskatoon, Moose Jaw, Humbolt and
Swift Current. Sub-cluster 33 dominates the villages (and includes Holdfest, Avonlea and
Medstead), and sub-cluster 25 dominates the towns (and includes Shaunavon, Milestone and
Southey). In the rural municipalities, most HRDs are in sub-clusters 41, 43 and 45 (which
Eguakun, Park, Quaye
298
299
300
include Garry, Tisdale and Wilton); on the Indian Reserves, most HRDs are in sub-cluster 55
(which includes Buffalo Rever Dene, Wapachewunak and Red Pheasant).
TABLE 3 Derived Neighborhood High Risk Clusters by
Location
Sub cluster
% of
Main Cluster
Prefix
Identity
Population
Total
11
24,310
13
345,025
56%
City
1
14
99,355
15
93,320
22
34,895
Town
2
23
28,265
15%
24
9,725
25
66,170
31
1,425
32
700
Village
3
33
36,155
4%
34
2,765
35
1,925
41
50,495
Rural
Municipality
4
43
41,690
18%
45
83,625
51
2,165
Indian Reserves
5
52
6,765
5%
53
11,670
55
27,150
62
1,775
Others
6
63
10,575
2%
65
4,940
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
Table 4 shows the number of drivers in each of the five main sub-clusters, the percentage
who are male, and the dominant age category (relative to the total driving population). The Table
includes a Traffic Safety Behaviour Index composed of seven constructs. These constructs are
described in Table 2. Each traffic safety behaviour construct is assigned points according to the
correlation coefficient between the construct and the sub-cluster. If the correlation coefficient is
significant and high, indicating a high level of undesirable traffic behaviour, the behaviour
construct is assigned 3 points. A moderate level of undesirable behaviour is assigned 2 points,
and a low level is assigned 1 point. Each sub-cluster is assigned a total of 7 to 21 points. A high
total score is defined as 16.5 to 21 points; a moderate total score is defined as 11.8 to 16.4 points;
and a low total score is defined is defined as 7 to 11.7 points.
Table 4 indicates that the high-risk sub-clusters differ for the percentage of males, the
dominant age category, and the traffic behaviours. The Indian Reserve sub-cluster 55 has the
worst score for traffic behaviours (21) followed by the main sub-cluster for towns (19), and the
main sub-cluster for rural municipalities (15). Both the Indian Reserve and towns sub-clusters
9
Eguakun, Park, Quaye
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
are dominated by young HRDs (aged 15 to 34). The traffic behaviours results for the percentage
of HRDs who are male are less clear cut.
TABLE 4 Main Sub-Clusters showing Characteristics and Traffic Safety
Behaviour Index
6.2 Spatial Representation of Clusters and Sub-Clusters
Figures 1 and 2 use ArcGIS software to show the location of the sub-clusters. Figure 1 shows
the two major cities (Regina and Saskatoon). Figure 2 shows the non-major cities and also the
towns and villages. The towns and villages are combined. The maps were created by layering the
final clusters over shape files for Saskatchewan.
The Figures show that a sub-cluster of HRDs is not entirely dependent on the location.
Sub-clusters 11 and 14, for example, bring together Saskatoon and Regina HRDs who share
similar characteristics, and sub-clusters 13 and Cluster 15 bring together Saskatoon and Regina
HRDs who share similar characteristics with non-major city HRDs.
6.3 Target Clusters Behaviour
Identification of the main clusters, sub-clusters and their locations does not determine which
sub-clusters are prime candidates for targeting. The collision risk index (CI) and high-risk drivers
penetration index (PI) introduced in Section 5.2 were used to create the perceptual map shown in
Figure 3. With respect to the cluster penetration index (PI), the centroid stabilizes at a PI of 1.2.
Thus, a penetration index less than 1 is generally considered to be low (PI<1) while a PI between
1 and 1.2 is considered high. It is higher when the ratio falls between 1.2 and 1.50, while a PI of
greater than 1.5 is considered extremely high. The threshold at 1.2 ensures those clusters with
higher to extreme penetration are considered as primary targets to enhance penetration
effectiveness. . The quadrant with a high CI and a high PI should contain the most important
sub-clusters. In Figure 3, the Primary Target quadrant identifies these sub-clusters. Secondary
clusters were defined as those lying in the quadrant with a high CI, but a low PI. In Figure 3, the
Secondary Target quadrant identifies these sub-clusters.
Figure 3 shows that six sub-clusters (11, 14, 15, 22, 23 and 24) are prime candidates for
targeting HRDs with safety messages. These sub-clusters represent about 29% of the high-risk
10
Eguakun, Park, Quaye
349
350
351
352
353
driver population and, are associated with the major cities and the towns: Regina, Saskatoon
(Cluster 11); Martensville (Cluster 14); Moose Jaw and Prince Albert (Cluster 15); Vonda and
Liberty (Cluster 22); Invermay, Weldon, and Norquaye (Cluster 23); and Canola, Kyle, and
Herbert (Cluster 24).
354
355
356
357
358
359
360
FIGURE 1 Geospatial Presentation of High-Risk Sub-Clusters in Saskatoon
and Regina
11
Eguakun, Park, Quaye
361
362
363
364
365
366
367
368
FIGURE 2 Geospatial Presentation of High-Risk Sub-Clusters in Towns, Villages
and Non-Major Cities in Saskatchewan
12
13
Eguakun, Park, Quaye
369
370
Collision
Index
P
Secondary
Target
Primary
Target
Penetration
13, 25, 32, 33
11, 14, 15, 22, 24, 23
High Risk Driver Penetration
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
FIGURE 3 Perceptual Map showing Collision Index and Driver Penetration
Index Values and Primary and Secondary Quadrants for Sub-Clusters of
High-Risk Drivers
Table 5 shows the geodemographic variables and traffic safety behaviour index for the
six primary target clusters behaviour. The Table shows that the dominant age category and
behaviour score of the primary clusters vary. Clusters 11(found primarily in Regina and
Saskatoon) and 14 (mainly in Saskatoon and Martensville) have behaviour scores of 21 and 17
respectively, an indication that they should be given priority attention when selecting target
clusters.
City based sub-clusters had particularly high levels of distracted and aggressive driving
behaviours, especially when compared with sub-clusters for towns. City based sub-clusters were
also more likely to be associated with driver-inexperience, extreme fatigue, falling asleep behind
the wheel, inattention, driving too fast for road conditions, exceeding the speed limit, and
following too closely. Sub-clusters associated with towns were more likely to have medically atrisk drivers due to the higher proportion of seniors in those clusters.
Although sub-clusters 11 and 14 both scored high on the behaviour index, Figure 3
shows that the risk of collision involvement was significantly higher for Cluster 11, probably due
to a greater proportion of younger drivers in sub-cluster 11 than in sub-cluster 14.
Eguakun, Park, Quaye
392
393
394
395
396
397
398
399
400
HRDs in sub-cluster 15 differed from sub-cluster 11 and 14 as they were less likely to
engage in impaired driving, be medically at risk, or fail to use seat belts.
Town sub-clusters 22, 23 and 24 were appropriately ranked on the basis of their traffic
safety behavioural score.
TABLE 5 Demographic Characteristics and Traffic Safety Behaviour Index Scores
for Primary Target Sub-Clusters
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
7.
DISCUSSION/CONCLUSION
Our research links geodemographic attributes of drivers, driver experience, risk profiles and
collisions data to gain insight into segmenting high risk road users. The study provides a
framework for identifying optimal high-risk driver segments through geodemographic modeling.
Like consumer market segments, traffic safety segments can be defined by certain attributes. The
study uses a number of geodemographic and traffic safety behaviour variables to define clusters
of drivers whose high risk behaviours should be targeted in safety messaging programs. The
geodemographic attributes include gender, age, income, postal code, etc.), and the traffic safety
behaviour attributes include following too closely, impaired driving, use of seat belts, etc.
We combine the geodemographic and traffic safety behaviour attributes to develop main
sub-clusters and main cluster categories of high risk drivers. The sub-clusters are further divided
into primary and secondary targets for traffic safety messaging. The perceptual map shows that
there is a linear relationship between the cluster collision index and the penetration index, an
indication of the extent to which the developed clusters truly reflect high-risk behavior of the
primary and secondary targets. For example, the results indicate that that the dominant high-risk
clusters are not necessarily the primary targets. Cluster 13, from Table 4 does not appear as a
primary target on the perceptual map, although it is identified as a dominant cluster for high-risk
drivers. This could be attributed to the fact that this cluster ranks low on overall traffic safety
behavior, collisions and violations, probably due to the dominant age being 65+. On the other
14
Eguakun, Park, Quaye
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
hand, cluster 11, which is associated with the city category, ranks high because it is dominated
by a younger 25-54 year old high-risk drivers. It is apparent that the clusters and primary targets
portray the influence of age on collision involvement and traffic behavior.
The approach used in this study has the potential to minimize the overlapping syndrome
of segmenting high-risk drivers. For example, when sub-cluster 11 high-risk drivers are targeted,
the message can target a number of traffic safety behaviours including aggressive driving and
driver distraction while at the same time focusing on a particular location. This is consistent with
the view that like-minded people tend to cluster together in similar neighbourhoods.
The limitations of the study must also be considered. The approach presented in this
study overcomes the limitation of not being able to reach high-risk drivers who have not come to
the attention of law enforcement officers. The way we estimate risk does not allow the inclusion
of the expected loss of an outcome: we estimate risk loosely risk by examining the number of
traffic-collision involvements in a given cluster relative to the corresponding exposure value (for
licensed drivers or population), but risk is a probability concept which must be applied to
determine the expected loss of an outcome. The lack of consideration for determining a high-risk
driver cluster’s probability of future involvement in a collision is also a weakness. We are also
aware that geographic information is not perfect and therefore cannot be regarded as the absolute
accurate reflection of reality (33).
The limitations notwithstanding, this study shows that high-risk drivers cannot be
considered homogenous as the group includes clear sub-clusters with varying collision risks,
geodemographic attributes and traffic behaviour profiles. Different sub-clusters should be
targeted differently, and it is important to identify primary and secondary targets, especially
when resources are limited.
In the case of Saskatchewan, the geodemographic modeling techniques used in this study
have identified high-risk drivers are found mainly in the major cities (56%), rural municipalities
(18%) and towns (15 %).
8.
FUTURE WORK AND PRACTICAL APPLICATIONS
This study provides traffic safety stakeholders responsible for highways and infrastructure,
transportation, city roads, and insurance, an innovative intelligence-based resource for targeting
the most relevant HRD groups, and thereby maximizing resources. For the approach to be
successful, transportation professionals need quality data from a variety of sources. Defining the
high-risk driver clearly at the beginning of will narrow the focus for successful identification of
optimal primary and secondary sub-clusters for targeting purposes.
Traditionally, mass media target the whole of the HRD group regardless of the withincluster risk variance, making the targeting inefficient. This study enables transportation
professionals to focus on the main clusters of high risk i.e., those drivers who generate
considerable negative incidents from undesirable traffic behaviours. For example, Clusters 11
and 14 rank very high on the traffic safety behaviour index used in this study, and are prime
targets for messaging. The findings presented in this paper suggest that we should target city 2054 year olds with messages on aggressive driving, impaired driving and distracted driving. In the
towns, scores on the traffic safety behaviour index are mostly lower than in the cities. If
resources are available for messaging in towns, we should target 55-64 year olds in Cluster 24.
This should be the case in spite of the fact that Cluster 25, with a higher proportion of females
aged 35-44, dominate the neighbourhood.
15
Eguakun, Park, Quaye
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
It would be interesting for future work to investigate which distinct geodemographic
high-risk clusters are likely to be involved in future collisions, and at what loss outcome, as
opposed to focusing on where in the road network future collisions are likely to occur. Such
work would consider and predict likely variations in high-risk behaviour across different
neighbourhoods and would further refine our ability to undertake focused traffic safety targeting.
An important area that was largely unexplored in our study is the extent to which
individual high-risk drivers contribute to the collective group at different hierarchical levels. It is
our view such a study could be accomplished through the use of robust statistical techniques that
employ model building and hypothesis testing within a theoretical framework. It would be also
interesting to explore various clustering methods to see if different methods produce visibly
different results.
Future work could also employ psychographic and social values data to further define the
target cluster profiles. An economic appraisal of the scientific risk potential of the high-risk
drivers would provide estimates of the costs of high risk drivers in dollar terms and would be
useful in an analysis of the return on investment from campaigns that target high-risk drivers
with traffic safety messages.
16
Eguakun, Park, Quaye
486
REFERENCES
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
1. Traffic Injury Research Foundation. Study of the Profile of High-Risk Drivers. TIRF.
Ottawa, ON. 1995.
2. Mcsaveney J., And Jones W. High-Risk Drivers: An Exercise in Collision Data Analysis
with What Was at Hand. Australasian Transport Research Forum 2011 Proceedings 28 - 30
September 2011, Adelaide, Australia Publication: http://www.patrec.org/atrf.aspx. Accessed
Dec 20, 2012.
3. NZ Ministry Of Transportation. Safer Journeys: New Zealand’s Road Safety Strategy 2010 –
2020. New Zealand Ministry of Transport.
www.transport.govt.nz/saferjourneys/documents/saferjourneystrategy.pdf. Accessed Oct 5,
2012
4. US DOT Speed Management Strategic Initiative. FHWA, FMCSA, NHTSA, DOT HS 809
925. 2005.
5. Canadian Council of Motor Transport Administrators (CCMTA). Road Safety Strategy 20112015. http://www.ccmta.ca/crss-2015/_files/road_safety_strategy_2015.pdf Accessed Jan 25,
2013.
6. Alberta Ministry of Transportation. Alberta Safety Plan.
www.transportation.alberta.ca/content/doctype48/production/traffic safetyplan.pdf. Accessed
Nov. 1, 2012.
7. Canadian Council of Motor Transport Administrator. Strategy to Deal with the High-Risk
Driver. High-Risk Driver (HRD) Task Force. June 2001. Ottawa, ON.
http://www.ccmta.ca/english/committees/rsrp/highrisk/pdf/hr_hrd_strategy.pdf. Accessed
Nov 6, 2012.
8. Walsh, D., Chapman, R.E. Rudd, B.A. Social Marketing for Public Health, Health Affairs,
Summer, 1993. Pp. 104-19.
9. Ott C.H and Haertlein C. Social Norms Marketing: A Prevention Strategy to Decrease HighRisk Drinking Among College Students. Nursing Clinics of North America, 37(2), 2002, pp.
351-364.
10. Ashby D, I., Longley P, A. Geocomputation, Geodemographics and Resource Allocation for
Local Policing. Transactions in GIS 9(1) 2005, pp. 53-72
11. Farr, M, Wardlow, J., Jones, C. Tackling Health Inequalities Using Geodemographics: A
Social Marketing Approach. International Journal of Marketing Research, 50(4) 2008, 449467.
12. Goss, J. Geodemographics. International Encyclopedia of the Social & Behavioural, 2009,
pp 6166-6169.
13. Birkin, M., Clarke, G.P. Geodemographics. International Encyclopedia of Human
Geography, 2009, pp. 382-389.
14. Petersen, J., Gibing, M., Longley, P., Mateos, P., Atkinson, P., Ashby, D. Geodemographics
as a Tool for Targeting Neighbourhoods in Public Health Campaigns. Journal of
Geographical Systems, 13.2: 2011, pp.173 (20).
15. Heitgerd, J.L., Lee, C.V. a New Look at Neighbourhoods near National Priorities List Sites.
Social Science & Medicine, 57(6) 2003, pp. 1117-1126.
16. Harman B. and Murphy M. The Application of Social Marketing in Reducing Road Traffic
Collisions among Young Male Drivers: An Investigation Using Physical Fear Threat
Appeals. International Journal of Business Management. July 2008.
17
Eguakun, Park, Quaye
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
www.ccsenet.org/journal/index.php/ijbm/article/download/1412/1370.Accessed Dec. 20,
2012.
17. Cambois, M.A., Fontaine, H. Surveys Measuring Risk Exposure and the Combining of
Results with Other Data. Collision Analysis & Prevention. 14(5) 1982, pp. 387-396
18. Blatt, J., Furman, S.M. Residence Location of Drivers Involved in Fatal Collisions. Collision
Analysis and Prevention, 30(6), 1998, pp. 705-11.
19. Shankar, U. And Warkell, K. Geo-Demographic Analysis of Fatal Motorcycle Collisions.
Washington, D.C.: U.S. Dept. Of Transportation, National Highway Traffic Safety
Administration, DOT HS 809 197, 2007
20. Anderson, T K. (2010). Using Geodemographic to Measure and Explain Social and
Environmental Differences In Road Traffic Collision Risk. Environment And Planning A
42(9), 2010, pp. 2186-2200.
21. Murat Karacasu, Arzu Er. An Analysis on Distribution of Traffic Faults in Accidents, Based on
Driver's Age and Gender: Eskisehir Case. Procedia - Social and Behavioural Sciences, Volume 20,
2011, Pages 776-785.
22. Elena Santamariña-Rubio, Katherine Pérez, Marta Olabarria, Ana M. Novoa. Gender differences in
road traffic injury rate using time travelled as a measure of exposure. Accident Analysis & Prevention,
Volume 65, April 2014, Pages 1-7
23. Dana Yagil. Gender and age-related differences in attitudes toward traffic laws and traffic
violations. Transportation Research Part F: Traffic Psychology and Behaviour, Volume 1,
Issue 2, December 1998, Pages 123-135
24. Dawn L. Massie, Kenneth L. Campbell, Allan F. Williams. Traffic Accident involvement
rates by driver age and gender. Accident Analysis & Prevention, Volume 27, Issue 1,
February 1995, Pages 73-87
25. Lu Ma, Xuedong Yan. Examining the nonparametric effect of drivers’ age in rear-end
accidents through an additive logistic regression model. Accident Analysis & Prevention,
Volume 67, June 2014, Pages 129-136
26. Mohamed A. Abdel-Aty, A.Essam Radwan. Modeling traffic accident occurrence and
involvement. Accident Analysis & Prevention, Volume 32, Issue 5, September 2000, Pages
633-642
27. Guangnan Zhang, Kelvin K.W. Yau, Guanghan Chen. Risk factors associated with traffic
violations and accident severity in China. Accident Analysis & Prevention, Volume 59,
October 2013, Pages 18-25.
28. Beatriz González-Iglesias, JoséAntonio Gómez-Fraguela, MªÁngeles Luengo-Martín.
Driving anger and traffic violations: Gender differences. Transportation Research F:
Traffic Psychology and Behaviour, Volume 15, Issue 4, July 2012, Pages 404-412
29. Guo, F., and Fang, Youjia. Individual Driver Risk Assessment Using Naturalistic Driving
Data. Accident Analysis and Prevention, Volume 61(2013) pp. 3-9.
30. Saccamonno, F.F. and Lai, X. A model for Evaluating Countermeasures at High-wayRailway Grade Crossings. Transportation Research Record: Journal of the Transportation
Research Board, No. 1918, Transportation Research Board of the National Academies,
Washington, D.C. 2005, pp. 18-25.
31. Donmez, B., Boyle, L.N., Lee, J.D. Differences in Off-Glances: Effects on Young Drivers’
Performance. Journal of Transportation Engineering-ASCE 136 (5), pp. 403-409.
32. Schneider, J., Kasper, B., (2003). Lifestyles, Choice of Housing and Daily Mobility: The
Lifestyle Approach in the Context of Spatial Mobility and Planning. International Social
Science Journal. 55, 2003, pp.319-332.
18
Eguakun, Park, Quaye
578
579
580
33. Duckham, M., Mason, K., Stell, J., Worbovs, M. (2001). A Formal Approach to Imperfection
in Geographic Information. Computers, Environment and Urban Systems. 25(1) 2001, pp.
89-103.
19