paper - AET Papers Repository

Evaluation of IVIS/ADAS using driving simulators
Comparing performance measures in different environments
Thomas Engen, Lone-Eirin Lervåg, Terje Moen
SINTEF Transport Research
1. BACKGROUND
Evaluation of traffic safety measures has traditionally been conducted in terms of
observational studies as before and after studies or comparative studies. Experimental studies
have traditionally been seen as somewhat difficult to conduct, however they are now
becoming more readily available.
In this paper we demonstrate how to measure change in performance due to the use of
ADAS/IVIS and unveil how we can represent real life through experiments. Results from
experiments are compared to observational studies as well as evaluation and comparison of
experiments conducted in different environments. We demonstrate how to measure change in
performance and outline how this can be utilised to conduct evaluation of ADAS/IVIS. We
show that experimental studies in driving simulators, on test tracks and in real traffic can be
valid tools, but at the same time it demands special expertise to create good experimental
design. In several different projects we have compared the results in driving simulators with
results from observational studies in real traffic, other experiments and literature reviews.
2. INTRODUCTION
2.1 Experimental studies
Experimental studies have traditionally been seen as somewhat difficult to conduct, however
they are now becoming more readily available. For studies of IVIS/ADAS the use of
experimental design is needed as we want to study the effect of IVIS/ADAS prior to mass
production and common use in vehicles. At the same time we must validate the use of
experimental methods to improve the truthfulness of the inferences we make through the
experiments.
2.2 Behavioural validation
Physical validation and behavioural validation are two main approaches to validating
simulators. Physical validation is the validation of parameters such as how the car performs as
compared to a real world car. Behavioural validation is an assessment of how the driver reacts
and performs within the virtual world of a simulation. This paper deals with behavioural
validation.
There exists different methods for behavioural validation (for example by (Kaptein, Theeuwes
et al. 1996) In this paper the validation of behaviour in the driving simulator is based on
reducing four threats to validity (Cook and Campbell 1979):
© Association for European Transport and contributors 2009
1
•
•
•
•
Statistical conclusion validity: To which degree is the use of the appropriate statistics
used to conclude whether the presumed independent and dependant variables co-vary.
The validity of inferences about the correlation (co-variation) between treatment and
outcome.
Internal validity: To which degree the result of an experiment can be attributed to the
manipulation of the independent variable rather to some other, uncontrolled variable.
Construct validity: The certainty to which a measurement device accurately measures
the theoretical construct it is designed to measure
External validity: The degree to which the results experiment can be generalized to
different persons, setting, treatment variables and measurement variables.
It is put a lot of emphasis on statically conclusion validity and internal validity in evaluation
research. The use of confidence intervals and other statically tool are used to reduce this
specific threat. Within traffic safety research methods of reducing threats like regression
effects and control of factors like traffic volume increase are used to reduce the threat of
internal validity. Construct validity are minimized because effects are traditionally measured
by accident rates. Meta –analyses are used to reduce the threat of external validity.
2.3 Measuring the influence of IVIS/ADAS
There are several different methods to measure the influence of IVIS/ADAS. One must put
special emphasise on construct validity to select performance indicators and measures that
accurately describes the effects of IVIS/ADAS. Theories both of workload influence and of
measuring primary and secondary tasks have been developed. All of these theories rely on the
possibility to measure different variables. Measurable variables can be sorted into two main
categories:
• Physiological Measures (Heart Rate Variability, Respiration Rate Variability,
Galvanic Skin Response, Muscle Tension)
• Driving performance (Lateral Control, Longitudinal control, Visual management,
Interaction with other vehicles)
Physiological measures are difficult to conduct unless you are carrying out experiments. We
have primarily focused on studies were we can compare the results of lateral control,
longitudinal control and interaction with other vehicles.
Measuring performance of driving can be done by using several methods. Registrations can
be done both by doing registrations of naturalistic tasks and by introducing new artificial
secondary tasks that is not normally handled in traffic but might indicate change in workload.
© Association for European Transport and contributors 2009
2
Naturalistic driving tasks
Artificial driving tasks
Continuous
registrations
Lane tracking
Speed
Time gap
Steering wheel reversal
Line tracking
Incident
registrations
Table 1: Driving performance indicators
Reaction time
Number of errors
Peripheral Detection Task
Further description and discussion of different Performance Indicators and Measures can be
found in a report from the FESTA projects (Kircher 2008)
2.4
Theoretical properties of driving simulators, test tracks and real traffic
evaluation
When doing evaluation one must be aware that the validity of the results can not be absolute,
but will always have some uncertainty to them. The validity of the results will “refer[ing] to
the approximate truth of an inference”(Cook and Campbell 1979). Evidence of validity may
come from other sources of information, such as from past findings and theories. Through a
literature review (Engen 2008) we have found research projects that involve validation of
driving simulators:
• Direct comparison with real life data
• A comparison of the driving simulator with physiological tests and a questionnaire
• Expert testing
• Validation compared to specific driver characteristics
• Stability over time and driver characteristics
• Driving training
In general, most of the research found that driving simulators was to some extent valid tools
for behavioural research.
Although the different tools can be valid tools for research, one must be aware that the
different methods and environments have their specific drawbacks:
• Driving simulator – lacks the possibility to produce real danger feeling, Realism might
be missing, due to low resolution of devices (Screens, audio…)
• Test track – Lack danger of interaction with other vehicles.
• Real traffic – Extreme and dangerous situation can not be tested
© Association for European Transport and contributors 2009
3
2.5 Evaluation procedure
To minimize the threat to construct validity, we have created a four step procedure for
designing the experiments for evaluation of IVIS/ADAS:
1. Describe what characteristics of the IVIS/ADAS that is to be evaluated.
2. Decide what measures and performance indicators that can be used for the evaluation.
3. Create scenarios were the performance indicators can be measured and calculated.
4. Combine several scenarios together to form a research design.
If one of the purposes of the evaluation is to compare the effects of the IVIS/ADAS with
previous research, one should seek to use the same performance indicators, scenarios and to a
large extent the same research design.
2.6 Environments used for the experiments
Experimental studies can be conducted in different environments:
• Driving simulator
• Test track
• Real traffic
The driving simulator used for the studies presented in this paper constitutes of a Renault
Scenic 1997 year model with a three-axis moving platform, a vibration system in the chassis
and a four-channel sound system. The visual representation of the road is presented on three
screens in front of the driver and two screens behind the driver, for a total of five projectors.
The three front screens are rear projected and provide in sum a 180° horizontal field of view
and 47° vertical field of view. The two screens behind the vehicle provide in sum a 90°
horizontal field of view and 47° vertical field of view.
The instrumented vehicle is a Volvo V70 2.4s. Data sources in the car can be divided into two
parts: the collection of information from standard sensors that are built into the car from the
manufacturer, and the collection of data from extra sensors specially mounted on the
instrumented car.
© Association for European Transport and contributors 2009
4
The test track used for validation of the driving simulator is a model of the real world test
track “Lånke”.
Figure 1: Test track model in real life and in the driving simulator
3. COMPARISON OF RESULT BETWEEN DRIVING SIMUALTORS
AND REAL TRAFFIC
3.1 Reaction time
The reaction time studies conducted in the driving simulator were compared to real life
measurements, previous research, and measurements of reaction time in a video-based
simulator. The reaction time in the driving simulator was measured by introducing near
collision situations. Designing near collision situations using an instrumented vehicle might
be unethical and involving test drivers in an accident are unethical. Measuring near collision
situations is therefore most practical in a driving simulator. Creating near collision situations
in a driving simulator imposes at least two serious problems:
• The near collision situation is designed by the researcher and represents that one
specific situation. It is difficult to generalize the results.
• Presenting more than one near collision situation can make the test driver aware of the
purpose of the project.
The driving simulator results were compared to several other sources:
• Literature review documenting earlier research and results about reaction times
• Measurements of reaction times at junctions in real traffic
• Previous tests in a video-based simulator
© Association for European Transport and contributors 2009
5
The reaction time found in the driving simulator varied a great deal in different situations.
This was, however, reasonable and comparable to the results from all the other sources
(Engen and Giæver 2004). In this case study, the most important threat to the validity of the
test was that subjects might learn what measurements are being used. In the case of reaction
time it is very important that each situation will be a surprise to the test subjects, but we found
that the subject’s alertness level increased after only one incident.
Mean reaction time in different traffic situation
2 .00
Reaction time (s)
Error Bars show 95.0% Cl of Mean
1 .50
Bars show Means
1 .00
0 .50
0 .00
1
2
3
4
5
6
7
8
Situation
Figure 2: Mean reaction time for different traffic situations.
3.2 Speed and lateral position
Speed and lateral position measurements of related conditions were conducted through
observational studies in real traffic and experiments in the driving simulator. Typical
Norwegian rural roads were used and speed and lateral position in several situations were
analysed.
© Association for European Transport and contributors 2009
6
Speed in the driving simulator
Road width: 8.5 m
Scenario
Lane Width = 3.25m, Shoulder width = 1.0m
Lane Width = 3.0m, Shoulder width = 1.25m
Mean Speed(km/h)
90,0
85,0
80,0
81,2
81,4
Left curve with
radius 360 m
Straight road
84,2
83,7
83,2
82,9
82,4
81,7
82,5
83,2
75,0
70,0
Right curve with
radius 1250 m
Left curve with
radius 500 m
Right curve with
radius 2500 m
Curve
Error bars: 95% CI
Figure 3: Effects of road characteristics on mean speed
The results were similar in both real traffic and the driving simulator (Giæver and Engen
2005), but there were some key differences. It should be emphasized that even though the
measurement of speed and lateral position is relatively easy, finding one real world speed that
can be compared to data from the driving simulator is not easy. The difference in statistical
mean could be just as small compared between simulator and roadside measurements as
between different roadside measurements.
The most important result from the study of speed and lateral position was that the driving
simulator results have less standard deviation than real world measurements. This is to be
expected, because real world measurements are more prone to influence from stochastic
variability. The control of confounding variables possible in a driving simulator can create
more exact results, but at the same time, there is a need for a good understanding of this
confounding variable to be able to create sound scenarios.
3.3 Time gap
Measurements of time gap were done both in the driving simulator and the instrumented
vehicle (Engen 2008). This case study was meant specially for testing the method and not for
finding the precise time gap. In the driving simulator both the mean time gap and standard
deviation was much smaller than those conducted using an instrumented vehicle. The time
gap found by using an instrumented vehicle was comparable to previous results from roadside
registrations(Giæver 1993).
© Association for European Transport and contributors 2009
7
Histogram of time gap from driv ing simulat or
Histogram of time gap from instrumented vehicle
3 0 00 0
1 5 00 0
1 000
800
Frequency
2 0 00 0
Frequency
Mean = 1.13
Std.Dev = 0.64
Max time gap used for statistics
2 5 00 0
Max time gap used for st atistics
1 200
600
1 0 00 0
400
5 00 0
200
0
Mean = 2.57
Std.Dev = 1.12
0
0,00
5 ,00
1 0,00
15 ,0 0
0,00
5,00
Time gap (s)
10,00
15,00
Time gap (s)
Figure 4: Time gap distribution.
In the case of time gap measurements in the driving simulator, the importance of
understanding confounding variables was even more evident. The driving simulator was
designed in an overly simplistic way as compared to a real world situation, which led to very
small time gaps. Similarly, lack of the ability to control both the instrumentation and the
traffic situation probably led to too large a time gap in recording as compared to a queued
situation. As was found in the speed and lateral position case study, the standard deviation of
the simulator study was much smaller than the standard deviation of the instrumented vehicle.
4. COMPARISON OF RESULT BETWEEN DRIVING SIMUALTOR
AND TEST TRACK
4.1 Speed and steering wheel movement
Comparing driving performance when influenced by alcohol in driving simulator and on test
track revealed that values of most traffic behaviour variables in simulator do not differ
considerably from corresponding values on test track. The variables tested where: response
time, speed, steering wheel reversals, steering wheel movement speed, number of cones
knocked down during serpentine driving, stopping, distance to tracking line and self reported
experience. (Sakshaug 2008)
Mean speed and steering wheel reversals per sec gave pretty much the same pattern at test
track as in simulator.
Round track driving on test track - Mean speed
Round track driving in simulator - Mean speed
40,0
40,0
30,0
Frequency
Frequency
30,0
20,0
10,0
Mean =42,7878
Std. Dev. =7,31103
N =300
0,0
20
30
40
50
Mean speed (kmh)
60
70
20,0
10,0
Mean =44,1
Std. Dev. =9,846
N =307
0,0
20
40
60
Mean speed (kmh)
80
Figure 5: Round track driving. Speed distribution.
© Association for European Transport and contributors 2009
8
Previously we have found that measurements of standard deviation of speed on real road
registrations are smaller than in the driving simulator. This is not the case here. The standard
deviation was not smaller in the driving simulator than on the test track. This is probably
because there are no uncontrolled factors due to other traffic as there is on road registrations.
Serpentine driving on test track - Mean speed distribution
Serpentine driving in simulator - Mean speed
25
20,0
Mean =33,435
Std. Dev. =3,
94255
N =144
20
Frequency
Frequency
15,0
15
10
10,0
Normal
5,0
Mean =33,69
Std. Dev. =5,332
N =145
5
0,0
0
20
25
30
35
40
45
20
25
Mean speed (kmh)
30
35
40
Mean speed (kmh)
45
50
Figure 6: Serpentine driving. Mean speed distribution.
4.2 Determining distance to object in the driving simulator
Compared to the real world, the ability for the driver to calculate the distance of an object in a
simulator is quite different (Sakshaug 2008). This is because the simulator image is 2D
compared to the real 3D world. In addition, the g-forces in the driving simulator used were
smaller than the real world. This leads to fewer cues for the driver about lateral and
longitudinal acceleration of the vehicle.
During a serpentine exercise conducted both on a test track and a corresponding simulated test
track, the subject knocked down or touched more cones in the simulator than compared to the
real world test track. It was also easier to assess the distance to a stop line on the test track
than in the simulator.
Simulator driving - no of cones knocked down
Driving on test track - No of cones knocked down
Serpentine cone
Stop line cone
6
5
Mean no of cones knocked down
5
No of cones knocked down
Serpentine driving No of cones
knocked down
Start/Stop - No of
cones knocked
down
6
4
3
2
4
3
2
1
1
0
0
Baseline Test drive Test drive Test drive Test drive Baseline
Before 1 - sober 2 - BAC 3 - BAC 4 - BAC
After
(sober)
level 1
level 2
level 3
(sober)
Baseline
before
(sober)
Test condition
Test
Test
Test
Test
Baseline
driving - driving - Drivnig - driving after
Sober
BAC level BAC level BAC level (sober)
1
2
3
Test condititon
Figure 7: Serpentine driving. Number of cones knocked down
© Association for European Transport and contributors 2009
9
However, the mean distance from the line, was found to vary with test drive category in the
same way in both environments. The serpentine driving task was obviously more difficult in a
simulator than on the test track.
5. CONCLUSION
Experimental studies are important tools for evaluation of IVIS/ADAS, but at the same time it
demands special expertise to create good experimental design. For example an important
strength of driving simulator experiments lies in controlling the confounding variables, but at
the same time this creates results with less variance than experiments in real traffic.
In this paper we have described several possibilities for experimental methods for evaluating
IVIS/ADAS and outlined some of the constraints and advantages of some of these methods.
Further research using experimental methods will increase the validity of using such methods.
BIBLIOGRAPHY
Cook, T. D. and D. T. Campbell (1979). Quasi-experimentation : design & analysis issues for
field settings. Chicago, Rand McNally College Pub. Co.
Engen, T. (2008). Use and validation of driving simulators. Faculty of Engineering Science
and Technology, Department of Civil and Transport Engineering. Trondheim,
Norwegian University of Science and Technology. Doctoral thesis.
Engen, T. and T. Giæver (2004). Reaksjonstid i vegtrafikken. Trondheim, SINTEF, Teknologi
og samfunn, Veg og samferdsel.
Giæver, T. (1993). Trafikkavvikling under vinterforhold. SINTEF Rapport. Trondheim,
SINTEF Samferdslsteknikk.
Giæver, T. and T. Engen (2005). Testing av visuell midtdeler. Trondheim, SINTEF,
Teknologi og samfunn, Veg og samferdsel.
Kaptein, N. A., J. Theeuwes, et al. (1996). "Driving simulator validity: Some considerations."
Transportation research record(1550): 30-36.
Kircher, K. (2008). D2.1 – A Comprehensive Framework of Performance Indicators and their
Interaction.
Sakshaug, K. (2008). VALIDAD Pilot Experiment in Simulator and on Test Track. Results of
Data Analysis. Trondheim, SINTEF: 37.
© Association for European Transport and contributors 2009
10