as a PDF

Analysis of Eye-Tracking Experiments performed on a Tobii T60
Chris Weigle and David C. Banks
University of Tennessee/Oak Ridge National Lab, Joint Institute for Computational Sciences
ABSTRACT
Commercial eye-gaze trackers have the potential to be an important tool for quantifying the benefits of new visualization
techniques. The expense of such trackers has made their use relatively infrequent in visualization studies. As such, it is
difficult for researchers to compare multiple devices – obtaining several demonstration models is impractical in cost and
time, and quantitative measures from real-world use are not readily available. In this paper, we present a sample protocol
to determine the accuracy of a gaze-tacking device.
Keywords: user studies, usability, eye-gaze tracking
1. INTRODUCTION
Many researchers are concerned with validating the benefit of one visualization technique versus another. A plausible
way to perform this validation is via user studies, where a user is presented with different visual stimuli (representing
different visualization techniques) and is instructed to perform some perceptual or cognitive task. In addition to standard
performance measures (e.g. reaction time, precision, and accuracy), it may be informative to capture and analyze user
attention. Including information on attention, for instance, can distinguish between time spent on critical features or
distracting elements in the visualization.
Asking users to self-report their attention is impractical and unnatural. In order to reduce the latency and motor-control
issues associated with pointing to the screen or moving a mouse, a gaze-tracking system may be employed to determine
where on the screen a user’s attention is directed. The use of a gaze-tracker requires that the researcher determine the
characteristics of the gaze-tracking hardware system. Because commercial systems (including software and support) can
cost $40,000 or more it is difficult for the experimenter to gather data comparing different devices. Specifications for
gaze-position error for attending to fixed points on the screen are readily available, but information about gaze-position
error for following a moving object are not.
We arranged for a Tobii Technology T60 demonstration unit to be brought to our lab (Tobii sales representatives provide
this service as their schedules allow). We had prepared set of animations for the purpose of collecting data in something
reflecting a user study scenario. This mock user study consisted of a single participant, seven animations, and 10 repeated
sessions of which 9 were successfully recorded. This paper will focus solely on the simplest of these animations, a ball
bouncing along a predictable path across a set of boxes.
This paper describes a sample protocol we used in a mock user study whose purpose was to confirm the hypothesis that
the gaze-tracking device reports the location of the user’s gaze where the user was instructed to look at a moving target.
Other experimenters may use this approach and compare to our results as part of their system-testing when considering
whether to employ a particular gaze-tracking technology for validating visualization techniques in a laboratory environment. A particular concern in this paper is whether gaze reports provided by the eye-gaze tracker should be accepted as
valid if the system marks them so.
Section 2 reviews eye-tracking technologies in common commercial use. Particular attention is paid to near-infrared,
image-based corneal-reflection technologies used by Tobii and many other commercial vendors. Section 3 summarizes the
specs of the relevant Tobii Technology products including the T60 demonstrated in our lab. Section 4 describes the mock
user study. Section 5 presents analysis of the mock user study data, with particular attention given to issues of validity of the
eye-tracking reports and the determination of outliers in the reports. Section 6 summarizes our findings and conclusions.
Further author information: (Send correspondence to C.W.)
C.W.: [email protected]
D.C.B.: [email protected]
2. EYE TRACKING TECHNOLOGIES
Eye trackers are systems that estimate direction of gaze of a user. Early eye tracking systems are surveyed by Young and
Sheena,1 and more recent systems are surveyed by Duchowski.2 Eye-tracking systems either track the eye-in-head gaze
direction (requiring a separate report for head orientation) or the combined eye-head gaze direction. Generally, any system
affixed to the user’s head is tracking eye-in-head direction, and any remote system is tracking eye-head gaze direction.
Either system can ultimately estimate the screen-space location of the point-of-regard of an eye.
Eye-tracking systems may be intrusive (requiring some form of physical contact between the sensor apparatus and the
user)3–5 or non-intrusive (typically using camera-based techniques).6–13 Although intrusive techniques require contact with
the user, they are generally more robust to user movement as the sensor remains fixed with respect to the user’s eyes and
can be significantly more accurate than non-intrusive techniques.14 Non-intrusive techniques may restrict user movements,
particularly head movements, to keep the user’s eyes fixed with respect to the system sensors.15 It is not uncommon for
camera-based eye tracking to require the use of a chin rest to largely immobilize the user’s head.
Today most vendors providing commercial systems offer remote tracking via image-based techniques.16–23 Such systems are comfortable for extended use, relatively fast to set up and calibrate for each user, and provide sufficient data rates
and accuracy for most uses (though not for some vision research).24, 25 These systems may also tolerate small amounts of
head movement without invalidating the calibration.2, 26 However, Schnipke and Todd describe the difficulties they faced
in using commercial remote eye trackers, in particular noting a 62.5% rate of failure at acquiring tracking at all.27
Camera-based eye tracking systems come in several designs and may search for different features of the eye in the
images they capture.5, 15 Many commercial trackers require some form of calibration procedure for each individual
user.12, 14, 28 Commercial camera-based trackers may be considered intrusive; they may be attached to a head mount
so the cameras can be a very close, fixed distance from the eye.17–19, 23 More generally, commercial camera-based trackers
are remote.16, 19–22 The amount of head motion allowed in remote camera-based systems varies with vendor from a few
millimeters to approximately one cubic foot of working volume.29 Systems allowing only a few millimeters of head motion typically require a head restraint, e.g. a chin rest, be employed. Figure 1 illustrates various camera-based commercial
systems.
(a) Arrington Research17
(b) Applied Science Laboratories21
(c) Tobii Technology20
Figure 1. Commercially available eye trackers may be head mounted (a), may require a head restraint (b), or may allow some free head
motion (c).
Camera-based trackers rely on determining properties of the eye from images. Figure 2 shows some of the main
anatomical features of the human eye. The most common features of the eye used for tracking are the pupil-iris boundary
and the iris-sclera boundary (also called the limbus). The limbus is often obscured by the eyelids, leading to low accuracy
in the vertical tracking component. However, the pupil-iris boundary has relatively less contrast than found across the
limbus. It is therefore common to use infrared or near-infrared light sources in pupil tracking systems as they provide
better contrast than visible light. Near-infrared light sources are almost invisible to the human eye, but readily detectable
by many video cameras. Figure 3 depicts the eye as seen by a camera.
The light source may be placed either on- or off-axis with the camera. When the source is on-axis, the camera sees the
pupil illuminated similar to the red-eye effect in flash photography. Thus on-axis tracking is also referred to as bright-pupil
tracking. Similarly, off-axis tracking is also referred to as dark-pupil tracking. Some systems use both on- and off-axis
sources, combining the two images of the eye to better isolate the pupil. Figure 3 depicts the eye under each tracking
modality.
labeled_eye.png (PNG Image, 508x516 pixels)
file:///Volumes/Macintosh%20HD%202/weigle/TobiiDemo/techRepo...
Figure 2. Major anatomical features of the human eye (image reproduced from30 ).
(a) The most common features captured by camerabased trackers, the pupil and the first corneal reflection
(image reproduced from18 ).
(b) The eye as imaged during dark-pupil (left) and light-pupil
(right) tracking. Note the dark pupil (A), the light pupil (B), and
the first corneal reflection (C) (image reproduced from31 ).
Figure 3. The eye as seen by camera-based tracking systems.
Multiple reflections of 1the
gaze
of 1 light source may be captured in the images of the eye and used to compute
6/13/07 9:53
AM direction.
These reflections are called Purkinje images, and are the reflection of the light source off of different layers of the anatomy
of the eye. The first Purkinje image is the reflection off the outer surface of the cornea, and the second Purkinje image is the
reflection off the inner surface of the cornea. Similarly, the third and fourth Purkinje images are generated at the surfaces
of the lens. The first Purkinje image, also called the corneal reflection, is easy to detect and is commonly used in camerabased trackers.10, 12, 16–23, 32 The remaining Purkinje images have been the basis of some camera-based eye trackers,15, 33
but require specialized equipment to detect. Figure 4 illustrates the anatomical sources of the first 4 Purkinje images.
Figure 4. Incoming light reflects off anatomical surfaces in the eye to cause the Purkinje images.
By determining the relationship between the pupil and the corneal reflection, camera-based eye trackers can compute
the gaze direction of the eye along with head position. The relationship between the source and camera is fixed by the
vendor so (assuming the eye to be a sphere) the corneal reflection does not vary with eye rotation and may be used as
a point of reference. The vector between the pupil center and corneal reflection can be mapped to screen coordinates
via a calibration procedure. The corneal reflection does track head movement, and the calibration will handle small head
motions.
Typical calibration procedures require the user to fixate on several predetermined screen positions, one at a time.
The parameters mapping from screen space to pupil-reflection vector are then solved for as over-constrained polynomial
systems. The number of parameters and equations depends on the order of the system and the number of fixation points.
For instance, a second order system with 9 fixations points requires solving two systems of 6 unknowns and 9 equations,
one each for the x and y screen coordinates. First and second order systems are the most common, and give sufficient
accuracy from relatively few fixation points.
Morimoto and Mimica report on their simulations of low-order calibration with a pupil-center corneal-reflection tracker
for fixed and moving head positions.14 They found the average gaze position error to be less than 1mm on the display or 1◦
of visual angle for a fixed head position (though not uniform over the screen). After moving the head 100mm horizontally,
the average gaze position error was found to be 10mm on the display. A vertical head movement of 100mm led to an
average gaze position error of 20mm on the display. Head motion transverse to the screen (along the z axis) could produce
significant errors in reported gaze position, with a 100mm head motion producing an average gaze position error of 40mm
on the display.
3. TOBII TECHNOLOGY SYSTEMS
Tobii Technology offers camera-based eye tracker systems where the light source and camera are permanently affixed to a
monitor. Tobii currently offers two lines of such systems, the new T/X-series line and the existing 50-series line. Of the
two lines, the model T60 and the model 1750 are the most directly comparable systems.
Tobii trackers utilize near-infrared pupil-center/corneal-reflection eye tracking.29 All Tobii eye trackers communicate
via a Tobii Eye Tracker (TET) server. The TET server is responsible for performing image processing on the camera video
feeds and for mapping from image parameters to screen coordinates. The TET server communicates over TCP/IP allowing
the server to run separately from the PC receiving the tracker reports.
3.1 Model T60
The model T60 provides a 17in LCD monitor with integrated eye tracking system. The system is accurate to within 0.5◦
with less than 0.3◦ drift over time and less than 1◦ error due to head motion.29 The T60 performs binocular tracking (it
tracks both eyes simultaneously) at 60Hz, and allows for head movement within a 44x22x30cm volume centered 70cm
from the camera. The T60 also combines both bright and dark pupil tracking.
The T60 integrates the TET server into the monitor – a dedicated processor housed inside the monitor case processes
all camera data and reports tracking information to any TCP/IP connected client. This offloads the processing duties from
the PC collecting tracking reports, and according to Tobii reduces the latency in computing the gaze position compared to
burdening a PC with the computations. Tobii reports an average latency of 33ms between camera exposure and receipt of
the gaze report on the data analysis PC.29
3.2 ClearView
ClearView is Tobii Technology’s current tool for capture and analysis of gaze data.34 ClearView provides for calibration
of study participants. ClearView allows creation of studies composed of images, animations, webpages, live video, real
world objects, or other applications. Aside from gaze data, ClearView can also record mouse clicks (e.g. for use with web
browser stimulus). ClearView also provides analysis tools to visually present stimuli and gaze data together either as a
replay of full or partial sessions or as a heat map of participant gaze patterns. ClearView also allows the export of the study
data to plain text files.
4. MOCK USER STUDY
We prepared a mock user study for the Tobii visit. The goal of the study was to provide hard data we could analyze to
better understand the performance characteristics of the Tobii line of products. We had originally hoped to also compare
two models of Tobii eye trackers, but Tobii was unable to provide a demonstration model from the 50-series. Tobii’s
ClearView tool was used to calibrate users, display stimuli, and capture gaze reports.
4.1 Stimuli
Although the full mock user study consisted of seven short animations, this paper will focus on only one stimuli (illustrated
in Figure 5) – a 25-pixel radius red ball bouncing predictably left-to-right across 4 green boxes, resting on each box for
33ms, then returning along the same path back to the left-most box. This animation plays for 12 seconds at 30Hz.
Figure 5. The bouncing-ball stimulus. The image shows the ball and the ball’s motion path. After moving left-to-right from the left-most
box to the right-most box, the ball reverses direction and returns to the starting box.
4.2 User
There was a single user who ran the complete session 10 times in one sitting. The user attended to one specific feature of
each animation for the first 5 sessions, and one specific, but different, feature for the second 5 sessions. For example, the
user attended to the bottom of the bouncing ball in stimulus three the first 5 times and the top of the ball the second 5 times.
Due to an unknown error, the data from the first session were lost. The data from the remaining nine sessions are intact.
4.3 Calibration
The calibration of the lone user was difficult. The process was repeated 4 times, with the calibration tool indicating that
at least 2 of the calibration points showed a poor fit in the computed mapping. The final calibration still contained 2 such
mis-fit points.
The difficulties may be because the lone user wore glasses. A second individual, a soft-contact wearer, was calibrated
with no difficulties. However, no sessions were recorded for this individual.
5. DATA ANALYSIS
This section describes our analysis of the mock user study data. We were interested in determining two things about the
eye tracker. First, how much error is present in the tracking reports, and what that tells us about the usable resolution of the
device. Second, are all reported gaze positions valid, and can we reasonably determine which reports are not valid.
For the bouncing-ball stimulus the first half of the trials tracked the bottom edge of the ball and the second half tracked
the top edge of the ball. Therefore we expect the y coordinates of the two session types to differ by 50 pixels (the diameter
of the red bouncing ball) if the eye tracker is accurate.
5.1 Characteristics of the Gaze Reports
Figure 6(a) presents the raw data overlaid onto the illustration of the stimuli motion path. Figure 6(b) presents the data as
a gaze path averaged over 100ms intervals. The top- and bottom-edge tracking sessions are color coded separately. We see
that the top edge data clusters near, but slightly above, the top of the area swept by the animated ball. Also, we see that the
bottom data clusters nearer the center of the swept area that the bottom.
Figure 7 compares the binocular gaze path of our user following the bottom edge of the bouncing ball for a single
session versus all such sessions. The short portion of the gaze path not following the motion of the bouncing ball is
believed to be target pursuit after the stimulus first appears. However, the initial target pursuit does appear to have a longer
duration than expected; initial open-loop-like pursuit∗ typically lasts only ∼100ms.35
∗ In smooth pursuit, the eyes move in an attempt to keep the image of the moving target fixed within the fovea. Initially, the eyes
move to the last known position of the target and can not use new information to correct initial pursuit. This is similar to an open-loop
feedback control system which can not use feedback to adapt to a changing system. Conversely, a closed-loop control system can use
feedback to adapt to changes in stimulus. When the eyes are engaged in smooth pursuit, they behave like a closed-loop system.
'bb.rgb' binary array=550x385 flipy format='%uchar'
top
bottom
'bb.rgb' binary array=550x385 flipy format='%uchar'
top
bottom
(a) raw data
(b) 100ms averaged
Figure 6. Gaze data for following the top or bottom edge of the moving ball. In (a) the raw data are presented. In (b) the data have been
averaged over 100ms intervals and connected to show a gaze path. The data are overlaid on the time-lapse image of the animation.
Figure 8 shows per-eye position-dependent error in the y-coordinate. The motion of the bouncing ball from trough to
peak to trough is always the same, the troughs all have the same y coordinate, and the peaks all have the same y coordinate.
Therefore it would be expected that a linear least squared fit of the gaze data for the full stimulus would have slope 0. In
fact, the binocular-averaged gaze data does have the expected property. However, for each individual eye the data does
not have the expected property. This holds true for single or multiple sessions. The implication of this finding is that gaze
reports marked invalid for a single eye should only be included if the per-eye slope can be computed and factored out (such
as in off-line processing). In an interactive application, it may be necessary to consider such gaze reports invalid for both
eyes.
5.2 Gaze-Report Latency
Before we can consider computing the pixel distance between the target motion path and the gaze reports, we must determine what the apparent latency is between gaze reports and the animation. To estimate the lag between the display of
a video frame and the corresponding gaze report, we compute the cross-correlation between the coordinates of the target
(bouncing ball) positions and gaze reports.
The target positions are resampled to 1ms intervals, assuming the target is stationary between frames. The gaze reports
are also resampled to 1ms intervals using linear interpolation. Resampling is necessary because the two time-series are
originally at different rates (30Hz for the animation, 60Hz for the gaze reports), invalid gaze reports are not included in the
02CMD.txt : predictable_bounce_comp.avi : bin = 250 ms : eye = b
01CMD.txt 02CMD.txt 03CMD.txt 04CMD.txt 05CMD.txt : predictable_bounce_comp.avi : bin = 250 ms : eye = b
550
550
500
500
450
450
400
400
350
350
300
300
250
250
200
200
300
400
500
(a) single session
600
700
300
400
500
600
700
(b) multiple sessions
Figure 7. The binocular gaze path for (a) a single session versus (b) multiple sessions tracking the bottom edge of a bouncing ball. Raw
data is in green, average path in blue, and 95% confidence intervals in red.
gaze series, and the display of the animation is not synchronized to the eye-tracker reports (in the data stream, the event
marking the beginning of the animation falls arbitrarily between tracker reports). Similar calculations were performed for
both the x- and y-coordinates.
02CMD.txt : predictable_bounce_comp.avi : bin = 250 ms : eye = l
550
02CMD.txt : predictable_bounce_comp.avi : bin = 250 ms : eye = r
A = 0.14233
550
B = 331.59
02CMD.txt : predictable_bounce_comp.avi : bin = 250 ms : eye = b
A = -0.13264
550
B = 463.48
500
500
500
450
450
450
400
400
400
350
350
350
300
300
300
250
250
250
200
200
300
400
500
600
700
A = 0.0077538
B = 393.89
200
300
(a) single session
400
500
600
700
300
(b) single session
400
500
600
700
(c) single session
01CMD.txt 02CMD.txt 03CMD.txt 04CMD.txt 05CMD.txt : predictable_bounce_comp.avi : bin = 250 ms : eye =01CMD.txt
l
02CMD.txt 03CMD.txt 04CMD.txt 05CMD.txt : predictable_bounce_comp.avi : bin = 250 ms : eye =01CMD.txt
r
02CMD.txt 03CMD.txt 04CMD.txt 05CMD.txt : predictable_bounce_comp.avi : bin = 250 ms : eye = b
550
A = 0.1066
550
B = 350.84
A = -0.10909
550
B = 447.09
500
500
500
450
450
450
400
400
400
350
350
350
300
300
300
250
250
250
200
200
300
400
500
600
700
A = 0.00062309
B = 396.44
200
300
400
500
600
(d) multiple sessions
(e) multiple sessions
Left eye only
Right eye only
700
300
400
500
600
700
(f) multiple sessions
Both eyes
Figure 8. Gaze paths for individual or both eyes (columns) and single or multiple runs (rows). Note that the individual eyes have
significant slope that is averaged out in the binocular path. In particular, the least-squares fit of the data in (a) has slope 0.14, (b) −0.13,
(c) 0.008, (d) 0.11, (e) −0.11, and (f) 0.0001.
350
300
Y coordinate (pixels)
250
200
150
100
50
top edge, multiple sessions
bottom edge, multiple sessions
top edge, single session
bottom edge, single session
target
0
2000
4000
6000
8000
10000
12000
14000
Time (ms)
Figure 9. Overlays of the y-coordinate time-series data for the bouncing-ball target and gaze report data (single and multiple sessions,
top and bottom edge tracking). Cross-correlation of each gaze time-series with the target shows less than 1ms lag.
Figure 9 shows these resampled data for the y-coordinates. The x-coordinates of the gaze data are not shown as they do
not deviate significantly from the target motion. The blue line shows the path of the center of radius of the bouncing-ball
stimulus. The red and pink lines show the gaze data for tracking the top edge of the ball (for a single session and all
sessions, respectively). The bright green and dark green lines show the gaze data for tracking the bottom edge of the ball
(again, for single and multiple sessions). Of note is that the red lines (object top) are visibly separated from the blue line
(object center) as they should be, but the green lines (object bottom) are not.
The cross-correlations of the target with each tracking time-series showed a peak at 0ms lag (for both x and y, for each
of the 4 gaze report time series). Strictly, this means the lag, if any, between the reported gaze positions and the animated
target is less than 1ms.
Note that this does not to suggest that there is no more than 1ms latency in receiving the gaze report at the computer.
Computation and network transmission necessarily inject latency between frame display and receipt of the corresponding
gaze report, and the device specification puts this latency below 32ms. The latency measured here is between the display
of a frame of animation and the corresponding image capture of the pupils by the tracking device.
5.3 Accuracy of Tracking
We average (100ms intervals) all sessions where the subject was tracking the bottom edge of the ball, and compare this
to the target position determined from the animation. We compute a difference in the x coordinate of µx = −4.41 pixels
(σx = 27.46). We compute a difference in the y coordinate of µy = −0.68 pixels (σy = 14.13). For a 17in LCD at
1024 × 768 resolution, with the user 60cm from the display, we would expect position error on the order of 1.5 pixels.
We find neither µx or µy to be within this tolerance of their expected values. In fact, a Student’s T-test shows that the
displacement of the y coordinate (from the target position) is not equal to the target radius of 25 pixels (p < 0.0001), nor
is it within the margin of error of this displacement.
Next, we average (100ms intervals) all sessions where the subject was tracking the top edge of the ball, and compare
this to the target position determined from the animation. We compute a difference in the x coordinate of µx = 3.67 pixels
(σx = 26.31). We compute a difference in the y coordinate of µy = 34.22 pixels (σy = 16.24). Again, we would expect
position error on the order of 1.5 pixels. And again, we find neither µx or µy to be within this tolerance of their expected
values. In fact, a Student’s T-test shows that the displacement of the y coordinate (from the target position) is not only not
25 pixels (p < 0.0001), but it is not within the margin of error of this displacement.
A paired Student’s T-test also shows that the difference between the sessions tracking the top edge of the ball and the
sessions tracking the bottom edge of the ball is not 50 pixels (p < 0.0001). Nor does the gaze position difference fall in
the 15 pixel error range.
The eye tracker appears to follow the motion of the eyes precisely, but it does not appear to report the location of the
eye-gaze position accurately. This may be due to user error (e.g. abnormal visuomotor response to the target), or problems
with calibration (e.g. eyeglasses).
5.4 Validity
Tobii’s TET server automatically flags gaze reports as valid or invalid depending on whether the eyes are successfully
captured and located by the imaging processes. Flags are applied per eye, so a report may indicate a report is valid for one
eye but invalid for the other eye. The data from the full mock user study contains 28243 individual gaze reports. The TET
server marks 1.65 percent of the left eye reports and 1.39 percent of the right eye reports as invalid.
An invalid report is one for which the TET server could not detect the presence of the eye. Unfortunately, our analysis
has already shown that reports for a single eye contain significant position dependent error in the y gaze coordinate (Figure
8) and are not reliable measures of gaze position on their own. Although it is possible to post-process the data to remove
the error, if using the data interactively it appears that gaze positions reported valid for only one eye should be considered
invalid for either eye. Under the assumption that gaze coordinates marked invalid for one eye should be treated as invalid
for either eye, 1.89 percent of the 28243 reports are invalid.
5.4.1 Pupil Size
Tobii’s TET server reports the diameter in millimeters for each pupil it detects. Typical radii for the human eye are 1.5mm
to 2mm but may be as small as 0.75mm in bright light and as large as 4mm in low light.36 However, the maximum radius
to which the pupil may open decreases with age. Although they are not depicted in the accompanying figures, some gaze
reports contain pupil diameters around 150mm. Such reports are few, but the gaze report can not be accepted as valid in
such cases.
The human pupil does not dilate and constrict voluntarily. Further, the two eyes normally respond identically to stimuli
(bright lights, nearby objects) directly effecting only one eye. In healthy, unmedicated individuals the left and right pupils
do not significantly differ in size. The subject was not exhibiting signs of any relevant physiological conditions, so we
expected to see device-reported pupil sizes be very consistent between the two eyes.
Figure 10 shows quantile-quantile plots† (qqplots) of pupil size (left versus right) for a single session. The display area
has been divided into a 4x3 grid of sectors, and a separate qqplot computed for each sector. The plots allow a quick visual
check for any bias in the pupil size distributions based on gaze location. Only the two central plots are shown in Figure
10 because they are the only sectors with sufficient data from both eyes. These left-central sector shows a significant
difference between the two reported pupil size distributions, while the right-central sector does not. This suggests that
significant differences between left and right pupil sizes are more likely to be reported in the left-central sector.
3.2
3.2
3.1
3.1
3
3
2.9
2.9
2.8
2.8
2.7
2.7
2.6
2.6
2.5
2.5
2.6
2.7
2.8
2.9
3
3.1
3.2
2.5
2.5
2.6
2.7
2.8
2.9
3
3.1
3.2
Figure 10. Pupil size qqplots (left vs. right) for over the two central sectors of the display (divided into 4x3 sectors). The plots suggest
that reported pupil sizes are more likely to differ significantly between the two eyes in the left-central sector of the display.
5.4.2 Pupil Distance
The Tobii TET server also reports distance in millimeters to each detected pupil. We again divided the display area into a
4x3 grid of sectors, and computed a left versus right qqplot for each sector. As with pupil size, the majority of the sectors
contain little or no data. Figure 11 shows that the right-central sector exhibits a relatively constant difference of about 2mm
between the two pupils. Such a difference is completely within expectation, and could be due to nothing more than the
subject sitting slightly off center of the camera. The left-central shows a significant difference between the largest distances
in each distribution. In particular, the distance reports for one pupil show a bimodal distribution with one mode near 66cm
and the other near 69.5cm. The presence of a bimodal distribution is not itself unexpected, but that both eyes do not reflect
the same bimodality is perhaps more surprising. It should be noted that this is present in the same sector that shows the
most difference in pupil size distribution.
The significant differences between left and right pupil distance may come from significant head movement, tracking
issues due to interference, or erroneous tracking. No data exists to rule out head movement, but the authors do not recall
the subject moving significantly during sessions (not enough to explain reported differences of 10–15cm found in the full
† Quantile-quantile
plots compare two distributions by plotting the values of the nth quantiles from each as (x, y) coordinates. When
the distributions are the same, the plot forms a line y = x + µx − µy .
data). It is possible that the subject’s eyeglasses provided a secondary glint that in some cases could be captured instead of
corneal reflection. This may in fact explain the 3.5cm jump exhibited in the left-central qqplot. But eyeglasses would not
be expected to explain the 10cm differences.
700
700
690
690
680
680
670
670
660
660
660
670
660
670
Figure 11. Pupil distance qqplots (left vs. right) for the two central sectors of the display (divided 4x3 into sectors). The plot suggests
that the reported pupil distances in the right-central sector were very similar in distribution with means that differed by only a few
millimeters. The left-central sector however shows a bimodal distribution for the pupil distance for one eye and not the other. Such a
distribution is unlikely to reflect the true pupil sizes of the subject.
5.4.3 Aggregate Validity
In some cases, the TET server marks gaze reports as valid when hidden parameters (such as pupil size and distance) are
out of normal range. These out-of-range reports factor in to the computation of the gaze coordinate, and can produce
obviously invalid data that is marked valid. Some easily detectable cases result in gaze coordinates that are out of range
for the display. Considering these cases are invalid as well, the percentage of invalid reports increases to 1.69 for the left
eye, 1.41 for the right eye, and 1.93 for both eyes together.
For normal, healthy individuals, the pupils dilate and contract by involuntarily reflex and do so in unison. Therefore
the pupils can be expected to always be approximately the same size. Given the range of normal pupil size, it is reasonable
to assume that pupil sizes less than 0.25mm or greater than 10mm are invalid, as are reports of pupil sizes that differ by
1mm or more. Rejecting such reports raises the percentage of invalid reports to 1.95 for both eyes together.
Valid pupil distance is more difficult to characterize than pupil size. The Tobii specifications for each display give the
working volume for head movement that provides gaze tracking with less than 0.5◦ error (44x22x30cm centered 70cm
from the camera). It is reasonable then to consider pupil distances outside of 40–100cm as out of the desirable range. No
gaze reports yielding potentially valid gaze coordinates also report distances outside this range. The difference between
the left and right pupil distance can also be used to determine invalid reports. To rigorously model the range of valid
differences in pupil distance the model should should consider the following:
• the range and position of the low-error working volume,
• the range of normal human inter-pupilary distance,
• the range of head sizes (for the target age demographic), and
• the range of head rotations that reasonably allow gaze to fall on the display.
A simpler approach is to choose a threshold. Rejecting gaze reports with left and right pupil distance differing by
8cm or more increases the number of invalid reports to 2.37 percent for both eyes together. This is a little more than 25%
increase in invalid gaze reports compared to those automatically flagged by the TET server It is however still a reasonably
small number of invalid reports over the full collection of data.
6. CONCLUSIONS
We have presented a sample protocol for confirming whether a gaze-tracking device accurately reports the location of
the user’s gaze where the user was instructed to look at a moving target. The protocol moves beyond vendor reported
characteristics for attending to fixed points on a display, and measures dynamic characteristics of the tracking device.
Although commercial eye trackers self-report information about internal state and data validity during tracking, flagging
data known to be invalid as such, we showed that there may be underlying characteristics of the tracker data that require
additional scrutiny. Our representative tracker exhibited significant position dependent error in the y coordinate that would
require post-processing to remove or exclusion of data during interactive use of the tracker. Our representative tracker also
reported data as valid where hidden physical parameters exceeded likely values (such as extreme differences between pupil
sizes). In total, the device flagged less than 80% of invalid gaze reports as such.
We were unable to match the static error specifications of the device in our dynamic study. Although we could distinguish between motion paths that were 50 pixels apart (recall that half the motion paths were intended to track the top edge
of a 50 pixel ball, and half were intended to track the bottom edge), the absolute position of the paths were not accurately
captured. In fact, if we did not know the intended motion paths in advance, we would not be able to deduce that the path
following the bottom edge of the bouncing ball was not intended to track the center of the ball.
Without further access to the representative model of eye tracker, we can not perform the tests again on a subject with
better calibration. This is unfortunate in terms of drawing conclusions about the representative eye tracker itself. However,
if indeed the calibration difficulties are the source of the idiosyncrasies reported in this paper, then without them their may
have been few findings of note.
We can say that the representative gaze tracker provides an excellent tool for determining relative positions and trajectories. The reported tracking data reliably captured the shape of the target’s motion. The two sets of data, each set tracking
a different feature of the target object, are distinguishable from the other.
REFERENCES
1. L. Young and D. Sheena, “Methods & designs: survey of eye movement recording methods,” Behav. Res. Methods
Instrum. 7(5), pp. 397–429, 1975.
2. A. Duchowski, Eye Tracking Methodology: Theory and Practice, 2003.
3. D. Robinson, “A method of measuring eye movements using a scleral search coil in a magnetic field,” IEEE Transactions on Biomedical Engineering 10, pp. 137–145, 1963.
4. A. Kaufman, A. Bandopadhay, and B. Shaviv, “An eye tracking computer user interface,” in Proceedings of the
Research Frontier in Virtual Reality Workshop, pp. 78–84, IEEE Computer Society Press, 1993.
5. J. Reulen, J. Marcus, D. Koops, F. de Vries, G. Tiesinga, K. Boshuizen, and J. Bos, “Precise recording of eye
movement: the iris technique, part 1,” Medidcal and Biological Engineering and Computing 26(1), pp. 20–26, 1988.
6. Y. Ebisawa and S. Satoh, “Effectiveness of pupil area detection technique using two light sources and image difference
method,” in Proceedings of the 15th Annual International Conference of the IEEE Engineering in Medicine and
Biology Society, A. Szeto and R. Rangayan, eds., pp. 1268–1269, (San Diego, CA), 1993.
7. A. Tomono, M. Iida, and Y. Kobayashi, “A TV camera system which extracts feature points for non-contact eye
movement detection,” in Proceedings of the SPIE Optics, Illumination, and Image Sensing for Machine Vision IV,
1194, pp. 2–12, 1989.
8. K. Kim and R. Ramakrishna, “Vision based eye gaze tracking for human computer interface,” in Proceedings of the
IEEE International Conference on Systems, Man and Cybernetics, (Tokyo, Japan), 1999.
9. Y. Ebisawa, “Unconstrained pupil detection technique using two light sources and the image difference method,”
Visualization and Intelligent Design in Engineering and Architect II 15, pp. 79–89, 1995.
10. Y. Ebisawa, M. Ohtani, and A. Sugioka, “Proposal of a zoom and focus control method using an ultrasonic distancemeter for video-based eye-gaze detection under free-hand condition,” in Proceedings of the 18th Annual International
Conference of the IEEE Engineering in Medicine and Biology Society, 1996.
11. D. Yoo, J. Kim, B. Lee, and M. Chung, “Non contact eye gaze tracking system by mapping of corneal reflections,” in
Proceedings of the International Conference on Automatic Face and Gesture Recognition, pp. 94–99, (Washington,
DC), 2002.
12. C. Morimoto, D. Koons, A. Amir, and M. Flickner, “Pupil detection and tracking using multiple light sources,” Image,
Vision and Computers 18(4), pp. 331–336, 2000.
13. C. Morimoto, A. Amir, and M. Flickner, “Detecting eye position and gaze from a single camera and 2 light sources,”
in Proceedings of the International Conference on Pattern Recognition, (Quebec, Canada), 2002.
14. C. H. Morimoto and M. R. M. Mimica, “Eye gaze tracking techniques for interactive applications,” Comput. Vis.
Image Underst. 98, pp. 4–24, April 2005.
15. T. Cornsweet and H. Crane, “Accurate two-dimensional eye tracker using the first and fourth Purkinje images,”
Journal of the Optical Society of America 63(8), pp. 921–928, 1973.
16. Seeing Machines. http://www.seeingmachines.com.
17. Arrington Research. http://www.arringtonresearch.com.
18. SR Research. http://www.sr-research.com.
19. Cambridge Research Systems. http://www.crsltd.com.
20. Tobii Technology. http://www.tobii.com.
21. Applied Science Laboratories. http://www.a-s-l.com.
22. L C Technologies. http://www.eyegaze.com.
23. SensoMotoric Instruments. http://www.smi.de.
24. A. Poole and L. Ball, Eye tracking in HCI and usability research. Idea Group, Inc, Pennsylvania, 2006.
25. K. Rayner and A. Pollatsek, The psychology of reading, Prentice Hall, Englewood Cliffs, NJ, 1989.
26. R. Jacob and K. Karn, “Eye tracking in human-computer interaction and usability research: Ready to deliver the
promises,” in The mind’s eye: Cognitive and applied aspects of eye movement research, J. Hyn, R. Radach, and
H. Deubel, eds., pp. 573–605, Elsevier, Amsterdam, 2003.
27. S. K. Schnipke and M. W. Todd, “Trials and tribulations of using an eye-tracking system,” in CHI ’00: CHI ’00
extended abstracts on Human factors in computing systems, pp. 273–274, ACM Press, (New York, NY, USA), 2000.
28. K. White, Jr., T. Hutchinson, and J. Carley, “Spatially dynamic calibration of an eye-tracking system,” IEEE Transactions on Systems, Man, and Cybernetics 23(4), pp. 1162–1168, 1993.
29. Tobii Technology Inc., Falls Church, VA, USA, Product description: Tobii T/X Series eye trackers, May 2007.
30. Wikipedia, The free encyclopedia. http://en.wikipedia.org/wiki/Image:Schematic diagram of
the human eye en.svg.
31. S. Milekic, “The more you look the more you get: intenton-based interface using gaze-tracking,” in Museums and the
Web 2003, D. Bearman and J. Trant, eds., Archives and Museum Informatics, (Pittsburgh), 2003.
32. T. Hutchinson, K. White, Jr., K. Reichert, and L. Frey, “Human-computer interaction using eye-gaze input,” IEEE
Transactions on Systems, Man, and Cybernetics 19, pp. 1527–1533, 1989.
33. H. Crane and C. Steele, “Accurate three-dimensional eyetracker,” Journal of the Optical Society of America 17(5),
pp. 691–705, 1978.
34. Tobii Technology Inc., Falls Church, VA, USA, Product description: ClearView 2.7 eye gaze analysis software, 2006.
35. R. J. Krauzlis and S. G. Lisberger, “Temporal properties of visual motion signals for the initiation of smooth pursuit
eye movements in monkeys.,” J Neurophysiol 72, pp. 150–162, July 1994.
36. Wikipedia, The free encyclopedia. http://en.wikipedia.org/wiki/Pupil.