Evaluating Scientific Visualization Using Cognitive Measures

Evaluating Scientific Visualization Using Cognitive
Measures
Erik W. Anderson
Scientific Computing and Imaging Institute
University of Utah
72. S. Central Campus Drive
Salt Lake City, UT, USA
[email protected]
ABSTRACT
In this position paper, we discuss the problems and advantages of using physiological measurements to to estimate
cognitive load in order to evaluate scientific visualization
methods. We will present various techniques and technologies designed to measure cognitive load and how they may
be leveraged in the context of user evaluation studies for scientific visualization. We also discuss the challenges of experiments designed to use these physiological measurements.
Categories and Subject Descriptors
H.5.m [Information interfaces and presentation]: Miscellaneous; D.2.8 [Software Engineering]: Metrics—performance measures
Keywords
Scientific Visualization, Evaluation, Human-Computer Interfaces
1.
INTRODUCTION
The inherent complexity of scientific data, coupled with
the variety of applicable rendering methods motivate robust evaluations of newly developed techniques. Evaluating scientific visualization techniques is a longstanding challenge [1, 2, 15]. Similarly, the field of information visualization has a strong tradition in pioneering research in evaluation techniques [27, 23, 7].
User studies often rely on timing and accuracy information collected during the study coupled with subjective user
surveys given after the experiment is completed. This combination of empirical measurement with subjective questionnaire is designed to assess the efficacy of a visualization
technique with respect to related methods. However, the
analysis of user evaluation studies remains difficult. These
challenges are often compounded by the limited empirical
data acquired during the study.
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page. To copy otherwise, to
republish, to post on servers or to redistribute to lists, requires prior specific
permission and/or a fee.
BELIV 2012 Seattle, WA, USA
Copyright 2012 ACM 978-1-4503-1797-7 978-1-4503-1797-7 ...$10.00.
Beyond the specific details of the many user study experiments, they all share a common goal: to assess the strengths
and weaknesses inherent to a visualization technique or system. Incorporating as many objective measures as possible
into the experiment not only provides a more robust analysis, but also mitigates subjectivity often introduced by users’
preferences, biases, and retrospection.
In this position paper, we review traditional evaluation
techniques that consist of data gleaned from system logging.
We then then outline evaluation methods using physiological measures for the assessment of scientific visualization
efficacy. Finally, we outline a potential user study using
physiological data to determine how well these data capture moments of insight. The discussion of this hypothetical
user study addresses the added difficulties associated with
more open-ended tasks used in conjunction with physiological data streams.
2.
TRADITIONAL EVALUATION METHODS
Due to the nature of today’s complex scientific data, simply displaying all available information does not adequately
meet the demands of domain scientists. Determining the
best use of visualization techniques is one of the goals of
scientific visualization evaluations. Evaluation methods are
dictated by the types of improvements offered by the method
being studied. Some evaluations are concerned primarily
with technological improvements such as rendering speed or
the management of large data.
User studies have been used to evaluate everything from
aircraft cockpits [25] and surgical environments [22] to visualization methods [17]. Evaluating visualization methods
that focus on human factors often employ user studies or expert evaluations to determine their effects on interpretation
and usability. Although expert assessment takes advantage
of knowledgeable users to enable more poignant analysis of
use cases, these experts also bring with them their own preconceptions and preferences that can skew studies.
Traditional evaluation methods provide mechanisms to
gauge aspects of visualizations or environment. Unfortunately, experiments using surveys to measure user experience introduce subjectivity and bias from the users. Subjectivity in user responses may be partially mitigated using
questionnaires developed with the Likert Scale [18]. User
feedback in evaluation may provide important insights into
how users interact with the system being studied. However,
these measures do not help answer questions regarding how
effective a method is at eliciting insight from a dataset - a
primary purpose of visualization.
3.
COGNITIVE LOAD IN VISUALIZATION
EVALUATION
Since cognition is defined as the process of knowledge acquisition and reasoning, it is a reasonable goal of visualization to place as small a burden on a person’s cognitive
resources as possible. By limiting the cognitive load associated with interpreting a visualization, additional resources
may be employed to reason about the salient aspects presented by the imagery. Physiological measures can be a way
of quantitatively assessing the cognitive load imposed on a
user while interpreting a visualization.
The Information Visualization (InfoVis) community has
adopted the use of physiological measures for the evaluation
of new and existing data presentation techniques [11]. There
are many different human physiological responses that may
be measured during an evaluation experiment; gaze location, pupil dilation, heart rate, respiration, skin conductivity changes, muscle activity, brain activity, and metabolic
activity to name a few. For a summary of these measurements and their uses, I direct the reader to Nathalie Riche’s
work [24].
Although many measurement techniques exist to gauge
physiological responses to a stimulus, not all of them express
the ease (or difficulty) of completing a specific task. Instead,
many physiological measures are indicative of the affective
response: the emotional interaction with stimulation. While
it is clear that affective responses are intrinsically linked with
insight generation [14], their link to cognitive load is less well
explored.
In 1991, Chandler and Sweller introduced Cognitive Load
Theory in which load placed upon the cognitive system is
categorized into three distinct sources [8]. Germane load
is the load imposed by learning a new task, Intrinsic load
represents the inherent difficulty of the problem at hand,
and Extraneous load is generated by the representation of
the data presented to the user for interpretation and action.
In terms of visualization evaluation, it is extraneous load
that we are most interested in as it aligns most closely with
the task of assessing the effectiveness of a given visualization.
3.1
Indirect Cognitive Measures
Several accepted methods have been developed to capture
cognitive load using subjective data collected after a task
is completed. The NASA-TLX index uses perceived mental
effort as a way to assess the task’s workload [10]. These selfreported mental-effort ratings have been shown to correlate
with mental workload measured by direct inspection of brain
activity through EEG [3]. Recently, it was shown that it is
possible to extract individual measures of germane, intrinsic,
and extraneous cognitive load using these subjective mentaleffort ratings [9].
Another way to indirectly measure cognitive load uses timing and accuracy data while a user is performing the evaluation task concurrently with a controlled working memory task [6]. This artificial loading of working memory, and
thus the cognitive system, is designed to overload a user’s
limited resource pool, resulting in decreased performance.
The degree to which performance is reduced correlates with
cognitive load imposed during the task.
3.2
Eye Tracking
It is possible to use direct methods to determine cognitive load imposed on a user. One such method is to employ
eye tracking equipment. Although the expense of this hardware is a substantial barrier to widespread adoption of the
technology, it provides insight into the time-dependency of
cognitive load with task performance.
Eye tracking data provides two distinct methods to measure cognitive load during a task: saccadic eye movement [28],
and pupillary response [13]. While eye tracking data can
be collected and analyzed to provide a real-time estimate
of cognitive load, additional care must be taken to design
and analyze the experiment appropriately. For instance, the
introduction of rapid changes of brightness may artificially
change pupil dilation while interface elements designed to attract gaze attention may confound the analysis of saccadic
movements.
3.3
Brain Activity
The direct inspection of the brain’s cognitive centers provides activity data without the influence of peripheral systems. For example, pupils dilate and contract with respect
to both ambient lighting conditions as well as cognitive load.
However, non-invasive measurements in the form of electroencephalography (EEG) or magnetoencephalography (MEG)
are acquired through expensive, specialized hardware and
represent a weighted average of oscillators distributed throughout the brain [20]. Fortunately, these barriers are not insurmountable.
Brain activity as measured by EEG systems has already
been used to evaluate operational environments [4]. Anderson et al. extended the use of EEG-based analysis to examine the cognitive load imposed on users interpreting different renderings of one dimensional distribution data. By
carefully controlling the experimental conditions, extraneous cognitive load was able to be separated from the other
load sub-types.
4.
USER STUDY DESIGN
Designing user studies to incorporate physiological measurements, whether to investigate cognitive or affective responses, is difficult. Studies striving to measure cognitive
load must take special care to ensure that only a single task
is presented to the user at a time. Due to the complex interactions between the brain and the environment, analysis
of cognitive processes is confounded when this requirement
is not met. If, during analysis, the question Is the brain responding to condition A or condition B?’ is asked, the task
must be simplified.
The method by which cognitive processes is measured
must also be matched to the requirements of the task being
performed. The physical constraints of specific technologies
must be respected during an evaluation experiment. For example, for the highest EEG data quality, movement must be
kept to a minimum, whereas eye tracking data is best collected in environments where ambient illumination is held
constant. This principle should influence the choice of data
acquisition modality, where interaction studies will likely be
better suited to eye tracking and interpretive studies be more
amenable to EEG.
Task difficulty and fatigue also play large roles in the efficacy of an experiment. If the number of trials is too large,
or the task difficulty is too high, the number of effective trials, or statistically viable samples, is reduced. Problems of
this nature impact both affective and cognitive reactions in
the form of undue stress and relief or mental fatigue, respec-
tively.
The number of effective trials is also impacted by the practice effect. Due to the brain’s ability to learn and adapt,
cognitive function will change after many instances of the
task. To mitigate this undesirable property, particular attention must be paid to decrease the repetitiveness of the
task. Reducing the effects of practice in this way will often
yield more effective trials in the experiment.
5.
MEASURING INSIGHT
The purpose of scientific visualization is to display data to
enable scientists to understand and gain insight from their
data. To maximize the effectiveness of visualization, rendering and interaction techniques should be chosen that best
elicit insights about the phenomena studied. This makes the
concept of insight generation central to the goals of evaluating visualization methods to determine their effectiveness.
Chris North took some of the first steps towards measuring
insights generated by visualization through characterizing
the different aspects of insight and proposing mechanisms
to explore them [19]. However, neuroscience studies have
further refined how we can measure insights gained from different problem solving strategies [12, 16, 26]. Without the
use of physiological measurement during evaluation studies,
it is easy to mistake a user’s response for genuine insight for
that of an educated guess.
The instant when a visualization sparks a moment of insight is important. This “Aha!” moment is expressed throughout the brain, causing both cognitive and affective responses [5].
Marketing research studies have explored some of the cognitive and affective measurements of satisfaction responses [21].
This type of experiment was extended to the study of satisfaction and insight measurement during verbal problems
using a combination of fMRI and EEG [12]. Work is currently underway to investigate a gamma-band (25–40Hz) activity and pupillometry to assess the cognitive and affective
impacts of insight expression. Monitoring physiological data
streams enables evaluations to take advantage of this important biological signal, allowing a more robust understanding
of how data representation affects our understanding of scientific data.
5.1
A User Study to Measure Insight
To determine the potential of EEG-based insight measures
in visualization evaluation, a user study must be designed
and conducted. As Chris North pointed out, measuring insight is difficult and if typically best elicited using openended tasks [19]. Unfortunately, this evaluation methodology makes analysis of physiological measures like EEG and
pupillometry prohibitively difficult. This challenge is compounded by the realization that synthetic problems may not
evoke the same type of insight during their interpretation as
a real-world exploration would.
The challenges of creating a user study to measure the
physiological manifestations of insight are not insurmountable. An appropriate user study would allow free, openended, exploration of real scientific data, addressing the potential differences associated with synthetic datasets. For
example, a participant may examine meteorological data
with the overall goal to determine the differences between
low and high pressure systems. The participant will then indicate they have come to a realization by pressing a button
on the keyboard. This action will allow the physiological
records to be appropriately segmented for analysis. At this
time, the participant and examiner will discuss the insight
made about the data. This period of discussion allows a
survey to be completed to help gauge the various characteristics of insight and allows the experiment to progress until
all aspects of the data are adequately explored.
Acknowledgments
We thank Greg Jones, Cat Chong, and Kristi Potter for
their continued support and insightful discussions on user
evaluations, their cognitive consequences, and experimental
design.
6.
REFERENCES
[1] D. Acevedo, C. Jacson, F. Drury, and D. Laidlaw.
Using visual design experts in critique-based
evaluation of 2d vector visualization methods. IEEE
Transactions on Visualization and Computer
Graphics, 14(4):877–884, 2008.
[2] D. Acevedo and D. Laidlaw. Subjective quantification
of perceptual interactions among some 2d scientific
visualization methods. IEEE Transactions on
Visualization and Computer Graphics,
12(5):1133–1140, 2006.
[3] C. Berka, D. J. Levendowski, M. N. Lumicao, A. Yau,
G. Davis, V. T. Zivkovic, R. E. Olmstead, P. D.
Tremoulet, and P. L. Craven. Eeg correlates of task
engagement and mental workload in vigilance,
learning, and memory tasks. Aviation, Space, and
Environmental Medicine, 78(Supplement
1):B231–B244, May.
[4] C. Berka, D. J. Levendowski, C. K. Ramsey, G. Davis,
M. N. Lumicao, K. Stanney, L. Reeves, S. H. Regli,
P. D. Tremoulet, and K. Stibler. Evaluation of an
EEG-workload model in the Aegis simulation
environment. Proceedings of SPIE, pages 90–99, 2005.
[5] E. Bowden and M. Jung-Beeman. Aha! insight
experience correlates with solution activation in the
right hemisphere. Psychonomic Bulletin and Review,
10:730–737, 2003.
[6] R. Brunken, S. Steinbacher, J. Plass, and D. Leutner.
Assessment of cognitive load in multimedia learning
using dual-task methodology. Experimental
Psychology, 49(2):109–119, 2002.
[7] S. Carpendale. Evaluating information visualizations.
In A. Kerren, J. Stasko, J.-D. Fekete, and C. North,
editors, Information Visualization, volume 4950 of
Lecture Notes in Computer Science, pages 19–45.
Springer Berlin / Heidelberg, 2008.
[8] P. Chandler and J. Sweller. Cognitive load theory and
the format of instruction. Cognition and Instruction,
8:293–332, 1991.
[9] K. DeLeeuw and R. Mayer. A comparison of three
measures of cognitive load: Evidence for separable
measures of intrinsic, extraneous and germane load.
Journal of Educational Psychology, 100(1):223–234,
2008.
[10] J. M. Hitt, J. P. Kring, E. Daskarolis, C. Morris, and
M. Mouloua. Assessing mental workload with
subjective measures: An analytical review of the
nasa-tlx index since its inception. Human Factors and
Ergonimics Society Annual Meeting, 43:1404–1404,
1999.
[11] W. Huang, P. Eades, and S.-H. Hong. Beyond time
and error: a cognitive approach to the evaluation of
graph drawings. In Proceedings of the 2008 Workshop
on BEyond time and errors: novel evaLuation
methods for Information Visualization, BELIV ’08,
pages 3:1–3:8, New York, NY, USA, 2008. ACM.
[12] M. Jung-Beeman, E. M. Bowden, J. Haberman, J. L.
Frymiare, S. Arambel-Liu, R. Greenblatt, P. J. Reber,
and J. Kounios. Neural activity when people solve
verbal problems with insight. PLoS Biol, 2(4):e97, 04
2004.
[13] J. Klingner, R. Kumar, and P. Hanrahan. Measuring
the task-evoked pupillary response with a remote eye
tracker. In Proceedings of the 2008 symposium on Eye
tracking research and applications, ETRA ’08, pages
69–72, New York, NY, USA, 2008. ACM.
[14] V. Konovalov and I. Serikov. Characteristics of the
galvanic skin response and electrocardiogram in active
and passive subjects under test conditions. Human
Physiology, 32:578–583, 2006.
10.1134/S0362119706050124.
[15] R. Kosara, C. G. Healey, V. Interrante, D. H. Laidlaw,
and C. Ware. Thoughts on user studies: Why, how
and when. IEEE Computer Graphics and
Applications, 23(4):20–25, 2003.
[16] J. Kounios, J. L. Frymiare, E. M. Bowden, J. I. Fleck,
K. Subramaniam, T. B. Parrish, and M. Jung-Beeman.
The prepared mind: Neural activity prior to problem
presentation predicts subsequent solution by sudden
insight. Psychological Science, 17(10):882–890, 2006.
[17] D. H. Laidlaw, R. M. Kirby, C. D. Jackson, J. S.
Davidson, T. S. Miller, M. da Silva, W. H. Warren,
and M. J. Tarr. Comparing 2d vector field
visualization methods: A user study. IEEE
Transactions on Visualization and Computer
Graphics, 11(2):59–70, 2005.
[18] R. Likert. A technique for the measurement of
attitudes. Archives of Psychology, 140:1–55, 1932.
[19] C. North. Toward measuring visualization insight.
IEEE Computer Graphics and Applications, 26(3):6–9,
2006.
[20] P. Nunez and R. Srinivasan. Electric Fields of the
Brain: The Neurophysics of EEG. New York: Oxford
University Press, 1981.
[21] R. L. Oliver. Cognitive, affective, and attribute bases
of the satisfaction response. Journal of Consumer
Research, 20(3):pp. 418–430, 1993.
[22] B. Reitinger, A. Bornik, R. Beichel, and
D. Schmalstieg. Liver surgery planning using virtual
reality. IEEE Computer Graphics and Applications,
26:36–47, 2006.
[23] N. Riche. Beyond system logging: human logging for
evaluating information visualization. Proc. of SIGCHI
Workshop BELIV 2010, 2010.
[24] N. Riche. Beyond system logging: Human logging for
evaluating information visualization. In BELIV’10
workshop at ACM SIGCHI 2010, 2010.
[25] N. B. Sarter and D. D. Woods. Pilot interaction with
cockpit automation ii: An experimental study of
pilots’ model and awareness of the flight management
system. Int’l J. of Aviation Psychology, 4(1):1–28,
1994.
[26] B. R. Sheth, S. Sandkühler, and J. Bhattacharya.
Posterior beta and anterior gamma oscillations predict
cognitive insight. Journal of Cognitive Neuroscience,
21(7):1269–1279, 2009.
[27] B. Shneiderman and C. Plaisant. Strategies for
evaluating information visualization tools: MILCS.
Proc. of AVI Workshop BELIV 2006, pages 1–7, 2006.
[28] E. Stuyven, K. V. der Goten, A. Vandierendonck,
K. Claeys, and L. Crevits. The effect of cognitive load
on saccadic eye movements. Acta Psychologica,
104(1):69 – 85, 2000.