Evaluating Scientific Visualization Using Cognitive Measures Erik W. Anderson Scientific Computing and Imaging Institute University of Utah 72. S. Central Campus Drive Salt Lake City, UT, USA [email protected] ABSTRACT In this position paper, we discuss the problems and advantages of using physiological measurements to to estimate cognitive load in order to evaluate scientific visualization methods. We will present various techniques and technologies designed to measure cognitive load and how they may be leveraged in the context of user evaluation studies for scientific visualization. We also discuss the challenges of experiments designed to use these physiological measurements. Categories and Subject Descriptors H.5.m [Information interfaces and presentation]: Miscellaneous; D.2.8 [Software Engineering]: Metrics—performance measures Keywords Scientific Visualization, Evaluation, Human-Computer Interfaces 1. INTRODUCTION The inherent complexity of scientific data, coupled with the variety of applicable rendering methods motivate robust evaluations of newly developed techniques. Evaluating scientific visualization techniques is a longstanding challenge [1, 2, 15]. Similarly, the field of information visualization has a strong tradition in pioneering research in evaluation techniques [27, 23, 7]. User studies often rely on timing and accuracy information collected during the study coupled with subjective user surveys given after the experiment is completed. This combination of empirical measurement with subjective questionnaire is designed to assess the efficacy of a visualization technique with respect to related methods. However, the analysis of user evaluation studies remains difficult. These challenges are often compounded by the limited empirical data acquired during the study. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. BELIV 2012 Seattle, WA, USA Copyright 2012 ACM 978-1-4503-1797-7 978-1-4503-1797-7 ...$10.00. Beyond the specific details of the many user study experiments, they all share a common goal: to assess the strengths and weaknesses inherent to a visualization technique or system. Incorporating as many objective measures as possible into the experiment not only provides a more robust analysis, but also mitigates subjectivity often introduced by users’ preferences, biases, and retrospection. In this position paper, we review traditional evaluation techniques that consist of data gleaned from system logging. We then then outline evaluation methods using physiological measures for the assessment of scientific visualization efficacy. Finally, we outline a potential user study using physiological data to determine how well these data capture moments of insight. The discussion of this hypothetical user study addresses the added difficulties associated with more open-ended tasks used in conjunction with physiological data streams. 2. TRADITIONAL EVALUATION METHODS Due to the nature of today’s complex scientific data, simply displaying all available information does not adequately meet the demands of domain scientists. Determining the best use of visualization techniques is one of the goals of scientific visualization evaluations. Evaluation methods are dictated by the types of improvements offered by the method being studied. Some evaluations are concerned primarily with technological improvements such as rendering speed or the management of large data. User studies have been used to evaluate everything from aircraft cockpits [25] and surgical environments [22] to visualization methods [17]. Evaluating visualization methods that focus on human factors often employ user studies or expert evaluations to determine their effects on interpretation and usability. Although expert assessment takes advantage of knowledgeable users to enable more poignant analysis of use cases, these experts also bring with them their own preconceptions and preferences that can skew studies. Traditional evaluation methods provide mechanisms to gauge aspects of visualizations or environment. Unfortunately, experiments using surveys to measure user experience introduce subjectivity and bias from the users. Subjectivity in user responses may be partially mitigated using questionnaires developed with the Likert Scale [18]. User feedback in evaluation may provide important insights into how users interact with the system being studied. However, these measures do not help answer questions regarding how effective a method is at eliciting insight from a dataset - a primary purpose of visualization. 3. COGNITIVE LOAD IN VISUALIZATION EVALUATION Since cognition is defined as the process of knowledge acquisition and reasoning, it is a reasonable goal of visualization to place as small a burden on a person’s cognitive resources as possible. By limiting the cognitive load associated with interpreting a visualization, additional resources may be employed to reason about the salient aspects presented by the imagery. Physiological measures can be a way of quantitatively assessing the cognitive load imposed on a user while interpreting a visualization. The Information Visualization (InfoVis) community has adopted the use of physiological measures for the evaluation of new and existing data presentation techniques [11]. There are many different human physiological responses that may be measured during an evaluation experiment; gaze location, pupil dilation, heart rate, respiration, skin conductivity changes, muscle activity, brain activity, and metabolic activity to name a few. For a summary of these measurements and their uses, I direct the reader to Nathalie Riche’s work [24]. Although many measurement techniques exist to gauge physiological responses to a stimulus, not all of them express the ease (or difficulty) of completing a specific task. Instead, many physiological measures are indicative of the affective response: the emotional interaction with stimulation. While it is clear that affective responses are intrinsically linked with insight generation [14], their link to cognitive load is less well explored. In 1991, Chandler and Sweller introduced Cognitive Load Theory in which load placed upon the cognitive system is categorized into three distinct sources [8]. Germane load is the load imposed by learning a new task, Intrinsic load represents the inherent difficulty of the problem at hand, and Extraneous load is generated by the representation of the data presented to the user for interpretation and action. In terms of visualization evaluation, it is extraneous load that we are most interested in as it aligns most closely with the task of assessing the effectiveness of a given visualization. 3.1 Indirect Cognitive Measures Several accepted methods have been developed to capture cognitive load using subjective data collected after a task is completed. The NASA-TLX index uses perceived mental effort as a way to assess the task’s workload [10]. These selfreported mental-effort ratings have been shown to correlate with mental workload measured by direct inspection of brain activity through EEG [3]. Recently, it was shown that it is possible to extract individual measures of germane, intrinsic, and extraneous cognitive load using these subjective mentaleffort ratings [9]. Another way to indirectly measure cognitive load uses timing and accuracy data while a user is performing the evaluation task concurrently with a controlled working memory task [6]. This artificial loading of working memory, and thus the cognitive system, is designed to overload a user’s limited resource pool, resulting in decreased performance. The degree to which performance is reduced correlates with cognitive load imposed during the task. 3.2 Eye Tracking It is possible to use direct methods to determine cognitive load imposed on a user. One such method is to employ eye tracking equipment. Although the expense of this hardware is a substantial barrier to widespread adoption of the technology, it provides insight into the time-dependency of cognitive load with task performance. Eye tracking data provides two distinct methods to measure cognitive load during a task: saccadic eye movement [28], and pupillary response [13]. While eye tracking data can be collected and analyzed to provide a real-time estimate of cognitive load, additional care must be taken to design and analyze the experiment appropriately. For instance, the introduction of rapid changes of brightness may artificially change pupil dilation while interface elements designed to attract gaze attention may confound the analysis of saccadic movements. 3.3 Brain Activity The direct inspection of the brain’s cognitive centers provides activity data without the influence of peripheral systems. For example, pupils dilate and contract with respect to both ambient lighting conditions as well as cognitive load. However, non-invasive measurements in the form of electroencephalography (EEG) or magnetoencephalography (MEG) are acquired through expensive, specialized hardware and represent a weighted average of oscillators distributed throughout the brain [20]. Fortunately, these barriers are not insurmountable. Brain activity as measured by EEG systems has already been used to evaluate operational environments [4]. Anderson et al. extended the use of EEG-based analysis to examine the cognitive load imposed on users interpreting different renderings of one dimensional distribution data. By carefully controlling the experimental conditions, extraneous cognitive load was able to be separated from the other load sub-types. 4. USER STUDY DESIGN Designing user studies to incorporate physiological measurements, whether to investigate cognitive or affective responses, is difficult. Studies striving to measure cognitive load must take special care to ensure that only a single task is presented to the user at a time. Due to the complex interactions between the brain and the environment, analysis of cognitive processes is confounded when this requirement is not met. If, during analysis, the question Is the brain responding to condition A or condition B?’ is asked, the task must be simplified. The method by which cognitive processes is measured must also be matched to the requirements of the task being performed. The physical constraints of specific technologies must be respected during an evaluation experiment. For example, for the highest EEG data quality, movement must be kept to a minimum, whereas eye tracking data is best collected in environments where ambient illumination is held constant. This principle should influence the choice of data acquisition modality, where interaction studies will likely be better suited to eye tracking and interpretive studies be more amenable to EEG. Task difficulty and fatigue also play large roles in the efficacy of an experiment. If the number of trials is too large, or the task difficulty is too high, the number of effective trials, or statistically viable samples, is reduced. Problems of this nature impact both affective and cognitive reactions in the form of undue stress and relief or mental fatigue, respec- tively. The number of effective trials is also impacted by the practice effect. Due to the brain’s ability to learn and adapt, cognitive function will change after many instances of the task. To mitigate this undesirable property, particular attention must be paid to decrease the repetitiveness of the task. Reducing the effects of practice in this way will often yield more effective trials in the experiment. 5. MEASURING INSIGHT The purpose of scientific visualization is to display data to enable scientists to understand and gain insight from their data. To maximize the effectiveness of visualization, rendering and interaction techniques should be chosen that best elicit insights about the phenomena studied. This makes the concept of insight generation central to the goals of evaluating visualization methods to determine their effectiveness. Chris North took some of the first steps towards measuring insights generated by visualization through characterizing the different aspects of insight and proposing mechanisms to explore them [19]. However, neuroscience studies have further refined how we can measure insights gained from different problem solving strategies [12, 16, 26]. Without the use of physiological measurement during evaluation studies, it is easy to mistake a user’s response for genuine insight for that of an educated guess. The instant when a visualization sparks a moment of insight is important. This “Aha!” moment is expressed throughout the brain, causing both cognitive and affective responses [5]. Marketing research studies have explored some of the cognitive and affective measurements of satisfaction responses [21]. This type of experiment was extended to the study of satisfaction and insight measurement during verbal problems using a combination of fMRI and EEG [12]. Work is currently underway to investigate a gamma-band (25–40Hz) activity and pupillometry to assess the cognitive and affective impacts of insight expression. Monitoring physiological data streams enables evaluations to take advantage of this important biological signal, allowing a more robust understanding of how data representation affects our understanding of scientific data. 5.1 A User Study to Measure Insight To determine the potential of EEG-based insight measures in visualization evaluation, a user study must be designed and conducted. As Chris North pointed out, measuring insight is difficult and if typically best elicited using openended tasks [19]. Unfortunately, this evaluation methodology makes analysis of physiological measures like EEG and pupillometry prohibitively difficult. This challenge is compounded by the realization that synthetic problems may not evoke the same type of insight during their interpretation as a real-world exploration would. The challenges of creating a user study to measure the physiological manifestations of insight are not insurmountable. An appropriate user study would allow free, openended, exploration of real scientific data, addressing the potential differences associated with synthetic datasets. For example, a participant may examine meteorological data with the overall goal to determine the differences between low and high pressure systems. The participant will then indicate they have come to a realization by pressing a button on the keyboard. This action will allow the physiological records to be appropriately segmented for analysis. At this time, the participant and examiner will discuss the insight made about the data. This period of discussion allows a survey to be completed to help gauge the various characteristics of insight and allows the experiment to progress until all aspects of the data are adequately explored. Acknowledgments We thank Greg Jones, Cat Chong, and Kristi Potter for their continued support and insightful discussions on user evaluations, their cognitive consequences, and experimental design. 6. REFERENCES [1] D. Acevedo, C. Jacson, F. Drury, and D. Laidlaw. Using visual design experts in critique-based evaluation of 2d vector visualization methods. IEEE Transactions on Visualization and Computer Graphics, 14(4):877–884, 2008. [2] D. Acevedo and D. Laidlaw. Subjective quantification of perceptual interactions among some 2d scientific visualization methods. IEEE Transactions on Visualization and Computer Graphics, 12(5):1133–1140, 2006. [3] C. Berka, D. J. Levendowski, M. N. Lumicao, A. Yau, G. Davis, V. T. Zivkovic, R. E. Olmstead, P. D. Tremoulet, and P. L. Craven. Eeg correlates of task engagement and mental workload in vigilance, learning, and memory tasks. Aviation, Space, and Environmental Medicine, 78(Supplement 1):B231–B244, May. [4] C. Berka, D. J. Levendowski, C. K. Ramsey, G. Davis, M. N. Lumicao, K. Stanney, L. Reeves, S. H. Regli, P. D. Tremoulet, and K. Stibler. Evaluation of an EEG-workload model in the Aegis simulation environment. Proceedings of SPIE, pages 90–99, 2005. [5] E. Bowden and M. Jung-Beeman. Aha! insight experience correlates with solution activation in the right hemisphere. Psychonomic Bulletin and Review, 10:730–737, 2003. [6] R. Brunken, S. Steinbacher, J. Plass, and D. Leutner. Assessment of cognitive load in multimedia learning using dual-task methodology. Experimental Psychology, 49(2):109–119, 2002. [7] S. Carpendale. Evaluating information visualizations. In A. Kerren, J. Stasko, J.-D. Fekete, and C. North, editors, Information Visualization, volume 4950 of Lecture Notes in Computer Science, pages 19–45. Springer Berlin / Heidelberg, 2008. [8] P. Chandler and J. Sweller. Cognitive load theory and the format of instruction. Cognition and Instruction, 8:293–332, 1991. [9] K. DeLeeuw and R. Mayer. A comparison of three measures of cognitive load: Evidence for separable measures of intrinsic, extraneous and germane load. Journal of Educational Psychology, 100(1):223–234, 2008. [10] J. M. Hitt, J. P. Kring, E. Daskarolis, C. Morris, and M. Mouloua. Assessing mental workload with subjective measures: An analytical review of the nasa-tlx index since its inception. Human Factors and Ergonimics Society Annual Meeting, 43:1404–1404, 1999. [11] W. Huang, P. Eades, and S.-H. Hong. Beyond time and error: a cognitive approach to the evaluation of graph drawings. In Proceedings of the 2008 Workshop on BEyond time and errors: novel evaLuation methods for Information Visualization, BELIV ’08, pages 3:1–3:8, New York, NY, USA, 2008. ACM. [12] M. Jung-Beeman, E. M. Bowden, J. Haberman, J. L. Frymiare, S. Arambel-Liu, R. Greenblatt, P. J. Reber, and J. Kounios. Neural activity when people solve verbal problems with insight. PLoS Biol, 2(4):e97, 04 2004. [13] J. Klingner, R. Kumar, and P. Hanrahan. Measuring the task-evoked pupillary response with a remote eye tracker. In Proceedings of the 2008 symposium on Eye tracking research and applications, ETRA ’08, pages 69–72, New York, NY, USA, 2008. ACM. [14] V. Konovalov and I. Serikov. Characteristics of the galvanic skin response and electrocardiogram in active and passive subjects under test conditions. Human Physiology, 32:578–583, 2006. 10.1134/S0362119706050124. [15] R. Kosara, C. G. Healey, V. Interrante, D. H. Laidlaw, and C. Ware. Thoughts on user studies: Why, how and when. IEEE Computer Graphics and Applications, 23(4):20–25, 2003. [16] J. Kounios, J. L. Frymiare, E. M. Bowden, J. I. Fleck, K. Subramaniam, T. B. Parrish, and M. Jung-Beeman. The prepared mind: Neural activity prior to problem presentation predicts subsequent solution by sudden insight. Psychological Science, 17(10):882–890, 2006. [17] D. H. Laidlaw, R. M. Kirby, C. D. Jackson, J. S. Davidson, T. S. Miller, M. da Silva, W. H. Warren, and M. J. Tarr. Comparing 2d vector field visualization methods: A user study. IEEE Transactions on Visualization and Computer Graphics, 11(2):59–70, 2005. [18] R. Likert. A technique for the measurement of attitudes. Archives of Psychology, 140:1–55, 1932. [19] C. North. Toward measuring visualization insight. IEEE Computer Graphics and Applications, 26(3):6–9, 2006. [20] P. Nunez and R. Srinivasan. Electric Fields of the Brain: The Neurophysics of EEG. New York: Oxford University Press, 1981. [21] R. L. Oliver. Cognitive, affective, and attribute bases of the satisfaction response. Journal of Consumer Research, 20(3):pp. 418–430, 1993. [22] B. Reitinger, A. Bornik, R. Beichel, and D. Schmalstieg. Liver surgery planning using virtual reality. IEEE Computer Graphics and Applications, 26:36–47, 2006. [23] N. Riche. Beyond system logging: human logging for evaluating information visualization. Proc. of SIGCHI Workshop BELIV 2010, 2010. [24] N. Riche. Beyond system logging: Human logging for evaluating information visualization. In BELIV’10 workshop at ACM SIGCHI 2010, 2010. [25] N. B. Sarter and D. D. Woods. Pilot interaction with cockpit automation ii: An experimental study of pilots’ model and awareness of the flight management system. Int’l J. of Aviation Psychology, 4(1):1–28, 1994. [26] B. R. Sheth, S. Sandkühler, and J. Bhattacharya. Posterior beta and anterior gamma oscillations predict cognitive insight. Journal of Cognitive Neuroscience, 21(7):1269–1279, 2009. [27] B. Shneiderman and C. Plaisant. Strategies for evaluating information visualization tools: MILCS. Proc. of AVI Workshop BELIV 2006, pages 1–7, 2006. [28] E. Stuyven, K. V. der Goten, A. Vandierendonck, K. Claeys, and L. Crevits. The effect of cognitive load on saccadic eye movements. Acta Psychologica, 104(1):69 – 85, 2000.
© Copyright 2026 Paperzz