Measuring Cognitive Load in Multimedia Instruction: A Comparison of Two Instruments David Windell ([email protected]) Eric N. Wiebe ([email protected]) North Carolina State University Presented in AERA Division C-Learning and Instruction, Section 6-Cognitive, Social and Motivational Processes, Session 5 March, 2007 Abstract The current study looks to review two popular self-report measures (NASA Task Load indeX, and Paas’ self-report instrument) and identify not only if they are consistent with one another, but also to discover whether both are equally sensitive across changes in levels of cognitive load. The two subclasses looked at in this study are intrinsic load, which is related to element interactivity, and extraneous load, which is influenced by the instructional design itself. Results from this study indicate that the NASA-Task Load index, as a weighted multi-dimensional rating scale, differs in measurement of the demands faced by learners in a PC-based, multimedia-learning environment from the more traditional, singlequestions short subjective instrument. Windell & Wiebe (2007) **Do not Redistribute** 2 Introduction Recent improvements in multimedia technologies have made the inclusion and distribution of audio narration overlays, as well as complex visual representations of information in online instructional materials affordable and widespread. Unfortunately for designers of distance education, this considerable amount of design space can lead to the creation of ineffective instructional material. A major reason that leads to this ineffectiveness is that the use of multimedia materials does not lead to a deep understanding of the material to be learned. Cognitive load theory can be used as a framework for understanding factors that may result from less than optimal instructional design environments. Cognitive load can be defined as a multidimensional construct representing the load that performing a particular task imposes on the learner’s cognitive system (Paas & van Merriënboer, 1994; Paas, Renkl, & Sweller, 2003; Sweller, Merrienboer, & Paas, 1998). As such, the amount of cognitive load, measured at a given time, is a way of assessing the level of information being manipulated in working memory. Effectively understanding the level of cognitive load or stress on working memory can help gauge the cognitive capacity for learning. This study reviews two self-report instruments as to their efficacy in measuring cognitive load. Theoretical Framework Cognitive Load Theory (CLT) defines three subclasses of cognitive load that additively contribute to the accumulated cognitive load during a given point during learning. These three subclasses interact and fluctuate throughout the task, and at any instance will have a differing impact on the limited capacity to manage overall load. The amount that any one type of load fluctuates would allow for the other two to rise or lower in their contributions to overall load, as more mental resources may be allocated to handle management of such loads. The three types of load identified by CLT include intrinsic load, germane load, and extraneous load (Paas, Tuovinen, Tabbers, & Van Gervin, 2003). Intrinsic load is that effort which results from the nature of the learning task and its interaction with the individual’s abilities and experiences. Germane load is that load created in construction of schemas during learning. Extraneous load is that cognitive load which is not necessary for learning, and is under the control of the designer. Factors determining extraneous load include presentation format and use of graphics or animations (Paas et al., 2003). In an effort to understand the amount of mental workload imparted on learners, researchers have developed and tested methods for assessing cognitive load across a variety of tasks and situations. Theory predicts what factors will contribute to each part of the load, but measures of load by and large only measure the composite of all three of these parts either directly, usually using self report, or indirectly using methods such as dual task techniques or measures of learning outcomes. Intrinsic, extraneous, and germane load, which are sometimes difficult to distinguish post hoc, are taken as a whole by the overall measurement. Windell & Wiebe (2007) **Do not Redistribute** 3 The current research uses two rating scale assessment techniques, a short selfreport instrument (SSI), which is a single question of the perception of overall mental load developed an refined by Paas, Tuovinen, et al. (2003), and the NASA-Task Load indeX (TLX) (Moroney, Biers, Eggemeier, & Mitchell, 1992), to assess levels of cognitive load across a variety of load situations where intrinsic and extraneous load are manipulated (see Appendix 1). Where the SSI provides only a measure of overall load via a single question, the multi-dimensional NASA-TLX measures workload via six subscales, each associated with a different source of workload. These subscales can also be combined into an overall weighted workload score (WWL). The WWL is derived by having the participants rate the relative importance of each of the six sources of workload, and then using this result to weight each scale in a combined score. If the NASA-TLX is shown to be more sensitive to variations in load across combinations of extraneous load and intrinsic load, then more research should be devoted to using this measure to understand the three types of cognitive load and use the findings in the design of multimedia instructional materials. We hypothesize that the mental demand subscale of the TLX rating correlates highly with the SSI across levels of both extraneous and intrinsic load because of the similar nature of the assessment questions (Hypothesis 1). However, because of the multi-dimensional nature of the NASA-TLX, we expect a lower correlation between the TLX Weighted Work Load score (WWL) and the SSI across module design conditions (which manipulated extraneous load) and learning modules (which manipulated intrinsic load) (Hypothesis 2). This decreased correlation is expected because subscales of the TLX that contribute to the WWL are expected to have variable sensitivity to extraneous and intrinsic load, thus resulting in the WWL being more sensitive to variations in these two types of load. The NASA-TLX posits multiple sources of workload and has designed a set of six subscales to track these different sources individually. Similarly, cognitive load theory also assumes multiple sources of mental load (i.e., extraneous, intrinsic, and germane). There is reason to believe that some of the NASA-TLX subscales may differentially track these different sources of cognitive load. Method Participants Participants in this study included forty-eight students enrolled in Introductory Psychology at North Carolina State University. Students were screened prior to exposure to experimental conditions for past experience in meteorology and earth science. Materials Learning modules on weather related to what influences the direction and strength of the wind were presented to students using timed Microsoft PowerPoint presentations (Appendix 2). Graphics on the PowerPoint slides were a mixture of static and animated graphics, but identical across conditions. Participants in narration conditions used stereo headphones. Windell & Wiebe (2007) **Do not Redistribute** 4 Independent Variables Two independent variables were controlled for in the current study: level of extraneous load and level of intrinsic load. Extraneous load was manipulated by placing participants in one of three Design Conditions driven by both the split-attention and modality effects (cf., Mayer & Moreno, 1998; Tindall-Ford, Chandler, & Sweller, 1997): Design Condition A- learning materials presented with a combination of text, static graphical displays, and animated graphical displays. Text and graphics are displayed sequentially, with informational text being followed by graphical representations of material to be learned. Animation was utilized in instances where motion is necessary for learning. Design Condition B- learning materials presented with a combination of narration with the same static and animated graphics as Design Condition A. Audio narration duplicated information presented in text in Design Condition A, and was presented serially with respect to graphical information. During the presentation of audio narration, learners viewed a blank slide. Design Condition C - learning materials presented with a combination of narration with the same static and animated graphics as Design Conditions A-B. Audio narration will be presented synchronously with graphics. Intrinsic load was manipulated by varying the level of complexity of learning materials through increasing element interactivity (Sweller, et al., 1998). Each of the above design conditions was replicated in three distinct learning modules (Modules 1-3). Modules 1-3 were presented in the same sequence in both conditions across participants, with the information becoming more complex with each module. This added complexity occurred by asking participants to consider additional forces acting on air parcels with each subsequent module, with Module 1 considering a single force, Module 2, two forces, and Module 3, three forces. Each additional force added interacts with other forces introduced in earlier modules. Dependent Variables Dependent variables associated with this research were cognitive load level, as measured by the SSI and NASA-TLX. In addition, participants answered recall and transfer test questions related to the content of the material. Procedure Prior to entering the experiment, participants were assigned to one of the three Design Conditions outlined above. Order of the cognitive load instruments was randomized, with some learners filling out the Short Subjective Instrument followed by the NASA-TLX, and vice versa for the remaining participants. Upon entering the lab, students were greeted and asked to fill out an informed consent form. The experimenter then asked the students to take the Module 1 pre-test. Following this, students participated in Module 1 (low task difficulty). At the completion of this module, students answered post-test questions. Immediately following, participants filled out both the Short Subjective Instrument and the NASA-TLX. Students were then offered a short (three minute) break to stretch or get a drink before repeating this process for Modules 2 and 3 (medium and high task Windell & Wiebe (2007) **Do not Redistribute** 5 difficulty, respectively). After completion of Module 3, students completed a short background questionnaire to obtain demographic information in order to provide a distracter before they completed the NASA-TLX-Part 2, the pair wise comparison, which immediately followed the background questionnaire. Results Pearson correlations were run for all three Design Conditions (A-C) comparing the NASA-TLX Mental Demands subscale with the SSI. In Design Condition A, Pearson correlations for Modules 1-3 were .76, .72, and .98, respectively. In Design Condition B, Pearson correlations for Modules 1-3 were .84, .93, and .56 respectively. In Design Condition C, Pearson correlations in Modules 1-3 were .76, .89, and .96 respectively. Contrasting this, Pearson correlations comparing the NASA-TLX Weighted Work Load and SSI were noticeably lower across all Design Conditions and Modules. In Design Condition A, Pearson correlations in Modules 1-3 were -.38, -.37 and -.007 respectively. In Design Condition B, Pearson correlations in Modules 1-3 were .22, -.29, and -.51 respectively. In Design Condition C, Pearson correlations for Modules 1-3 were .36, .67, and .71, respectively. A two-way repeated measures analysis of variance (ANOVA) was conducted to investigate the effect of Design Condition and Learning Module on the NASA-TLX Mental Demands subscale category among participants. Interaction between Design Condition and Learning Module was not significant (F(4,90)=.481, p=.749). There was a significant main effect for Learning Module (F(2,90)=6.127, p=.003) (Figure 1). Post hoc contrasts showed significance between Learning Modules 1 and 2 (p=.047) and Learning Modules 1 and 3(p=.005) but not for Learning Modules 2 and 3 (p=.078) A two-way repeated measures analyses of variance (ANOVA) was conducted to investigate the effect of Design Condition and Learning Module on the SSI among participants. ANOVA results indicated a significant main effect for Learning Module (F(2,90)=23.608, p<.001), using Huynh-Feldt correction for non-sphericity (Figure 2). Interaction between factors was not significant (F(4,90)=.621, p=.649). Post hoc contrasts showed significance for the SSI between Learning Modules 1 and 2 (p<.001), Learning Modules 2 and 3 (p=.008), and Learning Modules 1 and 3(p<.001). The two-way ANOVA results for WWL also indicated significant main effects for Learning Module (F(2,90)=12.667, p<.001) and Design Condition (F(2, 45) = 3.98, p= .026), using Huynh-Feldt correction for non-sphericity (Figure 3). Interaction between factors was not significant (F(4,90)=..83, p=.510). Post hoc contrasts also showed significance between Learning Modules 1 and 2 (p=.002), Learning Modules 2 and 3 (p=.022), and Learning Modules 1 and 3(p<.001). Post hoc Bonferroni tests for the factor Design Condition showed significance between Design Conditions A and C for the dependent variable WWL (p=.022). Windell & Wiebe (2007) **Do not Redistribute** 6 Module X Condition - Mental Demands 14 mean score 12 10 Module 8 1 2 3 6 4 2 0 A B C Condition Figure 1. Module x Design Condition Results- NASA-TLX Mental Demands Module X Condition - SSI 5 SSI mean score 4.5 Module 4 1 3.5 2 3 3 2.5 2 A B C Condition Figure 2. Module x Design Condition Results- Short Subjective Instrument Windell & Wiebe (2007) **Do not Redistribute** 7 Module X Condition -WWL 13 12 WWL Mean Score 11 Module 10 1 2 3 9 8 7 6 5 A B C Condition Figure 3. Module x Design Condition Results - Weighted Work Load The learning gains showed non-significant differences across the three Modules. Also, a non-significant trend was seen across Design Condition with Condition C showing the highest scores. Discussion and Educational Importance Results from this study indicate that the NASA-Task Load indeX, a weighted and multi-dimensional rating scale, differs in measurement of the demands faced by learners in a PC-based, multimedia-learning environment, from Paas’ single question measure. This difference is likely due to responses on TLX subscales across varying levels and types of load, as manipulated in this study by presentation format and content difficulty. Hypothesis 1, referencing similar subjective ratings for the SSI and the NASATLX Mental Demands subscale, was supported in the results by the high correlations observed between the two. These high correlations were true across all combinations of Design Condition and Module. ANOVA results showed similar responses to Design Condition and Module. Hypothesis 2, referencing a difference between the NASA-TLX Weighted Work Load and the SSI was also somewhat supported by correlation and ANOVA results. In each combination of Design Condition and Module, correlation coefficients were noticeably lower than those of the NASA-TLX Mental Demands subscale and the SSI. The ANOVA results, however, showed similar directionality in terms of the effects of Module and Design Condition. It is important to note, however, that SSI did not show a significance effect from Design Condition while the same test did show significant Windell & Wiebe (2007) **Do not Redistribute** 8 effects for the NASA-TLX WWL. This possibly indicates that the WWL is more sensitive to changes in extraneous load than the SSI. References Mayer, R. E., & Moreno, R. (1998). A split-attention effect in multimedia learning: Evidence for dual processing systems in working memory. Journal of Educational Psychology, 90(2), 312-320. Moroney, W. F., Biers, D. W., Eggemeier, F. T. and Mitchell, J. A. (1992). A Comparison of two scoring procedures with the NASA TASK LOAD INDEX in a simulated flight task, Proceedings of the IEEE NAECON 1992 National Aerospace and Electronics Conference, New York, 2, 734-740 Paas, F. G. W. C., Tuovinen, J. E., Tabbers, H., & Van Gervin, P. W. M. (2003). Cognitive load measurement as a means to advance cognitive load theory. Educational Psychologist, 38(1), 63-71. Paas, F., Renkl, A., & Sweller, J. (2003). Cognitive load theory and instructional design: Recent developments. Educational Psychologist, 38(1), 1-4. Paas, F., van Merriënboer, J. J. G., & Adam, J. J. (1994). Measurement of cognitive load in instructional research. Perceptual and Motor Skills, 79, 419-430. Sweller, J., Merrienboer, J. J. G. v., & Paas, F. G. W. C. (1998). Cognitive architecture and instructional design. Educational Psychology Review, 10, 251-296. Tindall-Ford, S., Chandler, P., & Sweller, J. (1997). When two sensory modes are better than one. Journal of Experimental Psychology-Applied, 3(4), 257-287. Windell & Wiebe (2007) **Do not Redistribute** 9 Appendix 1 Short Subjective Instrument Circle the number that best describes your experience today How difficult was it for you to understand this learning module and correctly answer the questions that followed? Extremely Easy Extremely Difficult 1------------2------------3------------4------------5------------6------------7 Windell & Wiebe (2007) **Do not Redistribute** NASA Task Load indeX (TLX) 10 Windell & Wiebe (2007) **Do not Redistribute** 11 NASA- Task Load index – Part 2 Instructions- Select the member of each pair that provided the most significant source of work to you in today’s tasks (circle your answer) Physical Demand or Mental Demand Temporal Demand or Mental Demand Performance or Mental Demand Frustration or Mental Demand Effort or Mental Demand Temporal Demand or Physical Demand Performance or Physical Demand Frustration or Physical Demand Effort or Physical Demand Temporal Demand or Performance Temporal Demand or Frustration Temporal Demand or Effort Performance or Frustration Performance or Effort Effort or Frustration Windell & Wiebe (2007) **Do not Redistribute** Appendix 2 Example PowerPoint Slides Note: All three graphics are frame captures of animations Module 1 Module 2 12 Windell & Wiebe (2007) Module 3 **Do not Redistribute** 13
© Copyright 2026 Paperzz