Measuring Cognitive Load in Multimedia Instruction

Measuring Cognitive Load in Multimedia Instruction: A Comparison of Two
Instruments
David Windell ([email protected])
Eric N. Wiebe ([email protected])
North Carolina State University
Presented in AERA Division C-Learning and Instruction, Section 6-Cognitive, Social and
Motivational Processes, Session 5
March, 2007
Abstract
The current study looks to review two popular self-report measures (NASA Task
Load indeX, and Paas’ self-report instrument) and identify not only if they are consistent
with one another, but also to discover whether both are equally sensitive across changes
in levels of cognitive load. The two subclasses looked at in this study are intrinsic load,
which is related to element interactivity, and extraneous load, which is influenced by the
instructional design itself.
Results from this study indicate that the NASA-Task Load index, as a weighted
multi-dimensional rating scale, differs in measurement of the demands faced by learners
in a PC-based, multimedia-learning environment from the more traditional, singlequestions short subjective instrument.
Windell & Wiebe (2007)
**Do not Redistribute**
2
Introduction
Recent improvements in multimedia technologies have made the inclusion and
distribution of audio narration overlays, as well as complex visual representations of
information in online instructional materials affordable and widespread. Unfortunately
for designers of distance education, this considerable amount of design space can lead to
the creation of ineffective instructional material. A major reason that leads to this
ineffectiveness is that the use of multimedia materials does not lead to a deep
understanding of the material to be learned.
Cognitive load theory can be used as a framework for understanding factors that
may result from less than optimal instructional design environments. Cognitive load can
be defined as a multidimensional construct representing the load that performing a
particular task imposes on the learner’s cognitive system (Paas & van Merriënboer, 1994;
Paas, Renkl, & Sweller, 2003; Sweller, Merrienboer, & Paas, 1998). As such, the amount
of cognitive load, measured at a given time, is a way of assessing the level of information
being manipulated in working memory. Effectively understanding the level of cognitive
load or stress on working memory can help gauge the cognitive capacity for learning.
This study reviews two self-report instruments as to their efficacy in measuring cognitive
load.
Theoretical Framework
Cognitive Load Theory (CLT) defines three subclasses of cognitive load that
additively contribute to the accumulated cognitive load during a given point during
learning. These three subclasses interact and fluctuate throughout the task, and at any
instance will have a differing impact on the limited capacity to manage overall load. The
amount that any one type of load fluctuates would allow for the other two to rise or lower
in their contributions to overall load, as more mental resources may be allocated to
handle management of such loads. The three types of load identified by CLT include
intrinsic load, germane load, and extraneous load (Paas, Tuovinen, Tabbers, & Van
Gervin, 2003).
Intrinsic load is that effort which results from the nature of the learning task and
its interaction with the individual’s abilities and experiences. Germane load is that load
created in construction of schemas during learning. Extraneous load is that cognitive load
which is not necessary for learning, and is under the control of the designer. Factors
determining extraneous load include presentation format and use of graphics or
animations (Paas et al., 2003).
In an effort to understand the amount of mental workload imparted on learners,
researchers have developed and tested methods for assessing cognitive load across a
variety of tasks and situations. Theory predicts what factors will contribute to each part of
the load, but measures of load by and large only measure the composite of all three of
these parts either directly, usually using self report, or indirectly using methods such as
dual task techniques or measures of learning outcomes. Intrinsic, extraneous, and
germane load, which are sometimes difficult to distinguish post hoc, are taken as a whole
by the overall measurement.
Windell & Wiebe (2007)
**Do not Redistribute**
3
The current research uses two rating scale assessment techniques, a short selfreport instrument (SSI), which is a single question of the perception of overall mental
load developed an refined by Paas, Tuovinen, et al. (2003), and the NASA-Task Load
indeX (TLX) (Moroney, Biers, Eggemeier, & Mitchell, 1992), to assess levels of
cognitive load across a variety of load situations where intrinsic and extraneous load are
manipulated (see Appendix 1). Where the SSI provides only a measure of overall load via
a single question, the multi-dimensional NASA-TLX measures workload via six
subscales, each associated with a different source of workload. These subscales can also
be combined into an overall weighted workload score (WWL). The WWL is derived by
having the participants rate the relative importance of each of the six sources of
workload, and then using this result to weight each scale in a combined score. If the
NASA-TLX is shown to be more sensitive to variations in load across combinations of
extraneous load and intrinsic load, then more research should be devoted to using this
measure to understand the three types of cognitive load and use the findings in the design
of multimedia instructional materials.
We hypothesize that the mental demand subscale of the TLX rating correlates
highly with the SSI across levels of both extraneous and intrinsic load because of the
similar nature of the assessment questions (Hypothesis 1). However, because of the
multi-dimensional nature of the NASA-TLX, we expect a lower correlation between the
TLX Weighted Work Load score (WWL) and the SSI across module design conditions
(which manipulated extraneous load) and learning modules (which manipulated intrinsic
load) (Hypothesis 2). This decreased correlation is expected because subscales of the
TLX that contribute to the WWL are expected to have variable sensitivity to extraneous
and intrinsic load, thus resulting in the WWL being more sensitive to variations in these
two types of load. The NASA-TLX posits multiple sources of workload and has designed
a set of six subscales to track these different sources individually. Similarly, cognitive
load theory also assumes multiple sources of mental load (i.e., extraneous, intrinsic, and
germane). There is reason to believe that some of the NASA-TLX subscales may
differentially track these different sources of cognitive load.
Method
Participants
Participants in this study included forty-eight students enrolled in Introductory
Psychology at North Carolina State University. Students were screened prior to exposure
to experimental conditions for past experience in meteorology and earth science.
Materials
Learning modules on weather related to what influences the direction and strength
of the wind were presented to students using timed Microsoft PowerPoint presentations
(Appendix 2). Graphics on the PowerPoint slides were a mixture of static and animated
graphics, but identical across conditions. Participants in narration conditions used stereo
headphones.
Windell & Wiebe (2007)
**Do not Redistribute**
4
Independent Variables
Two independent variables were controlled for in the current study: level of
extraneous load and level of intrinsic load. Extraneous load was manipulated by placing
participants in one of three Design Conditions driven by both the split-attention and
modality effects (cf., Mayer & Moreno, 1998; Tindall-Ford, Chandler, & Sweller, 1997):
Design Condition A- learning materials presented with a combination of text,
static graphical displays, and animated graphical displays. Text and graphics are
displayed sequentially, with informational text being followed by graphical
representations of material to be learned. Animation was utilized in instances where
motion is necessary for learning.
Design Condition B- learning materials presented with a combination of
narration with the same static and animated graphics as Design Condition A. Audio
narration duplicated information presented in text in Design Condition A, and was
presented serially with respect to graphical information. During the presentation of audio
narration, learners viewed a blank slide.
Design Condition C - learning materials presented with a combination of
narration with the same static and animated graphics as Design Conditions A-B. Audio
narration will be presented synchronously with graphics.
Intrinsic load was manipulated by varying the level of complexity of learning
materials through increasing element interactivity (Sweller, et al., 1998). Each of the
above design conditions was replicated in three distinct learning modules (Modules 1-3).
Modules 1-3 were presented in the same sequence in both conditions across participants,
with the information becoming more complex with each module. This added complexity
occurred by asking participants to consider additional forces acting on air parcels with
each subsequent module, with Module 1 considering a single force, Module 2, two
forces, and Module 3, three forces. Each additional force added interacts with other
forces introduced in earlier modules.
Dependent Variables
Dependent variables associated with this research were cognitive load level, as
measured by the SSI and NASA-TLX. In addition, participants answered recall and
transfer test questions related to the content of the material.
Procedure
Prior to entering the experiment, participants were assigned to one of the three
Design Conditions outlined above. Order of the cognitive load instruments was
randomized, with some learners filling out the Short Subjective Instrument followed by
the NASA-TLX, and vice versa for the remaining participants.
Upon entering the lab, students were greeted and asked to fill out an informed
consent form. The experimenter then asked the students to take the Module 1 pre-test.
Following this, students participated in Module 1 (low task difficulty).
At the completion of this module, students answered post-test questions.
Immediately following, participants filled out both the Short Subjective Instrument and
the NASA-TLX. Students were then offered a short (three minute) break to stretch or get
a drink before repeating this process for Modules 2 and 3 (medium and high task
Windell & Wiebe (2007)
**Do not Redistribute**
5
difficulty, respectively). After completion of Module 3, students completed a short
background questionnaire to obtain demographic information in order to provide a
distracter before they completed the NASA-TLX-Part 2, the pair wise comparison, which
immediately followed the background questionnaire.
Results
Pearson correlations were run for all three Design Conditions (A-C) comparing
the NASA-TLX Mental Demands subscale with the SSI. In Design Condition A, Pearson
correlations for Modules 1-3 were .76, .72, and .98, respectively. In Design Condition B,
Pearson correlations for Modules 1-3 were .84, .93, and .56 respectively. In Design
Condition C, Pearson correlations in Modules 1-3 were .76, .89, and .96 respectively.
Contrasting this, Pearson correlations comparing the NASA-TLX Weighted Work Load
and SSI were noticeably lower across all Design Conditions and Modules. In Design
Condition A, Pearson correlations in Modules 1-3 were -.38, -.37 and -.007 respectively.
In Design Condition B, Pearson correlations in Modules 1-3 were .22, -.29, and -.51
respectively. In Design Condition C, Pearson correlations for Modules 1-3 were .36, .67,
and .71, respectively.
A two-way repeated measures analysis of variance (ANOVA) was conducted to
investigate the effect of Design Condition and Learning Module on the NASA-TLX
Mental Demands subscale category among participants. Interaction between Design
Condition and Learning Module was not significant (F(4,90)=.481, p=.749). There was a
significant main effect for Learning Module (F(2,90)=6.127, p=.003) (Figure 1). Post hoc
contrasts showed significance between Learning Modules 1 and 2 (p=.047) and Learning
Modules 1 and 3(p=.005) but not for Learning Modules 2 and 3 (p=.078)
A two-way repeated measures analyses of variance (ANOVA) was conducted to
investigate the effect of Design Condition and Learning Module on the SSI among
participants. ANOVA results indicated a significant main effect for Learning Module
(F(2,90)=23.608, p<.001), using Huynh-Feldt correction for non-sphericity (Figure 2).
Interaction between factors was not significant (F(4,90)=.621, p=.649). Post hoc contrasts
showed significance for the SSI between Learning Modules 1 and 2 (p<.001), Learning
Modules 2 and 3 (p=.008), and Learning Modules 1 and 3(p<.001).
The two-way ANOVA results for WWL also indicated significant main effects
for Learning Module (F(2,90)=12.667, p<.001) and Design Condition (F(2, 45) = 3.98,
p= .026), using Huynh-Feldt correction for non-sphericity (Figure 3). Interaction between
factors was not significant (F(4,90)=..83, p=.510). Post hoc contrasts also showed
significance between Learning Modules 1 and 2 (p=.002), Learning Modules 2 and 3
(p=.022), and Learning Modules 1 and 3(p<.001). Post hoc Bonferroni tests for the factor
Design Condition showed significance between Design Conditions A and C for the
dependent variable WWL (p=.022).
Windell & Wiebe (2007)
**Do not Redistribute**
6
Module X Condition - Mental Demands
14
mean score
12
10
Module
8
1
2
3
6
4
2
0
A
B
C
Condition
Figure 1. Module x Design Condition Results- NASA-TLX Mental Demands
Module X Condition - SSI
5
SSI mean score
4.5
Module
4
1
3.5
2
3
3
2.5
2
A
B
C
Condition
Figure 2. Module x Design Condition Results- Short Subjective Instrument
Windell & Wiebe (2007)
**Do not Redistribute**
7
Module X Condition -WWL
13
12
WWL Mean Score
11
Module
10
1
2
3
9
8
7
6
5
A
B
C
Condition
Figure 3. Module x Design Condition Results - Weighted Work Load
The learning gains showed non-significant differences across the three Modules.
Also, a non-significant trend was seen across Design Condition with Condition C
showing the highest scores.
Discussion and Educational Importance
Results from this study indicate that the NASA-Task Load indeX, a weighted and
multi-dimensional rating scale, differs in measurement of the demands faced by learners
in a PC-based, multimedia-learning environment, from Paas’ single question measure.
This difference is likely due to responses on TLX subscales across varying levels and
types of load, as manipulated in this study by presentation format and content difficulty.
Hypothesis 1, referencing similar subjective ratings for the SSI and the NASATLX Mental Demands subscale, was supported in the results by the high correlations
observed between the two. These high correlations were true across all combinations of
Design Condition and Module. ANOVA results showed similar responses to Design
Condition and Module.
Hypothesis 2, referencing a difference between the NASA-TLX Weighted Work
Load and the SSI was also somewhat supported by correlation and ANOVA results. In
each combination of Design Condition and Module, correlation coefficients were
noticeably lower than those of the NASA-TLX Mental Demands subscale and the SSI.
The ANOVA results, however, showed similar directionality in terms of the effects of
Module and Design Condition. It is important to note, however, that SSI did not show a
significance effect from Design Condition while the same test did show significant
Windell & Wiebe (2007)
**Do not Redistribute**
8
effects for the NASA-TLX WWL. This possibly indicates that the WWL is more
sensitive to changes in extraneous load than the SSI.
References
Mayer, R. E., & Moreno, R. (1998). A split-attention effect in multimedia learning:
Evidence for dual processing systems in working memory. Journal of
Educational Psychology, 90(2), 312-320.
Moroney, W. F., Biers, D. W., Eggemeier, F. T. and Mitchell, J. A. (1992). A
Comparison of two scoring procedures with the NASA TASK LOAD INDEX in a
simulated flight task, Proceedings of the IEEE NAECON 1992 National
Aerospace and Electronics Conference, New York, 2, 734-740
Paas, F. G. W. C., Tuovinen, J. E., Tabbers, H., & Van Gervin, P. W. M. (2003).
Cognitive load measurement as a means to advance cognitive load theory.
Educational Psychologist, 38(1), 63-71.
Paas, F., Renkl, A., & Sweller, J. (2003). Cognitive load theory and instructional design:
Recent developments. Educational Psychologist, 38(1), 1-4.
Paas, F., van Merriënboer, J. J. G., & Adam, J. J. (1994). Measurement of cognitive load
in instructional research. Perceptual and Motor Skills, 79, 419-430.
Sweller, J., Merrienboer, J. J. G. v., & Paas, F. G. W. C. (1998). Cognitive architecture
and instructional design. Educational Psychology Review, 10, 251-296.
Tindall-Ford, S., Chandler, P., & Sweller, J. (1997). When two sensory modes are better
than one. Journal of Experimental Psychology-Applied, 3(4), 257-287.
Windell & Wiebe (2007)
**Do not Redistribute**
9
Appendix 1
Short Subjective Instrument
Circle the number that best describes your experience today
How difficult was it for you to understand this learning module and correctly answer the
questions that followed?
Extremely Easy
Extremely Difficult
1------------2------------3------------4------------5------------6------------7
Windell & Wiebe (2007)
**Do not Redistribute**
NASA Task Load indeX (TLX)
10
Windell & Wiebe (2007)
**Do not Redistribute**
11
NASA- Task Load index – Part 2
Instructions- Select the member of each pair that provided the most significant source of
work to you in today’s tasks (circle your answer)
Physical Demand
or
Mental Demand
Temporal Demand
or
Mental Demand
Performance
or
Mental Demand
Frustration
or
Mental Demand
Effort
or
Mental Demand
Temporal Demand
or
Physical Demand
Performance
or
Physical Demand
Frustration
or
Physical Demand
Effort
or
Physical Demand
Temporal Demand
or
Performance
Temporal Demand
or
Frustration
Temporal Demand
or
Effort
Performance
or
Frustration
Performance
or
Effort
Effort
or
Frustration
Windell & Wiebe (2007)
**Do not Redistribute**
Appendix 2
Example PowerPoint Slides
Note: All three graphics are frame captures of animations
Module 1
Module 2
12
Windell & Wiebe (2007)
Module 3
**Do not Redistribute**
13