Cognitive Load Measurement using Speech/Linguistic Features

From imagination to impact
Using Information to Drive Decisions
Cognitive Load Measurement using
Speech/Linguistic Features
Dr. Fang Chen
[email protected]
NICTA Copyright
2010
1
Outline
•
•
•
•
Background
Research Applications
Speech and Language Analyses
Data Sets:
–
–
–
–
Reading Experiment
Touch-table Collaborative Experiment
Bushfire Study
Driving Experiment
2
Background
• Cognitive load (CL): refers to the mental demand imposed on
working memory by a particular task.
• Working Memory: limited capacity for holding information in
mind in the context of cognitive activity.
• Cognitive Load Theory: development of the instructional
methods for effective use of people's limited cognitive
processing capacity.
3
Research Aims
• Overall:
– Identification of potential indices of cognitive load for
• real-time,
• objective,
• non-intrusive
• measurement of cognitive load.
• Specific to this research:
– Identification of potential linguistic and grammatical features of
cognitive load.
4
Need for CL Measurement
• Overloading or underloading of cognitive
processing:
– degradation of performance, and/or
– failures of learning and performing, and/or
– source of performance errors.
• CL measurement is crucial for:
–
–
–
–
minimising the amount of cognitive effort required,
maintaining the right level of CL,
achieving adaptive system response,
improving user performance.
5
Cognitive Load Measures
• Subjective measures
– e.g. self-reporting – manual, post-task, time-consuming, intrusive.
• Physiological measures
– e.g. eyes, brain, skin biosensors – sensitive, signal noise, intrusive, lot of
complex equipment
• Performance measures
– e.g. error rate, task performance – dual tasks
• Behavioral measures
– e.g. speech, pressure mouse – can be automatic, non-intrusive
6
Research Applications
• Designing intelligent adaptive user interfaces for intensive
working/interaction environments.
– Emergency services e.g. Bushfire Cooperative Research Centre
– Road traffic control services e.g. Roads and Traffic Authority (RTA)
• Other potential areas:
–
–
–
–
–
Call centers
Air traffic control rooms
Pilot cockpits
Online education / e-learning
… and so on.
7
Speech and Linguistic Measures
• Why Speech?
–
–
–
–
–
Sensitivity in the speech modality shown by prior art.
NonNon-intrusive, easy to collect e.g. phone calls, conversations
Objective measure,
measure not easily manipulated by the user
RealReal-time analysis is possible (for some speech signal features)
Widely available,
available in a number of application scenarios
• What measures?
– Pauses and response latency
• Pausing differently under different conditions.
– Language and word usage
• Using particular words and/or phrases at specific sentence and/or paragraph
positions;
– Grammar features and structures
• Using particular types of linguistic/grammatical categories;
• Using a particular type of syntax or grammatical structure i.e. usage of parts of speech
and their forms;
8
Experiment Setup
• A user study with two controlled levels of cognitive load
– Elicit natural speech from users
• A reading and comprehension task
– General knowledge (avoid the
expertise effect)
– Reading the extract
– Answer open-ended questions
• Give a short summary of the story
in at least five whole sentences.
• What was the most interesting
point in this story?.
• Describe at least two other points
highlighted in this story.
The Sun
The Sun has "burned" for more than 4.5
billion years and will continue to do so
for several billion more. It is a massive
collection of gas, mostly hydrogen and
helium. Because it is so massive, it has
immense gravity, enough gravitational
force to hold all of hydrogen and helium
together (and to hold all of the planets
in their orbits around the Sun!). The
Sun does not "burn" like wood burns – it
is a gigantic nuclear reactor….
NICTA Copyright 2010
9
Story-reading Experiment
• Experimental setup
– Story reading followed by Q&A
– 3 different levels of text difficulty (Lexile Framework for Reading,
www.lexile.com)
– 3 stories in each of the 2 sessions (fixed order)
• 1st session:
2nd session:
– “Sleep” (900L),
– “History of Zero” (1350L) &
– “Milky Way Galaxy” (1400L)
“Smoke Detectors” (950L),
“Hurricanes” (1250L) &
“The Sun” (1300L)
• 5 minutes break between sessions
– Dual-task for “Milky Way Galaxy” & “The Sun”
• Counting of background spoken numbers while reading the stories and
answering the questions
NICTA Copyright 2010
10
Experiment Setup
• Cognitive load level design
– Lexile Framework for Reading (200L 1st grade, 1700L grad)
• Syntactic and semantic complexity, vocabulary
– Text with same difficulty for both conditions
– Aural dual task, counting numbers during reading and answering
Task Load Level
Lexile Rating
Dual Task
Low
1300L
No
High
1300L
Yes
• Participants
– 15 native English speakers as subjects (8 females and 7 males)
NICTA Copyright 2010
11
Reading Experiment Data – Pause Analysis
12
Pause Analysis – Results Summary
*p<0.05, n=24.
13
Touch-table Collaboration Study - Lab Data
• Collaborative tasks using multi-touch tabletop screen.
• Interactive Firefighting tasks.
• 10 groups x 4 members = 40 subjects + (1 Pilot group)
– 30 Commanders + 10 Leaders
– 39 subjects data available (1 leader’s data missing)
• Speech Transcriptions using ELAN.
• Extracted and cleaned for LIWC and other analysis tools.
• Analysis completed:
–
–
–
–
Subjective Ratings
Grammar features - Pronouns
Word Category Features
Language Complexity Features
14
15
Touch-table Study Design
16
Lab Data – Some Hypotheses
•
•
•
•
Higher subjective ratings under high load task.
More speech and longer sentences.
More and longer pauses under high load task.
More use of:
– Negative emotion words, inclusive words, swear words, cognitive and
perceptive phrases, disagreement words etc.
• Less use of:
– Positive emotion words, agreement, certainty, achievement words
• More hesitations and incomplete sentences
• More use of plural pronouns and less use of singular ones.
• More complex sentences under high load task.
17
Lab Data – Subjective Ratings
18
Lab Data – Linguistic Analysis (Words)
19
Lab Data – Linguistic Analysis (Pronouns)
• Singular pronouns decrease
• Plural pronouns increase
20
Lab Data – Linguistic Analysis (Pronouns)
• Interaction between Singular and Plural Personal Pronouns
21
Lab Data – Language Complexity Analysis
• Language complexity measures
• Measured by two major factors:
– Semantic difficulty:
difficulty observes the use of words, their frequencies, and
their lengths (both in syllables as well as alphabets/characters).
– Syntactic complexity:
complexity observes primarily the sentence length, which
is considered as the best indicator of text or language complexity.
• Hypotheses
– Language Complexity increases
– Lexical Density decreases
22
Lab Data – Language Complexity Analysis
Lexical Density is the estimated measure of content per
functional and lexical units or lexemes in total text. In simple
words, it
is a measure
of thecomplex
ratio of unique
to three
the total
A word
is considered
or hardwords
if it has
numberorofmore
words.
syllables and does not contain a hyphen ( ). For example, the word ‘density’ has three
Lexical syllables.
Density = (different words / total words) x 100
Complex Word Ratio is the measure of the ratio of
Gunning
complex
Fogwords
Index to
calculates
the totalthe
number
syntactic
of words.
complexity of
language using sentence lengths and complex words and
implies that short and simple sentences in plain English
achieve a better score (lower value) than long sentences in
Flesch-Kincaid
Grade calculates the language difficulty using
complicated language.
average sentence lengths and average syllables per word. It
estimates
the Index
number
of years
required
to
Gunning Fog
= 0.4
x (ASLof+education
((SYW / words)
x 100))
understand the written or transcribed text.
The
SMOG Grade also estimates the number of education years
Where:
Flesch-Kincaid
Grade
= (0.39
ASL)
(11.8sentences
x of
ASW)
– 15.59
needed
fully comprehend
thex text.
uses
and
ASL =to
Average
sentence
length
(theIt+number
words
divided
complex
words
to
calculate
it.
The
emphasis
on
full
by the number of sentences)
Where:
comprehension
distinguishes
this
measurement
from other
SYW = Number
of words with
three
or more syllables
ASL
=
Average
sentence
length
(the
number
of
words
divided by
complexity measures.
the number of sentences)
Lexile
Level
also
measures
the comprehension
complexity
ASW
=Grade
Average
number
of of
syllables
word (thexnumber
SMOG
= square
root
((SYW
/per
sentences)
30) + 3 of
of any text.
A Lexile
numeric representation
syllables
divided
by themeasure
numberisofthe
words)
of
a
text’s
difficulty
ranging
from
200L
for easy to above
Where:
for complicated
texts.
It uses
mean
sentence
SYW1700L
= Number
of words with
three
or more
syllables
lengths and mean log word frequency to calculate it.
• Lexical Density (Vocabulary Richness)
– expected to decrease
• Hard Word Ratio
– expected to decrease
• Gunning Fog Index
– expected to increase
• Flesch-Kincaid Grade
– expected to increase
• SMOG Grade
– expected to increase
• Lexile Level
– expected to increase
23
Lab Data – Language Complexity Analysis
24
Lab Data – Language Complexity Analysis
25
Bushfire Data - Introduction
• Speech and transcription data from Bushfire CRC.
• Training exercises – four states (TAS, VIC, NSW, and
QLD).
• Three roles: Incident Controller (IC), Planning,
Operations.
• 11 exercises, 33 subjects
• All exercises monitored by bushfire management
experts.
• Operators co-located in a control room and trained for
roles.
• Data collection, transcription, coding, cleaning,
analyses.
• Four different load levels:
–
–
–
–
(1) ‘low’: casual conversation, no time pressure;
(2) ‘medium’: routine tasks;
(3) ‘high’: challenging tasks, time constraints; and
(4) ‘very high’: very challenging, lot of unexpected events
and breakdowns.
• Combined into low and high.
NICTA Copyright 2010
26
Bushfire Data – Same Hypotheses
•
•
•
•
Higher subjective ratings under high load task.
More speech and longer sentences.
More and longer pauses under high load task.
More use of:
– Negative emotion words, inclusive words, swear words, cognitive and
perceptive phrases, disagreement words etc.
• Less use of:
– Positive emotion words, agreement, certainty, achievement words
• More hesitations and incomplete sentences
• More use of plural pronouns and less use of singular ones.
• More complex sentences under high load task.
27
Bushfire Data – Linguistic Analysis (Words)
28
Bushfire Data – Linguistic Analysis (Pronouns)
• Singular pronouns decrease
• Plural pronouns increase
29
Bushfire Data – Linguistic Analysis (Pronouns)
• Interaction between Singular and Plural Personal Pronouns
30
Bushfire Data – Language Complexity Analysis
31
Other Linguistic Analysis Possibilities
N-gram Analysis
Bi-gram Ratio
Others:
Bi-gram Ratio
100%
• Most common N-grams
(Bigrams, Trigrams, 4-grams)
• Most common words (Unigrams)
• Most frequent or least frequent N-grams
• More…
90%
Percent
•
•
•
80%
70%
60%
50%
• Parse Tree Analysis
BiBi-gram Ratio
1
2
3
4
Load Level
L1
L2
L3
L4
93.5%
80.9%
79.4%
72.6%
p
0.0002
– Order of nn-grams
• For both – words and parts of speech.
32
An Abstract CLM Model
• Automatic, Real-time, Non-intrusive
33
Looking at Data Sets
•
•
•
•
Reading Experiment
Touch-table Collaborative Experiment
Bushfire Study
Driving Study
34
Driving Study Data - Introduction
• Simulated Driving Experiment
• Investigate how the distractions can affect the performance
of the user
• Identification of features to measure users’ cognitive load.
• 18 participants (8 females and 10 males)
• Data collected:
– Video (2 cameras, front and rear view)
• Eye gaze movement
– Audio
– Galvanic Skin Response (GSR) or skin resistance
35
Driving Study Data – Experiment Setup
•
•
•
•
•
•
•
Big screen for game
Front camera
Simulator frame
Wireless headset
Bio-sensor (GSR)
Speakers at back
Rear Camera
36
Future Challenges
• Areas for future work
– Development of larger databases
– Task dependant and task independent feature
• Need to take lab experiments ‘into the wild’
– Defining, researching and standardising tasks of interest
– Joint modeling of linguistic, speaker and cognitive load/emotion
information
37
Exploring Multimodalities
38
Exploring Multimodality
• Hypothesis:
– Users are more likely to use complimentary multimodal productions
as cognitive load increases
– Users will tend to rely on one modality more as cognitive load
increases
• Method:
– Wizard of OZ scenario: speech and gesture interface for a series of
map based tasks; task increasing in difficulty by varying quantity of
content and time-pressure
– Conditions for Speech Only interaction, Gesture Only interaction and
Multimodal
– Videotape participants, record audio, record answers, post-hoc
introspection questionnaire
NICTA Copyright 2010
39
Multimodality and Cognitive Load
• Exploring Multimodal Interface
Scenarios
– The recognisers in the interface
will capture the user’s input and
interpret the information and
choose and appropriate response
– Opportunity to capture interaction
data implicitly
NICTA Copyright 2010
Visual Data
User
Characteristics
Audio Data
Physiological
Data
Cognitive Load
Analysis
Environmental
Data
Other
Modalities
Task
Characteristics
40
Experiment Design
• Task:
– Incident Management Response
E.g. A major accident on corner of X and Y.
– Operators are required to deploy necessary crews and implement policies
and procedures
• Method:
– Elicit speech and free-hand gesture interface for a series of map based
tasks;
– Wizard of OZ scenario
– Videotape participants, record audio, record answers, post-hoc
introspection questionnaire
• Dependant Variables:
– Biosensor input: GSR and BVP
– Gesture: video footage
– Speech: transcribed manually
– Performance: latency, completion time & error-rates
– Multimodal productions: manual annotation
41
Examining Multimodal Input Structures
NICTA Copyright 2010
42
The Task
• There are 36 small tasks, divided into 3 groups of 12.
• Each group of 12 will consist of maps from 4 different cities:
• Each new task will be given to you at the top of the screen:
– e.g. There has been an accident on the corner of Victoria and Liverpool Street.
• The tasks will be carried out using different modes:
– speech + gesture together,
– speech-only and
– gesture-only
The experimenter will tell you which mode you should be using for each task.
• The task will first require some visual search for information.
• There are only three things the system can do:
1. Zooming in and out of maps
2. Selecting map elements
3. Tagging map elements
NICTA Copyright 2010
43
The Task
Toolbox
Task
Description
Map
NICTA
Copyright 2010
Information/Feedback
Area
44
Zooming Map Levels
Lower-level map
Contains selectable
elements; can zoom out
to higher level map
Top-level map
No selectable elements:
divided into four quadrants by
a dotted black line
NICTA Copyright 2010
45
Selectable Elements
• Selected elements will be shown
with a blue border.
School
==>
Petrol Station
Fire Station
Library
Hospital
Shopping Centre
RTA Branch
Parking Station
Church
Intersection
NICTA Copyright 2010
46
Tagging Map Elements
Tagging is a two-step process:
1. Select map element
->
2. Tag as Accident, Incident or Event
->
->
Accident: e.g. car accident, fire, flooding
Green border
Incident: occurrence that might cause a disruption to the traffic, e.g.
broken-down car, or a traffic jam in peak hour
Yellow border
Event: e.g. concert, protest march, fun run
Red border
Clear: Clears all tags for selected element
Info: Information area beneath the map
->
NICTA Copyright 2010
47
Special Tag: Notifying
Two parts: The element and the recipient need to be specified.
•
Select map element (e.g. Intersection, marked as accident)
->
•
Select NOTIFY action
PINK tag appears
•
->
Select the recipient map element (RTA Branch, Fire Station…)
AQUA tag appears
->
NICTA Copyright 2010
48
Top-level zoomable Map
(no selectable elements)
Zooming
• 2 zoom levels
• Lower level maps have
selectable elements
• Zoom in: 4 quadrants
• Zoom out
NICTA Copyright 2010
Lower-level Map
with selectable
elements
49
The Modalities
• Speech
– Short and sweet
– No specific words, no specific word order
We only give some suggestions
– Speak clearly and loudly
Zooming
Zoom into the top right quadrant
Top right quadrant
Zoom in to top right
Zoom out please
Selecting
Select the Church on Liverpool Street
Church on Liverpool
Please highlight the Church
Tagging
Make selected Church an accident (or incident or event) zone
Selected Church. Accident.
Accident.
NICTA Copyright 2010
50
The Modalities (2)
• Hand Gestures
– Pointing
– Hand shapes
Zooming
Point to quadrant and pause to select and zoom in.
Point to diagonal opposite ends of map, pause to zoom out.
Selecting
Point to the element, pause until beep
Tagging
Very clear hand shape (fist, flat palm, scissors, thumbs-up)
OR
Point to button in toolbox, pause to select
NICTA Copyright 2010
51
The Modalities (3)
• Multimodal
– Speech + gesture
– Any order or combination
– Speech only or gesture only are OK
– Examples:
• “Make this into an accident” + pointing at element
• “Zoom into this quadrant” + pointing at quadrant
• “Zoom out again”
NICTA Copyright 2010
52
Research Design
Balancing Available Modalities
• The traffic incident management (TIM) domain was used, and subjects
were required to update a geographical map with traffic conditions
information. Following our requirement, tasks were achievable using the
following modalities:
– Gesture:
Gesture
• Deictic pointing to map locations, items, and function buttons;
• Circling gestures for zoom functions.
– Hand Shapes:
Shapes Predefined hand shapes for item tagging: fist, open
palm, thumbs up etc
– Speech:
Speech street names, actions etc
• A large overlap was introduced across modal ways of performing
actions. However, some tasks required the combination of modalities.
NICTA Copyright 2010
53
Task Design
• Task Specification
– Task was given in written mode
– Users had freedom of inspection
– The task described a situation, but did not specify activities, e.g.
“An incident has occurred: a truck has lost some of its load at Walter
Avenue and Lytton Road, near Mowbray Park”
• Task Activities
–
–
–
–
Locate point of interest on the map
Mark with one of 3 tags: accident, incident or event
Notify relevant authorities, e.g. if casualties exist, notify a hospital.
11 different kinds of functionality available
NICTA Copyright 2010
54
Task Difficulty Level Design
• There were four levels of cognitive load, and three tasks were completed for
each level.
• The same visual was used for each level to avoid differences in visual
complexity.
• The tasks varied in load through:
–
–
–
–
The number of distinct entities in the task description;
The number of distractors (items not needed for the task);
The minimum number of actions required for the task.
Further load was achieved in Level 4 by introducing a time limit.
Level
Entities
Actions
Distractors
Time
1
6
3
2
∞
2
10
8
2
∞
3
12
13
4
∞
4
12
13
4
90 sec.
NICTA Copyright 2010
55
Available Modalities
• The Modalities
– Aimed to capture natural patterns of
speech and gesture combinations
– Speech: natural spoken language
‘recognised’ by an operator
• Avoids bias injected by errors in
recognisers
– Gesture: automated hand tracking
• Untethered: no equipment used on
the person
• Both tracking of the hand and hand
shapes used
• Buttons added to reduce
expressivity gap between gesture
and speech
Input
Speech
Gesture
Select
“Select”
Point
Zoom
“Zoom”
Circling
Notify
“Notify
Thumbs up
Tag
Accident
“Accident”
Fist
Tag Incident “Incident”
Open Palm
Tag Event
Scissors
“Event”
– Either or both could be used for
each command
NICTA Copyright 2010
56
Example of Interaction
System
Functionality
Example of Interaction
Zooming in or out of a
map
<Point at quadrant>; or
“Zoom in to the top right
quadrant”
Selecting a
location/item of
interest
<Point at location>; or
“St Mary’s Church”
Tagging a location of
interest with an
‘accident’, ‘incident’
or ‘event’ marker
<Select location> and:
“Incident”; or
Scissors shape
Notifying a recipient
(item) of an accident,
incident or an event
<Select accident> and
“notify”; or
fist shape
and <Select recipient>
Starting or ending a
task
“End task”; or
<Point at End task button>
NICTA Copyright 2010
57
Wizard of Oz
Main computer
Wizard
Camcorder
Firewire
camera
AGR
NICTA Copyright 2010
58
Data Captured
• The study generated various streams of data that were captured as
follows:
– Speech was orthographically transcribed, including specific tags for
disfluencies such as false starts, hesitations. Start and end time were
annotated for each utterance;
– Hand motion was captured by the automatic gesture recogniser at the rate
of 20 frames per seconds. Positions are relative to the camera view angle;
– Deictic pointing (pause while pointing, or circling) and hand shapes were
annotated at two levels: the video was annotated to mark the start and end
time of the overall motion leading to the gesture.
– System feedback to the user such as task change (marked by a beep), item
information, or error message were recorded with their time of occurrence;
– Bio-sensor data was recorded at the rate of 100 points per second. Skin
conductance is measured in micro Siemens (µS) while blood volume pulse
only provides relative measures expressed in percentage.
NICTA Copyright 2010
59
Sample of Annotation
Turn
Mark an
Incident
(A)
Construction
Select
(a)
Modality
Content
Gesture
[point to St Mary’s Church]
Speech
“Select St.Mary’s Church”
Shape
[scissors=Incident]
Speech
“Incident”
Select
(c)
Speech
“Select Crown Street Library”
Tag
Shape
[fist=Accident]
Select
(b)
Speech
“Select”
Gesture
[point to Collingwood School]
Tag
Shape
[open_palm=Event]
Tag
(a)
Mark an
Accident
(c)
(C)
Mark an Event
(B)
(b)
NICTA Copyright 2010
60
Results and Analysis
•
•
•
•
•
• Redundancy and Complementarity:
Users: 15 available
– Each user command in the system
Total inputs: 1119
requires an action and an object
Total turns: 394 (206 MM)
• Speech and/or
• Gesture-HandShape
Total constructions: 644
Average difficulty rating for levels
• Redundancy
(subjective)
Level 1 (easiest): 2/10
Level 2: 4.2/10
Level 4 (hardest): 5/10
– Doubling up of either action or object
information or both
Action
Object
Speech
√
√
Gesture
√
√
• Complementarity
– Action and object come through
different modalities
Action
Speech
Gesture
NICTA Copyright 2010
Object
√
√
61
Rates of Redundancy
• Redundancy:
– Conveying the same information over
more than one modality,
– Either would be sufficient on its own
90
80
70
Turn
Const
Select
Pure
Redundant
Modality
Content
Gesture
[point to St Mary’s Church]
Speech
“Select St.Mary’s Church”
60
Q1
Min
50
Mean
40
Tag
Hand_Shape
[scissors=Incident]
Speech
“Incident”
Max
Q3
30
20
10
• We found a statistically significant
decrease in the number of purely
redundant turns from
0
Level1
Level2
Level4
Proportion of Purely Redundant turns by Level
– 62.91% in Level 1 to
– 29.9% in Level 4 of all multimodal turns.
NICTA Copyright 2010
62
Redundancy
70
60
50
Purely redundant
40
Partially redundant
30
Purely
complementary
20
10
0
Level1
Level2
Level4
We observed a steady decrease in redundancy as task difficulty
increased. An ANOVA test between-users, across levels, shows there are
significant differences between the means (F =3.88 (df=2); p<0.05).
NICTA Copyright 2010
63
Rates of Complementarity
• Complementarity:
– Conveying different information over different modalities
e.g.
Turn
Pure
Complement
Action
Modality
Content
Select
Speech
“Select St Mary’s Church”
Tag
Hand_Shape
[scissors=Incident]
• We also found trends of increased multimodal
complementarity across levels:
– 12.86% in Level 1
– 45.53% in Level 2, and
– 36.02% in Level 4
NICTA Copyright 2010
64
Cognitive and Working Memory Theories
• Why?
Reduced level of redundancy + increased level of
complementarity, suggests a specific working
memory strategy
Phonological Loop
• Modal Model of Working Memory [Baddeley, 92]
• Working Memory Strategies:
– Activity is shifted to areas marked exclusively for
modal use
– At high load, users try to maximise the usage of
modal working memory
– Users channel the required semantic chunks to
different modalities, with the least amount of
replication possible
NICTA Copyright 2010
Central Executive
Visual-Spatial Sketchpad
65
Discussion and Challenges
• Results:
– The results of this study give initial evidence for
redundancy/complementarity behavioural symptom of cognitive load
management employed by users
• Sensitivity and Diagnosticity:
– ‘Ceiling’ values for rates of redundancy or complementarity
– Clearly not suitable for all users
• Automatic cognitive load estimation:
– A compound measure
– Various individual modal measurements for robustness
– Weighting of features on a per-user basis
• more reliable indices will influence a combined measure more strongly
NICTA Copyright 2010
66