From imagination to impact Using Information to Drive Decisions Cognitive Load Measurement using Speech/Linguistic Features Dr. Fang Chen [email protected] NICTA Copyright 2010 1 Outline • • • • Background Research Applications Speech and Language Analyses Data Sets: – – – – Reading Experiment Touch-table Collaborative Experiment Bushfire Study Driving Experiment 2 Background • Cognitive load (CL): refers to the mental demand imposed on working memory by a particular task. • Working Memory: limited capacity for holding information in mind in the context of cognitive activity. • Cognitive Load Theory: development of the instructional methods for effective use of people's limited cognitive processing capacity. 3 Research Aims • Overall: – Identification of potential indices of cognitive load for • real-time, • objective, • non-intrusive • measurement of cognitive load. • Specific to this research: – Identification of potential linguistic and grammatical features of cognitive load. 4 Need for CL Measurement • Overloading or underloading of cognitive processing: – degradation of performance, and/or – failures of learning and performing, and/or – source of performance errors. • CL measurement is crucial for: – – – – minimising the amount of cognitive effort required, maintaining the right level of CL, achieving adaptive system response, improving user performance. 5 Cognitive Load Measures • Subjective measures – e.g. self-reporting – manual, post-task, time-consuming, intrusive. • Physiological measures – e.g. eyes, brain, skin biosensors – sensitive, signal noise, intrusive, lot of complex equipment • Performance measures – e.g. error rate, task performance – dual tasks • Behavioral measures – e.g. speech, pressure mouse – can be automatic, non-intrusive 6 Research Applications • Designing intelligent adaptive user interfaces for intensive working/interaction environments. – Emergency services e.g. Bushfire Cooperative Research Centre – Road traffic control services e.g. Roads and Traffic Authority (RTA) • Other potential areas: – – – – – Call centers Air traffic control rooms Pilot cockpits Online education / e-learning … and so on. 7 Speech and Linguistic Measures • Why Speech? – – – – – Sensitivity in the speech modality shown by prior art. NonNon-intrusive, easy to collect e.g. phone calls, conversations Objective measure, measure not easily manipulated by the user RealReal-time analysis is possible (for some speech signal features) Widely available, available in a number of application scenarios • What measures? – Pauses and response latency • Pausing differently under different conditions. – Language and word usage • Using particular words and/or phrases at specific sentence and/or paragraph positions; – Grammar features and structures • Using particular types of linguistic/grammatical categories; • Using a particular type of syntax or grammatical structure i.e. usage of parts of speech and their forms; 8 Experiment Setup • A user study with two controlled levels of cognitive load – Elicit natural speech from users • A reading and comprehension task – General knowledge (avoid the expertise effect) – Reading the extract – Answer open-ended questions • Give a short summary of the story in at least five whole sentences. • What was the most interesting point in this story?. • Describe at least two other points highlighted in this story. The Sun The Sun has "burned" for more than 4.5 billion years and will continue to do so for several billion more. It is a massive collection of gas, mostly hydrogen and helium. Because it is so massive, it has immense gravity, enough gravitational force to hold all of hydrogen and helium together (and to hold all of the planets in their orbits around the Sun!). The Sun does not "burn" like wood burns – it is a gigantic nuclear reactor…. NICTA Copyright 2010 9 Story-reading Experiment • Experimental setup – Story reading followed by Q&A – 3 different levels of text difficulty (Lexile Framework for Reading, www.lexile.com) – 3 stories in each of the 2 sessions (fixed order) • 1st session: 2nd session: – “Sleep” (900L), – “History of Zero” (1350L) & – “Milky Way Galaxy” (1400L) “Smoke Detectors” (950L), “Hurricanes” (1250L) & “The Sun” (1300L) • 5 minutes break between sessions – Dual-task for “Milky Way Galaxy” & “The Sun” • Counting of background spoken numbers while reading the stories and answering the questions NICTA Copyright 2010 10 Experiment Setup • Cognitive load level design – Lexile Framework for Reading (200L 1st grade, 1700L grad) • Syntactic and semantic complexity, vocabulary – Text with same difficulty for both conditions – Aural dual task, counting numbers during reading and answering Task Load Level Lexile Rating Dual Task Low 1300L No High 1300L Yes • Participants – 15 native English speakers as subjects (8 females and 7 males) NICTA Copyright 2010 11 Reading Experiment Data – Pause Analysis 12 Pause Analysis – Results Summary *p<0.05, n=24. 13 Touch-table Collaboration Study - Lab Data • Collaborative tasks using multi-touch tabletop screen. • Interactive Firefighting tasks. • 10 groups x 4 members = 40 subjects + (1 Pilot group) – 30 Commanders + 10 Leaders – 39 subjects data available (1 leader’s data missing) • Speech Transcriptions using ELAN. • Extracted and cleaned for LIWC and other analysis tools. • Analysis completed: – – – – Subjective Ratings Grammar features - Pronouns Word Category Features Language Complexity Features 14 15 Touch-table Study Design 16 Lab Data – Some Hypotheses • • • • Higher subjective ratings under high load task. More speech and longer sentences. More and longer pauses under high load task. More use of: – Negative emotion words, inclusive words, swear words, cognitive and perceptive phrases, disagreement words etc. • Less use of: – Positive emotion words, agreement, certainty, achievement words • More hesitations and incomplete sentences • More use of plural pronouns and less use of singular ones. • More complex sentences under high load task. 17 Lab Data – Subjective Ratings 18 Lab Data – Linguistic Analysis (Words) 19 Lab Data – Linguistic Analysis (Pronouns) • Singular pronouns decrease • Plural pronouns increase 20 Lab Data – Linguistic Analysis (Pronouns) • Interaction between Singular and Plural Personal Pronouns 21 Lab Data – Language Complexity Analysis • Language complexity measures • Measured by two major factors: – Semantic difficulty: difficulty observes the use of words, their frequencies, and their lengths (both in syllables as well as alphabets/characters). – Syntactic complexity: complexity observes primarily the sentence length, which is considered as the best indicator of text or language complexity. • Hypotheses – Language Complexity increases – Lexical Density decreases 22 Lab Data – Language Complexity Analysis Lexical Density is the estimated measure of content per functional and lexical units or lexemes in total text. In simple words, it is a measure of thecomplex ratio of unique to three the total A word is considered or hardwords if it has numberorofmore words. syllables and does not contain a hyphen ( ). For example, the word ‘density’ has three Lexical syllables. Density = (different words / total words) x 100 Complex Word Ratio is the measure of the ratio of Gunning complex Fogwords Index to calculates the totalthe number syntactic of words. complexity of language using sentence lengths and complex words and implies that short and simple sentences in plain English achieve a better score (lower value) than long sentences in Flesch-Kincaid Grade calculates the language difficulty using complicated language. average sentence lengths and average syllables per word. It estimates the Index number of years required to Gunning Fog = 0.4 x (ASLof+education ((SYW / words) x 100)) understand the written or transcribed text. The SMOG Grade also estimates the number of education years Where: Flesch-Kincaid Grade = (0.39 ASL) (11.8sentences x of ASW) – 15.59 needed fully comprehend thex text. uses and ASL =to Average sentence length (theIt+number words divided complex words to calculate it. The emphasis on full by the number of sentences) Where: comprehension distinguishes this measurement from other SYW = Number of words with three or more syllables ASL = Average sentence length (the number of words divided by complexity measures. the number of sentences) Lexile Level also measures the comprehension complexity ASW =Grade Average number of of syllables word (thexnumber SMOG = square root ((SYW /per sentences) 30) + 3 of of any text. A Lexile numeric representation syllables divided by themeasure numberisofthe words) of a text’s difficulty ranging from 200L for easy to above Where: for complicated texts. It uses mean sentence SYW1700L = Number of words with three or more syllables lengths and mean log word frequency to calculate it. • Lexical Density (Vocabulary Richness) – expected to decrease • Hard Word Ratio – expected to decrease • Gunning Fog Index – expected to increase • Flesch-Kincaid Grade – expected to increase • SMOG Grade – expected to increase • Lexile Level – expected to increase 23 Lab Data – Language Complexity Analysis 24 Lab Data – Language Complexity Analysis 25 Bushfire Data - Introduction • Speech and transcription data from Bushfire CRC. • Training exercises – four states (TAS, VIC, NSW, and QLD). • Three roles: Incident Controller (IC), Planning, Operations. • 11 exercises, 33 subjects • All exercises monitored by bushfire management experts. • Operators co-located in a control room and trained for roles. • Data collection, transcription, coding, cleaning, analyses. • Four different load levels: – – – – (1) ‘low’: casual conversation, no time pressure; (2) ‘medium’: routine tasks; (3) ‘high’: challenging tasks, time constraints; and (4) ‘very high’: very challenging, lot of unexpected events and breakdowns. • Combined into low and high. NICTA Copyright 2010 26 Bushfire Data – Same Hypotheses • • • • Higher subjective ratings under high load task. More speech and longer sentences. More and longer pauses under high load task. More use of: – Negative emotion words, inclusive words, swear words, cognitive and perceptive phrases, disagreement words etc. • Less use of: – Positive emotion words, agreement, certainty, achievement words • More hesitations and incomplete sentences • More use of plural pronouns and less use of singular ones. • More complex sentences under high load task. 27 Bushfire Data – Linguistic Analysis (Words) 28 Bushfire Data – Linguistic Analysis (Pronouns) • Singular pronouns decrease • Plural pronouns increase 29 Bushfire Data – Linguistic Analysis (Pronouns) • Interaction between Singular and Plural Personal Pronouns 30 Bushfire Data – Language Complexity Analysis 31 Other Linguistic Analysis Possibilities N-gram Analysis Bi-gram Ratio Others: Bi-gram Ratio 100% • Most common N-grams (Bigrams, Trigrams, 4-grams) • Most common words (Unigrams) • Most frequent or least frequent N-grams • More… 90% Percent • • • 80% 70% 60% 50% • Parse Tree Analysis BiBi-gram Ratio 1 2 3 4 Load Level L1 L2 L3 L4 93.5% 80.9% 79.4% 72.6% p 0.0002 – Order of nn-grams • For both – words and parts of speech. 32 An Abstract CLM Model • Automatic, Real-time, Non-intrusive 33 Looking at Data Sets • • • • Reading Experiment Touch-table Collaborative Experiment Bushfire Study Driving Study 34 Driving Study Data - Introduction • Simulated Driving Experiment • Investigate how the distractions can affect the performance of the user • Identification of features to measure users’ cognitive load. • 18 participants (8 females and 10 males) • Data collected: – Video (2 cameras, front and rear view) • Eye gaze movement – Audio – Galvanic Skin Response (GSR) or skin resistance 35 Driving Study Data – Experiment Setup • • • • • • • Big screen for game Front camera Simulator frame Wireless headset Bio-sensor (GSR) Speakers at back Rear Camera 36 Future Challenges • Areas for future work – Development of larger databases – Task dependant and task independent feature • Need to take lab experiments ‘into the wild’ – Defining, researching and standardising tasks of interest – Joint modeling of linguistic, speaker and cognitive load/emotion information 37 Exploring Multimodalities 38 Exploring Multimodality • Hypothesis: – Users are more likely to use complimentary multimodal productions as cognitive load increases – Users will tend to rely on one modality more as cognitive load increases • Method: – Wizard of OZ scenario: speech and gesture interface for a series of map based tasks; task increasing in difficulty by varying quantity of content and time-pressure – Conditions for Speech Only interaction, Gesture Only interaction and Multimodal – Videotape participants, record audio, record answers, post-hoc introspection questionnaire NICTA Copyright 2010 39 Multimodality and Cognitive Load • Exploring Multimodal Interface Scenarios – The recognisers in the interface will capture the user’s input and interpret the information and choose and appropriate response – Opportunity to capture interaction data implicitly NICTA Copyright 2010 Visual Data User Characteristics Audio Data Physiological Data Cognitive Load Analysis Environmental Data Other Modalities Task Characteristics 40 Experiment Design • Task: – Incident Management Response E.g. A major accident on corner of X and Y. – Operators are required to deploy necessary crews and implement policies and procedures • Method: – Elicit speech and free-hand gesture interface for a series of map based tasks; – Wizard of OZ scenario – Videotape participants, record audio, record answers, post-hoc introspection questionnaire • Dependant Variables: – Biosensor input: GSR and BVP – Gesture: video footage – Speech: transcribed manually – Performance: latency, completion time & error-rates – Multimodal productions: manual annotation 41 Examining Multimodal Input Structures NICTA Copyright 2010 42 The Task • There are 36 small tasks, divided into 3 groups of 12. • Each group of 12 will consist of maps from 4 different cities: • Each new task will be given to you at the top of the screen: – e.g. There has been an accident on the corner of Victoria and Liverpool Street. • The tasks will be carried out using different modes: – speech + gesture together, – speech-only and – gesture-only The experimenter will tell you which mode you should be using for each task. • The task will first require some visual search for information. • There are only three things the system can do: 1. Zooming in and out of maps 2. Selecting map elements 3. Tagging map elements NICTA Copyright 2010 43 The Task Toolbox Task Description Map NICTA Copyright 2010 Information/Feedback Area 44 Zooming Map Levels Lower-level map Contains selectable elements; can zoom out to higher level map Top-level map No selectable elements: divided into four quadrants by a dotted black line NICTA Copyright 2010 45 Selectable Elements • Selected elements will be shown with a blue border. School ==> Petrol Station Fire Station Library Hospital Shopping Centre RTA Branch Parking Station Church Intersection NICTA Copyright 2010 46 Tagging Map Elements Tagging is a two-step process: 1. Select map element -> 2. Tag as Accident, Incident or Event -> -> Accident: e.g. car accident, fire, flooding Green border Incident: occurrence that might cause a disruption to the traffic, e.g. broken-down car, or a traffic jam in peak hour Yellow border Event: e.g. concert, protest march, fun run Red border Clear: Clears all tags for selected element Info: Information area beneath the map -> NICTA Copyright 2010 47 Special Tag: Notifying Two parts: The element and the recipient need to be specified. • Select map element (e.g. Intersection, marked as accident) -> • Select NOTIFY action PINK tag appears • -> Select the recipient map element (RTA Branch, Fire Station…) AQUA tag appears -> NICTA Copyright 2010 48 Top-level zoomable Map (no selectable elements) Zooming • 2 zoom levels • Lower level maps have selectable elements • Zoom in: 4 quadrants • Zoom out NICTA Copyright 2010 Lower-level Map with selectable elements 49 The Modalities • Speech – Short and sweet – No specific words, no specific word order We only give some suggestions – Speak clearly and loudly Zooming Zoom into the top right quadrant Top right quadrant Zoom in to top right Zoom out please Selecting Select the Church on Liverpool Street Church on Liverpool Please highlight the Church Tagging Make selected Church an accident (or incident or event) zone Selected Church. Accident. Accident. NICTA Copyright 2010 50 The Modalities (2) • Hand Gestures – Pointing – Hand shapes Zooming Point to quadrant and pause to select and zoom in. Point to diagonal opposite ends of map, pause to zoom out. Selecting Point to the element, pause until beep Tagging Very clear hand shape (fist, flat palm, scissors, thumbs-up) OR Point to button in toolbox, pause to select NICTA Copyright 2010 51 The Modalities (3) • Multimodal – Speech + gesture – Any order or combination – Speech only or gesture only are OK – Examples: • “Make this into an accident” + pointing at element • “Zoom into this quadrant” + pointing at quadrant • “Zoom out again” NICTA Copyright 2010 52 Research Design Balancing Available Modalities • The traffic incident management (TIM) domain was used, and subjects were required to update a geographical map with traffic conditions information. Following our requirement, tasks were achievable using the following modalities: – Gesture: Gesture • Deictic pointing to map locations, items, and function buttons; • Circling gestures for zoom functions. – Hand Shapes: Shapes Predefined hand shapes for item tagging: fist, open palm, thumbs up etc – Speech: Speech street names, actions etc • A large overlap was introduced across modal ways of performing actions. However, some tasks required the combination of modalities. NICTA Copyright 2010 53 Task Design • Task Specification – Task was given in written mode – Users had freedom of inspection – The task described a situation, but did not specify activities, e.g. “An incident has occurred: a truck has lost some of its load at Walter Avenue and Lytton Road, near Mowbray Park” • Task Activities – – – – Locate point of interest on the map Mark with one of 3 tags: accident, incident or event Notify relevant authorities, e.g. if casualties exist, notify a hospital. 11 different kinds of functionality available NICTA Copyright 2010 54 Task Difficulty Level Design • There were four levels of cognitive load, and three tasks were completed for each level. • The same visual was used for each level to avoid differences in visual complexity. • The tasks varied in load through: – – – – The number of distinct entities in the task description; The number of distractors (items not needed for the task); The minimum number of actions required for the task. Further load was achieved in Level 4 by introducing a time limit. Level Entities Actions Distractors Time 1 6 3 2 ∞ 2 10 8 2 ∞ 3 12 13 4 ∞ 4 12 13 4 90 sec. NICTA Copyright 2010 55 Available Modalities • The Modalities – Aimed to capture natural patterns of speech and gesture combinations – Speech: natural spoken language ‘recognised’ by an operator • Avoids bias injected by errors in recognisers – Gesture: automated hand tracking • Untethered: no equipment used on the person • Both tracking of the hand and hand shapes used • Buttons added to reduce expressivity gap between gesture and speech Input Speech Gesture Select “Select” Point Zoom “Zoom” Circling Notify “Notify Thumbs up Tag Accident “Accident” Fist Tag Incident “Incident” Open Palm Tag Event Scissors “Event” – Either or both could be used for each command NICTA Copyright 2010 56 Example of Interaction System Functionality Example of Interaction Zooming in or out of a map <Point at quadrant>; or “Zoom in to the top right quadrant” Selecting a location/item of interest <Point at location>; or “St Mary’s Church” Tagging a location of interest with an ‘accident’, ‘incident’ or ‘event’ marker <Select location> and: “Incident”; or Scissors shape Notifying a recipient (item) of an accident, incident or an event <Select accident> and “notify”; or fist shape and <Select recipient> Starting or ending a task “End task”; or <Point at End task button> NICTA Copyright 2010 57 Wizard of Oz Main computer Wizard Camcorder Firewire camera AGR NICTA Copyright 2010 58 Data Captured • The study generated various streams of data that were captured as follows: – Speech was orthographically transcribed, including specific tags for disfluencies such as false starts, hesitations. Start and end time were annotated for each utterance; – Hand motion was captured by the automatic gesture recogniser at the rate of 20 frames per seconds. Positions are relative to the camera view angle; – Deictic pointing (pause while pointing, or circling) and hand shapes were annotated at two levels: the video was annotated to mark the start and end time of the overall motion leading to the gesture. – System feedback to the user such as task change (marked by a beep), item information, or error message were recorded with their time of occurrence; – Bio-sensor data was recorded at the rate of 100 points per second. Skin conductance is measured in micro Siemens (µS) while blood volume pulse only provides relative measures expressed in percentage. NICTA Copyright 2010 59 Sample of Annotation Turn Mark an Incident (A) Construction Select (a) Modality Content Gesture [point to St Mary’s Church] Speech “Select St.Mary’s Church” Shape [scissors=Incident] Speech “Incident” Select (c) Speech “Select Crown Street Library” Tag Shape [fist=Accident] Select (b) Speech “Select” Gesture [point to Collingwood School] Tag Shape [open_palm=Event] Tag (a) Mark an Accident (c) (C) Mark an Event (B) (b) NICTA Copyright 2010 60 Results and Analysis • • • • • • Redundancy and Complementarity: Users: 15 available – Each user command in the system Total inputs: 1119 requires an action and an object Total turns: 394 (206 MM) • Speech and/or • Gesture-HandShape Total constructions: 644 Average difficulty rating for levels • Redundancy (subjective) Level 1 (easiest): 2/10 Level 2: 4.2/10 Level 4 (hardest): 5/10 – Doubling up of either action or object information or both Action Object Speech √ √ Gesture √ √ • Complementarity – Action and object come through different modalities Action Speech Gesture NICTA Copyright 2010 Object √ √ 61 Rates of Redundancy • Redundancy: – Conveying the same information over more than one modality, – Either would be sufficient on its own 90 80 70 Turn Const Select Pure Redundant Modality Content Gesture [point to St Mary’s Church] Speech “Select St.Mary’s Church” 60 Q1 Min 50 Mean 40 Tag Hand_Shape [scissors=Incident] Speech “Incident” Max Q3 30 20 10 • We found a statistically significant decrease in the number of purely redundant turns from 0 Level1 Level2 Level4 Proportion of Purely Redundant turns by Level – 62.91% in Level 1 to – 29.9% in Level 4 of all multimodal turns. NICTA Copyright 2010 62 Redundancy 70 60 50 Purely redundant 40 Partially redundant 30 Purely complementary 20 10 0 Level1 Level2 Level4 We observed a steady decrease in redundancy as task difficulty increased. An ANOVA test between-users, across levels, shows there are significant differences between the means (F =3.88 (df=2); p<0.05). NICTA Copyright 2010 63 Rates of Complementarity • Complementarity: – Conveying different information over different modalities e.g. Turn Pure Complement Action Modality Content Select Speech “Select St Mary’s Church” Tag Hand_Shape [scissors=Incident] • We also found trends of increased multimodal complementarity across levels: – 12.86% in Level 1 – 45.53% in Level 2, and – 36.02% in Level 4 NICTA Copyright 2010 64 Cognitive and Working Memory Theories • Why? Reduced level of redundancy + increased level of complementarity, suggests a specific working memory strategy Phonological Loop • Modal Model of Working Memory [Baddeley, 92] • Working Memory Strategies: – Activity is shifted to areas marked exclusively for modal use – At high load, users try to maximise the usage of modal working memory – Users channel the required semantic chunks to different modalities, with the least amount of replication possible NICTA Copyright 2010 Central Executive Visual-Spatial Sketchpad 65 Discussion and Challenges • Results: – The results of this study give initial evidence for redundancy/complementarity behavioural symptom of cognitive load management employed by users • Sensitivity and Diagnosticity: – ‘Ceiling’ values for rates of redundancy or complementarity – Clearly not suitable for all users • Automatic cognitive load estimation: – A compound measure – Various individual modal measurements for robustness – Weighting of features on a per-user basis • more reliable indices will influence a combined measure more strongly NICTA Copyright 2010 66
© Copyright 2026 Paperzz