Abstracting and Composing High-Fidelity Cognitive Models of Multi-Agent Interaction MURI Kick-Off Meeting August 2008 Christian Lebiere David Reitter Psychology Department Carnegie Mellon University Main Issues • Understand scaling properties of cognitive performance – Most experiments look at a single performance point rather than as a function of problem complexity, time pressure, etc – Key component in abstracting performance at higher levels • Understand interaction between humans and machines – Most experiments study and model human performance under a fixed scenario that misses key dynamics of interaction – Key aspect of both system robustness and vulnerabilities • Understand generality and composability of behavior – Almost all models are developed for specific tasks rather than assembling larger pieces of functionality from basic pieces – Key enabler of scaling models and abstracting their properties Cognitive Architectures • What is a cognitive architecture? – Invariant mechanisms to capture generality of cognition (Newell) – Aims for both breadth (Newell Test) and depth (quantitative data) • How are they used? – Develop model of a task (declarative knowledge, procedural strategies, architectural parameters) – Limits of model fitting (learning mechanisms, architectural constraints, reuse of model and parameters) • ACT-R – Modular organization, communication bottlenecks, mapping to brain regions – Mix of symbolic production system and subsymbolic statistical mechanisms ACT-R Cognitive Architecture Intentions W S B ln t Activation Ai Bi Learning ji A i Ti F e Retrieval Productions Ai Ui Pi G Ci U Succ i Learning Pi Succi Faili Utility Goal j d j j Latency j Memory Visual Manual Motor Vision IF the goal is to categorize new stimulus and visual holds stimulus info S, F, T THEN start retrieval of chunk S, F, T and start manual mouse movement World Size Fuel Turb Dec Stimulus Bi Chunk S 20 1 SSL L 20 3 S13 Y Sample Task: AMBR Synthetic ATC Model - Methodology • Model designed to solve task simply and effectively – Not engineered to reproduce any specific effects • Reuse of common design patterns – Makes modeling easier and faster – Reduces degrees of freedom • No fine-tuning of parameters – Left at default values or roughly estimated from data (2) • Architecture provides automatic learning of situation – Position & status of AC naturally learned from interaction Model - Methodology II • As many model runs as subject runs – Performance variability is an essential part of the task! – Model speed is essential (5 times real-time in this case) – Stochasticity is a fundamental feature of the architecture • Production selection • Declarative retrieval • Perception and actions – Stochasticity amplified by interaction with environment – Model captures most of variance of human performance – No individual variations factored in the model (W, efforts) Model - Overview • 5 (simple) declarative chunks encoding instructions – Associate color to action and penalty • 36 (simple) productions organized in 5 unit tasks – – – – – Color-Goal(5): top-level goal to pick next color target Text-Goal(4): top-level goal to pick next area to scan Scan-Text(7): goal to scan text window for new messages Scan-Screen(8): goal to scan screen area for exiting AC Process(12): processes a target with 3 or 4 mouse clicks • Unit tasks map naturally to ACT-R goal type and production-matching - a natural design pattern Flyoff - Performance 400 Subjects Mean Penalty Points 300 Model Mean 200 100 0 Color - Low Color - Mid Color - High Text - Low Text - Mid Text - High Condition • Performance is much better in the color than text condition • Performance degrades sharply with time pressure for text • Good fit except for text-high: huge variation with tuneup too Flyoff - Distribution 700 600 Penalty Points 500 400 300 200 100 0 Tuneup Mid Flyoff Mid Model Mid Tuneup High Flyoff High Model High Condition • The model can yield a wide range of performances through retrieval and effort stochasticity and dynamic interaction • Model variability always tends to be lower than the subjects Flyoff - Penalty Profile 150 Subjects Mid 125 Model Mid Penalty Points Subjects High 100 Model High 75 50 25 0 T H HD SE SD WD DM CE IM Condition • Errors: no speed change error or click error but incorrect and duplicated messages occurring during the handling of holds • Delays: more holds for high but fewer welcome and speed Flyoff - Latency 100 RT (sec) RT (sec) 100 Sub-Txt-Low 10 Sub-Txt-Mid 10 Sub-Clr-Low Sub-Clr-Mid Sub-Txt-High Sub-Clr-High Mod-Txt-Low Mod-Clr-Low Mod-Txt-Mid Mod-Clr-Mid Mod-Txt-High 1 0 1 2 3 4 Number of Intervening Events 5 Mod-Clr-High 6 1 0 1 2 3 4 5 Number of Intervening Events • Response times increase exponentially with number of intervening events and faster for text than color condition • Model is slightly faster in color but slower in text condition 6 Flyoff - Selection 300 Sub-Txt-Low Sub-Clr-Low Sub-Txt-Mid Sub-Clr-Mid Sub-Txt-High 200 Number of Selections Number of Selections 300 Mod-Txt-Low Mod-Txt-Mid Mod-Txt-High 100 0 0 1 2 3 4 Number of Intervening Events 5 6 Sub-Clr-High 200 Mod-Clr-Low Mod-Clr-Mid Mod-Clr-High 100 0 0 1 2 3 4 5 6 Number of Intervening Events • The number of selections decreases roughly exponentially, with text starting lower but trailing off longer with final spike • Ceiling effect in color condition (mid & high): see workload Flyoff - Workload 6 Workload Rating (1-10) 5 Subjects Model 4 3 2 1 0 Color - Low Color - Mid Color - High Text - Low Text - Mid Text - High Condition • Workload is higher in text condition and increases faster • Model reproduces both effects but misses ceiling effect in color condition even though it gets it for selection measure! Learning Categories 0.6 Human 1 Human 3 0.5 Human 6 Percent Correct Model 1 Model 3 0.4 model 6 0.3 0.2 0.1 0.0 1 2 3 4 5 6 7 8 Trial • Model learns responses through instance-based categorization • Learning curve and level of performance reflects degree of complexity of function mapping aircraft characteristics to response Transfer Errors • Transfer performance is defined by (linear) similarities between stimuli values along each dimension (size, fuel, turb.) • Excellent match to trained instances (better than trial 8!). • Extrapolated: syntactic priming or non-linear similarities? Individual Stimuli Predictions • Good match to probability of accepting individual stimuli for each category. 1.0 y = 5.2292e-2 + 0.80874x R^2 = 0.890 y = 2.4735e-2 + 0.92498x R^2 = 0.750 0.8 y = - 1.7731e-2 + 1.0172x R^2 = 0.485 Model 0.6 • RMSE: 0.4 Cat. 1 = 14.1% Cat 1 0.2 Cat 3 Cat. 3 = 13.4% Cat 6 0.0 0.0 0.2 0.4 Human 0.6 0.8 1.0 Cat. 6 = 12.5% Task Approach • Use similar task to AMBR - AMBR variant, Team Argus, CMU-ASP (Aegis) - for exploration • Introduce team aspect that is implicit in task by interchangeably replacing controllers by humans, models or agents • Right properties, tractable, scalable even though somewhat abstract • Scale model to other domains (UAV control, Urban Search and Rescue) and environments (DDD, NeoCities) • Force model generalization across environments • Explore fidelity/tractability tradeoffs Issue 1: Scaling Properties • Cognitive science is usually concerned with absolute performance (e.g. latency) at fixed complexity points – Often less discriminative than scaling properties • Study human performance at multiple complexity points to understand scaling and robustness issues – Scaling provides strong constraints on algorithms and representations – Robustness is a key issue in extrapolating individual performance to multi-agent interaction and overall network performance, reliability and fault-tolerance • Quantify impact on all measures of performance – Converging measures of performance provide stronger evidence than separate measures susceptible to parametric manipulation • Understanding of scaling key to enabling abstraction Constraints and Analyses • AMBR illustrated strong cognitive constraints put on the scaling of performance as a function of task complexity • Past analyses have shown the impact of: – Architectural component interactions (Wray et al, 2007) – Representational choices (Lebiere & Wallach, 2001) – Parameter settings on dynamic processes (Lebiere, 1998) Focus Slope = 0.1 Matching Retrievals by Focus 200 20 Series1 180 Series2 16 Series3 160 Series1 14 Series4 140 Series2 12 Series5 120 Series3 Series6 10 Poly. (Series1) 8 Poly. (Series2) 6 Linear (Series3) 4 Log. (Series4) Log. (Series5) 2 Total Retrievals Chunk Retrievals 18 Series4 100 Expon. (Series1) 80 Linear (Series2) 60 Log. (Series3) 40 Log. (Series4) 20 Log. (Series6) 0 0 0 2 4 6 Log Chunks 8 10 0 5 10 Log Chunks 15 20 Scaling Experiments • Study human performance at multiple complexity points to understand scaling and robustness issues – – – – – Vary task complexity (e.g. level of aircraft autonomy) Vary problem complexity (e.g. number of aircraft) Vary information complexity (e.g. aircraft characteristics) Vary network topology (e.g. number of controllers) Vary rate of change of environment (e.g. appearance or disappearance of aircraft, weather, network topology) • Quantify impact on all measures of performance – Direct performance (number of targets handled, etc) – Situation awareness (levels, memory-based measures) – Workload (both self-reporting and physiological measures) Issue 2: Dynamic Interaction • Main problem in developing high-fidelity cognitive models of multi-agent interaction are the increased degrees of freedom of open-ended agent interaction • Methodology has been developed to model multiagent interactions in games and logistics (supply chain) problems (West & Lebiere, 2001, Martin et al, 2004) – Develop baseline model to capture first-order dynamics – Replace most HITL with baseline model(s) to reduce DOF – Refine model based on greater data accuracy and revalidate • Methodology can be extended to multiple levels of our hierarchy, each time abstracting to next level • Also extends to heterogeneous simulations with mixed levels including HITL, models and agents Results: Model against Model Lag2 Model Against Itself Lag2 Model Against Lag1 Model Score Differential (Lag2 - Lag1) 20 Score Differential 10 0 -10 -20 0 20 40 60 80 100 10 0 -10 0 Play 20 40 60 80 Play • Performance resembles a random walk with widely varying outcomes • Distribution of streaks hints at fractal properties • The model with the larger lag will always win in the long run 100 Results: Model against Human Human Against Lag1 Model Human Against Lag1 and Lag2 Models 3 10 0 -10 0 20 40 60 Play 80 100 Score Differential (Human - Model) Score Differential (Human - Lag1) Human vs Lag 2 Human vs Lag1 2 1 0 -1 0 10 20 30 Play • Performance of human against lag1 model is similar to lag2 model • Lag2 model takes time to get started because of longer chunks whereas lag1 model starts faster because it uses fewer shorter chunks Results: Effects of Noise Effect of Noise (Lag2 Against Lag2) Effect of Noise (Lag2 Against Lag1) Noise = 0 Noise = 0.1 Noise = 0.25 300 200 100 0 0.0 0.2 0.4 0.6 Noise Level 0.8 Lag2 Noise = 0 400 Score Difference (Lag2 - Lag1) Score Difference (High Noise - Low Noise) 400 1.0 Lag2 Noise = 0.1 Lag2 noise = 0.25 200 0 -200 -400 0.0 0.2 0.4 0.6 0.8 Noise Level of Lag1 Model • Performance improves sharply with noise, then gradually decreases • Noise fundamentally alters the dynamic interaction between players • Noise is essential to adaptation in changing real-world environments 1.0 Interactive Alignment • Tendency of interacting agents to align communicative means at different levels (Pickering & Garrod 2004) • Task success is correlated with alignment (Reitter & Moore 2007) • More alignment if interlocutors are perceived to be non-human (Branigan et al. 2003) Micro-Evolution • Communities will evolve communicative standards – e.g., Reference to Landmarks, identification strategies for locations (e.g., Garrod & Doherty 1994, Fay et al. in press) Garrod & Doherty 1994 : location identification strategy: counting boxes vs. connections Micro-Evolution • Evolutionary dynamics apply • How do cognitive agents enable and influence evolution? (Pressure? Heat?) Autonomous agents • Can autonomous agents support alignment and communicative evolution? • Interaction of humanoid cognitive models with autonomous agents – as a testbed before testing with humans. – How can communicative behavior of UAVs be adapted to take limitations of human cognition into account? Interaction Experiments • Impact of evolving, interactive communication – Vary constraints on evolution of communication (e.g. fixed vs. adaptive communication channel) – Vary constraints on sharing of communication (e.g. pairwise vs. community communication development) • Impact of fixed, flexible or emergent network organization – Vary network flexibility (e.g. communication beyond grid) – Vary level of information sharing (e.g. information filters) • Accurate cognitive models for human-machine interaction – Adaptive interfaces (e.g. to predicted model workload) – Model-based autonomy (eg. handle monitoring, routine decision-making) Issue 3: Behavior Abstraction • First two issues build solutions toward this one • Study of scaling properties helps capture response function for all aspects of target behavior • Abstraction methodology helps iterate and test models at various levels of abstraction to maximize retention • Issues: – Grain scale of components (generic tasks, unit tasks?) – Attainable degree of fidelity at each level? – Capture individual differences or average, normative behavior? • Latter may miss key interaction aspects outliers • Individual differences as architectural parameters (WM, speed) • Use cognitive model to generate data to train machine learning agent tailored to individual decision makers ACT-R vs. Neural Network Model Neural network model based on same principles (West, 1998;1999) Answer Lag 1 •Simple 2-layer neural network •Localist representation •Linear output units •Fixed lag of 1 or 2 Lag 2 •Dynamics arise from the interaction of the two networks • Network structure (fields) can be mapped to chunk structure (slots) • ACT-R and network both store game instances (move sequences) • ACT-R and network are similarly sensitive to game statistics • Noise plays a more deliberate role in ACT-R than neural network Individual vs Group Models • Model of sequence expectation applied to baseball batting • Key representation and procedures general, not domain-specific • Cognitive architecture constrains performance to reproduce all main effects: recency, length of sequence and sequence ordering • Variation in performance between subjects can be captured using straightforward parameterization of perceptual-motor skills 180 Subject 1 Model All Subjects Model Scaled 160 Mean Temporal Error (msec) Mean Temporal Error (msec) 80 60 40 20 140 120 100 80 60 40 20 0 F S F, F F, S S, F S, S F, F, F F, F, S F, S, F F, S, S Pitch Sequence S, F, F S, F, S S, S, F S, S, S 0 F S F, F F, S S, F S, S F, F, F F, F, S F, S, F F, S, S Pitch Sequence S, F, F S, F, S S, S, F S, S, S Markov Model (Gray, 2001) Basic Markov assumption: Current state determines future • 2 states: expecting fast or slow pitch • Probabilities of switching state as, af and temporal errors when expecting fast and slow pitch Tf, Ts need to be estimated • 2 more transition rules and associated parameters (ak, ab) to handle pitch count Markov vs. ACT-R • State representation – Markov has discrete states that represent decisions – ACT-R has graded states that reflect the state of memory • Transition probabilities – Markov needs to estimate state transition probabilities – ACT-R predicts state change based on theory of memory • Pitch count – Markov has to adopt additional rules and parameters – ACT-R generalizes using established representation • ACT-R is more constrained than Markov model • Similar results for backgammon domain: – Comparable results to NN and TD-learning with orders of magnitude fewer training instances Abstraction Experiments • Impact of Representation Fidelity – Vary degree of model fidelity to determine impact on network dynamics (e.g. high- vs. low-fidelity nodes for specialists vs. generalists) – Determine which model aspects are critical to performance • Impact of Skill Compositionality – Enforce skill composition through standard, common interface and determine impact on performance – Evaluate impact of architectural constructs including working memory support for multi-tasking • Relevant computer science concepts – Abstract Behavior Types • Generalization of abstract data types to temporal streams – Aspect-Oriented Programming • Generalization to allow more complex procedural interaction
© Copyright 2025 Paperzz