Mapping Descriptive Models of Graph Comprehension into

Mapping Descriptive Models of Graph Comprehension
into Requirements for a Computational Architecture:
Need for Supporting Imagery Operations
B. Chandrasekaran and Omkar Lele
Department of Computer Science and Engineering
The Ohio State University
Columbus, OH 43210 USA
[email protected], [email protected]
Abstract. Psychologists have developed many models of graph comprehension,
most of them descriptive, some computational. We map the descriptive models
into requirements for a cognitive architecture that can be used to build predictive computational models. General symbolic architectures such as Act-R and
Soar satisfy the requirements except for those to support mental imagery operations required for many graph comprehension tasks. We show how Soar
augmented with DRS, our earlier proposal for diagrammatic representation, satisfies many of the requirements, and can be used for modeling the comprehension and use of a graph requiring imagery operations. We identify the need for
better computational models of the perception operations and empirical data on
their timing and error rates before predictive computational models can become
a reality.
Keywords: graph comprehension, cognitive architectures, diagrammatic reasoning, mental imagery.
1 Graph Comprehension Models
In this paper, we investigate the requirements for a cognitive architecture that can
support building computational models of graph comprehension. We start by reviewing research on graph comprehension models that psychologists and cognitive scientists have built over the last three decades.
High-Level Information Processing Accounts. Bertin [1] proposed a semiotics-based
task decomposition that anticipated the later information-processing accounts of
Kosslyn [2] and Pinker [3]. These accounts provide a framework to place the rest of
the research. Because of their information-processing emphasis and consequent
greater relevance to computational modeling, we focus on [2] and [3]. Their proposals
have much in common, though they differ in their emphases and details. They envision a process that produces a representation (“perceptual image” for Kosslyn, and
"visual description" for Pinker) in visual working memory (WM), respecting Gestalt
Laws and constraints such as distortions and discriminability. In [3], the visual
A.K. Goel, M. Jamnik, and N.H. Narayanan (Eds.): Diagrams 2010, LNAI 6170, pp. 235–242, 2010.
© Springer-Verlag Berlin Heidelberg 2010
236
B. Chandrasekaran and O. Lele
description is an organization of the image into visual objects and groups of objects –
lines, points, regions, and abstract groupings corresponding to clusters. It is not clear
if this visual description is purely symbolic or if it retains shape information, as Kosslyn’s images do, but keeping the shape information is essential, as we shall see.
In both models, the construction of this internal visual representation is initially a
bottom-up process, but soon after, it is a sequential process in which top-down and
bottom-up processes are opportunistically combined: the state of the partial visual
representation at any stage and the agent's goals trigger the retrieval of relevant
knowledge from Long-Term Memory (LTM), which in turn directs further processes.
Pinker’s account of retrieval of goal- and state-relevant information from LTM
uses the idea of "schemas" (or “frames” as they are often called in AI), organized collections of knowledge about the structure of graphs, in general as well as for various
graph types. Comprehension of the specific graph proceeds by filling in the “slots” –
the graph type, the axes, the scales, quantities represented, etc. – in the schemas. The
schema knowledge guides the agent's perception in seeking the information needed to
perform this slot-filling.
Shah et al. [4]1 propose that graph comprehension can be divided, at a high level,
into two primary processes – a pattern recognition process followed by a bottom-up
and top-down integrative cognitive process, an account consistent with the
Pinker/Kosslyn accounts. Their research on attention during this process highlights
the role of domain-specific background knowledge in guiding the comprehension
process. Their model includes an account of learning: novice graph users, in contrast
to experts, tend to spend more time understanding the graph structure (how the graph
represents information). In the schema language, learning results in the details of the
general schema being filled in so that only the problem-specific instantiation needs to
take place, considerably speeding up the process.
Perception. Graphs contain information that is encoded using graphical properties
such as position, size, and angle. The accuracy of the information obtained from the
graph depends on the how well humans can decode the encoded information. Cleveland and McGill (see, e.g., [5]) identify a set of “elementary graphical perception
tasks,” that is, tasks which they propose are performed instantaneously with no apparent mental effort. They order them in terms of accuracy: Position along common
scale, Position on identical but non-aligned scales, Length, Angle or Slope (with angle
not close to 0, 90o or 180o), Area, Volume, Density, and Color Saturation. Simkin
and Hastie [6] claim that the ordering is not absolute, but changes as the graphs and
conditions change. They further argued that these judgment tasks were not always
instantaneous, but often required a sequence of mental imagery operations such as
anchoring, scanning, projection and superimposition.
Human perception is good at instantly perceiving certain properties, e.g., 90o angle,
the midpoint of a line segment. Anchoring is the use of such standard perceptions to
estimate properties that are not standard. For example, estimating the distance of a
point p on a line segment from one end of the line can be done by a series of midpoint
perceptions of segments containing p. Relative lengths of segments from p to the two
ends of the line can be estimated by the relative scanning durations. Projection is
1
Due to space limitations, we only cite a subset of relevant papers by many authors. The bibliographies of the cited references contain pointers to other relevant work by the same authors.
Mapping Descriptive Models of Graph Comprehension into Requirements
237
when a ray is sent out from one point in the image to another point for the purpose of
securing an anchor. Superimposition is when one object is mentally moved onto another object, such as when a smaller bar might be mentally moved on to a longer one
so that their bottoms align, as part of estimating the ratio of their lengths.
Fig 1. Example from [10]: Need to imagine line extension to answer question extended by the
user
Gillan et al. (see, e.g., [7]) describe a model for graph comprehension, in which in
addition to external perception, imagery operations are used, such as rotate and move
image objects, and mental comparison of lengths of lines. Trafton et al. (see e.g., [8])
draw attention to how the individual steps of information extraction from a representation are integrated to answer a question. Their cognitive integration is functionally
akin to the schema-based comprehension account of Pinker, while their visual integration includes spatial transformations and imagination. Fig. 1 is an example, where the
user has to extend a line in their imagination to answer a question about a value in the
future.
Computational Models. The models discussed so far have been qualitative and descriptive. The two models we discuss in this section, Lohse’s [9] and Peebles and
Cheng’s [10], are both computational and share many features. Both models use a
goal-seeking symbolic architecture with working and long-term memories. Both the
models assume an agent who knows how to use the graph, that is, the early general
schema instantiation steps did not need to be modeled. That still leaves many visual
search, perceptions and information integration tasks that the agent needs to perform.
The models share other features as well: neither model has an internal representation
of the spatiality of the graphical objects, nor use visual imagination, their examples
being simple enough not to need it; and neither of them has a computational version
of external perception, whose results are instead made available to the models as
needed for simulation. Lohse's model could predict the times taken to answer specific
questions based on empirical data on timings for the various operations, and he reported agreement between the predictions and the results, but Foster [11] did not find
such an agreement in his experiments using Lohse’s model.
The research in [10] used ACT-R/PM, a version of ACT-R that has a visual buffer
that can contain the location, but not the full spatiality, of the diagrammatic objects
in the external representation. Procedural and declarative knowledge about using the
graphs is represented in ACT-R’s production rules and propositions. The research
also modeled learning the symbol-variable associations and the location information,
238
B. Chandrasekaran and O. Lele
reducing visual search. Predictions from the model of changes in the reaction
times as questions changed matched the human performance qualitatively, but not
quantitatively.
Putting the Models Together. The process that takes place when a user is presented
with a graph along with some questions to answer can be usefully divided into two
stages: one of general comprehension in which the graph type is understood and some
details instantiated, and a second stage in which the specific information acquisition
goals are achieved. Initial perception organizes the image array into visual elements
(Kosslyn, Shah, Trafton), at the end of which working memory contains both symbolic and diagrammatic information, the latter supporting the user's perceptual experience of seeing the graph as a collection of spatial objects in a spatial configuration
(Kosslyn). After this, the two stages share certain things in common: a knowledgeguided process involving both bottom-up and top-down operations performing visual
and cognitive integrations (Pinker, Shah, Trafton). Perceptions invoke (bottom-up)
schema knowledge (Pinker) about graphs and graph types, which in turn guide (topdown) perceptions in seeking information to instantiate the schemas, and the processes are deployed opportunistically. Comprehension initially focuses on identifying
the type of graph and then on instantiating the type: the domain, the variables and
what they represent, the scales, etc. For information extraction, in simple cases, the
user may apply a pre-specified sequence of operations, as in the methods of Lohse
and Peebles. In more complex cases, increasingly open-ended problem solving may
be necessary for visual and cognitive integration, driven by knowledge in LTM.
While some of the perceptions simply require access to the external image and result in symbolic information to cognition, others require mental imagery operations
(Simkin, Gillan, Trafton), such as imagining one or more of the image objects as
moved, rotated, or extended, and elements combined or abstracted. Relational perception may be applied to objects some of which are the results of imagery operations. Perception may be instantaneous, or involve visual problem solving.
2 Requirements on a Cognitive Architecture to Support Modeling
Graph Comprehension and Use
The family of symbolic general architectures, with Soar and Act-R as two best known
members, has the architectural features required for building computational models of
graph comprehension, with the important exception of supporting mental imagery
operations.
Supporting goal-directed problem solving. The requirements for cognitive integration
– combining bottom-up and top-down activities, representation of schema knowledge,
instantiating the schema to build comprehension models, and using schema-encoded
information acquisition strategies by deploying appropriate perceptions – can be handled by the architectural features of Act-R and Soar. Schemas are just a higher level
of knowledge organization than rules and declarative sentences, and knowledge representation formalisms in Act-R and Soar can be used to implement the schemas. Both
architectures have control structures that can produce a combination of bottom-up
Mapping Descriptive Models of Graph Comprehension into Requirements
239
(information from perception triggering retrieval of new knowledge) and top-down
(newly retrieved knowledge creating new perception goals) behaviors. Appropriate
knowledge can produce needed cognitive integration. Act-R and Soar also support
certain types of learning, with Act-R providing more learning mechanisms than Soar.
The available mechanisms can be used to capture some of the observed learning phenomena [4], as demonstrated in [10].
Supporting Imagery Operations. For imagery operations, the architectures need a
working memory component with a representation functionally organizing the external or internal representation as a spatial configuration of spatial objects, tagging
which objects are from the external representations and which belong to imagery.
Operations to create imagery objects and configurations should be available: imagery
elements may be added afresh, or may be a result of operations such as moving,
rotating, or modifying existing objects so they satisfy certain spatial constraints. Diagrammatic perception operations, by which we mean relational and property perceptions after figure-ground, are to be applied to configurations of diagrammatic objects,
whether the objects correspond to external objects, imagined objects or a combination. There may also be benefits to having some of the perception operators be always
active without the need for cognition to specifically deploy them, so that a certain
amount of bottom-up perception is always available. Such bottom-up perceptions can
be especially useful in early stages.
Treating Perceptions as Primitives vs modeling the computational details. The cognitive mechanisms of Act-R and Soar, especially the former, derive validation to a more
or less degree – both from human problem solving experiments and from neuroimaging studies. However, there is really not much in the way of detailed computational models for perception and mental imagery operations, especially ones that
would replicate timing and error data. Such computational models would have a role
for pre-attentive perception as well, e.g., to explain when certain perceptions are instantaneous and when they require extended visual problem solving. Because of the
lack of computational models, one approach is to treat the internal perception and
imagery operators as primitives and simply program them without concern about the
fidelity of their implementation with respect to human abilities. Models built in this
way will be good for certain purposes, e.g., the effect on agent’s behavior of the
availability or the absence of specific pieces of knowledge and strategies, and not for
others, e.g., predict timing data, or perceptual learning. It should be a goal of the
modeling community to develop computational models of perception and imagery
operations that account for human performance, including pre-attentive phenomena.
Augmenting Architectures with DRS – A Diagrammatic Representation System. We
propose that the DRS representation and the associated perception and action routines
reported in [12] provide the basis for augmenting the architectures in the symbolic
family with an imagery capability. The DRS system as it exists only supports diagrams composed of points, curves and regions as elements, which happens to cover
most of the graphs. DRS is a framework for representing the spatiality of the diagrammatic objects that result after the early stage of perception has performed a
figure-ground separation. DRS is a list of internal symbols for these objects, along
with a specification of their spatiality, the intended representation as a point, curve or
240
B. Chandrasekaran and O. Lele
region, and any explicit labeling of the objects in the external diagram. DRS can also
be used to represent diagrammatic objects in mental images, or a combination from
external representation and internal images, while also keeping track of each object's
provenance. A DRS representation may have a hierarchical structure, to represent any
component objects of an object.
The DRS system comes with an open-ended set of imagery and perception operations. The imagery operations can move, rotate or delete elements of a DRS representation, and add DRS elements satisfying certain constraints, to produce new DRS
structures. Relational perception operations can be defined between elements of a
DRS representation: e.g., Longer(C1, C2), Inside(R1,R2). Operators are also available
to detect emergent objects when diagrammatic objects overlap or intersect. Kurup
[13] has built biSoar, a bi-modal architecture, in which DRS is integrated with Soar.
Matessa [14] has built Act-R models that perform diagrammatic reasoning by integrating DRS with Act-R.
4 Building a Computational Model Using DRS
In this section, we will show the functional adequacy of a general symbolic cognitive
architecture augmented with DRS to build a computational model for graph comprehension. We use biSoar, but we could have used Act-R plus DRS as well. We implemented the scanning, projection, anchoring and superimposition operators [6], the last
three being imagery operations (that is, they create objects in WM corresponding to
imaged objects). We treat scanning as instantaneous only for obvious judgments such
as 50%. If the proportion is say 70%, the agent we model would perform it by a recursive mid-point algorithm until an estimate within a specified tolerance is reached.
While we modeled several graph usage tasks, we will use the model for the graph
in Fig. 1, where a line needed to be mentally extended so as to answer a question
about future, “What might the Y value be for x = 4?”. The model starts with a DRS
representation corresponding to figure-ground separated version of the external representation. This is what perception would deliver to cognition. In this example, the
DRS consisted of the curves for the axes and the graph for the x-y function, the scale
points, and the point for the origin. For convenience in simulation, the entire DRS
representation is not in WM, rather it is kept separately as a proxy for the intensity
array version of the external representation. Depending on attention, parts of this DRS
will be in WM, along with diagrammatic objects resulting from imagery operations.
Certain initial perceptions are automatically deployed at the beginning, that is, without any specific problem solving goal in mind. These initial perceptions are intended
to model what a person familiar with the graph domain might notice when first looking at a graph, such as intersecting horizontal and vertical lines. These will serve as
cues to retrieve the appropriate schemas from LTM, in this case the schema for a
graph of a function in Cartesian coordinates. This schema sets up perceptions to identify the axes, the scale markers, the origin, the functional curve, and the variables.
The schema also has procedures to answer classes of questions, such as the Y value
for a given X, and performing trend estimates, and using them for inferring Y values
for ranges of X not covered by the existing graph. The extension of the graph is now
imagined and added to the DRS. The general procedure is instantiated to call for the
Mapping Descriptive Models of Graph Comprehension into Requirements
241
perception of the point on the extension that corresponds to x =4, which in turn called
for a vertical line from x = 4 to be imagined, and an anchor point mentally created,
and then a projection to be drawn to the Y-axis and another anchor to be created on
Y-axis, and finally the value to be determined. The representational capabilities of
biSoar and the associated perceptions and imagery operations were adequate for all
steps of the process.
The above model, and others we have built, display all the phenomena identified in
our review: visual image in WM, visual descriptions built guided by graph schema
knowledge in LTM, bottom-up and top-down processing for cognitive integration
(Shah, Trafton), goal-driven problem solving, and the use of imagery operations in
cognition (Simkin, Gillan, Trafton). While the level of modeling we described can be
useful to investigate the role of different pieces of knowledge and certain types of
learning, the true usefulness of such models is the potential to predict timing and error
rates in the use of graphs, so that proposed graphs designs can be evaluated. For this
we need human performance data, and, even better, computational models that
reproduce human performance, on a variety of perceptions and imagery operations
required for graph use. Empirical research of this sort, and deeper understanding of
the underlying perceptual mechanisms are needed.
Acknowledgments. This research was supported by the Advanced Decision Architectures Collaborative Technology Alliance sponsored by the U.S. Army Research
Laboratory under Cooperative Agreement DAAD19-01-2-0009.
References
[1] Bertin, J.: Semiology of Graphs. University of Wisconsin Press, Madison (1983)
[2] Kosslyn, S.: Understanding charts and graphs. Applied Cognitive Psychology 3, 185–226
(1989)
[3] Pinker, S.: A theory of graph comprehension. In: Feedle, R. (ed.) Artificial Intelligence
and the Future of Testing, pp. 73–126. Erlbaum Hillsdale, New Jersey (1990)
[4] Shah, P., Freedman, E.: Toward a model of knowledge-based graph comprehension. In:
Hegarty, M., Meyer, B., Narayanan, H. (eds.) Diagramatic Representation and Iinference,
pp. 18–31. Springer, Berlin (2002)
[5] Cleveland, W.S., McGill, R.: Graphical perception and graphical methods for analyzing
scientific data. Science 229(4716), 828–833 (1985)
[6] Simkin, D., Hastie, R.: An information-processing analysis of graph perception. Journal
of the American Statistical Association 82(398), 454–465 (1987)
[7] Gillan, D.J.: A Componential model of human interaction with graphs: VII. A Review of
the Mixed Arithmetic-Perceptual Model. In: Proceedings of the Human Factors and Ergonomics Society 52th Annual Meeting, pp. 829–833 (2009)
[8] Trafton, J.G., Ratwani, R.M., Boehm-Davis, D.A.: Thinking graphically: extracting local
and global information. Journal for Experimental Psychology 14(1), 36–49 (2008)
[9] Lohse, J.: A cognitive model for the perception and understanding of graphs. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems: Reaching
Through Technology, New Orleans, Louisiana, pp. 137–144 (1991)
[10] Peebles, D., Cheng, P.C.: Modeling the effect of task and graphical representation on response latency in a graph reading task. Human Factors 45(1), 28–46 (2003)
242
B. Chandrasekaran and O. Lele
[11] Foster, M.E.: Evaluating models of visual comprehension. In: Proceedings of Eurocogsci
2003: The European Cognitive Science Conference 2003. Erlbaum, Mahwah (2003)
[12] Chandrasekaran, B., Kurup, U., Banerjee, B., Josephson, J.R., Winkler, R.: An architecture for problem solving with diagrams. In: Blackwell, A.F., Marriott, K., Shimojima, A.
(eds.) Diagrams 2004. LNCS (LNAI), vol. 2980, pp. 151–165. Springer, Heidelberg
(2004)
[13] Kurup, U.: Design and Use of a Bimodal Cognitive Architecture for Diagrammatic Reasoning and Cognitive Modeling, Ph D. Thesis, Columbus. The Ohio State University,
Ohio (2007)
[14] Matessa, M., Archer, R., Mui, R.: Dynamic Spatial Reasoning Capability in a Graphical
Interface Evaluation Tool. In: Proc. 8th International Conference on Cognitive Modeling,
Ann Arbor, MI, USA, pp. 55–59 (2007)