Running Head: Conceptual Flexibility The Costs of Supervised Classification: The Effect of Learning Task on Conceptual Flexibility Aaron B. Hoffman Department of Psychology University of Texas at Austin Austin, TX Bob Rehder Department of Psychology New York University New York, NY Send all correspondence to: Aaron Hoffman The University of Texas at Austin 1 University Station A8000 Austin, Texas 78712-0187 Email: [email protected] 1 Abstract Research has shown that learning a concept via standard supervised classification leads to a focus on diagnostic features whereas learning by inferring missing features promotes the acquisition of withincategory information. Accordingly, we predicted that classification learning would produce a deficit in people’s ability to draw novel contrasts—distinctions that were not part of training—as compared to feature inference learning. Two experiments confirmed that classification learners they were at a disadvantage at making novel distinctions. Eye movement data indicated that this conceptual inflexibility arose due to (a) a narrower attention profile that reduces the encoding of many category features and (b) learned inattention that inhibits the reallocation of attention to newly relevant information. Implications of these costs of supervised classification learning for views of conceptual structure are discussed. 2 The Costs of Supervised Classification: The Effect of Learning Task on Conceptual Flexibility Two assumptions have guided the study of concept learning ever since Hull (1920). The first is that category learning amounts to learning to provide a common label for sets of objects. This assumption is explicit in the field’s standard supervised classification task, in which people label a series of visually presented stimuli. Learning is supervised, as experimenters wait for a response and correct the learner after every trial. Additional stimuli are often tested to assess generalization patterns. Experimenters vary the structure of the categories being learned (i.e., the distribution of features within and between categories) to establish the success of one model over another (e.g., that a model that abstracts the category prototype is superior to one that memorizes the category exemplars or vice versa). One influential example of this kind of work is Shepard, Hovland and Jenkins (1961), who studied the difficulty of learning to classify eight exemplars into two categories for different category structures. Over the years, research has followed Shepard et al.'s lead by teaching people to group objects into a small number of arbitrary sets—usually two—and asking what kinds of representations are formed. However, there are many others tasks besides supervised classification that require concepts. Inferring hidden properties of a plant (e.g., edible or not), using everyday objects (e.g., setting an alarm clock), and learning without an explicit teacher (unsupervised learning) are a few examples of tasks that likely contribute to what we know about concepts. Even so, in spite of the conceptual nature of most realworld tasks, category learning research has not (yet) accounted for the effects of those tasks on concept formation. In fact, there have been very few studies examining how concepts arise from different kinds of learning tasks (but see Inference versus Classification below), and it is important to know whether representations formed in one learning task generalize to others. A second assumption in category learning research is that categories learned in one context transfer to another. Consider the goal of distinguishing rose bushes from raspberry bushes. If the most diagnostic feature is the presence of berries, then the berry feature should receive the most attention (since both plants have thorns) (Nosofsky, 1984; Rehder & Hoffman, 2005a; Shepard et al. 1961). However, when one must later distinguish raspberry from cranberry bushes, thorns are suddenly diagnostic, because while both have red berries, only the raspberry bush has thorns. In general, when different categories of 3 objects are compared, which sets of features are diagnostic and to what degree depends on the specific categories being contrasted. As we will show in the section below, The Attention Optimization-Flexibility Tradeoff, people’s ability to transfer knowledge across contexts requires a level of flexibility that seems to be at odds with findings from the supervised classification task and the models that have been developed to account for those findings. The central reason that current learning models may fail to yield flexible concepts is the sorts of attentional learning mechanisms they assume. The mechanisms of attention in many computational models of category learning (Kruschke, 1992; Erickson & Kruschke, 1998; Kruschke & Johansen, 1999) suggest that people learn to attend to only that information needed to distinguish the (usually, two) categories being acquired. Although this leads to efficient learning, after learning one classification in which, say, cue A is most diagnostic, people should have trouble learning a second classification in which B is the good cue, because prior classifications have taught people to ignore it (Kruschke, 1996a; Kruschke, Kappenman, & Hetrick, 2005). Thus, while people don’t seem to have trouble classifying across different real-life contexts, our models would suggest that they should. How do we resolve the mismatch between the flexibility of real-life categorization and the rigidity of the categories acquired via experimental classification tasks? The hypothesis we explore here is that perhaps people acquire many categories in ways other than supervised classification in order to provide them with the flexibility needed to use their conceptual knowledge in different situations. Our goal, then, is to explore the relationship between learning task and the flexibility of conceptual representations. We will accomplish this by comparing the standard classification task with a feature inference task (Yamauchi & Markman, 1998), along two points of comparison. The first is learners’ flexibility in applying knowledge about previously learned categories to novel contrasts. The second is how different tasks lead to different patterns of attention allocation throughout learning. In the following section we review evidence that human learners produce what we consider an optimal, but less flexible, attention profile in the standard classification task. We then review findings from inference learning studies and suggest that it may lead to less optimal but more flexible category representations. Because our arguments hinge on the idea that classification and inference will produce different attention 4 profiles, to attain an online measure of attention, subjects’ eye movements were tracked while they learned categories. Because the use of eye tracking and concept learning is still relatively new, the final section preceding our two experiments reviews relevant work. The Attention Optimization-Flexibility Tradeoff That people can learn to attend optimally to stimulus dimensions underscores how classification performance and attention are linked: By attending optimally, a learner maximizes performance. It is generally accepted that people will attend to the most relevant dimensions when learning categories via supervised classification (Anderson, Ross, & Chin-Parker, 2002; Nosofsky, 1984). Selective attention during category learning was first proposed by Shepard et al. (1961) who discovered that standard principles of stimulus generalization alone could not account for the relative difficulty of learning categories. They tested the relative difficulty of learning six category structures with one, two, or three dimensions required to classify the stimuli and found that learning difficulty was correlated with the number of relevant dimensions: Discriminations that required the use of one dimension were easier to learn than those requiring two, which were in turn easier than those requiring three. These results were explained by assuming that people were attending to the relevant dimension(s) and ignoring the rest. Other studies (e.g., Kruschke, 1993) have shown that tasks with opportunities to use fewer dimensions (filtration tasks) are easier than tasks requiring more dimensions (condensation tasks). The intuition is simple: less to process, less to learn. According to this view, selective attention is a simplifying mechanism, which people use to reduce the difficulty of learning (also see Haider & Frensch, 1999; Lee & Anderson, 2001). In response to such results, categorization models have incorporated the assumption that people will eventually optimize their attention to the most relevant dimensions. For example, the well-known exemplar-based classification model, the generalized context model (GCM), assumes “…that people will distribute attention among the component dimensions so as to optimize performance in a given categorization paradigm. That is, it is assumed that the [attention] parameters will tend toward those values that maximize the average percentage of correct categorizations.” (Nosofsky, 1986, p. 41) 5 In the GCM and related models (e.g., ALCOVE, Kruschke, 1992), as well as in prototype variants (Nosofsky, 1992; Smith & Minda, 1998), attention changes the influence of dimensions. Increasing attention to relevant dimensions that predict the category label and decreasing attention to irrelevant dimensions that don’t improves classification performance by exaggerating differences between items of opposite categories. Whether people will eventually learn to attend optimally when learning concepts via classification training is unquestioned. But does the classification task on its own result in representations that are similar to those of real-world concepts? This question arises from a problem we have observed with the notion of attention optimization. To continue the above quote, “The distribution of attention weights that optimizes performance will depend on the particular category structure under investigation.” (Nosofsky, 1986, p. 41) In other words, optimizing attention for one category contrast is not always optimal for another. This implies that the attention profile learned to improve classification performance in one context might actually reduce performance in another. For example, when people learn to distinguish between possums and raccoons in one context, and between cats and dogs in another, then the attention profiles acquired may reduce people's ability to distinguish between possums and cats. The consequence of ignoring irrelevant dimensions for one set of category contrasts means that the learner has to re-attend (and learn about) those dimensions when familiar categories are contrasted in novel ways. In this manner, the heralded powers of selective attention assumed by present theories may have a hidden cost when previously irrelevant dimensions become relevant. Inference versus Classification The question of how attention profiles transfer to novel category contrasts may be less of a concern with other learning modes. Other tasks, where the learner is more focused on the properties of categories, may yield more flexible representations that readily support novel contrasts. Research that has expanded the array of tasks that test concept acquisition theories (Markman & Ross, 2003; Markman, Yamauchi, & Makin, 1997) led us to consider a task that may produce flexible conceptual representations. Whereas classification involves predicting the category label from features, 6 feature inference learning involves predicting a missing feature from other features and the category label. So rather than determining that a plant is a raspberry bush because it has thorns and berries, the inference task asks learners to determine whether a raspberry bush has thorns, or some other property. The feature inference task has received a substantial amount of recent attention because there is evidence that classification and inference learners process and represent conceptual information differently. For example, Chin-Parker and Ross (2002) found that classification learners acquired the expected sensitivity to features to distinguish between categories. When categories were learned via inference, however, subjects become especially sensitive to within-category correlations of features. That is, when learning via inference, people were not only learning the relationship between berries and the label “raspberry bush,” but they also noticed the correlation between the types of leaves and the presence of berries, or between the size of the thorns and the shape of the leaves (also see Anderson & Fincham, 1996). Similarly, Chin-Parker and Ross (2004) manipulated the diagnosticity and prototypicality of feature dimensions and found that classification tasks caused learners to become sensitive to just diagnostic features whereas inference learners were also sensitive to nondiagnostic prototypical features (also see Anderson, Ross, & Chin-Parker, 2002; Sakamoto & Love, 2006; Yamauchi & Markman, 2000a; b). Finally, Yamauchi and Markman (1998) found that inference learners were more likely than classification learners to infer prototypical category features, and they were less likely to infer features associated with actual training items. Moreover, inference learners were able to learn linearly-separable category structures in fewer trials than classification learners. Thus, in spite of descriptions of the inference and classification tasks being formally identical (Anderson, 1991), there is the possibility that the nature of category representations can differ depending on the nature of the learning task. This evidence suggests that whereas classification learning may foster attention to the diagnostic dimensions that serve to distinguish categories, inference learning may focus categorizers on withincategory information. Our hypothesis is that because the within-category information acquired by inference learners is not tied to any particular set of contrast categories, such knowledge yields a more general and flexible representation. As a consequence, with respect to novel contrasts, inference learners may be at an advantage over classification learners. 7 To test this hypothesis, we asked people to learn the four categories A, B, C, and D in Table 1. Each category possesses three binary dimensions whose values are designated as 0 or 1. These categories were acquired in two learning phases. Participants were first presented with examples from categories A and B and later with examples from categories C and D. We manipulated whether those categories were learned via supervised classification (e.g., an exemplar from either category A or B is presented and the subject must respond with “A” or “B”) or feature inference (e.g., an exemplar labeled an A or B is presented and the subject predicts a missing feature). Following the two learning phases, participants were presented with test items requiring them to make a novel contrast, to designate an item as either A or C, A or D, B or C, or B or D. We predict that, because they will attend selectively, classification learners will perform poorly on these novel contrasts. Because they will attend to dimension 1 while learning As and Bs, and to dimension 2 while learning Cs and Ds, they will learn to ignore dimension 3 which is required to distinguish As and Cs. In contrast, because inference learners will acquire within-category information about all three dimensions, they should be successful on the novel contrasts. Performance on the novel contrasts will speak to this article’s central question: How do people acquire concepts so they can be used flexibly in many category contrasts? Results showing that supervised classification learning produces relatively inflexible representations will highlight the need to augment that sort of learning with other kinds of learning, like inference. An additional goal of this paper is to uncover why inference training produces a different conceptual representation. To answer the question of whether people attend differently in inference and classification tasks, we used eye tracking as a measure of category learners’ online attention allocation. Using Eye Fixations to Measure Attention The use of eye fixations as a measure of attention has enjoyed success in numerous research areas. There are many sources of evidence linking attention with eye movements, each underscoring how hard it is to fixate one part of our environment while something significant is occurring elsewhere. From the neuroscience literature, Moore and Armstrong (2003) demonstrated that the strength of the perceptenhancing effects of attention in perceptual brain areas (V4) is linked to retinotopic stimulation of frontal eye fields. Shepard, Findlay, and Hockey (1986) have demonstrated that although attending without 8 making corresponding eye movements is possible, it is not possible to make an eye movement without shifting attention. More generally, the close link between attention and eye movements has been shown across a variety of cognitive tasks (see Liversedge & Findlay, 2000 and Rayner, 1998 for reviews). Mechanisms of attention are neurally and behaviorally coupled with eye movements. There have also been demonstrations that eye movements can be a good measure of attention in learning experiments. For example, Kruschke et al. (2005) used eye tracking to contrast Kruschke’s (2003) theory of attentional learning with the Rescorla-Wagner model—two learning theories with different explanations of the well-known blocking phenomenon. Whereas Rescorla-Wagner proposes that nothing is learned about a blocked cue, attentional learning theory proposes that people indeed learn something—that the cue should be ignored. Using eye tracking, Kruschke et al. found that people fixated the blocked cue less than not blocked cues, supporting the idea that learned inattention is the superior explanation for blocking. This finding corroborated Kruschke and Blair’s (2000) discovery that people are slower to associate the blocked cue to a new outcome, presumably because they have stopped attending it. Finally, eye tracking has also been applied to the specific domain of category learning. For example, Rehder and Hoffman (2005a) used eye tracking to test the attentional assumptions of Nosofsky, Palmeri, and McKinley’s (1994) rule-plus-exception (RULEX) model with Kruschke’s (1992) exemplarbased ALCOVE model. Testing the Shepard et al. (1961) category structures, Rehder and Hoffman found that most people fixated all dimensions at the onset and then decreased the number of dimensions fixated to only the relevant ones, a result predicted by ALCOVE but not RULEX (and directly confirming Shepard et al.'s conjecture that learners optimize their attention as part of concept acquisition). In another study, Rehder and Hoffman (2005b) replicated Medin and Schaffer’s (1978) 5-4 category structure with an eye tracker and found that fixation times to stimulus dimensions matched the decisions weight estimated from behavioral responses of one model (Nosofsky’s, 1984; 1986, generalized context model) but not a prototype model (Reed, 1972). As illustrated by this review, eye tracking is considered a good measure of learners’ selective attention in a variety of cognitive tasks. The current study follows this work by measuring eye movements as people learn novel categories through classification or feature inference. We use these data both to 9 investigate how inference learners acquire more within-category information and to assess learners’ attention profiles as they make novel category contrasts (an entirely new test of learners’ conceptual knowledge). Such attention profiles will inform when and how learners apply category knowledge in a flexible manner. Experiment 1 Across two training phases participants learned about the categories A, B, C, and D in Table 1 via inference or classification while we monitored their eye fixations to the three feature dimensions and the category label. A test phase examined classification performance and attention profiles as people made novel category contrasts. Following prior research, we expected classification subjects to learn to ignore the irrelevant dimensions during training and that such attention optimization should correspond to good classification performance on the trained contrasts, but a difficulty in making novel classifications. In contrast, prior research has demonstrated a tendency for inference learners to acquire within-category information, suggesting a general motivation to learn about all the dimensions in the inference task. Such motivation can potentially produce flexible category representations, that is, ones that support novel contrasts. Measuring eye movements during training will help explain differences between learning conditions. Method Participants. Twenty-four New York University students participated for course credit. They were tested individually and assigned randomly, in equal numbers, to standard classification or to an inference task. Subjects were also assigned to one of six counterbalancing conditions corresponding to different ways of distributing the three dimensions to screen locations. Materials. Subjects learned categories of “ceremonial symbols.” Stimuli were generated from pairs of dimensions that had been pretested for approximately equal discrimination times (Hoffman, 2008). The features of the symbols were 2 degrees of visual angle in diameter. There were four degrees separating each feature and the center of the CRT. Examples are shown in Figure 1. The top left of each symbol contained the category label. The other locations contained features. 10 The eye tracker was an SMI Eyelink I with a 1/4 to 1/2 degree of visual angle margin of error and 250 Hz temporal resolution. Student participants had normal vision or normal with corrective lenses. The eye tracker produced a rectangular gaze-contingent (GC) window of 4 x 4 degrees of visual angle centered on subjects’ gaze, so that when their gaze fell within the window, the feature in the display became visible, but if their gaze fell outside the window, the feature became jumbled. Gaze-contingence ensured that subjects could only extract feature information by fixating its location. Figure 1D shows a jumbled stimulus that was presented to the subjects when they weren’t looking at a stimulus feature. The jumbling was intended to prevent any use of peripheral vision to extract stimulus information, ensuring that when subjects were not fixating a location, they could not use the information in that location. Classification Task 1: A versus B. Table 1 presents a three-dimensional structure with categories A, B, C, and D. Subjects were trained on these categories using different contrasts. First, categorizers learned to contrast As versus Bs. To classify As and Bs, they needed to use dimension 1, in which feature-value 1 predicted category A, and 0 predicted category B. Dimension 2 was irrelevant, with 1s and 0s occurring in each category equally. Dimension 3 contained a 1 for all category A and B members, so it could also not be used to discriminate A from B. Figure 2 shows the experimental procedure. Before each trial, we presented a drift correction, in which the subject fixated the point in the center to display the stimulus. In the classification trials, the top left of the stimulus contained category labels. For example, “A B” meant that the subject must decide whether the item is an A or a B, the order indicating that the left button corresponds to A and the right button to B. The buttons assignments varied randomly from trial to trial. There was no response deadline for classifying. However, immediately after a response, the category location was replaced with the correct category label, producing a chime for a correct response, or a buzzer if incorrect. The complete stimulus with the correct category remained on screen for 4 s. Each subject made classifications with both label orderings, which disassociated buttons and labels. The two unique category A items and two unique category B items were presented six times each, in random order (24 trials), with half of the items having the A B label order and the other half with the B A label order. Training continued for five blocks. 11 Classification Task 2: C versus D. After classifying As and Bs, subjects in the classification condition learned a second contrast, between Cs and Ds. As Table 1 shows, this contrast required use of dimension 2, with 1s predicting Cs, and 0s Ds. Dimensions 1 and 3 are irrelevant. Thus, the task was identical to the A versus B task, but with the relevant dimension switched. Note that whereas Classification Task 1 had five blocks, Task 2 had four. The additional block in the first task was intended to allow learners to acclimate to the procedure. Inference Task 1: Category A and B. The inference condition was nearly identical to the classification condition. Like classification learners, inference learners first learned about categories A and B. However, instead of predicting the category label (i.e., classifying), inference learners predicted missing features. Figure 2 also provides an example inference trial. The bottom left of the stimulus contained a feature option, indicating that the subject must decide which feature belongs there. The relative position of the feature options indicated which button to press for each option. The left button selected the feature on the left, and the right button selected the feature on the right. Unlike the classification task, perfect performance on the inference task was unattainable. As Table 1 shows, when predicting missing features in As and Bs one should infer a 1 on dimension 1 for an A and a 0 for a B; for dimensions 3 one should always predict a 1. But predictions on dimension 2 must be at chance because neither the category label nor the other feature dimensions provide any information about the correct value on that dimension. Inference learning on categories A and B lasted for five blocks. In each block, each of the four category exemplars was presented six times. Every exemplar was presented with each of the three dimensions queried twice (once for each feature order). Exemplars were presented in random order, for a total of 24 trials per block. Inference Task 2: Category C and D. Inference learning continued with the second set of categories in Table 1 for four blocks. As for the first inference task, perfect performance is attainable on only two of the three dimensions. Switch Task. After the first two tasks, both classification and inference subjects were presented with classification trials involving contrasts between categories that were unpaired during training. For 12 example, they were presented with a member of category A or C and asked to classify it into the correct category. Other novel contrasts involved category B versus C, B versus D, and A versus D. Importantly, correct responding on these novel contrasts required the contrast dimension 3 which had been previously irrelevant during training. Dimensions 1 and 2 yield a maximum accuracy of only 75% and thus cannot be used to attain perfect performance on these classification trials. Whereas the subjects in the classification condition continued with the same task that they had been engaged in all along, the inference subjects were now asked to classify, for the first time in the experiment. Accordingly, additional instructions were provided to the inference group before engaging in the switch classification task. As in the classification training tasks, feedback was provided for all subjects. Subjects completed four blocks of 24, randomly ordered switch-contrast trials. Each block was constructed by sampling with replacement from the 16 unique switch trials. Results Learning. Figure 4 shows the average proportion correct for classification and inference training conditions, with the inference condition subcategorized into three trials types depending on the dimension queried. The valid cue inference trials test the dimension that covaries with the category label: dimension 1 during A-B inference training and 2 during C-D training. The contrast cue inference trials test the dimension needed during the switch task, namely dimension 3. Finally, the invalid cue inference trials test the remaining dimension: dimension 2 during A-B training and 1 during C-D training. We first consider performance during AB training, blocks 1-5. Figure 4 indicates that performance of the classification learners improved rapidly and ended up nearly perfect (.98) in the last block. This was expected given that a single dimension could be used to predict category membership. Inference learners also showed improvement on both the valid and contrast cue trials, reaching accuracy of .86 and .98 in block 5, respectively. The more accurate performance on the contrast cue trials is likely because the same value was always the correct answer, whereas the correct answer on valid cue trials depended on the category label. Interestingly, accuracy on valid cue trials was also worse than classification performance despite the fact that those tasks are formally identical (classifiers used 13 dimension 1 to predict the category label whereas inference subjects used the category label to predict dimension 1). On these trials inference learners were marginally lower than the classification group in the last block, t(21) = 1.74, p < .10. There are a couple of differences between the tasks that may explain these effects: the inference task required predicting values on three dimensions as compared to just one for the classifiers (the category label itself) and the inference learners also had to contend with invalid cue trials on which it was impossible to exceed chance performance. Indeed, Figure 4 confirms that invalid cue trial accuracy remained at chance throughout training. A similar pattern of performance was observed during CD training. By the last block, classification accuracy reached .99 and inference accuracy on the valid and contrast cue trials reached accuracy of .81 and .95, respectively. Note that once again valid cue inference trials were less accurate than classification trials, t(21) = 2.66, p < .05, and that subjects were at chance on the invalid cue trials. Fixations: AB and CD. We next examined how learners’ overall attention profiles differed between the two tasks. Figure 5 shows the proportion of fixations to the category label and feature dimensions over AB and CD blocks, as a function of task. From prior work (Rehder & Hoffman 2005a; b) we predicted that at the beginning of learning, the typical classification learner would fixate dimensions about equally, and this is what we found: In block 1 of Figure 5 (top); the proportion of time that classification learners fixated each feature location did not differ, F < 1. In the first trial, classification subjects fixated the category location more than the average of the other locations (M = .40 vs. M = 0.20), p < .01. Presumably this effect is a result of classification subjects having to refer to the category location to make their classification response (e.g., to determine whether, on this trial, the left or right button was associated with category “A”). Our prior work predicted a shift towards an optimal attention profile during the course of classification training, and Figure 5 confirms that the present classification learners moved their fixations from irrelevant to relevant dimensions, until irrelevant dimensions were fixated rarely or not at all. Over blocks, fixations shifted away from dimensions 2 and 3, and towards dimension 1, until by the end of AB training, people fixated only the category label and the valid cue. 14 Attention optimization also occurred in the CD training blocks. Classification subjects learned to fixate the category label and valid dimension 2 and ignore dimensions 1 and 3. On the last block of CD training, subjects were splitting their fixations between dimension 2 and the category label. Eleven of the 12 subjects failed to fixate dimensions 1 and 3 even once during the last block of CD training. Including the present experiment, subjects’ tendency to optimize their attention in classification has been replicated several times, as subjects generally fixated only those locations necessary for the task (Rehder & Hoffman, 2005a). A difference between AB and CD training blocks is that fixations were not as evenly distributed at the onset of CD training as they were for AB training. The average proportion of time fixating dimension 2 on the first trial of CD training was between -0.01 and 0.14. (All reported intervals reflect 95% confidence.) Moreover, 10 of 12 subjects did not fixate this dimension at all. The average proportion of time fixating dimension 3 on the first trial of CD training was between -0.02 and 0.09 and 10 of 12 classification subjects never fixated this dimension. This initial pattern of CD attention occurred of course because dimensions 2 and 3 had been irrelevant during AB training, and this learned pattern of attention transferred to the CD task. Nevertheless, learners quickly adjusted their allocation of attention in the manner appropriate for the CD contrast. Not surprisingly, the fixations of the inference learners (Figure 5, bottom) exhibit a very different pattern of attention. As in the classification condition, all features locations were initially fixated about equally. But unlike the classification condition, there was little if any change in those fixations during the course of training. However, recall that these subjects were making inferences on all three feature dimensions, and so it is unsurprising that, when fixations are averaged over the three different types of inference trials, subjects exhibit attention to each of those dimensions. Also unlike the classification learners, inference subjects spent less time attending the category label compared to the average feature location (M = .12 vs. M = .29), p < .01. Much of this effect (but not all, see below) was due to the inference learners’ need to attend to the queried feature location to determine which button (left or right) was associated with which feature value. 15 The different pattern of fixations between the two types of training was confirmed by a 2 x 3 x 9 ANOVA applied to the feature location fixations in Figure 5 with training condition (classification vs. inference), dimension (1–3), and block (1–9) as factors. The three-way interaction was both large, η2p = 0.62, and reliable, F(16, 336) = 34.306, p < .01, indicating that the effect of block on fixation patterns depends on training condition. A separate 3 x 5 ANOVA for AB training of classification learners yielded a large two-way interaction, F(8,88) = 12.06, MSE = 0.006, p < .01, η2p= 0.52, as did the 3 x 4 ANOVA for CD training of classification learners, F(6,66) = 8.59, MSE = 0.005, p < .01, η2p= 0.43, confirming that fixations changed over the course of both AB and CD classification training towards an optimal attention profile. There were no reliable changes for the inference condition in fixation patterns over training. A more revealing analysis of the inference condition is shown in Figure 6, which presents average proportion fixations broken down by inference trial type. The left panel shows proportions of fixations when the valid dimension was queried. Importantly, the panel reveals that the inference subjects are devoting substantial time fixating the non-queried invalid and contrast dimensions; indeed, they are fixating those dimensions about as often as the category label itself. The remaining panels tell a similar story. Regardless of whether it was the invalid dimension (middle panel) or contrast dimension (right panel) being queried, subjects fixated the other, non-queried feature dimensions. This is in stark contrast to the nearly zero fixations allocated to irrelevant dimensions by classification subjects in Figure 5. Whereas it was expected that people would fixate the category label and the to-be-predicted feature dimension, it was unnecessary for subjects to fixate the two non-queried feature dimensions because those dimensions were uninformative regarding the value of the missing feature. But rather than optimizing attention and focusing on the category label alone, Figure 6 indicates that subjects continued to fixate those other dimensions up through the final training block. Of course, attending to the contrast dimension 3 may enable the inference learners to perform well on the final switch trials, a question we now turn to. Switch-trial performance. Learning a concept is often thought of as classifying objects into contrasting categories. As people improve their classification performance, they optimize attention to the most relevant dimensions, as observed in both AB and CD classification training in the current 16 experiment and in multiple prior experiments. In contrast, the inference condition saw relatively little change in how people attended to the stimulus dimensions during training as learners continued to fixate all feature dimensions on each inference trial, including those not being queried. As a result of this difference in how people allocate attention in the two tasks, we predict that inference training yields more flexible category representations than classification training does. To test this prediction, we examined both the inference and classification groups’ classification performance on switch trials, in blocks 10 through 13. Recall that in the switch trials, instead of As versus Bs, and Cs versus Ds, the contrasts switched, so on one trial a subject might classify A versus C, or B versus D. These contrasts required dimension 3 to accurately classify all exemplars. Because the classification group learned to ignore dimension 3, they are predicted to perform worse in the switch trials compared to the inference learners, who fixated that dimension throughout training. The following analysis is a strict test of the hypothesis that inference learning leads to flexible category representations, because it does not credit inference learners for also engaging in a novel task. For the previous 9 blocks, inference learners had not been (explicitly) classifying As versus Bs and Cs versus Ds, but were instead predicting features associated with those categories. It will be noteworthy if inference learners outperform classification learners in this new classification task. This in fact is what we found. Figure 4, blocks 10-13, shows superior performance of the inference learners over classification learners on the switch trials. Whereas, the classification group performed as well in the first block of switch trials (0.76) as they did in the first block of CD training (0.78), the inference group was near perfect (0.97). The effect of training task on performance in the first switch block was significant, t(21) = 5.30, p < .01, and large, η2p = 0.57. In other words, despite having to engage in a new task, inference learners were much more effective at applying their knowledge of As, Bs, Cs, and Ds, to the new category contrasts. Figure 4 also shows that inference learners’ accuracy remained at ceiling during the four switch blocks, F(1, 10) = 1.96, MSE = 0.000, p > .05, η2p = 0.16. In contrast, classification learners improved their performance substantially from their first switch block, M = 0.76, to their last, M = 0.93, F(1, 11) = 20.21, MSE = 0.009, p < .01, η2p = 0.65. 17 First-block test performance. We next set out to discover why the inference group was more accurate than the classification group. Was switch performance harmed by classification training or helped by inference training? To answer this question, Figure 7 (left panel) presents accuracy in the first block of switch trials for both training groups divided into quarter blocks, that is, averaging over trials 16, 7-12, 13-18, and 19-24. The figure also includes performance on the first block of classification trials during AB and CD training for comparison. In the first quarter block classification subjects performed near chance during both AB and CD training, an expected result given that subjects were responding to categories for the first time. In contrast, Figure 7 shows that the classification and inference groups in fact both performed above chance levels during the first quarter block of switch trials: classification subjects had a proportion correct between 0.68 and 0.87, and inference subjects between 0.82 and 1.00. However, while both groups benefited from prior training, inference subjects benefited the most. A 4 x 2 ANOVA between inference and classification learners over the four quarter blocks of switch trials, yielded a large and reliable effect of training condition, F (1, 21) = 28.01, MSE = 0.035, p < .01, η2p = 0.57, reflecting that inference subjects outperformed the classification subjects; a separate analysis of the first quarter block revealed that this advantage also obtained in the first quarter block, t(21) = 2.21, p < .05. In the 4 x 2 analysis, the effect of quarter block was not significant, F < 1, because the inference group was at ceiling and the classification group was frozen at an intermediate level of performance. There was no interaction, F < 1. The above analysis suggests that there was an initial benefit from classification training in the first quarter block of switch trials. However, there was also evidence that the classification group’s performance was harmed by prior training. Examining the classification groups’ learning curves over the entire first block of switch trials in Figure 4 we observed that whereas for both AB and CD classification, accuracy improved rapidly over the first training blocks, classification learners did not improve during the first block of switch trials. A 4 x 3 ANOVA analyzing the effect of quarter block and the three classification learning phases performed by the classification group revealed that phase interacted with quarter block, F(6,66) = 2.53, MSE = 0.04, p < .05, η2p = 0.19, confirming that some contrasts improved faster than others. A 4 x 2 ANOVA tested the interaction between two levels of the contrast factor (CD 18 and switch) and quarter blocks, yielding F(3,33) = 4.12, MSE = 0.041, p < .05, η2p = 0.27, reflecting that performance showed improvement over time on the CD trials but not the switch trials. First-block fixations. There are two critical observations in the above analyses: (1) classification learners optimized attention and (2) classification learners did not respond as accurately to new category contrasts as the inference learners. One explanation for the two findings is that optimizing attention for the classification task trades off with classification flexibility. In contrast, inference learners did not optimize attention but made novel contrasts easily. To examine the relationship between attention and novel contrasts more directly, Figure 7 (right panel) compares fixations to the contrast dimension 3 during the first block of switch trials for the classification and inference learners. (For comparison, we include fixations to dimension 1 during the first block of AB classification training and to dimension 2 during the first block of CD classification training. Thus, the figure presents fixations to the “relevant” dimension during each classification phase.) If classification training harmed learners’ performance by preventing their redirection of attention to dimension 3, fixations to dimension 3 should be suppressed relative to AB and CD training. In fact, Figure 7 shows fewer fixations to the relevant dimension in the first block of switch trials relative to AB and CD training. The fixation data in Figure 7 were submitted to a 4 x 3 ANOVA (quarter block by category contrast). This yielded a main effect of category contrast, F(2,22) = 5.24, MSE = .079, p < .05. A planned comparison showed that whereas there was no reliable difference between overall fixations to the relevant dimensions during AB and CD training, F < 1, the switch trials exhibited reliably fewer fixations overall (M = 0.11, SD = 0.12), compared to the average of AB and CD (M = 0.27, SD = 0.11), p < .05, η2p = 0.44, confirming that fixations to the relevant dimension were suppressed during the switch trials. Discussion The purpose of Experiment 1 was to test the hypothesis that classification training produces less flexible category representations because learners optimize their attention profile for a particular contrast, thus yielding poorer performance on novel category contrasts. Consistent with this hypothesis, we found that subjects trained to infer features were more successful on the novel contrasts compared to those trained via classification. This difference obtained even though both groups of subjects were exposed to 19 the same data, that is, the same exemplars from category A, B, C, and D—what differed was that one group predicted missing features whereas the other predicted missing category labels. Apparently, the supervised classification task yields representations that generalize less well to novel contrasts. Notably, inference learners were not just better than the classification group, but virtually perfect on the first six switch trials, a remarkable result given that these subjects were classifying for the first time in the experiment. Depending on the learner’s goals, the flexibility resulting from inference training could reflect a more useful category representation than classification training. The eye tracking results indicated that that were two sources of the greater flexibility found in the inference condition. First, the different attention profiles induced by the two tasks is an important variable mediating what category information is encoded. Whereas classifiers optimized attention to the single dimension that predicted category membership, inference learners fixated all feature dimensions throughout training. Indeed, they fixated not only the dimension being queried on the current trial, they fixated the other, nonqueried dimensions, apparently because they knew those dimensions would be queried on future trials (Rehder, Colner, & Hoffman, 2009). This greater attention to (and presumably encoding of) all feature dimensions served the inference learners well, as knowledge of the contrast dimension 3 thus acquired allowed them to perform almost perfectly on the subsequent switch trials. Second, the attention profile induced by classification training also retarded learning during the switch phase. The learned inattention to dimension 3 resulted in the classification group exhibiting slower learning during the switch trials as compared to during their AB and CD learning. Note that the switch and AB/CD tasks aren't perfectly comparable, because the former requires distinguishing four categories as compared to two in the latter. However, the classification learners’ few fixations to contrast dimension 3 during the early switch trials (and their absence of improvement during those trials) is suggestive of negative transfer from their initial classification training. Experiment 2 In Experiment 1, the feature inference task resulted in better performance on novel contrasts as compared to classification training. We attributed this result to the inference learners’ sustained attention to the critical contrast dimension during training and to the classification group’s learned inattention to 20 that dimension during the switch trials. In Experiment 2 we conducted another test of the learned inattention hypothesis by equating the classification and inference groups on their need to attend to the contrast dimension. We accomplished this by changing the inference task so that the contrast dimension was never queried. Not querying the contrast dimension should reduce learning of that dimension of course, and this in turn should eliminate part of the inference group’s advantage on the switch trials relative to the classification group. However, if the classification group but not the inference group also experiences learned inattention as we claim, then the classification group should continue to exhibit relatively worse switch performance. We introduced one other change to address a potential concern regarding our interpretation of eye movements during inference learning. As mentioned, Experiment 1’s inference learners attended to the nonqueried feature dimensions on each inference trial, a finding we took as reflecting their desire to learn about those dimensions (because they would be queried on future trials). In fact, the amount of time spent fixating the nonqueried dimensions was similar to the amount of time subjects spent fixating the category label itself. Nevertheless, a more mundane explanation is that those fixations merely reflected a need to search the display to determine which dimension was being queried. Recall that inference training consisted of three types of randomly-interleaved trials, one for each dimension. As a result, at the start of an inference trial the subject’s first job was to determine which dimension was being queried. Thus, fixations to nonqueried dimensions might merely reflect this display search rather than (as we have claimed) the inference learners anticipating which dimensions would be queried in the future. Accordingly, a second change in Experiment 2 was to signal to inference learners which dimension was being queried by dashing the line leading from the fixation point to the queried dimension, eliminating their need to search the display. As a result of this change, fixations to nonqueried dimensions will provide more unambiguous evidence that subjects are trying to learn those dimensions. Method Twenty-four New York University students participated for course credit. They were tested individually and assigned randomly, in equal numbers, to standard classification and inference tasks. Methods were identical to Experiment 1 with the following changes: 21 Dimension 3 was never queried. This reduced the number of trials per block to 16 (from 24). To equate the two conditions, the number of trials per block in the classification condition was also reduced to 16 (i.e., each exemplar was presented four times, twice per label-presentation order). To remove learners’ uncertainty about the location of the queried dimension, a dashed line was used to indicate the location of the queried dimension. Figure 3 provides examples. Note that a dashed line was also added to the classification condition (and to the switch trials). Subjects were not told ahead of time exactly which dimensions would be queried. Switch-classification blocks now contained 32 trials, with every possible novel contrast tested twice (rather than a random selection of 24 as in Experiment 1). Since the most important contrast is how people attend in the very first block of these switch trials, the number of switch blocks was reduced to two, from four. Finally, to get a better sense for how people attend in different category contrasts, in a final block of 48 trials, subjects had an opportunity to classify every possible contrast, including the original A versus B, and C versus D contrasts. Results Learning. In Experiment 1, we observed an advantage in classification over the inference condition in both AB and CD training. In Experiment 2 the number of queried dimensions was reduced from three to two, since the contrast dimension was never queried. Did the previously observed learning advantage in the classification condition replicate? AB and CD training performance (blocks 1-5, and 6-9, respectively), are presented in Figure 8. Classification performance and inference performance on the valid cue trials both improved over training blocks. As in Experiment 1, however, classification subjects outperformed inference subjects in their respective tasks, even though in Experiment 2 inference learners predicted only two of the three feature dimensions. On their last AB training block, the classification learners scored an average proportion correct of 0.98 (SD = 0.03) as compared to 0.84 (SD = 0.27) for the inference learners. The inference learners were performing above chance levels in predicting the valid cue, t(11) = 4.46, p < .01, but scored 22 marginally lower than the classification group, t(22) = 1.81, p < .10. Inference learners were at chance on the invalid cue trials of course, a result necessitated by the category structure in Table 1. The CD training blocks exhibited the same pattern. The average classification subject scored around 0.98 (SD = 0.04) on the last CD training block as compared to 0.77 (SD = 0.30) for the inference learners. Once again, the inference learners scored above chance, t(11) = 3.17, p < .01, but performed worse than the classification learners, t(11) = 2.42, p < .05. In spite of reducing the number of queried dimensions from three to two, the inference task remained harder than the classification task. Fixations: AB and CD. Figure 9 shows the proportion of fixations to category label and dimensions over AB and CD blocks, as a function of task. In the classification condition, we expected the same pattern of attention as observed in Experiment 1, because, with the exception of having fewer training trials per block, the classification condition was identical across experiments. The results replicated the findings in Experiment 1 and earlier work: At the beginning of learning, the classification learners fixated dimensions about equally and then shifted fixations from the irrelevant dimensions 2 and 3 to the relevant dimension 1, until by the end of AB learning the irrelevant dimensions are fixated rarely or not at all. This pattern of attention transferred to initial CD training in block six, but subjects then redirected attention to the valid dimension 2 (and away from the now invalid dimension 1). All of the basic findings for the classification condition observed in Experiment 1 were replicated in Experiment 2. We next examined eye movements in the inference condition. Recall that whereas all dimensions were queried in Experiment 1, in Experiment 2 the contrast dimension was never queried. As expected, throughout learning, inference learners largely ignored the contrast dimension (see Figure 9). Even in the first block, inference learners fixated the contrast dimension significantly less often than the other two feature dimensions and the category label (all ps < .01). Following Experiment 1, we examined eye fixations as a function of queried dimension and training block (Figure 10). Recall that in Experiment 1, on each inference trial subjects fixated all feature dimensions, even those not being queried on that trial. However, those fixations might have been due to subjects’ need to search the display to determine the queried dimension, a need eliminated in Experiment 2 by a dotted line from the fixation point to that dimension. Unsurprisingly, Figure 10 indicates that 23 inference subjects were fixating the queried dimension between 56 % and 68% of the time, a result also seen in Experiment 1 (Figure 6). However, it also reveals many fixations (between 12% and 20% of total fixation time) to the feature dimension not being queried on that trial, that is, to the valid dimension when the invalid dimension was queried and to the invalid dimension when the valid dimension was queried. Indeed, the other feature dimension was fixated about as long as the category label itself. Apparently, inference learners fixate the other dimension in order to learn about it, because they know it will be queried on a future trial, a point we return to in the General Discussion. In contrast, learners rarely fixated the never-queried dimension 3 (between 3% and 10%). Switch-trial performance. At the end of Experiment 1 we examined inference and classification learners’ performance on switch-trials to assess how well their training transferred. We found that the inference group was better able to apply their category knowledge to new contrasts (and to a new task)—a result we attributed to inference learners’ attention to the contrast dimension throughout learning and classification learners’ attention optimization that prevented reallocation of attention to a previously irrelevant dimension. As a result of not querying the contrast dimension in Experiment 2’s inference condition, we made that dimension as irrelevant to inference training as it was to classification, and indeed in this experiment fixations to the contrast dimension were rare in both groups. As a result, inference and classification learners may now both struggle to switch attention to the contrast dimension. On the other hand, because the inference learners never directed their attention to the contrast dimension (because it was never queried), they never had to learn to direct their attention away from it either. In the absence of such learned inattention, the inference group redirect attention to the contrast dimension more readily. Figure 8 presents accuracy on the switch-classification trials (blocks 10 and 11). In spite of the absence of queries on the contrast dimension during training, the inference learners achieved better switch-trial performance than classification learners. The inference group was more accurate than the classification group during both the first block (M = 0.93, SD = 0.09 vs. M = 0.83, SD = 0.16), t (22) = 1.95, p = .064, and the second (M = 0.99, SD = 0.02 vs. M = 0.89, SD = 0.15), t (22) = 2.26, p < .05. This result is remarkable given that the inference learners ignored the contrast dimension during training every 24 bit as much as the classification learners. Apparently, simply ignoring a dimension during learning does not entail difficulty in using that dimension in the future. Instead, inference learners quickly shifted attention to newly relevant information. We next explored further why inference learners were able to perform better than the classification group on switch trials. Figure 11 presents classification performance (left panel) and eye movement data (right panel) during the first block of switch trials, broken down by quarter blocks, with four trials in each. Figure 11 also includes first block AB and CD classification performance for comparison. As expected, subjects were performing at chance at the beginning of both AB and CD training. In contrast, on the switch trials classification subjects were above chance, between 0.68 and 0.87, reflecting the benefit of prior training. Nevertheless, inference learners’ performance, between 0.82 and 1.00, was significantly better than classification learners, t(21) = 2.21, p < .05. Note that, unlike Experiment 1, improvement among the classification subjects on the switch trials seemed to occur as rapidly as it did for the inference subjects, in the first switch block. A test for an interaction revealed no significant difference between training type and quarter block, F < 1. An analysis of different switch trials confirms that one reason classification subjects were performing worse than inference subjects is their failure to redirect attention to the contrast dimension. On items that didn’t require the contrast dimension for correct classification, the classification group’s accuracy was between 0.81 and 0.95. However, when correct classification required the contrast dimension, performance dropped to between 0.68 and 0.87; performance on the two item types was reliably different, t(11) = 3.09, p < .01. Moreover, classification learners’ failure to use the contrast dimension is revealed directly in the eye movement data in Figure 11: Throughout the first switch block the classification learners fixated the critical contrast dimension about half as often as the inference learners. Indeed, fixations to the contrast dimension were suppressed relative to those to the relevant dimensions in the first block of AB and CD training. As in Experiment 1, the classification group is reluctant to attend to information needed to obtain accurate switch trial performance. What is the underlying cause of the inference group’s greater willingness to redirect attention to newly relevant information? To help answer this question, Figure 12 presents learners’ attention 25 allocation to the contrast dimension as a function of trial for the first block of AB (trials 1-16) and CD training (trials 81-96). The figure shows that the largest difference between the two conditions was that the classification learners attended more to the contrast dimension early in learning. In the first 16 trials of AB training, the classification condition allocated about twice as much fixation time to the contrast dimension (M = 0.16, SD = 0.07) as the inference condition did (M = 0.09, SD = 0.06), F(1, 18) = 4.87, MSE = 0.073, p < .05, η2p = 0.21, and was slower to ignore the contrast dimension, as indicated by a trial by task interaction, F(15, 270) = 2.06, MSE = 0.008, η2p = 0.10, p < .05. A similar pattern obtained in CD training. The classification condition allocated more fixations (M = 0.08, SD = 0.05) to the contrast dimension than the inference condition did (M = 0.04, SD = 0.04), F(1, 18) = 2.92, MSE = .034, η2p = 0.14, p = .104, and was slower to ignore the contrast dimension, F(15, 270) = 2.67, MSE = 0.008, p < .01. (More accurately, this interaction reflects an inverted u-shaped pattern, in which the classification condition first increased fixations and then decreased fixations to the contrast dimension.) The different patterns of attention reflect the different reasons the two groups ignored the contrast dimension. Inference subjects ignored it because they were never explicitly queried on that dimension, and they realized that it was not part of the task. Classification subjects learned to ignore the contrast dimension, as they gradually discovered that the contrast dimension didn’t help them classify As from Bs or Cs from Ds. This is why there is an initial increase in fixations to the contrast dimension in the first CD block, because classification learners reattended it, and then learned that it was useless in classifying Cs and Ds. Classification learners’ fixation results reflected a learned inattention to the contrast dimension, which probably caused their difficulty in attending to the contrast dimension during the switch trials. Old versus New Contrasts Finally, Experiment 2 also included a block 12 in which learners in both conditions classified every contrast within the same block, in random order. They classified both old contrasts (AB and CD) and the switch contrasts (e.g., AD). We can now assess how well inference and classification learners performed on the original training items, and how it is that learners are capable of switching their attention between multiple contrasts. If inference learners represented the categories in a flexible manner, 26 they should perform well on the original category contrasts, and they should perform well in switching between contrasts. Figure 13 shows subjects’ performance during Block 12. The figure reveals that subjects in general did well on all three types of contrasts, the two original AB and CD contrasts, and the third set of switch contrasts. All groups were performing well above chance in all category contrasts, ps < .01. The two tasks were performing at about the same levels overall, F(1,22) = 1.82, MSE = 0.030, p = .191. Examining individual contrasts we can see that perhaps the classification group outperformed inference learners on AB and CD contrasts, whereas the inference group outperformed the classification group on switch contrasts. This difference is not very surprising given that the classification group received four or five blocks if practice in classifying AB and CD, whereas the inference group had none. Finally, we examined how attention switched between contrasts across trials. Did people exhibit contrast-specific attention? Figure 14 shows subjects’ average proportional fixations for the three contrast types, AB, CD and switch. The figure shows that the majority of attention was allocated to the category location. This makes sense because the learner need to look there first to determine which categories were under consideration i.e., which category contrast. The figure also shows that for both groups, fixations shifted to reflect contrast-specific attention. For AB contrasts the majority of fixations were on dimension 1, for CD on dimension 2, and for the switch contrasts on dimension 3. This is the first demonstration that we know of that people switch their attention profiles to fit which categories they are considering. The figure shows that the two groups were optimized for different contrasts. For example, the inference group was very good at ignoring dimensions 1 and 2 for the switch contrasts, but was not so good at ignoring the contrast dimension for AB and CD contrasts. By comparison, the classification group seemed somewhat better at ignoring the contrast dimension for AB and CD contrasts, but did not tend to ignore them as much for the switch contrasts. It is interesting to note that these different levels of optimization for the different contrasts mirrors the different levels of performance across the two groups in Figure 13. That is, proportion correct seemed generally correlated with superior optimization patterns. The 3 x 3 x 2 interaction (dimension by contrast by task) indicated that this observation was reliable, F(4,88) = 3.841, MSE = 0.006, p < .01, η2p= 0.15. To simplify the follow-up analysis we 27 averaged the amount of time spent on the irrelevant dimensions for the original contrasts and for the switch contrasts. Consistent with Figure 14, inference subjects spent more time fixating the irrelevant dimensions for the original contrasts (M = 0.14, SD = 0.05) than did the classification subjects, (M = 0.10, SD = 0.04), t (22) = 2.73, p < .01. Conversely, classification subjects spent more time fixating the irrelevant dimensions for the switch contrasts (M = 0.08, SD = 0.05) than did the inference subjects, (M = 0.02, SD = 0.04), t (22) = 2.76, p < .01. Discussion The purpose of Experiment 2 was to further test our claim that learned inattention contributes to classification learners’ poor performance on novel contrasts relative to inference learners. In Experiment 1, we attributed that result to not only learned inattention but also to the fact that the inference task induces a broad attention profile versus the classification task’s narrower one—one that naturally enables the encoding of category information required to make novel contrasts. In Experiment 2, we equated the two learning conditions on their need to attend to the contrast dimension 3 during training by never asking inference learners to infer a feature on that dimension. That inference learners rarely fixated the contrast dimension during training indicated that this manipulation was effective; indeed, the inference group was less likely than the classification group to fixate that dimension. But despite this difference, the inference learners still exhibited better performance on the switch trials than did the classification learners, just as they did in Experiment 1. Analyses of individual switch trials indicated that this difference arose because of classification learners’ poor performance on just those switch trials that required the critical contrast dimensions. And, eye tracking data directly confirmed the classification learners’ reluctance to attend to that dimension during the switch trials. In the General Discussion we discuss reasons for learned inattention in supervised classification learning in more detail. General Discussion Flexible representations—those that support classification in novel contexts—are a necessary property of any intelligent system. On the one hand, there are undoubtedly contexts in which people repeatedly classify objects into a relatively small number of categories (see below for examples), and it is important of course that in those situations our cognitive system performs accurately and efficiently. But 28 there are at least as many other cases that require unique discriminations—ones in which an object’s potential categories have never previously appeared together. For example, if while driving on a dark and windy night one encounters an ambiguous object in the road, it might be an injured dog, a sleeping drunk, or an errant trashcan; if while looking through your front door’s peephole in response to a knock, the distorted figure on the other side might be a police officer or a robber (or a to-be-avoided court officer trying to serve papers). Choosing effective courses of action requires that we be as accurate and efficient on these novel contrasts as on those with which we have repeated experience. In this research we investigated the effect of learning task on the resulting flexibility of conceptual representations. Our focus was on the central task in laboratory studies of category acquisition, namely, supervised classification learning. Experimental studies of this task—which run into the hundreds if not thousands—have had by far the most influence on current theories of how categories are learned and represented, and dozens of computational models have been proposed to account for the large database of empirical findings. And yet it is not always clear how the category representations observed in such studies map onto the kinds of categories that people learn and use everyday (Murphy, 2005). In fact, the very results of these hundreds of studies suggested to us that supervised classification learning is likely to produce inflexible representations, ones that might be effective at the small set of discriminations on which people are trained but which generalize poorly to novel contrasts. To assess this hypothesis directly, we compared classification learning with feature inference learning that previous research suggested might lead to the acquisition of more complete, and hence more flexible representations. As expected, two experiments found that those subjects trained via supervised classification learning were less accurate on new discriminations. Whatever may be their other virtues, the representations produced by supervised classification learning alone are insufficient to support the important novel contrasts that our cognitive systems are required to carry out everyday. In the remainder of this article we consider, first, the reasons for the relatively poor transfer performance that results from supervised classification learning. Our use of eye tracking allowed us to identify two separate costs of classification learning. We then discuss the kinds of real-world tasks for which supervised classification training either is or isn’t likely to produce advantageous representations. 29 As inference training is an example of how a task can yield differences in what people represent conceptually, we devote some discussion to how inference and classification differ, how attention allocation can explain these differences, and more generally how attention allocation can serve as the explanatory variable that mediates what is eventually learned in a task. Cost #1: Limited knowledge from narrow focus The first way that classification learning reduces conceptual flexibility is by restricting learners’ attention to a small subset of diagnostic dimensions. Recall that category flexibility was assessed by having learners make novel category contrasts that were not part of the original training. Because the category structures were arranged so that the relevant dimensions switched, in phases, across three dimensions, we predicted, on the basis of a large body of previous research (e.g., Shepard et al. 1961; Nosofsky, 1984; Rehder & Hoffman, 2005a), that classification learners would learn to attend to diagnostic dimensions and ignore irrelevant ones. As predicted, in both Experiments 1 and 2, classification learners quickly allocated eye movements almost exclusively to the diagnostic dimensions. But because the (so-called) irrelevant dimensions were required for the novel contrasts we also predicted that classification learners would struggle on the switch trials, and in fact in both Experiments 1 and 2 classification learners’ performance on novel discriminations lagged behind the performance of those whose training task (feature inference learning) induced a broad attention profile instead. This result obtained despite the fact that (a) both groups were exposed to exactly the same training exemplars, and (b) the feature inference learners were switching to a new task (classification). In other words, the muchheralded powers of selective attention to rapidly produce accurate and efficient classification performance have a dark side, namely, representations that fail to generalize well to new situations. It is important to note some potential limitations of the category structures we tested. For example, a critic might note that we produced inflexible representations only by initially training classification learners on discriminations that required the smallest possible number of dimensions, namely one. In contrast, most real-world categories exhibit a family resemblance structure in which multiple features are required to establish category membership, suggesting that the narrow attention profile observed in the present experiments might not have obtained with more realistic categories. But in 30 response, we would argue that most pairs of family resemblance categories can be distinguished on the basis of a relatively small number of features, and that learners are likely to focus on those dimensions at the expense of others. For example, we suspect that a child being pressed by parents to learn the difference between cats from dogs will quickly discover that only a few cues are needed to do the job (e.g., body size, ear shape, and “meow” vs. “woof”). Indeed, Hoffman (2008) found that learners shift eye fixations toward those family-resemblance dimensions needed to support a particular discrimination and away from those irrelevant to that distinction (also see Rehder & Hoffman, 2005ab). But although this “training” may placate the adults, it is unlikely to help the child learn that dogs and cats have paws and tails, eat, like affection, breathe, usually don’t bite, and the dozens of other properties that, while they fail to discriminate each other, will be needed to discriminate cats and dogs from members of other categories. Clearly, some sort of other experience with category members will be needed to fill the large gaps in the child’s knowledge about cats and dogs that arises from supervised classification learning alone (see below for ideas). Another caveat concerns the stimuli we tested. Not all objects are composed of dimensions that are as spatially-separated as they were in our “ceremonial symbols,” and it is possible that the learning deficit experienced by classification learners might have been less pronounced had we used stimuli with spatially integrated features. Consistent with this conjecture, Allen and Brooks (1991) found that subjects learned information about the environment in which category exemplars (a type of schematic animal) appeared even when they were told the correct classification rule beforehand. It is likely that this result obtained because learners had to search for the item in the display, and attention thus devoted to the environmental context was sufficient for learners to incidentally encode that information (also see Brooks, Squire-Graydon, & Boyd, 2007; Thibaut & Gelaes, 2006). On the other hand, there are many sorts of stimuli composed of “parts” that appear in predictable locations, providing learners ample opportunity to attend to some parts at the expense of others. (Indeed, in a subsequent experiment, Allen and Brooks found that context learning was eliminated when items were presented at a fixed position on the computer screen.) And, of course, it is well known that people can selectively focus on “separable” dimensions even when those dimensions are spatially integrated in the same object (Garner, 1978). 31 Although in this article we emphasize the incomplete representations formed by supervised classification learning, it would be a mistake to conclude that they learned nothing about the categories besides diagnostic information. Indeed, although they lagged behind the feature inference learners, in both our experiments the classification groups performed above chance when first confronted with novel contrasts, indicating that they acquired some category information that was irrelevant to the contrasts on which they were trained. Moreover, the supervised learning of categories has been shown to yield more information than some other learning procedures. For example, Bott, Hoffman, and Murphy (2007) found that learners acquired more information when predicting an outcome they knew to be a category label as compared to predicting a meaningless outcome (a low or high tone). This occurred despite the use of a “blocking” paradigm in which both groups of subjects were first trained on a single cue that predicted the outcome perfectly (also see Hoffman & Murphy, 2006). But although this result indicates that people might be intrinsically motivated to learn about categories as compared to meaningless outcomes even during supervised classification, note that even these subjects learned less than half of the category’s features. Supervised classification learning appears to be a poor means for producing complete and flexible category representations. In this study we have focused on how the failure to attend and thus encode category information prevents novel discriminations, but of course it prevents other sorts of important inferences as well. For example, discriminations will be required at higher or lower levels of abstractions. When learning about roses versus raspberry bushes, the feature of having thorns occurs in both cases, and is thus not diagnostic. However, thorns reflect a superordinate category of “thorned plants.” Narrowly focusing on only diagnostic features (e.g., berries) prevents the learner from acquiring this superordinate category information, which would allow the classifier to readily distinguish between thorned and not-thorned plants in the future. In addition, the absence of category information will prevent the category-based inductions that are the reason for learning categories in the first place. For example, below we review evidence that, as compared to supervised classification learning, feature inference learning promotes the learning of internal category information (e.g., prototypical features and interfeature correlations) that 32 may not be useful for distinguishing two particular categories, but nevertheless allows one to predict missing features in objects once they have been classified. Cost #2: Learned Inattention Whereas the first cost concerns what is learned about features, the second concerns what is learned about attention and how learned attention transfers to subsequent learning events. In Experiment 1 feature inference learners performed far better on the initial novel contrasts than classification learners, a result we have attributed to the feature inference group having learned more about the category. However, we also found that the deficit experienced by the classification group persisted during all four blocks of switch trials, and that their rate of improvement during those trials lagged behind what was observed during their initial training. Eye tracking data provided a ready explanation: Classification learners were slow to attend to just that information required for acquiring good performance on the novel discriminations—one can hardly learn a new feature-to-category association if one is ignoring the feature. To demonstrate directly that classification learners were experiencing a deficit of not only knowledge (about the category) but also of attention (that transferred to subsequent learning), in Experiment 2 we equated the two tasks so that neither required knowledge of the critical contrast dimension, and once again found that classification learners exhibited a reluctance to attend to that information—and thus, not surprisingly, slow learning of the novel contrasts. In other words, the second cost of supervised classification learning is that it trains classifiers to ignore information that might become relevant in the future. The phenomenon of learned inattention is not new, of course—it relates to classic learning phenomena such as latent inhibition (Lubow 1989), blocking (Kamin, 1969), and highlighting (Kruschke, 1996b; Medin & Edelson, 1988), and surrounding theory (Kruschke 2001; Kruschke 2003; Kruschke & Blair 2000, Mackintosh, 1975; Sutherland & Mackintosh, 1971). All three phenomena occur when cues are paired with other cues that are more predictive or salient, and it is found that learning of any such cues is reduced. In fact, latent inhibition corresponds exactly to the classification condition found in Experiments 1 and 2, where once irrelevant cues become relevant. Latent inhibition predicts that learning about previously irrelevant cues is hard—a fact we verified in our Experiments. 33 Attention learning theory (Mackintosh, 1975; Sutherland & Mackintosh, 1971), so far as we know, does the best job in accounting for latent inhibition, standard and backward blocking, and highlighting. On this account, people shift attention towards those dimensions that reduce error and away from those dimensions that increase error. Kruschke implemented formal descriptions of attention learning theory, combining them with models of category learning in (EXIT, Kruschke & Blair, 2000; RASHNL, Kruschke & Johansen, 1999) and as a result was able to account for these phenomena and the benchmark classification data that all formal theories must account for. Thus, classification tasks that contain irrelevant dimensions produce rigid category representations because the attentional demands of the task allow the learner to reduce the number of attended dimensions to the minimum, which in turn makes learning about previously irrelevant dimensions hard. Experiment 2 also revealed that while learned inattention harmed the transfer performance of the classification learners, it did not harm performance of the inference learners. Although our eye tracking data showed that inference subjects largely ignored the never-queried contrast dimension during training (in fact, they fixated that dimension even less than the classification learners), they, unlike the classification learners, quickly reallocated attention to this newly relevant information when required to make novel contrasts. Apparently, learned inattention is not the same thing as simply failing to attend. Rather, what distinguishes the groups is that inference learners never had to learn to ignore the contrast dimension as the classification subjects did. Because inference learners were shown where to look (the dashed line) they never had to learn the hard lesson that a dimension was irrelevant to their task; as a result they didn’t have to learn an even harder lesson when it became relevant later on. Consequently, inference learners were capable of quickly reallocating attention to newly relevant information when it was appropriate to do so. These results underscore an important difference between the attention profiles acquired through trial and error learning and those that arise out of the direction of the task or experimenter. It seems that ignoring features as a result of discovering that they are irrelevant over numerous trials is qualitatively different than having a visual indicator showing where to look. So the history of how the learner acquired an attention profile is as important as the eventual attention profile itself. 34 The Benefits of Supervised Classification: Categorization as Cognitive Skill In summary then, the attention profile induced by traditional supervised classification learning tends to limit both what one learns about a category and what one attends in the future, factors that work against novel discriminations. However, this is not to say that this sort of (overly) optimized attention is always bad. Indeed, the expertise literature is rife with classification scenarios for which supervised learning seems especially apt, namely, those that involve repeatedly categorizing items into a small number of categories. For example, expert chick sexers must examine the ‘vent’ region of 800 to 1200 one-day-old chickens per hour and accurately identify their sex. Chick sexing is a highly specialized task, in which category knowledge is used in a limited way, revealing task demands similar to supervised classification learning. As in most laboratory studies, in chick sexing there are only two categories, and the only goal in learning the chick-sex category is for sorting. And, the task is considered extremely difficult and requires years of training (Horsey, 2002). Examples of this kind of expert classification are commonplace (e.g., wine and coffee tasters who discriminate among varietals, region, and year, etc.). Moreover, there is ample evidence that expert classification involves the same sort of attention optimization that characterizes supervised classification learning. For example, in the well-known study of Chi, Feltovich, and Glaser (1981), expert but not novice physics problem solvers ignored problems’ superficial features and grouped them according to the underlying principles. Conversely, Lesgold, Rubinson, Feltovich, Glaser, Klopfer, and Wang (1988) found that experts were able to make more accurate diagnoses of x-rays by attending to more relevant cues than novices What these examples make clear is that the expert classification that results from extended supervised learning are examples of a cognitive skill, that is, an ability that emerges after long practice to perform a particular task efficiently and accurately. In this light, it is easy to see how the costs of supervised classification become benefits instead. Attention optimization allows the expert to attain high levels of performance by avoiding the allocation of processing resources to information known to be irrelevant, and indeed there are numerous studies in the skill acquisition literature demonstrating how skilled performance involves learning to ignore irrelevant cues (even some using eye tracking, e.g., Haider & Frensch, 1999; Lee & Anderson, 2001). Learned inattention is also unproblematic here because 35 extended training exposes experts to a wide range of problems, which allows them to determine all sources of information needed to carry out their task. After years of training, there is little chance that a chick-sexer will suddenly need access to new information to accurately sex chicks. Learned inattention might even be beneficial, preventing the expert from being distracted by features that are novel but still irrelevant to their task. But despite the obvious importance of expert classification, it is equally clear that much of our life does not involve repeated classification into a small number of categories. Each day we carry out perhaps thousands acts of classification involving perhaps hundreds of categories. And while many of these are routine (the toothpaste appears in the same place every morning), as our examples above indicate, many others involve sets of candidate categories that the classifier may have never encountered before or ever again. Rather than relying on highly practiced and thus rigid attention profiles, in these situations the categorizer must, on the fly, allocate attention to the most diagnostic features for the categories in the contrast set. In fact, our Experiment 2 showed this kind of dynamic attention allocation in action, as learners allocated attention in a manner appropriate for the type of distinction they were required to make. Of course, one would expect that the need to adjust attention depending on context is common in many cognitive tasks, and this is just what has been found. For example, when items are compared (e.g., to assess their similarity), which features are attended to will vary dynamically depending on how they correspond to the features of the contrasted items (Tversky, 1977; Markman & Gentner, 1993; 1996; Medin, Coley, Storms, & Hayes, 2003). But what dynamic attention requires to function of course are representations that are robust— that include information on many feature dimensions. In the case of classification, this information must be available to sort objects into the large number of different sets of candidate categories with which a classifier might be confronted. One needs to know that raspberry bushes have berries to distinguish them from rose bushes today, but that they have thorns to distinguish them from cranberry bushes tomorrow. But these are exactly the representations that supervised classification alone appears ill suited to provide. 36 Feature Inference Learning The possibility that supervised classification learning would lead to relatively impoverished category representations led us to consider whether feature inference learning had the potential to produce the representations needed for novel contrasts. Consistent with this idea, feature inference learning resulted in better performance on novel discriminations as compared to supervised classification. Our use of eye tracking allowed us to attribute this flexibility to the broader spread of attention that the inference task induces (and that promotes the sort of robust and complete representations we have argued are needed for novel contrasts) and to the absence of learned inattention to irrelevant dimensions that later become relevant. Note that the inference task’s broader spread of attention manifested itself in two ways. First (and not surprisingly), on each feature inference trial, subjects fixated the queried dimension. Second, on each trial subjects also fixated the feature dimensions that they expected would be queried on future trials (a conclusion supported by our finding that Experiment 2 inference learners quickly stopped fixating a dimension that wasn’t queried). This sort of anticipatory learning in which subjects acquire category knowledge on which they are being directly queried and receive feedback (supervised learning) and which they expect they will be queried but don’t receive feedback (unsupervised learning) has been seen in other studies. For example, in an eye tracking study Rehder et al. (2009) compared two accounts of the feature inference task. On one hand, researchers have argued that inference learning produces motivation for the classifier to learn what the categories are like, with attention to within-category feature correlations and prototypical information (Chin-Parker & Ross, 2002; 2004; Markman & Ross, 2003; Yamauchi & Markman, 1998), an account that predicts that learners will fixate most features on most trials, regardless of whether they are ever queried. On the other hand, others have argued that, rather than classification enhancing one’s motivation to learn, it simply involves acquiring rule-like associations between the category label and the features (Johansen & Kruschke, 2005; also see Nilsson & Ohlsson, 2005; Sweller, Hayes, & Newell, 2006), an account that predicts that only the category label and the tobe-predicted dimension are fixated. But contra both of these accounts, and as in the present study, Rehder et al. found that inference learners fixated the currently queried dimension and dimensions they expected 37 would be queried on future trials (but not those they knew would be never queried) (also see Anderson et al. 2002; Sakamoto & Love, 2006). Note that whereas anticipatory learning would appear to emerge naturally from the requirements of the feature inference learning task, there appears to exist no comparable analog in classification. Because their task is so constrained, classification learners have no indication that additional knowledge may be required; each classification trial is essentially identical: predict the category label, feedback, repeat. Because only the category location is ever queried, there is no anticipation driving the learning of other information. As a result, they narrow their focus exclusively to diagnostic features. Although we believe that anticipatory learning is one of the key properties of feature inference learning that helps it produce the robust (i.e., complete) representations, it is also important to note that this knowledge was stored flexibly, that is, in a manner that allowed it to be used for a new type of task. Note that during training, inference learners were predicting the features on the basis of the category label, but that during the switch trials they had to go in the opposite direction, from features to the category label. This flexible use of knowledge hints at the possibility that extended supervised classification and inference training might also tend to recruit different memory systems. Anderson, Fincham, and Douglass (1997) have argued that one characteristic that distinguishes declarative and procedural memory systems is that the former but not the latter supports symmetrical access. For example, whereas verbal learning researchers studying declarative memory found that a stimulus was as easily retrieved given the response as the response from the stimulus (Ekstrand, 1966; Horowitz, Norman, & Day, 1966; Paivio, 1971), procedural memories have been shown to exhibit asymmetry of access because a stimulus retrieves its response but not vice versa (Anderson, 1987; Anderson & Fincham, 1994; Rehder, 2001). We note that the ease with which the present inference learners used knowledge in the reverse direction suggests knowledge stored in a more declarative form. In contrast, because procedural memory is thought to support the acquisition of novel associations slowly, incrementally, and only after multiple repetitions (Cohen & Eichenbaum, 1993; Dienes & Berry, 1997; Glisky, Schacter, & Tulving, 1986; Musen & Squire, 1993; Nissen, 1992; Squire, 1992; Tulving, Hayman, & Macdonald, 1991) it 38 might be more involved in the cases of expert classification that we have argued that extended supervised classification learning is likely to engender (Ashby, Alfonso-Rees, Turken, Waldron, 1998). Of course, one might expect that the more complete representations that result from feature inference learning would have other benefits as well, and research reviewed earlier confirms that this is the case. As mentioned, Chin-Parker and Ross (2004) found that inference learners learned prototypical but undiagnostic features better than classification learners. And, Chin-Parker and Ross (2002) found that inference but not classification learners learned within-category feature correlations. This latter result is especially notable given that knowledge of the correlations was not an explicit part of feature inference training—subjects could simply predict missing features on the basis of a category label and ignore the other features. Rather, we suspect that anticipatory learning induced a broad attention profile in which most features were fixated on most inference trials, which in turn allowed subjects to incidentally encode knowledge of the correlations (and then express that knowledge on a subsequent transfer test). That is, as in the current experiments in which feature inference training later resulted in better classification performance, feature inference learning appears to promotes robust representations that are flexible enough to be useful for tasks that were not part of initial training (also see Ross, 1999). This study adds yet another effect of flexible concept representation: Learners are able to use previously nondiagnostic information to make novel category contrasts. We close by noting that it is likely that people’s complete and flexible category representations are formed by means other than just feature inference learning. It is clear that children do not engage in long, uninterrupted sequences of feature prediction trials any more than they engage in long, uninterrupted sequences of supervised classification trials. For example, unsupervised classification learning may involve less rule-based processing and thus a broader attention profile, promoting the encoding of more category information (at least when learning is incidental, Love, 2002). Semisupervised learning involving a mixture of trials with and without feedback may also promote a broader attention profile. Children also learn categories by spontaneously naming objects and then receiving feedback. Although naming is a form of supervised classification learning it does not involve choosing between a small number of categories—because a new object can instead belong to any category, many 39 features will need to be inspected to label it correctly. Finally, supervised classification learning as we have studied it here may eventually lead to flexible representations if objects are learned in a variety of contexts and contrast sets. If different contrasts are presented to the classification learner, the learner will be forced to attend more broadly, mitigating the costs we have documented. But note that although this sort of supervised classification can contribute to the acquisition of flexible categories, it differs from the learning that is typically studied in the laboratory in which items subjects only learn to classify items into a small number of categories. In summary, these considerations indicate that different tasks will direct attention to different aspects of a category’s structure. There is a growing interest in understanding the exact relationship between task and what is learned—this study provides a step in this direction and highlights the role that attention allocation can play as useful explanatory variable for how task influences the acquired concept. Conclusion Two experiments confirmed that supervised classification results in poorer performance on novel distinctions as compared to feature inference learning. Eye movements indicated that inflexible representations arose from an optimized attention profile that maximized classifiers’ immediate performance at the cost of inhibiting the acquisition of category information and preventing reallocation of attention to newly relevant information. In contrast, feature inference learning induces anticipatory learning, broader attention, and thus more flexible category representations. We noted that the costs of supervised classification learning might be mitigated with stimuli with non-separable stimulus dimensions; they might be avoided supervised classification learning is combined with other sorts of learning tasks that induce a broad attention profile. We suggest that how people generalize to novel contrasts provides an important new assessment of conceptual representations, one that helps identify how different representations arise from different learning tasks. 40 References Ashby, F. G., Alfonso-Reese, L. A., Turken, A. U., & Waldron, E. M. (1998). A neuropsychological model of multiple systems in category learning. Psychological Review, 105, 442-481. Anderson, A. L., Ross, B. H., & Chin-Parker, S. (2002). A further investigation of category learning by inference. Memory & Cognition, 30, 119-28. Anderson, J. R. (1987). Skill acquisition: Compilation of weak-method problem solutions. Psychological Review, 94, 192-210. Anderson, J. R., & Fincham, J. M. (1994). Acquisition of procedural skills from examples. Journal of Experimental Psychology: Learning, Memory, and Cognition, 20, 1322-1340. Anderson, J. R., Fincham, J. M., & Douglass, S. (1997). The role of examples and rules in the acquisition of a cognitive skill. Journal of Experimental Psychology: Learning, Memory, and Cognition, 23, 932-945. Anderson, J.R. (1991). The adaptive nature of human categorization. Psychological Review, 98, 409-429. Bott, L., Hoffman, A., & Murphy, G. L. (2007). Blocking in category learning. Journal of Experimental Psychology: General, 136, 685-699. Brooks, L. (1978). Non-analytic concept formation and memory for instances. In E. Rosch & B. B. Lloyd (eds.), Cognition and categorization (pp. 169-211). Hillsdale, NJ: Erlbaum. Brooks, L. R., Squire-Graydon, R., & Wood, T. J. (2007). Diversion of attention in everyday concept learning: Identification in the service of use. Memory & Cognition, 35, 1-14. Chi, M. T. H., Feltovich, P. J., & Glaser, R. (1981). Categorization and representation of physics problems by experts and novices. Cognitive Science, 5, 121-152. Chin-Parker, S., & Ross, B. H. (2002). The effect of category learning on sensitivity to within category correlations. Memory & Cognition, 3, 353-62. Chin-Parker, S., & Ross, B. H. (2004). Diagnosticity and prototypicality in category learning: a comparison of inference learning and classification learning. Journal of Experimental Psychology. Learning, Memory, and Cognition, 30, 216-26. Cohen, N., & Eichenbaum, H. (1993). Memory, amnesia, and the hippocampal system. Cambridge, MA: MIT Press. 41 Dienes, Z., & Berry, D. (1997). Implicit learning below the subjective level. Psychonomic Bulletin & Review, 4, 3-23. Ekstrand, B. R. (1966). Backward associations. Psychological Review, 65, 50-64. Erickson, M. A., & Kruschke, J. K. (1998). Rules and exemplars in category learning. Journal of Experimental Psychology: General, 127, 107-140. Johansen, J. K., & Kruschke, J. K. (2005). Category representation for classification and feature inference. Journal of Experimental Psychology. Learning, Memory, and Cognition, 31, 1433-58. Garner, W. R. (1978). Aspects of a stimulus: Features, dimensions, and configurations. In E. Rosch & B. B. Lloyd (Eds.), Cognition and categorization. Hillsdale, NJ: Erlbaum. Glisky, E. L., Schacter, D. L., & Tulving, E. (1986a). Computer learning by memory-impaired patients: Acquisition and retention of complex knowledge. Neuropsychologia, 24, 313-328. Haider, H., & Frensch, P. A. (1999). Eye movement during skill acquisition: More evidence for the information-reduction hypothesis. Journal of Experimental Psychology: Learning, Memory, and Cognition, 25, 172-190. Hoffman, A. B., Harris, H. D., & Murphy, G. L. (2008). Prior knowledge enhances the category dimensionality effect. Memory & Cognition, 36. Hoffman, A.B. & Murphy, G. L. (2006). Category dimensionality and feature knowledge: When more features are learned as easily as fewer. Journal of Experimental Psychology: Learning, Memory, and Cognition, 32, 301-315. Hoffman, A. B. (2008). Attention dynamics in category learning. (Doctoral Dissertation, New York University, 2008). Dissertation Abstracts International-B, 69 (03), (UMI No. 3296887). Horowitz, L. M., Norman, S. A., & Day, R. S. (1966). Availability and associative symmetry. Psychological Review, 65, 50-64. Horsey, R. (2002). The art of chicken sexing. UCL Working Papers in Linguistics 14, 107–117. Hull, C. L. (1920). Quantitative Aspects of the evolution of concepts. Psychological Monographs, XXVIII Johansen, M. K., & Kruschke, J. K. (2005). Category representation for classification and 42 Kamin, L. J. (1969). Predictability surprise, attention, and conditioning. In R. M. Church & B. A. Cambell (Eds.), Punishment and aversive behavior. New York: Appleton-Century Crofts. Kruschke, J. K. (1992). ALCOVE: An exemplar-based connectionist model of category learning. Psychological Review, 99, 22-44. Kruschke, J. K. (1993). Human category learning: Implications for back propagation models. Connection Science, 5, 3-36. Kruschke, J. K. (1996a). Base rates in category learning. Journal of Experimental Psychology: Learning, Memory, & Cognition, 22, 3-26. Kruschke, J. K. (1996b). Dimensional relevance shifts in category learning. Connection Science, 8, 225-247. Kruschke, J. K. (2001). Toward a unified model of attention in associative learning. Journal of Mathematical Psychology, 45, 812-863. Kruschke, J. K. (2003a). Attention in learning. Current Directions in Psychological Science, 12, 171–175. Kruschke, J. K., & Blair, N. J. (2000). Blocking and backward blocking involve learned inattention. Psychonomic Bulletin & Review, 7, 636-645. Kruschke, J. K., & Johansen, M. K. (1999). A model of probabilistic category learning. Journal of Experimental Psychology: Learning, Memory, & Cognition, 25, 1083-1119. Kruschke, J.K., Kappenman, E. S., & Hetrick, W.P. (2005). Eye gaze and individual differences consistent with learned inattention in associative blocking and highlighting. Journal of Experimental Psychology: Learning, Memory, & Cognition, 31, 830-845. Lee, F. J., & Anderson, J. R. (2001). Does learning a complex task have to be complex?: A study in learning decomposition. Cognitive Psychology, 42, 267-316. Lesgold, A., Rubinson, H., Feltovich, P., Glaser, R., Klopfer, D., & Wang, Y. (1988). Expertise in a complex skill: Diagnosing x-ray pictures. In M. T. H. Chi, R. Glaser & M. J. Farr (Eds.), The nature of expertise (pp. 311342). Hillsdale, NJ: Erlbaum. Liversedge, S. P., & Findlay, J. M. (2000). Saccadic eye movements and cognition. Trends in Cognitive Science, 4, 6-14. 43 Love, B., C. (2002). Comparing supervised and unsupervised category learning. Psychonomic Bulletin & Review, 9, 829-835. Lubow, R. E. (1989). Latent inhibition and conditioned attention theory. Cambridge, UK: Cambridge University Press. Macintosh, N. (1975) A theory of attention: variations in the associability of stimuli with reinforcement, Psychological Review, 82, 276–298. Markman, A. B., & Gentner, D. (1993). Structural alignment during similarity comparisons. Cognitive Psychology, 25, 431-467. Markman, A. B., & Gentner, D. (1996). Commonalities and differences in similarity comparisons. Memory & Cognition, 24, 235-249. Markman, A. B., & Ross, B. H. (2003). Category use and category learning. Psychological Bulletin, 4, 592613. Markman, A. B., Yamauchi, T., & Makin, V. S. (1997). The creation of new concepts: A multifaceted approach to category learning. In T. B. Ward, S. M. Smith, & J. Vaid (Eds.), Conceptual structures and processes: Emergence, discovery, and change (pp. 179-208). Washington, DC: American Psychological Association. Medin, D. L., & Edelson, S. M. (1988). Problem structure and the use of base-rate information from experience. Journal of Experimental Psychology: General, 117, 68-85. Medin, D. L., & Schaffer, M. M. (1978). Context theory of classification learning. Psychological Review, 85, 207-238. Medin, D. L., Coley, J. D., Storms, G., & Hayes, B. K. (2003). A relevance theory of induction. Psychonomic Bulletin & Review, 10, 517-532. Moore, T., and Armstrong, K.M. (2003). Selective gating of visual signals by microstimulation of frontal cortex. Nature, 421, 370–373. 44 Murphy, G. L. (2005). The study of concepts inside and outside the laboratory: Medin versus Medin. In W. Ahn, R. Goldstone, B. Love, A. Markman, & P. Wolff (Eds.), Categorization inside and outside the lab: Essays in honor of Douglas L. Medin (pp. 179-195) . Washington, DC: American Psychological Association. Musen, G., & Squire, L. R. (1993). On the implicit learning of novel associations by amnesic patients and normal subjects. Neuropsychology, 7, 119-135. Nilsson, H., & Olsson, H. (2005). Categorization vs. inference: Shift in attention or in representation? In Bara, B.G., Barsalou, L., & Bucciarelli, M. (Eds.), Proceedings of the 27th Annual Conference of the Cognitive Science Society (pp. 1642-1647). Stresa, Italy: Cognitive Science Society. Nissen, M. J. (1992). Procedural and declarative learning: Distinctions and interactions. In L. R. Squire & N. Butters (Eds.), Neuropsychology of memory, 2nd edition. (pp. 203-210). New York: Guilford Press. Nosofsky, R. M. (1984). Choice, similarity, and the context theory of classification. Journal of Experimental Psychology: Learning, Memory, & Cognition, 10, 104-114. Nosofsky, R. M. (1986). Attention, similarity and the identification-categorization relationship. Journal of Experimental Psychology: General, 115, 39-57. Nosofsky, R. M. (1992). Similarity scaling and cognitive process models. Annual Review of Psychology, 43, 25-53. Nosofsky, R. M., Palmeri, T. J., & McKinley, S. C. (1994). Rule-plus-exception model of classification learning. Psychological Review, 101, 53-79. Paivio, A. (1971). Imagery and verbal processes. Hillsdale, NJ: Erlbaum. Rayner, K. (1998). Eye movements in reading and information processing: 20 years of research. Psychological Bulletin, 124, 372-422. Reed, S.K. (1972). Pattern recognition and categorization. Cognitive Psychology, 3, 382-407. Rehder, B. (2001). Interference between cognitive skills. Journal of Experimental Psychology: Learning, Memory, and Cognition, 27, 451-469. Rehder, B., & Hoffman, A. B. (2005a). Eyetracking and selective attention in category learning. Cognitive Psychology, 51, 1-41. 45 Rehder, B., & Hoffman, A. B. (2005b). Thirty-something categorization results explained: Selective attention, eyetracking, and models of category learning. Journal of Experimental Psychology: Learning, Memory, & Cognition, 31, 811-829. Rehder, B., Colner, R. M., & Hoffman, A. B. (2009). Feature inference learning and eye tracking. Journal of Memory and Language, 60, 393-419. Ross, B. H. (1999). Postclassification category use: The effects of learning to use categories after learning to classify. Journal of Experimental Psychology: Learning, Memory, and Cognition, 25, 743-757. Sakamoto, Y., & Love, B. C. (2006). Sizable sharks swim swiftly: Learning correlations through inference in a classroom setting. Proceedings of the Cognitive Science Society. Mahwah, NJ: Lawrence Erlbaum Associates. Shepard, R. N., Hovland, C. I., & Jenkins, H. M. (1961). Learning and memorization of classifications. Psychological Monographs, 75, Whole No. 517. Shepherd, M., Findlay, J. M., & Hockey, R. J. (1986). The relationship between eye movements and spatial attention. The Quarterly Journal of Experimental Psychology, 38, 475-491. Smith, J. D., & Minda, J. P. (1998). Prototypes in the mist: The early epochs of category learning. Journal of Experimental Psychology: Learning, Memory, and Cognition, 24, 1411-1436. Squire, L. R. (1992). Memory and the Hippocampus: A synthesis from findings with rats, monkeys, and humans. Psychological Review, 99, 195-231. Sutherland, N. S., & Mackintosh, N. J. (1971). Mechanisms of animal discrimination learning. New York: Academic Press. Sweller, N., Hayes, B.K., & Newell, B.R. (2006). Category learning through inference and classification: Attentional allocation causes differences in mental representation. Poster presented at The 47th Annual Meeting of the Psychonomic Society. November 16-19, Houston, TX. Thibaut, J., & Geisler, W. S. (2006). Exemplar effects in the context of a categorization rule: Featural and holistic influences. Journal of Experimental Psychology: Learning, Memory, and Cognition, 32, 1403-1415. 46 Tulving, E., Hayman, C. A. G., & Macdonald, C. A. (1991). Long-lasting perceptual priming and semantic learning in amnesia: A case experiment. Journal of Experimental Psychology: Learning, Memory, and Cognition, 17, 595-617. Tversky, A. (1977). Features of similarity. Psychological Review, 84, 327-352. Yamauchi, T., & Markman, A. B. (1998). Category learning by inference and classification. Journal of Memory and Language, 39, 124-48. Yamauchi, T., & Markman, A. B. (2000a). Inference using categories. Journal of Experimental Psychology. Learning, Memory, and Cognition, 3, 776-95. Yamauchi, T., & Markman, A. B. (2000b). Learning categories composed of varying instances: The effect of classification, inference, and structural alignment. Memory & Cognition, 28, 64-78. 47 Author Note Aaron B. Hoffman, Department of Psychology, University of Texas at Austin. Bob Rehder, Department of Psychology, New York University. Correspondence concerning this article should be sent to Aaron B. Hoffman, Department of Psychology, 1 University Station A8000, Austin, Texas 78712-0187 (E-mail: [email protected]). This research is supported by the National Science Foundation under Grant No. 0545298 to Bob Rehder. We thank Todd Gureckis, Kenneth Kurtz, and Gregory L. Murphy for comments on a previous version of this article. 48 Table 1 Category structure used in Experiments 1 and 2 Dimension Category 1 2 3 A 1 1 1 A 1 0 1 B 0 1 1 B 0 0 1 C 1 1 0 C 0 1 0 D 1 0 0 D 0 0 0 49 Figure Captions Figure 1. Panel A, example A versus B classification trial: left button for category A and right button for category B. Panel B, B versus A classification trial; button assignments reversed. Panel C, inference trial, left button indicates double circle, right button single circle. Panel D, ambiguous features appear when subject doesn’t fixate them. Panel E, example C versus D trial. Figure 2. Trial sequence for classification and inference trials. Figure 3. Experiment 2 example stimuli for A versus B, B versus A, Inference, Peripheral stimulus, C versus D, and an inference peripheral stimulus. Dashed line indicates what dimension is queried. Figure 4. Experiment 1, proportion correct as a function of task, cue, and block. Blocks 1-5 are A verus B trials, 6-9 are C versus D trials, and 10-13 are switch classification trials for both groups. Figure 5. Experiment 1, proportion fixation time as a function of training condition, location, and block. Figure 6. Experiment 1, proportion fixation time for the inference group, as a function of location (cues and category label), queried dimension, and training block. Figure 7. Experiment 1, first block proportion correct (left panel) and the proportion of fixation time to the relevant dimension (right panel) for AB classification training, CD classification training, and switch trials (inference and classification conditions). The first block is divided into quarters of six trials. The relevant dimension was 1, 2, and 3 during AB training, CD training, and switch trials, respectively. Figure 8. Experiment 2, proportion correct as a function of task, cue, and block. Blocks 1-5 are A verus B trials, 6-9 are C versus D trials, and 10-11 are novel switch classification trials for both groups, and block 12 is switch classification for all possible contrasts, including old contrasts. Figure 9. Experiment 2, proportion fixation time as a function of training condition, location, and block. 50 Figure 10. Experiment 2, proportion fixation time for the inference group, as a function of location (cues and category label), queried dimension, and training block. Figure 11. Experiment 2, first block proportion correct (left panel) and the proportion of fixation time to the relevant dimension (right panel) for AB classification training, CD classification training, and switch trials (inference and classification conditions). The first block of AB and CD classification trials is divided into quarters of four trials. The first switch block is divided into quarters of eight trials. The relevant dimension was 1, 2, and 3 during AB training, CD training, and switch trials, respectively. Figure 12. Experiment 2, first block fixations to contrast dimension as a function of trial and training condition. Figure 13. Proportion correct during Block 12 switch trials by training condition. Figure 14. Proportion fixating each location during Block 12 switch trials by training condition and contrast discrimination. 51 Figure 1 52 Figure 2 53 Figure 3 54 Figure 4 55 Figure 5 56 Figure 6 57 Figure 7 58 Figure 8 59 Figure 9 60 Figure 10 61 Figure 11 62 Figure 12 63 Figure 13 64 Figure 14 65 66
© Copyright 2026 Paperzz