cognitive mechanisms at play: the role of

 COGNITIVE MECHANISMS AT PLAY: THE ROLE OF SUPRAMODAL REPRESENTATIONS AND WORKING MEMORY IN MELODIC PERCEPTION A DISSERTATION SUBMITTED TO THE GRADUATE DIVISION OF THE UNIVERSITY OF HAWAI‘I AT MĀNOA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY IN PSYCHOLOGY December 2014 By Ahnate James Lim Dissertation Committee: Scott Sinnett, Chairperson Patricia Couvillon Alex Doumas Brad Nakamura George King Keywords: Melodic perception, music cognition, cognitive psychology DEDICATION For my parents, For if cognition arises from either nature or nurture And they provided both nature and nurture Then assuredly they are accountable for cognition ACKNOWLEDGEMENTS Above all, I would like to express my gratitude to my advisor, Scott, for his enthusiasm, his dedication, his kind advice, his assistance in all aspects of graduate school, and his countless revisions of my writings. I am also much indebted to my committee members, Dr. Doumas, Dr. Couvillon, Dr. Nakamura, and Dr. King for their continued guidance, criticisms, insights, and patience in regards to my research endeavors. i ABSTRACT Music is highly relational and in this manner shares much in common with other human behavior. While this may suggest that the processes used in music perception could be domain general, the characteristics and flexibility of these representations remain less understood. If the underlying representations required for perceiving music are shared with those in other domains it should be possible to map such representations to other domains. Through explicit and implicit learning tasks in Experiment 1 and 2, this hypothesis was partially supported through novel experiments with melodic stimuli in the auditory modality and analogous stimuli (Gabor sequences) in the visual modality. Transfer of representations was defined as successful mapping of categories from one modality to another. Two classes of representations (contour and intervals) were used in an explicit learning paradigm in Experiment 1. Crossmodal mapping was evident in three out of the four conditions, implying domain-­‐generality and flexibility of representational transfer. Experiment 2 extended this to an implicit learning paradigm using both unimodal and crossmodal testing conditions. Under unimodal conditions participants were able to discriminate novel instances based on learned exemplars, however there was considerably less evidence of crossmodal discrimination for the same representations. These results provide novel insights into the differential flexibility of melodic representations under explicit and implicit learning conditions, and how explicit learning may play a role in creating domain-­‐general representations. ii The final experiment explored the extent to which cognitive resources such as working memory underpin the processing of relational information in melodies. Participants in Experiment 3 listened to a melodic stream while performing concurrent n-­‐back tasks with different gradients of working memory taxation. Following each task, participants were given a recognition test between two stimuli that matched the previously heard melodic stream on either relational or featural properties. A weak trend between memory taxation and rate of relational responses was observed, providing insights for future research. Combined, the research presented here extends knowledge of explicit and implicit learning, and their interactions with domain-­‐general and domain-­‐specific representations, providing a novel way of exploring not only music perception, but also relational processing and the underlying mechanisms. iii TABLE OF CONTENTS Page ACKNOWLEDGEMENTS ........................................................................................................................... i ABSTRACT ..................................................................................................................................................... ii LIST OF TABLES ......................................................................................................................................... vi LIST OF FIGURES ..................................................................................................................................... vii LIST OF ABBREVIATIONS ...................................................................................................................... ix CHAPTER 1: INTRODUCTION ............................................................................................................ 10 1.1 Statement of general aims and goals ........................................................................ 10 1.2 Introduction ........................................................................................................................ 13 1.3 The Melody .......................................................................................................................... 15 1.4 The Role of Relations in General Cognition ........................................................... 18 1.5 Domain General or Domain Specific? ....................................................................... 20 CHAPTER 2: EXPERIMENT 1 .............................................................................................................. 32 2.1. Crossmodal mapping under explicit learning ..................................................... 32 2.2. Methods ................................................................................................................................ 32 2.2.1. Participants ...................................................................................................... 32 2.2.2. Stimuli and apparatus .................................................................................. 33 Contour discrimination ............................................................................ 34 Interval discrimination ............................................................................. 35 2.2.3. Procedure .......................................................................................................... 36 2.3. Results ................................................................................................................................... 39 2.4. Discussion ........................................................................................................................... 42 CHAPTER 3: EXPERIMENT 2 .............................................................................................................. 46 3.1. Unimodal and crossmodal mapping under implicit learning ....................... 46 3.2. Methods ................................................................................................................................ 46 3.2.1. Participants ...................................................................................................... 46 3.2.2. Stimuli and apparatus .................................................................................. 47 3.2.3. Procedure .......................................................................................................... 47 3.3. Results ................................................................................................................................... 48 3.4. Discussion ........................................................................................................................... 51 iv CHAPTER 4: EXPERIMENT 3 .............................................................................................................. 53 4.1 Working memory and melodic perception ............................................................ 53 4.2 Baseline, 1-­‐back and 2-­‐back conditions .................................................................. 56 4.3 Methods ................................................................................................................................. 58 4.3.1 Participants ....................................................................................................... 58 4.3.2 Stimuli and apparatus ................................................................................... 59 4.3.3 Procedure ........................................................................................................... 61 4.4 Results .................................................................................................................................... 63 4.5 Discussion ............................................................................................................................ 67 CHAPTER 5: GENERAL DISCUSSION .............................................................................................. 70 5.1 Domain-­‐general or Domain-­‐specific ......................................................................... 71 5.2 Asymmetrical crossmodal findings ........................................................................... 76 5.3 Effect of musical experience ......................................................................................... 77 5.4 Significance and future research ................................................................................ 82 REFERENCES ............................................................................................................................................. 87 v LIST OF TABLES Table Page 1: Experiment 1 and 2 conditions .......................................................................................... 31 vi LIST OF FIGURES Figure Page 1: First four notes of Happy Birthday .................................................................................. 17 2: Gabor patches at various frequencies ............................................................................. 28 3: Contour stimuli for the contour discrimination condition .................................... 35 4: Interval stimuli for the interval discrimination condition ..................................... 36 5: Explicit learning paradigm used in Experiment 1 ..................................................... 38 6: Percent of participants reaching learning criterion across the different conditions in Experiment 1 ........................................................................................ 40 7: Number of learning trials until learning criterion was reached across the different conditions in Experiment 1 ............................................................. 40 8: Mapping accuracy across conditions in Experiment 1 ............................................ 42 9: Individual plots for mapping accuracy across the conditions in Experiment 1 .................................................................................................................... 44 10: Implicit learning paradigm used in Experiment 2 .................................................. 48 11: Mapping performance across all conditions in Experiment 2 ........................... 49 12: Mapping accuracy in Experiment 2 across learned modalities and test types. .................................................................................................................................... 51 13: Mapping accuracy in Experiment 2 across learned modalities and relation types, paneled by test type ....................................................................... 51 14: Auditory stream with interleaved melody used in Experiment 3 ................... 60 15: Test stimuli used in Experiment 3 ................................................................................. 61 16: Exposure phase for Experiment 3 ................................................................................. 63 17: A) Relational response averages across the three n-­‐back conditions. B) responses within the 2-­‐back condition ........................................................... 65 vii 18: N-­‐back accuracy scores across the two n-­‐back conditions ................................. 65 19: Ratio of relational responses across the four test trials in Experiment 3 ............................................................................................................................................. 66 20: A) Relational response on first trials across the three n-­‐back conditions. B) responses on first trials within the 2-­‐back condition ....... 67 21: Experiment 1 scatterplot of A) individual mapping accuracy scores and B) number of learning trials until criterion was reached as a function of musical experience ................................................................................ 78 22: Scatterplot of individual mapping accuracy scores in Experiment 1 as a function of musical experience. Plots grouped by relation type and testing modality ..................................................................................................... 79 23: Scatterplot of A) individual mapping scores in Experiment 2 and B) ratio of relational responses in Experiment 3 as a function of musical experience ........................................................................................................ 80 viii LIST OF ABBREVIATIONS 2AFC ................................................................................................... Two Alternative Forced Choice ANOVA ..................................................................................................................... Analysis of Variance CI .................................................................................................................................. Confidence Interval DORA ........................................................................................... Discovery of Relations by Analogy FLSD ....................................................................................... Fischer’s Least Significant Difference fMRI ................................................................................. Functional Magnetic Resonance Imaging PFC ................................................................................................................................ Pre-­‐Frontal Cortex ix CHAPTER 1: INTRODUCTION 1.1 Statement of general aims and goals The core strategy humans use to process and store familiar melodies is through a relative pitch code (Attneave & Olson, 1971; Page, 1994). This relative pitch code stores the pitch sequence of a melody in terms of the relations or intervals (specific frequency differences) between each note. There is considerable behavioral (Dowling, 1978, 1984, 1988) and neurological (Fujioka, Trainor, Ross, Kakigi, & Pantev, 2004; Trainor, McDonald, & Alain, 2002) evidence indicating that humans process music using relational properties that are present in the auditory music signal. For example, humans can generally recognize a song regardless of whether it starts on a low or high note due to the unique intervallic pattern between all subsequent notes. The contour (general shape, or sequence of up and down movements in frequencies from note to note) is another characteristic upon which melodies can be categorized. Given the existence of these relational characteristics, the question remains as to how they contribute to a listener’s mental representation of a melody, whether such content is supramodal in nature, and whether these representations can be transferred to other modalities. Although there is some research on the crossmodal correspondence of pitch and visual frequency, there is a paucity of research on crossmodal correspondence for actual melodic content (i.e., sequence of music notes) and the underlying relations. Directly related to these issues, this dissertation addresses whether relational representations of different classes (direction and magnitude) acquired within a particular modality (melodic or visual grating sequences) can be mapped to 10 a separate modality, and whether the manner in which the representations were acquired (explicitly or implicitly) may affect the mapping (Chapters 2 and 3). Given that much learning occurs implicitly, this dissertation will also examine in an exploratory manner the extent to which the availability of working memory resources dictates the perception of relational melodic content (Chapter 4). Thus, the primary objective of this dissertation is to determine the extent to which melodic representations acquired both explicitly and implicitly can be mapped to and from a separate modality—in this case to the visual modality—by means of analogous Gabor patch sequences. A secondary objective is to explore the possible role of working memory resources in the implicit perception of melodic stimuli. Accordingly, the following three specific goals will be addressed: 1. Assess the extent to which underlying representations used for processing music are domain-­‐general. This will be accomplished by systematically testing if representations of contour (direction) and intervals (magnitude) of explicitly learned melodic stimuli could be mapped to the visual modality (Gabor patch sequences), and vice versa. 2. Assess whether implicit learning conditions influence domain-­‐general processes. This will be accomplished by systematically testing if representations of contour and intervals from implicitly learned melodic stimuli are mapped within the same modality as well as to the visual 11 modality (Gabor patch sequences), and vice versa. Assess also if mapping rates differ from previously found explicit learning conditions. 3. Explore the relationship between implicit melodic perception and working memory resources. 12 1.2 Introduction As neither the enjoyment nor the capacity of producing musical notes are faculties of the least direct use to man in reference to his ordinary habits of life, they must be ranked amongst the most mysterious with which he is endowed. -­‐ Charles Darwin (1873, p. 878) Music is mysterious. In many ways, music may be characteristic to the human experience. Such is the prevalence of music that a society has never existed—to the best of our knowledge—that did not practice some form of music or another (Bohlman, 1999; Wallin, 1991). Aside from excavated musical instruments dating as far back as 35,000 years (Conard, Malina, & Münzel, 2009), in the present day music continues to proliferate within even the most remote of hunter-­‐gatherer societies (e.g., Rouget, 2004). As research techniques for studying human behavior and the brain have evolved, the underlying processes of music perception have continued to both fascinate and mystify cognitive scientists alike. Recent advances in neuroscience over the last several decades for example, have shown that the simple act of perceiving music involves distributed activity throughout the brain, including diverse regions such as Broca’s area (Fadiga, Craighero, & D’Ausilio, 2009), the prefrontal cortex (Bengtsson, Csíkszentmihályi, & Ullén, 2007), as well as the amygdala (Limb, 2006). Indeed, many if not all of these regions are used in other tasks such as speech processing, for example (e.g., speech processing, see Koelsch et al., 2004). Evidence also indicates that at an early age the brain does not process 13 music and language as different domains, but that language may at least be processed initially as a ‘special case’ of music (Koelsch, 2011). Given the integrated and distributed nature of the neurological underpinnings of music, it should come as no surprise that a host of relationships has been established between music and other areas of cognition. Aside from the common comparisons made between music and language (e.g., Patel, 2008), many have suggested that music can help to understand other phenomena, such as domain-­‐general aesthetic preferences1 (Marcus, 2012), implicit and explicit learning of grammatical structures (Rohrmeier & Rebuschat, 2012; Tillmann & Poulin-­‐
Charronnat, 2010), and complex event sequencing (Tillmann, 2012), to name but a few. In a similar vein, other researchers have sought to uncover links between musical training and performance in other domains such as mathematics, language, spatial-­‐temporal abilities, and memory (for a summary, see Rauscher, 2003). Thus, it is clear that gaining a better understanding of the underlying processes in music perception contributes towards understanding underlying processes in other areas of cognition as well, and a precise formulation of these mechanisms could be of great import. Consequently, we will now examine the building blocks fundamental to what we call “music,” and more specifically how some of these building blocks are processed and potentially shared in other cognitive domains. 1 In light of claims for musical predisposition in humans, Marcus (2012, p. 508) poses the argument that even if music was not innate in a strong sense, it could still have a strong influence on mental processes due to “reward systems that predate aesthetic pleasure altogether,” through “forms of input that yield reward both for novelty and for correct prediction.” 14 1.3 The Melody One of the most fundamental and salient aspects of music is the melody. Indeed, simple vocal melodies were likely the earliest form of music to have been produced, and have been (and still are) prevalent in all documented cultures past and present. Simple melodies consist of discrete2 units or notes, with each note characterized by a pitch, or fundamental frequency (e.g., Hertz value). In addition to pitch, other features of melodies include rhythm and timbre. Pitch is a property of sound related to the rate of vibration that produces the sound, and is characterized by descriptions such as “lowness” or “highness.” Rhythm is the specific arrangement of sound according to duration and periodic stress. Timbre is the character or quality of sound that is distinct from pitch or loudness, usually associated with the specific source of production, such as the different timbres of the guitar, the violin, and the human voice that make those instruments instantly recognizable when heard. The research presented in this dissertation is concerned only with the dimension of pitch in melodies (although future research will expand the basic approach used here to the domain of temporal perception and rhythm). Importantly, there are several ways in which the pitch sequence of a melody can be encoded. Two of the most common forms of encoding are absolute and relative pitch. Encoding a melody in terms of the absolute pitch involves storing the notes in terms of the fundamental frequencies (i.e., featural aspects) of each pitch, 2 Throughout this paper we assume that all but the lowest levels of auditory representations are discrete, as has been demonstrated for representations in general (Dietrich & Markman, 2003), as well as musical representations specifically (Purwins et al., 2008). 15 whereas encoding in terms of relative pitch would involve storing the melody in terms of the relations or intervals (specific frequency differences) between each note. Relative pitch encoding is considered to be the core strategy humans use to characterize and store familiar melodies (Attneave & Olson, 1971; Page, 1994). For example, the song Happy Birthday is immediately recognizable to us due to the unique intervals between each of the notes. That is, one is able to immediately recognize this song regardless of whether it initially starts on a low or high note due to the unique intervallic pattern between all subsequent notes (see Figure 1, a & b). There is much evidence on the use of relative pitch information in adults through both behavioral (Dowling, 1978, 1984, 1988) as well as neuroimaging studies (Fujioka et al., 2004; Trainor et al., 2002). 16 b)
Frequency
Frequency
a)
Time
Time
Frequency
c)
Time
Figure 1: First four notes of Happy Birthday in low (a) and high (b) versions. Although the individual pitches (solid lines) in a) and b) are different, the distance between each of the notes within a) and b) are identical. On the other hand, although the contour between a) and c) are similar, the intervals are different, thus c) would not sound like “Happy Birthday”. In addition to relative pitch, the contour (general shape, or sequence of up and down movements in frequencies from note to note) is another characteristic upon which melodies can be categorized. Given the existence of these various characteristics (i.e., relative pitch and contour), there has been considerable research and speculation on the extent to which these elements of music contribute to a listener’s mental representation of a melody, and how they may interact. 17 It is worth noting that while a melody with an identical contour to Happy Birthday, but with a different intervallic sequence would be perceived as a completely different song, it would still have the same general “shape” (i.e., contour), or up and down pattern (compare a or b to c in Figure 1). Although the intervallic pattern may be the most overtly salient and representative feature of a melody to humans, studies have shown that human adults are also sensitive to absolute pitch and melodic contour, at least in the short term (Bartlett & Dowling, 1980; Dowling, 1978). Furthermore, there is evidence that sensitivity to intervallic and contour features of melodies is present even in young infants (Trehub, 2001; Trehub, Bull, & Thorpe, 1984; Trehub, Trainor, & Unyk, 1993). Even though intervallic and contour properties of melodies may characteristically differ in the type of information they carry, what is perhaps more important is that the nature of the information they carry is fundamentally relational. That is, this information depends on the relationship (whether it is the precise intervallic distance or the general contour shape) between each pitch, and not on the actual pitch frequencies themselves. Thus it is within this relational capacity that melodic perception can be said to share a cornerstone property with many other cognitive processes. 1.4 The Role of Relations in General Cognition The ability to explicitly and implicitly process relational properties in stimuli has been proposed as a fundamental mechanism underlying a wide range of cognitive phenomenon. This includes not only higher level reasoning skills such as analogy-­‐
making (Gentner, 1983; Gick & Holyoak, 1980; Holyoak & Thagard, 1995), language 18 (Kim, Pinker, Prince, & Prasada, 1991), and rule based learning (Lovett & Anderson, 2005), but also extends to perceptual processes such as the detection of similarities (Medin, Goldstone, & Gentner, 1993). There is evidence, for example, that we recognize objects due to the specific relationships that exists between component shapes (i.e., “geons”). Indeed, Biederman’s (1987; see also Hummel & Biederman, 1992) theory of object recognition proposes, for instance, that a simplified mug and a bucket are both composed of the same component geons: a curved cylinder and a vertical cylinder. What makes the two objects distinct, however, is that with the mug the curved cylinder is attached to the side of the vertical cylinder, where as with the bucket the curved cylinder is attached to the top of the vertical cylinder. Therefore it is the unique relationships between the two geons in each object that determine what object is perceived, and not merely the identities of the geons. In this way, the processing of relationships is foundational to visual perception. Given that melodic processing appears to require extracting relational information from melodies, it is a reasonable and parsimonious starting point to assume that the same mechanisms used in other relational tasks might also operate when processing melodies. That is, common to the approaches (e.g., intervallic and contour) that both adults and infants employ to encode melodic information is how melodies are represented as relations between individual notes. Thus, the strength of relational reasoning lies in the ability to reason beyond the specific features of an object; it is the ability to extract the relationship that an object has with others. Similarly, the ability to recognize a melody (or its shape) rests on appreciating the relationship between the pitches, and not just the specific frequencies of each note. 19 1.5 Domain General or Domain Specific? When speaking of the mechanisms that enable the perception and processing of music, a central question is whether these mechanisms are specific to music, or if instead they underpin other cognitive mechanisms as well (i.e., domain-­‐specific or domain-­‐general). One of the basic approaches to answering this question has been to look at musical behavior in infants, since any ability that is present at infancy is less likely to be acquired through experience or specialization and more likely to be attributable to innate abilities and basic core cognitive functions. The brunt of the work on infant music perception suggests that many of the cognitive subsets used for processing music are indeed domain-­‐general mechanisms (Hannon & Trainor, 2007; Patel, 2008; Trehub & Hannon, 2006). Although some researchers do note that specialization or modularity may result from increased experience, it appears that the existing evidence is currently tipping the scale in favor of music being domain-­‐general (for a recent review, see Honing & Ploeger, 2012). Further evidence for a common domain-­‐general mechanism that is used also for melodic perception can be seen in a developmental phenomenon known as the “relational shift”. Studies throughout developmental psychology have overwhelmingly shown that while children initially attend to, recall, and reason about absolute perceptual properties, around the age of 4-­‐6 they begin to rely on structured relational properties (Allport, 1924; Gentner & Rattermann, 1991; Halford, 2005; Pollack, 1969; Vernon, 1940). That is, as mentioned previously, the ability to reason about relations comes to underpin many cognitive functions as children develop. This shift has been observed in areas such as language (Gentner, 20 1988), spatial tasks (Case & Khanna, 1981; DeLoache, Sugarman, & Brown, 1985), number comprehension (Gelman & Gallistel, 1978; Michie, 1985), and visual shape perception (Abecassis, Sera, Yonas, & Schwade, 2001), to name but a few. This phenomenon has been termed the “relational shift,” as the characteristic trend is towards greater reliance on relational attributes as children mature. Importantly, extensive evidence indicates that this prepotency for relational processing is also observed in how children process melodies, but only after a certain age (i.e., the relational shift). Specifically, when recalling melodies younger children typically recall more absolute pitch properties than relational properties, while the exact opposite occurs in older children. This perceptual musical shift and exchange in recall of absolute and relational aspects in younger and older children has been replicated in many studies (Saffran, 2003; Saffran & Griepentrog, 2001; Sergeant, 1969; Sergeant & Roche, 1973; Stalinski & Schellenberg, 2010; Takeuchi & Hulse, 1993). In light of such evidence, it is most reasonable to propose that similar developmental trends in music and other domains may be attributed to similar underlying mechanisms. Further evidence for a domain general relational shift comes from a unique series of studies employing computational simulations to explain existing infant melodic perception data. In a study by Lim and colleagues (2012), a domain-­‐general symbolic-­‐connectionist model of relational learning, DORA (Discovery Of Relations by Analogy; Doumas, Hummel, & Sandhofer, 2008), was successfully used to simulate melodic perception and categorization by infants (Chang & Trehub, 1977; Trehub et al, 1984). Given an input of notes from corresponding melodic stimuli 21 sequences, DORA’s performance matched the behavioral data from the infant studies, thereby providing an account of a neurally plausible mechanism for melodic processing that can bootstrap structured representations of relational musical properties (i.e., contour and intervals) from unstructured feature representations (i.e., absolute pitch frequencies). In a follow up study (Lim, Doumas, & Sinnett, 2013), the same model was further used to simulate the relational shift in melodic perception of children aged 3-­‐6 years based on an experiment by Sergeant and Roche (1973). DORA’s performance also matched the children’s in this simulation, suggesting common developmental and perceptual mechanisms between the relational shift in melodic processing and the shift seen across other domains. Notably, these findings dovetail with DORA’s ability to successfully simulate the relational shift in visual shape perception (Doumas & Hummel, 2010), analogical problems (Doumas, Morrison, & Richland, 2009; Morrison, Doumas, & Richland, 2011), categorical reasoning, spatial reasoning, general relational reasoning, and progressive alignment (Doumas et al., 2008). The fact that the same model is able to simulate behavior in such a wide variety of domains lends further credence to the domain general hypothesis from a computational standpoint. Indeed, one of the most widely used approaches for evaluating the domain-­‐
general nature of music has been to compare music to other domains. Perhaps the most common comparison is between music and language, since the two domains do share many commonalities. To begin with, the most obvious similarity is the centrality of the auditory modality to both domains. Given auditory input streams, both language and music also require the integration and updating of individual 22 event sequences to form larger overall representations (Friederici, 2002; Hagoort, 2005; Patel, 2003; Tillmann, 2005). The way in which these sequential events are binded to each other is important (for a discussion on binding, see Experiment 2 below), with Broca’s area within the frontal lobe being implicated as a neural correlate for this process (Hagoort, 2005). Furthermore, the role of sequential learning is fundamental to both music and language. It has been suggested that the processing of sound is a key opportunity which subsequently aids the brain in learning how to process sequential stimuli in general (Conway, Pisoni, & Kronenberger, 2009). This notion is consistent, for example, with research by Conway and colleagues (2009) that used an implicit learning task to measure visual sequencing abilities in deaf children with cochlear implants. It was found that deaf children exhibited general sequencing deficits that were correlated to deficits in language outcomes, leading the investigators to hypothesize that deprivation of early sequential learning opportunities in deaf children may explain their continued difficulties in language even after receiving cochlear implants (Conway, Pisoni, Anaya, Karpicke, & Henning, 2011). It has been proposed that temporality and sequential processes may also separate music from other art forms. Although music may rely on domain-­‐general mechanisms, its unique appeal may lie in its inherently temporal nature that allows for the close interaction of prediction and novelty (Kivy, 1993; Marcus, 2012). Conversely, most arguments for the domain specificity of music have centered on the unique attributes of music, as well as the seemingly precocious abilities of infants to discriminate such auditory properties as are necessary for 23 processing music (for a review, see Trehub, 2001). It should also be noted that the fact that most of what is acquired musically occurs through implicit, or unsupervised, learning (Rohrmeier & Rebuschat, 2012) cause such abilities to appear even more precocious. Evidence that infants prefer songs over speech directed at them (Trainor, Clark, Huntley, & Adams, 1997) may indicate that we are more ‘motivated’ to process music than other stimuli in some intrinsic manner (Honing & Ploeger, 2012). In general, the fact that infants appear to be well equipped with the cognitive machinery necessary for perceiving music has often been equated to a predisposition for music (Trehub, 2001). Despite the existing body of evidence pointing towards domain-­‐general mechanisms for music, the true underlying nature of the mechanisms and representations of music perception remain elusive and much less understood. While listening to music, what types of representations are activated? How are they stored and subsequently manipulated? Evidence from behavioral studies with adults have shown that musical pitch can be mapped to a variety of representations, including vertical space (Melara & O'Brien, 1987; Pratt, 1930; Rusconi, Kwan, Giordano, Umilta, & Butterworth, 2006), luminosity and loudness (Hubbard, 1996; McDermott, Lehr, & Oxenham, 2008), as well as words related to emotion, size, sweetness, texture and even temperature (Eitan & Timmers, 2010; Nygaard, Herold, & Namy, 2009; Walker & Smith, 1984). For instance, a recent study using video clips of singers performing different types of hand motions (as primes for musical stimuli) demonstrated that pitch processing shares representations with spatial processing (i.e., higher spatial movements in the visual modality primed the 24 perception of higher pitches; see, Connell, Cai, & Holler, 2013). Although many of these experiments used musical stimuli that were more complex than simple melodies, these findings do suggest that music—in its wide range of simple to complex manifestations—can be mapped to a variety of representations across different modalities. That these connections can be made is interesting and could perhaps be explained from an associative learning perspective (where the learning context and occurrence of environmental regularities may forge such cross-­‐domain associations, see Spence, 2011). To gain a better understanding of the representational content of melodies, a more direct and precise approach may be helpful. For instance, are the representations of contour (up or down direction between notes) and intervals (frequency distance between notes) unique to music, or are they used in other areas of cognition as well? Although evidence of crossmodal correspondences between pitch and spatial frequency has been previously demonstrated (Evans & Treisman, 2010; for a review of correspondence in general, see Spence, 2011), the full automaticity of such correspondence is still under debate (Spence & Deroy, 2013). Furthermore, to our awareness there are no studies that have utilized sequences of stimuli (as direct analogs to melodies) to examine correspondences in representational content3, an important point considering the inherently sequential nature of music (all music unfolds through time). 3 Although auditory pitch and visual Gabor sequences have been previously used to study asynchrony and temporal recalibration mechanisms (Heron, Roach, Hanson, McGraw, & Whitaker, 2012). 25 Related to domain-­‐generality, a study on individuals with amusia (tone-­‐deaf) for example, revealed that impairments in their ability to process musical stimuli are correlated with difficulties on a mental-­‐rotation task (Douglas & Bilkey, 2007), suggesting that these deficits could reflect a more general underlying impairment. Nevertheless, to gain a better understanding of the representational content of melodies, a more direct and precise approach may be helpful. For instance, we know that pitch related representations can be encoded in both absolute and relative dimensions. Within the relative dimension, are the two previously mentioned representations of contour (up or down direction between notes) and intervals (frequency distance between notes) unique to music, or are they used in other areas of cognition as well? To more thoroughly investigate the extent to which musical processing may overlap with underlying domain-­‐general processes, a series of experiments was conducted on the representational characteristics used in melodic perception. Experiments 1 and 2 examined whether the representations used for melodic perception (both intervallic and contour based) can be mapped to visual sequential processing. Although mental representations are constructs that cannot be directly observed, their properties can still be indirectly inferred using tasks that involve specific manipulations targeted on these representations. In this sense, domain-­‐
generality can be defined as the extent to which melodic representations can be flexibly mapped to other senses. Although other mapping tasks have been used previously, results from such experiments may be more speculative due to their indirect and priming-­‐type 26 manipulations. For example, a study by Connell et al. (2013) measured participants’ perception of pitch after listening to a video of a singer moving their arm either up or down concurrent with pitch stimuli (i.e., upward movement as priming the perception of higher pitches and vice versa). This approach may be problematic due to the fact that the manipulations (hand movements) were directly observable to participants. Therefore the effects on performance could be caused by overt factors such as participant awareness and reasoning (seeing an arm move in a certain direction can be a cue for the participant to respond accordingly) and not necessarily caused by direct manipulations on the underlying pitch or spatial representations. With this example being one of the only studies to date to examine the visual aspect of sequential pitch representation, there is a clear lack of research investigating this issue. To date, no study has used visual sequences that are analogous to melodic sequences on basic dimensions (e.g., frequency and duration) in a systematically controlled experimental setting. While Connell et al. (2013) examined how changes in pitch frequencies can be influenced by spatial direction (e.g., up and down), what would happen if the compared dimension (e.g., frequency, not spatial height) was in fact used in the visual modality? It is important to note here that a pitch itself can never move up or down in space, a pitch can only increase or decrease its frequency in relation to other pitches. Therefore the convention of describing pitch as moving “up” or “down” has no actual physical correlate in the material property of sound, it is merely a commonly used conceptualization or visualization of pitch. To examine the transfer of melodies to visual spatial frequency sequences, as well as to 27 investigate the crossmodal processing of melodic representations, visual stimuli with analogous dimensions (i.e., Gabor patches, with different frequencies and presentation durations, see Figure 2) will be used in the studies presented in this dissertation. Figure 2: Gabor patches at various frequencies. The approach used for exposing participants to melodic stimuli in the studies presented in this dissertation also differed to that used by Connell et al. (2013). Given that most people can perceive music despite having little to no formal musical training (Bigand & Poulin-­‐Charronnat, 2006; Koelsch, Gunter, Friederici, & Schröger, 2000), it has been suggested that most musical experience and knowledge is acquired through implicit learning (for a recent review on the topic as it relates to music, see Rohrmeier & Rebuschat, 2012). That is, similar to language, mere exposure to music is adequate for the development and acquisition of knowledge about fairly complex sets of regularities and relationships. Yet, it is also the case that through development virtually all members of society receive guidance and instruction on how to process information to some extent, even in formats as simple as a parent guiding a child’s perception through suggestion to compare one object to another (either directly or through referential language). Thus, in the real world representations may be acquired through different means where learning could in 28 fact occur both explicitly or implicitly—with or without awareness, supervision, or direct knowledge. Although this study is primarily concerned with how the relations in melodies are extracted and represented and not on implicit or explicit learning per se, it is not unreasonable to speculate that existing differences in melodic representations could in fact result from differences in learning procedure. Therefore, to account for any differences in representations that may arise due to differences in learning approaches, both explicit (Experiment 1) and implicit (Experiment 2) learning paradigms will be used. In light of the considerable amount of evidence on the domain general characteristics of music (as well as other cognitive mechanisms), it is predicted that simple melodic representations should map fairly easily to the visual domain4. As to the differences between explicit and implicit learning, it is hypothesized that the extent to which a categorical representation has been processed more explicitly should influence how easily it is mapped to other domains. Thus, it is speculated that after explicit learning participants should be better at performing crossmodal mappings than after implicit learning, although the full extent of this advantage is open to question. Another important aspect of melodies that will be examined in the first series of experiments is the type of relations that can be extracted from a melody. Recall from the abovementioned properties of melodies (see section 1.3) that the two 4 Provided that abstract representations such as shape, direction, and magnitude exists within the individual’s cognitive repertoire, which should be the case for the adult participants used in this study. 29 characteristically relational attributes of melodies are the contour (up and down shape between each note), and the intervallic sequence (specific distance between notes; e.g., +4, -­‐2, +1, etc.). Given that humans can process both types of relations, the question remains as to whether any differences exist in how contour and intervals are processed. Notably, there is a scarcity of research on whether melodic contour and intervals are processed differentially. Thus, by measuring crossmodal transfer of both types of relations, any differences in processing could lead to differences in how easily the relations are transferred to a separate modality. At first glance it would appear that contour might require less processing resources than intervals since it only codes for direction and not magnitude (magnitude, requiring greater precision may also require more resources). As this is currently an open question in the field of music perception, the experiments presented in Chapters 2 and 3 will measure the interaction of different types of relations and how they influence the transfer of melodies to visual sequences (and vice versa), through the use of both contour and intervallic discrimination tasks.5 Additionally, I will examine the effects of different types of learning on crossmodal transfer using both explicit (Experiment 1) and implicit (Experiment 2) learning paradigms (see Table 1). 5 As a preview, Experiment 3 will examine the role of working memory in melodic perception by manipulating the availability of these resources using a dual task approach. More background will be provided for Experiment 3 as it is presented. 30 Table 1: Experiment 1 and 2 conditions (M = melodic, V = visual) Experiment 1 2 Learning Explicit Implicit Domain transfer M to V V to M M to M M to V V to V V to M 31 Discrimination Contour Interval Contour Interval Contour Interval Contour Interval Contour Interval Contour Interval CHAPTER 2: EXPERIMENT 1 2.1. Crossmodal mapping under explicit learning In Experiment 1, the relational characteristics of melodic perception will be investigated using a novel paradigm consisting of matched sequential auditory and visual stimuli that are analogous on several basic dimensions. A series of Gabor patches (i.e., sinusoidal gratings) will be used as a visual analogue to auditory pitch stimuli that compose a melody, as they can be defined by both frequency and duration (how long stimuli is presented). Thus, any learning transfer of different relational properties from melodies to visual sequences (and vice versa) can be observed. In theory, if learning relations in one modality could transfer to stimuli in a separate modality, this would be evidence for the flexible use (or mapping) of relations in a domain-­‐general manner. Experiment 1 (and Experiment 2 as well; see Chapter 3) will test for the transfer of both contour and intervallic relations, which are important aspects in the processing of melodies. As will be explained in greater detail below, the intervallic relation might require greater discrimination and processing (magnitude) than a contour relation (direction), therefore it is hypothesized that the transfer of intervallic relations may be harder than for contour relations. 2.2. Methods 2.2.1. Participants Participants were recruited until there were at least 15 subjects in each of the four conditions who passed the learning criterion (see below, section 2.1.2). This 32 resulted in a total of 76 participants (21 males, age = 21 ± 5 years). Participants’ musical experience (M = 5, SD = 5 years) did not differ across the conditions (p = 0.5). Self-­‐reported perfect pitch ability (7 participants) did not differ across conditions (p = 0.8). All participants were recruited from undergraduate psychology courses at the University of Hawaii at Manoa and offered course credit for their participation. Participants were naïve to the purpose of the experiment and had normal or corrected to normal vision. Ethical approval was obtained from the University’s Committee on Human Subjects. 2.2.2. Stimuli and apparatus Stimuli were presented on a 21inch Core2Duo 2.4 GHz iMac computer using the Psychophysics Toolbox extension for Matlab (Brainard, 1997). Participants were seated at an eye to monitor distance of approximately 60cm. From this distance, all presented auditory stimuli occurred at approximately 70 decibels, as measured by a sound meter. Responses were made via key presses to one of two buttons on a keyboard. Auditory stimuli consisted of sinusoidal sound waves generated by the iMac speakers at different frequencies determined by the following equation: 𝐹 =𝑓∙2
!
!"
(Equation 1) Where n could vary from 0 to 11, producing the range of 12 semitone pitches contained within an octave, and f was set to 440 Hz (A4). Visual stimuli consisted of Gabor patches generated with Gaussian envelopes and rotated at a 45° angle. Selection of the spatial frequency followed an analogous process determined by the following equation: 33 𝐹 =𝑠∙2
!
!"
(Equation 2) Where s represents the spatial frequency. A value of 2π * 0.015 was used for s, as this gave a good range of Gabor frequencies suitable for the demands of the task. Each melodic and visual sequence consisted of three items. The relationship between these three items varied depending on whether the participant was placed in the contour or intervallic discrimination condition (described below). The presentation stream was continuous, with each pitch and Gabor pattern being presented for one second (with no pauses in between pitches). Thus each melodic or Gabor sequence had a total duration of three seconds. Contour discrimination In the contour condition, the property that differentiated the two types of sequences to be categorized was the contour, or general shape of the sequence. One type of sequence had a steadily increasing contour, where if the first note was n, the second note would be n + 2, and the third would be n + 4 (see Figure 3). The other type of sequence had an up-­‐down frequency relationship, where if the first note was n, then the second note would be n + 2, and the third would be n – 2. 34 b)
Frequency
Frequency
a)
Time
Time
Figure 3: Contour stimuli for the contour discrimination condition. In a) the contour is up-­‐up, and in b) up-­‐down. Interval discrimination For interval discrimination condition, the property that differentiated the two types of sequences was the interval or magnitude of distance between each element in the sequence. One type of sequence had a steadily decreasing contour, where if the first note was n, the second note would be n – 2, and the third would be n – 4. The other type of sequence had a non-­‐steady (exponential) decrease, where if the first note was n, then the second note would be n – 2, and the third would be n – 8 (see Figure 4). 35 a)
Frequency
Frequency
b)
Time
Time
Figure 4: Interval stimuli for the interval discrimination condition (note the similar contours). 2.2.3. Procedure Participants were randomly allocated into one of the four conditions until there were at least 15 participants in each condition that reached criterion on the learning task. One group (n=41) received the melodic training and the other (n=35) received the visual sequence training. Within the melodic training group, 20 performed the contour discrimination task, while 21 performed the intervallic discrimination task. Within the visual sequence training group, 18 performed the contour discrimination task, while 17 performed the interval discrimination task. Within these four sub-­‐groups, the procedure was identical and started off with an introduction screen on the computer providing participants with instructions for the task. Next, participants either saw or heard a sequence of three auditory pitches or Gabor patches, depending on the condition that they had been placed in. Participants were then instructed to categorize each stimuli sequence, 36 using the two alternative forced choice (2AFC) format, into one of two categories using either the 1 or 2 key on the keyboard (key assignment to each category was counterbalanced and randomized). Stimuli were presented in the same modality throughout the learning phase. Since the purpose was for participants to discover the categories for themselves via feedback provided at the end of the trial, they were instructed to guess on the initials trials and to then use subsequent feedback as needed (see Figure 5). This learning phase continued until the participant correctly responded 12 times in a row, whereupon the learning phase would terminate and the experiment would proceed to the testing phase. 37 Explicit Learning Paradigm
Visual Condition
Auditory Condition
(same as auditory, except
with visual stimuli)
750 ms
Freq. 1
1000 ms
Freq. 2
1000 ms
Freq. 3
1000 ms
Until response
* Learning trials were
repeated until 12
consecutive correct
answers were obtained
2000 ms
Correct/Incorrect feedback
Figure 5: Explicit learning paradigm used in Experiment 1. The learning phase for the auditory and visual conditions were identical except for the specific stimuli used in the two modalities (compared above with dotted arrows). In the testing phase, participants were presented with 20 exemplars in the opposite sensory modality as to what they had been exposed to in the learning phase. Half of these exemplars were from one category, and the other half from the other category. Note that these categories were previously learned, albeit in a separate modality. Presentation order of the 20 exemplars was randomized. No 38 feedback was provided during the testing phase. Participants were instructed to categorize in the same manner (2AFC) as they had done during the learning phase, but were provided no other instructions. Depending on the time it takes to learn the categories, the experiment took approximately 5-­‐10 minutes. After the experiment, participants were asked to fill out a questionnaire with questions on demographic information, previous musical experience, and any specific strategies they employed during the task. 2.3. Results Of the 76 participants allocated into each of the four conditions, the pass/fail (recall that failing to do the task was an inability to respond correctly 12 times in a row) rate was 17/1 for the visual-­‐to-­‐melody contour group, 16/4 for the melody-­‐to-­‐visual contour group, 17/0 for the visual-­‐to-­‐melody interval group, and 15/6 for the melody-­‐to-­‐visual interval group. Pass/fail rates were marginally different between the groups (χ2 (1, N = 76) = 3.82, p = .05, see Figure 6). The rest of the analyses were conducted only on participants that completed the training section (n = 65). The number of training trials until criterion was reached did not significantly differ across the four conditions (overall M = 33, SD = 24, F(3,61) = 1.04, p = 0.4). 39 % Reach learning criterion
1.00
0.75
Discrimination
Contour
0.50
Interval
0.25
0.00
Visual
Modality
Melodic
Figure 6: Percent of participants reaching learning criterion across the different conditions in Experiment 1. Only participants who passed criterion went on to the testing section. Training trials
50
40
Discrimination
30
Contour
20
Interval
10
0
Visual
Melodic
Train stimulus
Figure 7: Number of learning trials until learning criterion was reached across the different conditions in Experiment 1. Error bars indicate Fischer’s LSD (Least Significant Difference)6. 6 Unlike standard error of the mean bars, which are a measure of individual dispersion, Fischer’s LSD bars are measures of mean separation and hence a more useful in facilitating visual post hoc comparisons. 40 Of crucial interest to Experiment 1 was whether participants could map categorical representations from one modality to another. One-­‐sample t-­‐tests were conducted on mean mapping scores to determine if accuracy was greater than chance (0.5, see Figure 8)7. In the contour discrimination condition, mapping accuracy was significantly better than chance for transfer from Gabor patches to melodies (M = 0.77, SD = 0.19, t(16) = 6.02, p < .001), as well as for transfer from melodies to Gabor patches (M = 0.72, SD = 0.35, t(15) = 2.45, p = .03). In the intervallic discrimination condition, mapping accuracy was significantly better than chance only for transfer from Gabor patches to melodies (M = 0.70, SD = 0.27, t(16) = 3.06, p < .01), but not for transfer from melodies to Gabor patches (M = 0.63, SD = 0.33, t(14) = 1.50, p = .2). To compare performance across the different conditions, a 2x2 Analysis of Variance (ANOVA) was conducted. No significant main effects nor interactions were found for either factors of category type (F(1,61) = 1.23, p = 0.3) or modality (F(1,61) = 0.8, p = 0.4). 7 Note that testing for absolute difference from chance (instead of the greater than chance test used here) would provide more significant results, but not be indicate of mapping accuracy. An absolute difference test can only determine discriminability and not mapping accuracy since consistently incorrect scores (less than chance) would indicate a reversed mapping response. 41 Mapping accuracy
1.00
0.75
Discrimination
Contour
0.50
Interval
0.25
0.00
V to M
M to V
Learning transfer
Figure 8: Mapping accuracy across conditions in Experiment 1. Error bars indicate 95% CI (confidence interval). M = melody, V = visual. 2.4. Discussion The purpose of Experiment 1 was to 1) determine whether representations for short melodic sequences can be mapped to representations in the visual modality, and 2) examine whether mapping performance differs across different classes of melodic representations (contour and intervals) and their visual counterparts. A novel experimental paradigm involving active categorization with feedback and subsequent crossmodal testing was employed. Overall, tentative evidence for shared representational resources was provided by successful crossmodal mappings in three out of the four conditions. In the contour discrimination condition, crossmodal mapping occurred in both directions (for visual sequences to melodies as well as for melodies to visual sequences). In the intervallic discrimination condition, on the other hand, successful mapping occurred only for transfer from visual sequences to melodies, and not from melodies to visual sequences. This finding, in conjunction with the higher rate of 42 failure to reach training criterion (as well as larger number of trials needed to reach it, Figure 6 & 7), and overall lower accuracy scores for the intervallic condition (Figure 8, gray bars), may suggest that intervallic discrimination was harder than contour discrimination. While these findings may also provide tentative support for the notion that discriminating change in frequency direction is easier than discriminating change in frequency magnitude (Lim et al., 2012; Trehub et al., 1984), an examination of individual mapping scores suggests that the lower scores could possibly reflect a lack of mapping, and not a lack of discrimination (see Figure 9). That is, a mapping score closer to zero would indicate that the participant was getting the 2AFC test wrong consistently, thus mapping their response in the opposite manner. In fact, as can be seen in Figure 9, this trend occurred more during learning transfer from melodic to visual stimuli. Mapping accuracy
1.00
0.75
Discrimination
Contour
0.50
Interval
0.25
0.00
V to M
V to M
M to V
Learning transfer
43 M to V
Figure 9: Individual plots for mapping accuracy across the conditions in Experiment 1. Circles represent the contour conditions, whereas triangles represent the interval conditions. The solid line indicates group means. V = visual, M = melodic. The trends from the current data suggest that transfer across sensory modalities may not be bi-­‐directional. That is, mapping transfer was evident when first exposed to visual Gabor patches regardless of relation type (contour or intervallic), but successful transfer was only observed from melodies to Gabor patches for contour relations. It is possible that the task is slightly easier going from Gabor patches to melodies when compared with the opposite. Although not significant, lower accuracy scores for transfer from melodies to visual sequences, compared to the opposite, could suggest a possible bias or difference in representational flexibility of melodies versus visual sequences. Whether this possible difference may arise during the acquisition or transfer stage of representations is an open question. The methodology used here could be strengthened, for example, by a modeling approach demonstrating how such domain generalizability is achieved, since the question remains as to what precisely is learned from exposure to such melodic and visual sequences. The successful crossmodal mappings in Experiment 1 lend partial support the notion that learned categories can at least be used in a domain-­‐general manner, if the acquired representations are not in fact domain-­‐general by nature. The active categorization task may have been successful in focusing attention on the difference between categories. The question remains however, as to whether such categorical representations could be similarly obtained from mere passive exposure to such 44 melodies. Given that most people can perceive music despite having little to no formal musical training (Bigand & Poulin-­‐Charronnat, 2006; Koelsch et al., 2000), it has been suggested that musical experience and knowledge for most people are acquired through implicit learning (for a recent review on the topic as it relates to music, see Rohrmeier & Rebuschat, 2012). That is, similar to language, mere exposure to music is adequate for the development and acquisition of knowledge about fairly complex sets of regularities and relationships. In truth, representations may be acquired through different means where learning could in fact occur either explicitly or implicitly (with or without awareness, supervision, or direct knowledge). Although Experiment 1 used a more explicit learning approach through active categorization and feedback, the question remains as to how implicit learning procedures (i.e., passive observation) affect musical perception. In Experiment 2, an implicit learning procedure was used precisely towards this end—to gain a broader understanding of the domain generalizability of music and its representational content that may arise due to different learning approaches. 45 CHAPTER 3: EXPERIMENT 2 Unimodal and crossmodal mapping under implicit learning In Experiment 2, the same setup and procedure as Experiment 1 was used with the main difference being that the initial exposure was an implicit learning task rather than an explicit learning task. The rationale underlying this change was to more closely replicate how melodies might be processed in more natural settings where they are not being actively categorized. Thus, any differences between explicit and implicit learning on representational transfer should become evident through comparison of Experiments 1 and 2. If the process of actively directing attention to differences between stimuli pairs plays a role in the acquisition of supramodal representations, then it would be expected that crossmodal transfer in Experiment 2 would be less than in the explicit learning condition of Experiment 1. Additionally, participants were tested in both modalities (counterbalanced) in order to confirm that unimodal discrimination of categories is indeed possible. Within modality performance should be considerably better than crossmodal performance, as crossmodal responses require generalizing the representations to a novel context and stimulus. Methods 3.1.1. Participants One hundred twenty naïve participants (36 males) were recruited for Experiment 2 (15 for each of the eight conditions, see Table 1). Age (M = 22, SD = 5) did not differ across the conditions (p = 0.7). Participants’ musical experience (M = 4, SD = 4 years) and self-­‐reported perfect pitch ability (12 participants) also did not differ 46 across the conditions (p = 0.6). All participants were recruited from undergraduate psychology courses at the University of Hawaii at Manoa similar to Experiment 1. 3.1.2. Stimuli and apparatus Stimuli were constructed and presented using the same parameters and apparatus as in Experiment 1 (see section 2.2.2). 3.1.3. Procedure The procedure for Experiment 2 was similar to Experiment 1 (see section 2.2.3) except for a few key differences. First, since the learning paradigm was implicit, there was no categorization or feedback during the learning phase. Participants were presented with eight exemplars of a particular category during this initial phase. After each exemplar, the participant would press the spacebar to continue on to the next one (see Figure 10). After all eight exemplars were presented the testing phase began, which consisted of 20 exemplars. Half of the exemplars were novel instances of the category the participant had been exposed to during the learning phase, and the other half were of a different category. After each exemplar, participants would respond as to whether that exemplar was similar to what they had heard during the learning phase using either the N key (for “no”), or the Y key (for “yes”). Similar to Experiment 1, no feedback was given during the testing phase. Each participant was presented with two testing phases, one in each modality. Testing participants in the same modality as they had learned, as well as testing them in the opposite modality allowed us to examine both unimodal learning and crossmodal transfer. The order in which the test modalities were presented was also counterbalanced across participants (see Table 1). 47 Implicit Learning Paradigm
Auditory Condition
Visual Condition
(same as auditory, except
with visual stimuli)
750 ms
Freq. 1
1000 ms
Freq. 2
1000 ms
Freq. 3
1000 ms
Next trial prompt
* 8 exemplars presented per learning section
Figure 10: Implicit learning paradigm used in Experiment 2. Time course represents one learning trial, with a total of eight trials presented for each learning section. Results Of crucial interest to Experiment 2 was how participants mapped categorical representations both within and across modalities. One-­‐sample t-­‐tests were again conducted on mean mapping scores to determine if accuracy was different from chance (0.5). In the contour discrimination condition, unimodal mapping accuracy was significantly better than chance for melodies (M = 0.84, SD = 0.13, t(14) = 10.3, p < .001, see Figure 11) and visual sequences (M = 0.80, SD = 0.19, t(14) = 6.19, p < 48 .001). Crossmodal mapping accuracy was also significantly better than chance when learned visually and tested in the auditory modality (M = 0.62, SD = 0.17, t(14) = 2.73, p = .016), but was marginally better than chance when learned melodically and tested in the visual modality (M = 0.61, SD = 0.21, t(14) = 1.95, p = .07). Mapping accuracy
Contour
Interval
1.00
0.75
Test
Unimodal
0.50
Crossmodal
0.25
0.00
Melodic
Visual
Melodic
Learning modality
Visual
Figure 11: Mapping performance across all conditions in Experiment 2. Error bars indicate 95% CI. Dashed line indicates at chance performance. In the intervallic discrimination condition unimodal mapping accuracy was significantly better than chance for melodies (M = 0.88, SD = 0.14, t(14) = 10.6, p < .001), but not different than chance for visual sequences (M = 0.63, SD = 0.27, t(14) = 1.81, p = .09). In contrast, crossmodal mapping accuracy was not different from chance when learned melodically (M = 0.52, SD = 0.13, t(14) = 0.48, p = .6) or visually (M = 0.52, SD = 0.18, t(14) = 0.51, p = .6). To compare performance across the conditions, a 2x2x2 ANOVA was conducted with the factors of learning modality (melodic or visual), test type (unimodal or crossmodal), and relation type (contour or intervallic). There was a 49 main effect of learning modality (F(1, 112) = 4.8, p = 0.03), indicating that mapping accuracy was better for melodies (M = 71%) than for Gabor sequences (M = 63%). There was also a main effect for test type (F(1, 112) = 40.7, p < 0.001), indicating that mapping accuracy was better for unimodal tests (M = 78%) than for crossmodal tests (M = 56%). There was no main effect for relation type (F(1, 112) = 1.7, p = 0.2). There was a significant two-­‐way interaction between learning modality and relation type (F(1, 112) = 4.2, p = .04), indicating that there was a greater disparity in mapping accuracy between contour and interval relations after learning with visual sequences (69% vs 58%, respectively) compared to after learning with melodies (69% vs 72%). Note also, that these trends were in different directions (see Figure 12). Lastly, there was a significant three-­‐way interaction between all factors (F(1, 112) = 4.0, p = .049), indicating that the two-­‐way interaction between learning modality and relation type differed depending on whether participants were tested unimodally or crossmodally (see Figure 13). In essence, the three-­‐way interaction meant that the two-­‐way interaction was driven by unimodal conditions, and disappeared under crossmodal conditions. Mapping accuracy
1.00
0.75
Learned
Modality
0.50
Melodic
Visual
0.25
0.00
Unimodal
Test
Crossmodal
50 Figure 12: Mapping accuracy in Experiment 2 across learned modalities and test types. Error bars indicate FLSD. Mapping accuracy
Unimodal
Crossmodal
1.00
0.75
Relation
Contour
0.50
Interval
0.25
0.00
Melodic
Visual
Melodic
Learned modality
Visual
Figure 13: Mapping accuracy in Experiment 2 across learned modalities and relation types, paneled by test type. Error bars indicate FLSD. Discussion Overall, the results from Experiment 2 indicate that participants were able to discriminate the passively learned contour and intervallic categories to novel instances much better when tested under unimodal conditions, as compared with crossmodal conditions. Performance was much close to chance level for crossmodal transfer of contour relations, while there was no evidence of crossmodal transfer for the more difficult intervallic discrimination conditions. Examining specific trends within Experiment 2 indicate that while performance across modalities was symmetrical for contour discrimination (Figure 13), there was an advantage for unimodal testing of melodic stimuli over visual sequences for intervallic discrimination. In other words, it would appear that perception for intervallic categories was more accurate for melodies than for visual 51 sequences. This could be due to the fact that we may have more experience and knowledge with melodic stimuli as compared to visual grating sequences—which arguably would probably never be seen by participants outside the context of a cognitive psychology experiment. Overall, results from Experiment 2 support the notion that the representations acquired through implicit learning may be more domain-­‐specific than the representations acquired through the explicit learning procedure used in Experiment 1. On the whole these results provide insights into the flexibility of melodic representations and contribute to the existing body of knowledge on crossmodal implicit learning. Having established that implicit learning—the type of learning through which the majority of people use to process musical stimuli the majority of the time—is adequate for subsequent unimodal discrimination of the relations in melodic (as well as visual sequence) stimuli, the question remains as to how robust this implicit learning is, and what attentive or perceptive processes are at least mandatory for such learning. Given that relations are the cornerstone to many core properties of music, Experiment 3 will explore whether working memory—a component of short term memory used for dynamic information processing—could play a lead role in melodic perception. 52 CHAPTER 4: EXPERIMENT 3 4.1 Working memory and melodic perception Earlier in the introduction (see The Role of Relations in General Cognition, section 1.3) evidence was provided as to how melodic perception requires cognitive processes capable of extracting the relationships between the elemental pitches within a melody. These acquired relational representations are key (they essentially define a melody), and Experiment 1 and 2 investigated whether these representations are unique to melodies, or can also be used in the visual domain with non-­‐musical stimuli. Another unique and defining aspect of melodies mentioned earlier is their inherent temporal and sequential nature. Given this sequential nature, the question remains: how are the elemental pitches within melodies bound together over time such that relationships can be extracted and processed in a listener’s mind? As it happens, this question turns out to be central to how sequential information is processed. How are holistic objects or grouped sequential events such as melodies constructed from their individual events or constituent features? Different theories have put forward different explanations for such a binding mechanism. Anne Treisman’s feature integration theory (Treisman, 1986, 1998) posits that there are several different stages in the process. For example, in the initial stage an object’s features are processed separately. Treisman proposed that attention can be likened to a “glue” that binds the various features together. Other researchers have proposed that working memory is responsible for this binding process, wherein the ability to simultaneously hold different features or objects in memory while they are 53 being processed could be likened to such an integrative mechanism (Allen, Baddeley, & Hitch, 2006). The role of working memory and its interaction with attention is a widely studied and debated phenomenon (e.g., Feng, Pratt, & Spence, 2012; Postle, 2006). Working memory is a dynamic form of memory that is manipulated quickly (in seconds) and used to temporarily store information for further analysis (Baddeley, 2003). In fact working memory is often associated with objects in attention, and the two concepts are somewhat interconnected. For example, a functional magnetic resonance imaging (fMRI) study strongly implicated their overlap (LaBar, Gitelman, Parrish, & Mesulam, 1999). Generally, the brain regions involved in working memory are associated with different areas depending on the task, but a common overlap has been consistently observed in the prefrontal cortex (PFC). Studies examining lesions in the PFC have confirmed its function with working memory (Miller & Cohen, 2001). Indeed, the integrative aspect of the PFC with other sensory areas has also led some to suggest the role of executive controller for the PFC (the executive controller is a central component in the multi-­‐component model of working memory, see Baddeley, 2003). Research in the visual domain has shown that working memory may function as an automatic binding mechanism for sequential visual events. Yet this binding ability may not be entirely impervious to cognitive strain, as experiments that place participants under dual task conditions with heavy memory (and attentional) demands have shown (Allen et al., 2006; Lavie, 2005; Toro, Sinnett, & Soto-­‐Faraco, 2005; Toro, Sinnett, & Soto-­‐Faraco, 2011). While in the auditory domain studies 54 have looked at the binding of spatial and verbal features through sequential exposure (Maybery et al., 2009), to date no study has systematically and directly investigated the relationship between working memory and binding mechanisms in the specific context of melodic perception. To more directly examine this relationship, the final experiment in this dissertation directly manipulated the availability of these resources in order to observe any effects on participants’ abilities to process melodies. This approach was used to determine whether working memory is in fact a prerequisite for the holistic perception of melodies. Given that forming relations between constituent units underlies melodic recognition, the question remains as to how robust is this ability to form relations. By examining the extent to which melodic perception depends on working memory resources, the role of working memory in relational processing can be inferred. Whether working memory resources are used similarly across different types of sequential processing, or whether there may be a bias towards musical processing due to a predisposition for musical stimuli (as some have suggested, e.g., Trehub, 2001) is also an open question. When constructing a test for relational processing, attention must be given to the featural basis through which relational representations are actually composed. Recall from an earlier discussion (see section 1.5, Domain General or Domain Specific?), that the “relational shift” is a ubiquitous domain-­‐general trend in the cognitive development of children and accounts for a change in preference for featural (or absolute) percepts towards a preference for relational percepts, and that extensive evidence indicates that this prepotency for relational processing is 55 observed in how children process melodies as well. That is, when recalling melodies, younger children typically recall more absolute (i.e., featural) pitch properties than older children, while the exact opposite occurs in older children. Critically, this shift (and seeming dichotomy) from featural to relational processing indicates a change in preference or bias of attention towards relational aspects (Allport, 1924; Vernon, 1940), and not necessarily an inability to process featural aspects as children get older. In a similar vein, most adults can still attend to featural properties if attention is directed to them; it is merely the case that attending to relational properties becomes the default modus operandi under most contexts. Thus it is hypothesized that if working memory resources are a prerequisite for relational processing (Doumas et al., 2008; Morrison et al., 2011; Morrison, Doumas, & Richland, 2006; Morrison, Holyoak, & Truong, 2001), the ability to perceive melodic content should increasingly falter as these resources become depleted. As relational processing falters, perception could in theory default to basic stimulus features. Towards this end, Experiment 3 included a baseline condition and two n-­‐back tasks designed to tax working memory to varying extents while observing recall behavior on a relational to featural continuum. 4.2 Baseline, 1-­‐back and 2-­‐back conditions The basic approach in Experiment 3 was to require participants to first listen to a pitch stream that consisted of repeated melodies interleaved with random notes. In a subsequent two alternative forced choice (2AFC) task, participants would then select a melody that was most similar to the exposed melody. Critically, the 2AFC 56 task included a novel melody that was constructed using either the same relational structure as the exposure melodies but without any of the previously heard notes, or using a completely different relational structure but with the same note frequencies they previously heard in the exposure melody. Thus, in the secondary task participants were required to choose between a relational or a featural match. Importantly, working memory was taxed between participants using 1-­‐back and 2-­‐
back tasks, for which performance could be compared against the baseline group. The baseline group had no concurrent working memory task, but was similar to the 1-­‐back and 2-­‐back groups in all other regards. The task for all three groups consisted of two phases: for the first exposure phase participants listened to an auditory stream consisting of random pitches interleaved between a repeated three-­‐note melody (see section 4.5.2). After this exposure phase, participants were presented with the test phase consisting of four trials. These test trials included two three-­‐note melodies and participants were required to indicate which of the two melodies was more similar to what they had previously heard in the exposure stream. Critically, and as explained above, one choice contained featural attributes from the exposure stream while the other contained relational attributes. During exposure to the auditory pitch stream, a visual letter stream was simultaneously presented on a computer screen (present in all three groups, but only tasked in the 1-­‐back and 2-­‐back groups). In the baseline group participants’ were only required to monitor the visual letter stream without responding. Thus any relationship between available working memory resources and melodic 57 perception would be observed through behavioral trends on the recognition task in the test phase. This would indicate whether melodic perception (i.e., the extraction of relationships between elemental pitches) is dependent on working memory—in this case the hypothesized mechanism for the binding of notes within a melody.8 The 1-­‐back condition was identical to the baseline condition, with the key difference being that participants were now required to monitor for and respond to any repeated letters within the visual letter stream. This requirement placed a demand on participants’ working memory resources (they had to remember one place back in the letter stream). The 2-­‐back condition was identical to the 1-­‐back condition, with the key difference being the use of a 2-­‐back memory task instead of a 1-­‐back task. The reason for this manipulation was to obtain multiple behavioral measurements corresponding to different levels of available working memory capacity. 4.3 Methods 4.3.1 Participants Participants were recruited until there were at least 20 subjects with n-­‐back accuracy scores of 70% or above in each of the 3 conditions. This resulted in a total of 69 participants (11 males, age = 23 ± 6), with 20 participants in the baseline 8 Note, that the use of a pitch stream with interleaved melody has been previously been used in studies examining statistical learning and absolute pitch with infants (Saffran & Griepentrog, 2001). Although statistical learning alone could account for the extraction of featural regularities (absolute pitch) in such a context, it would not suffice to explain how relations of the interleaved melody used in Experiment 3 could be extracted. In other words, for a participant to select answer b) in Figure 7, they must have learned the relations between each note (the contour and intervals), since those are the only similarities between a) and b). 58 condition, 20 in the 1-­‐back condition, and 29 in the 2-­‐back condition. Participants’ musical experience (M = 4 years, SD = 4) and self-­‐reported perfect pitch abilities (15 participants) did not differ across the conditions (p = 0.9). All participants were recruited from undergraduate psychology courses at the University of Hawaii at Manoa similar to Experiment 1 and 2. 4.3.2 Stimuli and apparatus The auditory pitch stream in the exposure phase was constructed using randomly determined pitches according to Equation 1 (section 2.2.2), n being sampled from the following subset of five integers [0, 2, 4, 6, 8].9 The pitch stream was assembled by a paradigm script using the following procedure: 1) first a random melody was constructed, 2) next, this melody was played repeated, while 3) interspersing random notes of random quantities (between 0-­‐2 notes) in between each repeated melody (see Figure 14). In addition, a visual letter stream was concurrently presented on a computer screen (participants were only required to respond to the letter stream in the 1-­‐back and 2-­‐back conditions). This letter stream was constructed from randomly chosen non-­‐repeated letters (from the following set: B, C, D, E, F, J, K, L, M, N, P, R, S, T, Y, X, Z). Each pitch/letter event was presented for 700ms, with 16.7ms of silence and a blank gray screen as separation between events. In the 1-­‐back condition, the letters would repeat (e.g., B-­‐B, etc.) at randomly allocated positions, and in the 2-­‐back condition the letters would repeat after one intervening letter (e.g., B-­‐A-­‐B, etc.) at randomly allocated positions. 9 This whole-­‐tone scale was used to avoid the possibility of having any harmonic or scale related information within the melody and pitch stream as possible confounds. 59 Melodic note
Frequency
Random note
Time
Figure 14: Auditory stream with interleaved melody used in Experiment 3. The test phase consisted of four questions asking participants to indicate which of two melodies was most similar to what they had heard during the exposure phase. These two choices were three note sequences that either 1) retained the relational aspect of the interleaved melody and none of the featural aspects, or 2) retained the featural aspects of the interleaved melody, but none of the relational aspects (see Figure 15). 60 Exposed melody
Frequency
a)
6
3
1
Time
Relational test
c)
7
Frequency
Frequency
b)
Featural test
4
2
6
3
1
Time
Time
Figure 15: Test stimuli used in Experiment 3. During the test phase participants were asked to indicate whether the relational (b) or featural (c) test item sounded most familiar. The featural item preserved the specific frequencies, or absolute features of the exposed melody, whereas the relational item preserved the intervallic and contour sequence (magnitude difference and direction between each note). 4.3.3 Procedure All three conditions consisted of two phases (with the extra addition of training prior to the 1-­‐back and 2-­‐back tasks). For the first exposure phase, participants listened to the pitch stream described in section 4.5.2. Concurrent to the pitch stream was a visual letter stream that participants were required to monitor (see Figure 16). In the baseline condition participants were not required to respond to the visual stream (in order to control for the visual display used in the 1-­‐back and 2-­‐
back conditions where a response was required). In the 1-­‐back condition 61 participants had to perform a visual 1-­‐back task where they pressed the spacebar at every detected repetition in the letter stream (e.g., A-­‐A, G-­‐G, etc.; see Figure 16). The procedure for the 2-­‐back condition was identical to the 1-­‐back condition except that a 2-­‐back visual task was substituted for the 1-­‐back task (i.e., responding with the spacebar each time a 2-­‐back repetition occurred, e.g. A-­‐B-­‐A, G-­‐Y-­‐G, etc.). The auditory pitch stream lasted for two minutes, with the repeated melody being played approximately 100 times during this period. After the pitch stream ended, the letter stream continued to be displayed for one minute in order to ensure that any short-­‐term memory trace for the melodies was not present when completing the test phase. After the exposure phase, participants were then presented with the test phase where they heard two three-­‐note melodies and were asked to indicate which of the two was more similar to what they had heard during the exposure phase. To ensure that participants were familiar with the tasks and could perform the 1 and 2-­‐back components, participants in the 1 and 2-­‐back groups were trained on the n-­‐back test prior to the experiment. In order to acclimatize participants to the 1-­‐back and 2-­‐back tasks, during the actual exposure phase a lead-­‐in period of 30 seconds for the visual letter stream was used prior to the onset of the pitch stream (see Figure 16). 62 700 ms
N
(16.7 ms)
700 ms
S
...etc.
...etc. for 30 seconds to get
used to n-back
G
500 ms
each
n = random
n = p1
H
n = p2
B
n = p3
U
melodic
patterns
randomly
allocated
throughout
stream
n = random
B
n = random
...etc. for 2 minutes
Figure 16: Exposure phase for Experiment 3. In the 1-­‐back and 2-­‐back conditions the letter stream consisted of the corresponding n-­‐back task presented simultaneous to a pitch stream with a melodic pattern distributed throughout. In the baseline condition participants were only required to monitor the randomly constructed letter stream. 4.4 Results The first analysis in Experiment 3 was to compare the baseline, the 1-­‐back, and the 2-­‐back groups against chance in order to determine if relational response rates were significant different than if participants were responding at chance. In addition, performances across the three conditions were also compared against 63 each other using a one-­‐way ANOVA. The outcome measure used was the average ratio of relational (melodic) responses to featural (absolute pitch) responses across the four test trials. A score of 1 indicated a ratio of relational-­‐only responses, whereas a score of 0 indicated featural-­‐only responses, and a score of 0.5 indicated half relational and half featural responses (i.e., at chance). One sample t-­‐tests against chance (0.5) were conducted on participants scoring 70% or higher on the n-­‐back tasks. The relational response rate in the baseline condition was marginally better than chance (M = 0.61, SD = 0.31, t(19) = 1.63, p = .06), while neither the 1-­‐back (M = 0.58, SD = 0.30, t(19) = 1.10, p = .3) nor the 2 back conditions (M = 0.56, SD = 0.29, t(19) = 0.96, p = .3) were significantly different from chance. A one-­‐way ANOVA with factor of concurrent task was also conducted and revealed that the ratio of relational responses were not significantly different across the baseline (M = 0.61), 1-­‐back (M = 0.58), and 2-­‐back (M = 0.56) conditions, F(2, 57) < 1, p = 0.9, see Figure 17a. B) Within 2-back
0.8
% Relational Response
% Relational Response
A) 0.6
0.4
0.2
0.0
n
1
N back
2
0.75
0.50
0.25
n=4
n=9
n = 16
0 to 0.6
0.6 to 0.8
0.8 to 1
0.00
2-back accuracy
1.00
64 Figure 17: A) Relational response averages across the three n-­‐back conditions. B) responses within the 2-­‐back condition at different levels of 2-­‐back accuracies. Considering that the critical manipulation in Experiment 3 was the taxing of working memory resources, n-­‐back accuracy scores across the 1-­‐back and 2-­‐back groups were compared to determine whether the manipulation was effective. Independent sample t-­‐test revealed that accuracy scores on the 1-­‐back (M = .99, SD = .02) were significant higher than on the 2-­‐back (M = .82, SD = .16), t(29) = 5.6, p < .001 (see Figure 18). A) B) Accuracy on n-back task
n-back accuracy
n-back accuracy
p < .001
1.0
0.5
M = .99
M = .82
n-back
0.75
0.50
0.25
p < .001
0.00
0.0
1
Individual n-back scores
1.00
1
2
n-back
2
Figure 18: N-­‐back accuracy scores across the two n-­‐back conditions. A) average scores B) individual plots. The question remains as to what may be the best approach for analyzing the test data, given the way in which the second task was constructed. Recall that four test trials were used in the melodic recognition test, and the average of the responses to all four trials were used in the analyses (see analysis above). However, it could be argued that using the first trial only may provide a more pure measure of 65 relational processing (i.e., first impression). That is, it is possible that participants could have learned from the 2AFC test anywhere within the first trial to the fourth trial, and noticed that one of the choices always retained the featural aspects while the other retained the relational aspects, subsequently using this information to respond differently. To more closely examine this potential, an ANOVA with factors of trial number (1-­‐4), and task condition (baseline, 1, and 2-­‐back) was conducted. No main effects (p > .4) were observed, although there was a marginal interaction (p = .08), suggesting that the pattern of responses across the four trials was different among the three task conditions (see Figure 19). % Relational response
Relational responses by trial across n-back tasks
n
1
2
1.00
0.75
0.50
0.25
0.00
1
2
3
4
1
2
3
Test trial
4
1
2
3
4
Figure 19: Ratio of relational responses across the four test trials in Experiment 3 by task conditions. Error bars indicate standard error of the mean. Given that that the initial response may be a more accurate index of relational processing, the same analysis on all 4 trials was conducted using only the first test trial. Although the ratio of relational responses did not differ against each other (p = .5), or from chance across all three conditions (all p > .3, see Figure 20a), it is worth noting a trend in the 2-­‐back data where relational responses rates were 66 higher for those with lower accuracy scores on the 2-­‐back task (see Figure 20b).10 Tentatively, this may suggest that when working memory resources are less taxed, and consequently more available, the relations within the melodic stream were detected more. Although, note that this trend did not exist when analyzing responses across all four trials. B) 1.00
% Relational Response
% Relational Response
A) 0.75
0.50
0.25
0.00
n
1
N back
Within 2-back
1.00
0.75
0.50
0.25
n=4
n=9
n = 16
0 to 0.6
0.6 to 0.8
0.8 to 1
0.00
2
2-back accuracy
Figure 20: A) Relational response on first trials across the three n-­‐back conditions. B) responses on first trials within the 2-­‐back condition at different levels of 2-­‐back accuracies. 4.5 Discussion Several important findings arose from this exploratory study of the relationship between working memory and the relational processing of melodies. Firstly, the comparison on the n-­‐back accuracy scores across the 1-­‐back and 2-­‐back conditions revealed a significant decline in performance on the 2-­‐back (see Figure 18), thereby 10 This trend could not be examined within the 1-­‐back task since the majority of n-­‐
back scores were clustered around 100%, (i.e., the task was too easy, see Figure 18b.) 67 verifying that the increased task difficulty taxed memory resources. The greater variability of n-­‐back performance on the 2-­‐back compared to the 1-­‐back (Figure 18b) further supports the finding that the 2-­‐back task was considerably more difficult. The lack of a clear effect on relational processing from the taxation of working memory may suggest that 1) the binding mechanism for either melodic perception or relational processing could be something other than working memory (e.g., attention) or 2) that relational processing is automatic and impervious to attentional/working memory demands. Obviously, these points are only speculative and would require more research to eliminate other possible explanations, including for example, the notion that the findings obtained here was a result of the specific configuration of stimuli used in this experiment. Bearing in mind that the n-­‐back tasks did successfully tax working memory resources, the lack of difference for relational response rates on the testing section across the different groups would appear to weaken the hypothesis of a strong relationship between working memory and relational processing—at least within the capacities measured here. Indeed, the only finding indicating a trend in the data is the marginally better than chance rate for relational responses in the baseline group, which was not observed in either of the n-­‐back groups. However, the fact that the relational response rate was only marginally different than chance in the baseline condition could be due to other potential problems within the experimental design (such as the construction of the stimuli used in the pitch stream and subsequent testing, for example). 68 Another issue to consider is the reliability of the methodology. Recall that during the testing section participants were given a 2AFC test between two melodies that either retained the relational or featural aspects of the melody previously heard during the exposure phase. In an attempt to keep the instructions from biasing participants’ response towards either the relational or featural answer, the testing prompt had to be purposefully vague. Participants were asked which of the two choices sounded “the most familiar” to what they had heard during the exposure phase. Thus, the most “familiar” choice could be either featural or relational, depending on how participants’ interpreted the question. That is, it is possible that participants could have been aware of the relations in the melody but still chose the featural answer if they interpreted “familiar” on the basis of features instead of relations. This inherent subjectivity in how participants could have interpreted the test prompt may have had the unfortunate outcome in not ensuring that the crucial measure to be inferred from this response—the processing of relations in the repeated melody—was actually measured. Given that a direct test of relational processing could be difficult to employ without also biasing participants’ answers (e.g., “what shape was the melody”, or “what were the relationships between the notes in the melody”, etc.), future studies might improve on this methodology by employing indirect measures where the recall test of relational content is orthogonal to the actual responses, thus ensuring that the obtained answer is a more reliable measure of relational processing. 69 CHAPTER 5: GENERAL DISCUSSION This dissertation employed novel approaches aimed towards better understanding the representational nature of melodies and visual sequences. Overall, the findings provide insights into the flexibility of melodic representations under active and passive learning conditions, and how explicit learning may play a role in developing domain-­‐general representations. The overarching goals of this dissertation were to determine the degree to which different classes of relational representations acquired within a particular modality can be mapped to a separate modality (i.e., domain-­‐general vs. domain-­‐specific), and whether the manner in which the representations were acquired (explicitly or implicitly) could affect mapping behavior (Chapters 2 and 3). Lastly, given that knowledge of music for most is acquired implicitly, this dissertation also explored the extent to which the availability of working memory resources dictates subsequent perception of implicitly acquired relational melodic content (Chapter 4). Before discussing the findings it would be worth revisiting the specific goals and aims (stated in section 1.1) this dissertation set out to accomplish: 1. Assess the extent to which underlying representations used for processing music are domain-­‐general. This will be accomplished by systematically testing if representations of contour (direction) and intervals (magnitude) of explicitly learned melodic stimuli could be mapped to the visual modality (Gabor patch sequences), and vice versa. 70 2. Assess whether implicit learning conditions influence domain-­‐general processes. This will be accomplished by systematically testing if representations of contour and intervals from implicitly learned melodic stimuli are mapped within the same modality as well as to the visual modality (Gabor patch sequences), and vice versa. Assess also if mapping rates differ from previously found explicit learning conditions. 3. Explore the relationship between implicit melodic perception and working memory resources. 5.1 Domain-­‐general or Domain-­‐specific One of the central goals of this dissertation was to determine to what extent the underlying representations used for processing music are domain-­‐general. As such, this is the first study of its kind to employ a counterbalanced design using sequential stimuli with analogous dimensions across both modalities. Although studies in the past have looked at correspondence between non-­‐sequential instances of visual spatial height and even visual frequency in relation to pitch (for a review, see Spence, 2011), none have looked at the correspondence of sequential melodic and visual stimuli and their inherent relational properties. This last point is important as it allowed for comparisons across modalities that go beyond mere sensory correspondence; the rationale here being to attempt to examine the underlying representations that are encoded from exposure to melodies and other analogous sequential stimuli (e.g., Gabor patch sequences). Crucially, if representations are acquired in one modality and can subsequently be used in another modality, this would seemingly suggest that such representations are domain-­‐general. Recall from 71 the introduction (section 1.5, Domain General or Domain Specific?) that a central question to the study of music cognition is whether the underlying mechanisms of music perception are specific to only music, or if instead they also underpin other cognitive mechanisms. In regards to how relations are learned, the differential findings from Experiment 1 and 2 may tentatively suggest that explicit learning will lead to more domain-­‐general representations, whereas implicit learning will lead to more domain-­‐specific representations. First, the higher rate of crossmodal mappings in Experiment 1 may imply shared, or domain-­‐general, representations. Second, the failure to map crossmodally in Experiment 2—in conjunction with the success of unimodal mapping—suggests that representations in Experiment 2 were domain-­‐
specific. It is important to note that these two experiments were essentially identical with the key exceptions being that participants explicitly learned the relations in Experiment 1 and implicitly learned the relations in Experiment 2. The other slight difference was the testing procedure (see below). Note also that mapping success (an operational definition for domain generality) may be dependent on relational complexity, since crossmodal mappings were worse for transfers of intervallic relations across both modalities in Experiment 2, and were worse for the transfer of intervallic relations from melodies to visual sequences in Experiment 1. This suggests that complex relations are harder to process (recall also the higher fail rate for learning intervallic relations in Experiment 1), where mapping to a separate modality would be even more difficult. In addition, intervallic scores were lower than their counterparts in all conditions 72 for both Experiment 1 and 2 (five out of six pairs), except for unimodal melodic transfer in Experiment 2, where the intervallic mean was slightly higher than the contour mean. In light of these findings, it would appear that the nature in which representations are acquired essentially determines the domain-­‐generalizability of those representations. When representations are acquired explicitly, they may be used in a more domain-­‐general manner (i.e., crossmodally), but when they are acquired implicitly, their range of use may be limited to the domain of acquisition (i.e., unimodally). Furthermore, the complexity of representations themselves seem to dictate their general usability or domain-­‐generalizability (i.e., crossmodal mapping), with more complex representations (interval or magnitude) being more difficult to process (i.e., through discrimination or mapping, etc.) and more simple representations (contour or direction) being more amenable to processing. There is a caveat in these conclusions, however, in that they are somewhat tentative given the slightly different approaches used in the testing phases of Experiment 1 and 2. Although they were both 2AFC tests, Experiment 1 required participants to categorize exemplars into categories, whereas Experiment 2 required participants to confirm or disconfirm an exemplar as being within a previously learned category. Despite every effort being taken to create nearly identical experiments, this difference was inevitable given the nature of explicit and implicit learning. Further complicating the issue is the fact that the amount of learning exposure in each Experiment differed. Whereas the number of learning trials in Experiment 1 varied depending on how long it took participants to reach 73 the learning criterion (average of 30 exemplars per subjects, from two categories), learning examples in Experiment 2 were limited to eight exemplars (one category) per participant. For these reasons it may be problematic to compare Experiment 1 and 2 directly, and any conclusions regarding domain-­‐generality is somewhat tentative. Also, while it would appear that implicit learning in Experiment 2 was not sufficient for the establishment of supramodal representations, future studies should also determine whether increasing the number of exemplars could have any effects on crossmodal mapping accuracy. Having said that, it should be noted that participants were able to map successfully within modalities after only eight exposures in Experiment 2, suggesting that the relation could be learned, at least in a domain-­‐
specific way, with the given amount of exposure. Moreover, the results obtained here do supplement existing findings on the domain-­‐general aspects of musical processing (i.e., a lack of behavioral and neurological differences between musicians and non-­‐musicians on passive listening tasks, see Bigand & Poulin-­‐Charronnat, 2006; Koelsch et al., 2000). Lastly, it should be noted that there are different levels of processing at which we can characterize the domain general nature of music. The above discussion looked at the domain-­‐general nature of the relations, or higher-­‐
order representations that were ultimately learned from musical stimuli. Others studies have looked for example, at how the lower level processes or mechanisms in music perception may be domain general. These domain-­‐general processes may include statistical learning, speech segmentation, or the ability to process sequential stimuli. For example, speech segmentation may be a domain-­‐general process that 74 accounts not only for perception of discrete units in speech (i.e., words and phonemes), but also for perception of discrete units in music (i.e., individual notes) from auditory signals that are continuous at the sensory level. When discussing the domain-­‐general nature of music, perhaps the most frequent comparisons are made between music and language, since the two domains do share many commonalities. Relevant to this study, the role of sequential learning is fundamental to both music and language. It has been suggested that the processing of sound is an important opportunity during development that subsequently aids the brain in learning how to process sequential stimuli in general (Conway et al., 2009). This notion is consistent, for example, with a study that used an implicit learning task to measure visual sequencing abilities in deaf children with cochlear implants. It was found that deaf children exhibited general sequencing deficits that were correlated to deficits in language outcomes, leading the investigators to hypothesize that deprivation of early sequential learning opportunities in deaf children may explain their continued difficulties in language even after receiving cochlear implants (Conway et al., 2011). It has been proposed that temporality and sequential processes may also separate music from other art forms. Although music may rely on domain-­‐general mechanisms, its unique appeal may lie in its inherently temporal nature that allows for the close interaction of prediction and novelty (Kivy, 1993; Marcus, 2012). Thus it is possible that different aspects of music could be domain-­‐general to different extents. The focus of this dissertation was on the characteristics of the relations and representations that were acquired from melodic exposure. Here the 75 findings from Experiment 1 and 2 suggests that factors such as learning approach (i.e., explicit or implicit) and representational complexity (i.e., contour or interval) could play important roles in determining whether the representations are best characterized as domain-­‐general or domain-­‐specific. 5.2 Asymmetrical crossmodal findings One of the interesting trends observed across Experiment 1 and 2 is the asymmetrical nature of learning transfer across modalities. In Experiment 1, mapping transfer was not observed from melodies to visual sequences for intervallic relations, and in Experiment 2 there was an advantage for unimodal testing of melodic stimuli over visual sequences for intervallic discrimination. Overall, these trends could be due to a variety of factors, including 1) that people may have more experience and knowledge of melodic stimuli than visual gratings, 2) that it is easier to map acquired representations to melodies than to visual grating sequences, 3) that bias is due to the specific stimuli configuration used here, or 4) that bias towards melodies could be a manifestation of an auditory coding bias. This last point—of a potential bias towards melodies resulting from preference for auditory stimuli—is mentioned due to findings that the majority of information within short-­‐term memory is encoded in the auditory modality (Conrad, 1964) and also that temporal structures—presented even in the visual modality—are represented using an auditory code (Guttman, Gilroy, & Blake, 2005). 76 5.3 Effect of musical experience One approach that has been used by researchers to examine the link between general cognition and musical processing is to look at existing differences between musicians and non-­‐musicians. It could be inferred from such comparisons that any differences observed could be due to expertise or overtraining in music, thus providing insights into the effects of musical training, and perhaps even what types of processes are used while listening to or playing music (but for a discussion of the limitations inherent in cross-­‐sectional designs, see Boot, Blakely, & Simons, 2011). Aside from increased expertise within the domain of music, studies have also shown that musical training may have concomitant benefits in other domains as well (Helmbold, Rammsayer, & Altenmüller, 2005; Jones & Yee, 1997; Patston, Hogg, & Tippett, 2007; Rauscher, 2003). Thus, it is possible that musical experience may have had an affect on performance within the three experiments.11 To determine whether such relationships exist, an analysis of the correlations between musical experience and performance was conducted. 11 Note however, that in all experiments musical experience did not differ across the different conditions, therefore expertise was unlikely to have driven any of the effects observed in each experiment. 77 A) B) Crossmodal mapping
Learning trials needed
150
No. of learning trials
Mapping accuracy
1.00
0.75
0.50
0.25
0.00
100
50
0
0
5
10
15
0
Years of musical experience
5
10
15
Years of musical experience
21: Experiment 1 scatterplot of A) individual mapping accuracy scores and B) number of learning trials until criterion was reached as a function of musical experience, with linear regression lines. In Experiment 1 there was an overall weak positive correlation between musical experience and mapping accuracy, r(57) = .29, p = .03 (see Figure 21a), and a weak negative correlation between musical experience and number of trials until criterion was reached, r(57) = -­‐.22, p = .09 (see Figure 21b). However, examining each sub-­‐condition revealed a strong correlation between musical experience and mapping accuracy within the intervallic discrimination condition for learning transfer from visual to melodic sequences, r(13) = .63, p = .01, as well as transfer from melodic to visual sequences, r(12) = .48, p = .08 (see Figure 22, right panels). Correlations for contour discrimination conditions were weaker and non-­‐significant (p > .1). 78 Mapping accuracy during testing
Contour
Interval
1.00
Melody
0.50
0.25
0.00
1.00
0.75
Visual
Mapping accuracy
0.75
0.50
0.25
0.00
0
5
10
15
0
5
10
Years of musical experience
15
Figure 22: Scatterplot of individual mapping accuracy scores in Experiment 1 as a function of musical experience, with regression lines for each group. Plots are grouped by relation type (horizontal panels) and testing modality (vertical panels). In contrast to Experiment 1, there were no significant correlations between musical experience and performance scores in Experiment 2 and 3 (see Figure 23). What might have caused the significant correlation in Experiment 1 that did not similarly occur in the other experiments? Although there were several differences between the experiments, the answer may lie in the fact that the primary difference between the first experiment and the other two was the use of an explicit learning task. It could be speculated that musical experience may have had more of an effect in Experiment 1 due to the active learning procedure, which consequently enabled greater access into existing stores of musical knowledge and concepts than in Experiment 2 and 3 where musical learning occurred only at an implicit level. 79 A) B) Experiment 2
% Relational Response
Mapping accuracy
1.00
0.75
0.50
0.25
0.00
0
5
10
Experiment 3
1.00
0.75
0.50
0.25
0.00
0
15
Years of musical experience
5
10
15
Years of musical experience
Figure 23: Scatterplot of A) individual mapping scores in Experiment 2 and B) ratio of relational responses in Experiment 3 as a function of musical experience with regression lines by experiment. Note the lack of correlation. This cross-­‐experiment correlational survey tentatively suggests that musical experience may have a greater influence on performance in explicit learning conditions than in implicit conditions. That is, it may be the case that under certain circumstances (such as the ones used within these experiments) musical knowledge and experience can only be used explicitly, and perhaps not implicitly. This hypothesis may be consistent for example, with research showing no difference between musicians and non-­‐musicians on listening tasks (Bigand & Poulin-­‐
Charronnat, 2006; Koelsch et al., 2000). To elaborate on this hypothesis, note first that passive listening of music in the common everyday sense that we are familiar with does not require expertise. Note also, that when both musicians and non-­‐
musicians are exposed to these types of passive and implicit listening tasks, no differences were found across the two groups (Bigand & Poulin-­‐Charronnat, 2006; Koelsch et al., 2000). Thus it follows that if the musicians in these studies had more 80 experience or knowledge than non-­‐musicians, such experience did not have any observable effects on implicit learning—similar to the correlational results and hypothesis presented above. Such notions would also be consistent for example, with the argument that the basic set of perceptual capabilities used for music perception are less dependent on experience and perhaps of innate origins (Hauser & McDermott, 2003). Lastly, the claim that musical experience has greater effects on explicit learning and less for implicit learning may be limited to the current set of stimuli and procedures used within this dissertation, as there is some evidence that musical training or experience may have some concomitant processing “benefits” outside of the specific domain of training. A study by Helmbold and colleagues (2005), for example, compared 70 musicians to non-­‐musicians on psychometric assessments of intelligence and general mental abilities and found differences in performance on two tests: flexibility of closure, and perceptual speed. The flexibility of closure task (identifying hidden visual patterns) involved detecting single elements in complex objects, while the measure of perceptual speed involved finding visually presented letters amongst digit distractors. The authors speculated that the better performance on tasks measuring perceptual speed could be explained by the demands of musical training in requiring quick recognition of musical symbols or structures. In a related study, Jones and Yee (1997) found musicians to be better at discriminating time change to auditory rhythmic patterns, but only when they were simple patterns. Other research using a line bisection task also showed faster reaction times and fewer errors in musicians, suggesting better visual perceptual 81 processing (Patston et al., 2007). There is also evidence from neuropsychological studies suggesting that musicians have greater neuroplasticity resulting in both functional and anatomical differences, including increases of grey and white matter volume in the left cerebellum, more pronounced cortical reorganizations for musically related motor activity, and larger evoked potential fields (25%) in response to instrumental tones when compared to controls (Gaser & Schlaug, 2003; Münte, Altenmüller, & Jäncke, 2002). Despite these findings12, there is still a lack of consensus among researchers as to how much musical training can truly affect processing in other domains. To this end, the correlations presented here can only provide speculations, while possible hypotheses should be further explained with experiments designed to manipulate such causal mechanisms (e.g., musical training). 5.4 Significance and future research The experiments presented in this dissertation relate both to existing knowledge on music cognition as well as the broader field of cognition. First, the rationale and basis for the experimental designs used within this dissertation was based on numerous studies that were designed to measure basic musical properties that can be discriminated even by non-­‐musically trained individuals. Although humans can discriminate on an impressive array of musical dimensions, the main properties 12 Note here, the great potential for the “file drawer problem” (Rosenthal, 1979), wherein many file drawers across research labs could in fact contain stacks of evidence for the null hypotheses of no difference between musicians and non-­‐
musicians on countless measures (perhaps even published contradictory ones) without us ever knowing of their existence. 82 examined here were contour (Dowling, 1978; Trainor et al., 2002), and intervals (Attneave & Olson, 1971; Dowling, 1978, 1984, 1988; Fujioka et al., 2004; Trainor et al., 2002). Given that these properties are inherent in music, the research conducted herein sought to answer whether these relational properties are domain-­‐general or domain-­‐specific, by measuring the flexibility to which they can be mapped to a separate modality. Domain-­‐generality is a central concept in cognition due to its inherent parsimony and potential ability to unify separate aspects of cognition. Indeed several lines of research findings support the argument that music perception uses domain-­‐general processes. Research with infants have shown for example, that they are able to discriminate among the auditory properties necessary for the perception of music to a greater precision than what is actually required for music (for a review, see Trehub, 2001). Many have taken this as clear evidence for an innate perceptual faculty, or at the very least a predisposition for music. In addition, studies examining statistical learning abilities in infants have found that they can discriminate musical stimuli just as well as speech stimuli, thus extending the domain-­‐general repertoire of music to segmentation abilities (Saffran, Johnson, Aslin, & Newport, 1999). Given this predisposition for music, the question remains as to what are the underlying representations used in music? What mental currency do these predispositions deal with? The approaches used within this dissertation are initial steps aimed precisely towards answering such questions. By examining whether implicitly learned categories in Experiment 2 could be transferred to a separate modality, the flexibility and supramodal/unimodal nature 83 of these representations can thus be inferred. The novel crossmodal approach used here provided tentative evidence that implicitly learned relations in melodic stimuli are domain-­‐specific. Note that this does not contradict with having a domain-­‐
general mechanism. For example, the abovementioned studies on statistical learning with infants found that the mechanism for statistical learning was the same for musical and speech streams. The crucial difference here is that infants in those studies were tested within the same medium. That is, after being exposed to speech streams, infants were tested on words within the speech streams (Saffran, Aslin, & Newport, 1996), and when exposed to pitch streams, infants were tested on recognition for melodies in the pitch stream (Saffran et al., 1999). To date, infants have not been tested across mediums (e.g., exposed to a speech stream and tested on analogous melody stimuli), but the results obtained from Experiment 2 would appear to predict failure on such cross medium tests. One of the central themes of cognition is the mental representations underlying intelligent behavior. Given that representations cannot be observed directly, their nature can only be inferred indirectly through measured behavior. This was the rationale behind the crossmodal mapping paradigm used in both Experiment 1 and 2. Note however, that the novelty of the studies used here were from the use of sequential stimuli in a crossmodal mapping paradigm (melodic stimuli and their visual grating sequence analogues). As mentioned earlier, sequential learning may play a special role in human cognition as it involves the processing of dynamic representations and relations (Freyd, 1987). There have been fewer investigations on the mapping of sequential stimuli, although mapping 84 for static stimuli pairs has been investigated much more thoroughly. In a review of such findings, Spence (2011) proposes that crossmodal correspondence can occur both from associative learning, as well as semantic sources. For the data obtained in Experiment 1, it is less likely that associative learning could have accounted for the observed crossmodal correspondences, since visual gratings (Gabor patches) are a class of stimuli that would seldom be seen by participants outside of a cognitive or vision related experiment. On the other hand, semantic coding could indeed have played a role in the crossmodal correspondence observed here for the reasons detailed above. To test whether this could be the case, future studies could add a priming component to the methods used in Experiment 1 for example, by exposing participants to different classes of words (e.g., either congruent or incongruent to the relations in the stimuli sequence) and observing any effects on behavior during the crossmodal mapping test. Such tests may help to discern whether the representations in question are encoded at the semantic level. In the study of cognition, music has historically been eclipsed by other topics and viewed as a more peripheral subject matter. This may be due in part to the attitude many researchers have towards musical phenomenon. Recall the quote by Darwin (1873) at the beginning of the dissertation for example, where he claims that music has “the least direct use to man in reference to his ordinary habits of life.” Similar views are mirrored by Steven Pinker (1997), who describes music as mere “auditory cheesecake”, which although undeniably enjoyable, only occurred accidentally as an indirect result of more purposeful evolutionary progressions. Nevertheless, there are others who claim that studying the underlying basis of 85 music has the potential to provide key insights into fundamental cognitive mechanisms due to its many domain-­‐general characteristics (Hauser, Chomsky, & Fitch, 2002; McDermott & Hauser, 2005). These characteristics and functions include, but are not limited to: auditory perception, segmentation, syntax, concepts, representations, relations, statistical learning, explicit and implicit learning, memory, developmental trajectory, etc. In the same vein, the argument can also be made that the interdisciplinary nature of music makes it a prime candidate for studying the mind (for discussion, see Pearce & Rohrmeier, 2012). In conclusion, this dissertation provides evidence that music is a well-­‐suited medium for studying many aspects of cognition. To return to the quote by Darwin from which we began, I would argue that music might prove itself to be useful for the very same reason that it is mysterious. For, if we should succeed in unraveling the particularly intriguing mysteries of music, we could find ourselves well on the way towards unraveling much larger mysteries of cognition. 86 REFERENCES Abecassis, M., Sera, M. D., Yonas, A., & Schwade, J. (2001). Whats in a shape? children represent shape variability differently than adults when naming objects. Journal of Experimental Child Psychology, 78(3), 213-­‐239. Allen, R. J., Baddeley, A. D., & Hitch, G. J. (2006). Is the binding of visual features in working memory resource-­‐demanding? Journal of experimental psychology: General, 135(2), 298. Allport, G. W. (1924). Eidetic imagery. British Journal of Psychology. General Section, 15(2), 99-­‐120. Attneave, F., & Olson, R. K. (1971). Pitch as a medium: A new approach to psychophysical scaling. The American journal of psychology, 84, 147-­‐166. Baddeley, A. (2003). Working memory: Looking back and looking forward. Nature Reviews Neuroscience, 4(10), 829-­‐839. Bartlett, J. C., & Dowling, W. J. (1980). Recognition of transposed melodies: A key-­‐
distance effect in developmental perspective. Journal of Experimental Psychology: Human Perception and Performance, 6(3), 501. Bengtsson, S. L., Csíkszentmihályi, M., & Ullén, F. (2007). Cortical regions involved in the generation of musical structures during improvisation in pianists. Journal of Cognitive Neuroscience, 19(5), 830-­‐842. Biedeman, I. (1987). Recognition-­‐by-­‐components: A theory of human image understanding. Psychological Review, 94(2), 115-­‐147. 87 Bigand, E., & Poulin-­‐Charronnat, B. (2006). Are we “experienced listeners”? A review of the musical capacities that do not depend on formal musical training. Cognition, 100(1), 100-­‐130. Bohlman, P. V. (1999). Ontologies of music. In N. Cook & M. Everist (Eds.), Rethinking music (pp. 17-­‐34). Oxford, England: Oxford University Press. Boot, W. R., Blakely, D. P., & Simons, D. J. (2011). Do Action Video Games Improve Perception and Cognition? Frontiers in Psychology, 2. doi: 10.3389/fpsyg.2011.00226 Brainard, D. H. (1997). The psychophysics toolbox. Spatial vision, 10(4), 433-­‐436. Case, R., & Khanna, F. (1981). The missing links: Stages in children's progression from sensorimotor to logical thought. New Directions for Child and Adolescent Development, 1981(12), 21-­‐32. Conard, N. J., Malina, M., & Münzel, S. C. (2009). New flutes document the earliest musical tradition in southwestern Germany. Nature, 460(7256), 737-­‐740. Connell, L., Cai, Z. G., & Holler, J. (2013). Do you see what I’m singing? Visuospatial movement biases pitch perception. Brain and cognition, 81(1), 124-­‐130. Conrad, R. (1964). Acoustic confusions in immediate memory. British journal of Psychology, 55(1), 75-­‐84. Conway, C. M., Pisoni, D. B., Anaya, E. M., Karpicke, J., & Henning, S. C. (2011). Implicit sequence learning in deaf children with cochlear implants. Developmental Science, 14(1), 69-­‐82. 88 Conway, C. M., Pisoni, D. B., & Kronenberger, W. G. (2009). The Importance of Sound for Cognitive Sequencing Abilities The Auditory Scaffolding Hypothesis. Current Directions in Psychological Science, 18(5), 275-­‐279. Darwin, C. (1873). The descent of man, and selection in relation to sex. Vol. 2. London: John Murray. DeLoache, J. S., Sugarman, S., & Brown, A. L. (1985). The development of error correction strategies in young children's manipulative play. Child Development, 928-­‐939. Dietrich, E., & Markman, A. B. (2003). Discrete thoughts: Why cognition must use discrete representations. Mind & Language, 18(1), 95-­‐119. Douglas, K. M., & Bilkey, D. K. (2007). Amusia is associated with deficits in spatial processing. Nature Neuroscience, 10(7), 915-­‐921. Doumas, L. A. A., & Hummel, J. E. (2010). A computational account of the development of the generalization of shape information. Cognitive science, 34(4), 698-­‐712. Doumas, L. A. A., Hummel, J. E., & Sandhofer, C. M. (2008). A theory of the discovery and predication of relational concepts. Psychological Review, 115(1), 1. Doumas, L. A. A., Morrison, R. G., & Richland, L. E. (2009). The Development of Analogy: Working Memory in Relational Learning and Mapping. Paper presented at the 31st Annual Conference of the Cognitive Science Society, Amsterdam, The Netherlands. Dowling, W. J. (1978). Scale and contour: Two components of a theory of memory for melodies. Psychological Review, 85(4), 341. 89 Dowling, W. J. (1984). Assimilation and tonal structure: Comment on Castellano, Bharucha, and Krumhansl. Journal of Experimental Psychology, 113(3), 417-­‐
420. Dowling, W. J. (1988). Tonal structure and children's early learning of music. In J. Sloboda (Ed.), Generative Processes in Music. Oxford: Oxford University Press. Eitan, Z., & Timmers, R. (2010). Beethoven’s last piano sonata and those who follow crocodiles: Cross-­‐domain mappings of auditory pitch in a musical context. Cognition, 114(3), 405-­‐422. Evans, K. K., & Treisman, A. (2010). Natural cross-­‐modal mappings between visual and auditory features. Journal of vision, 10(1), 6. Fadiga, L., Craighero, L., & D’Ausilio, A. (2009). Broca's area in language, action, and music. Annals of the New York Academy of Sciences, 1169(1), 448-­‐458. Feng, J., Pratt, J., & Spence, I. (2012). Attention and visuospatial working memory share the same processing resources. Frontiers in Psychology, 3, 1-­‐11. Freyd, J. J. (1987). Dynamic mental representations. Psychological Review, 94(4), 427. Friederici, A. D. (2002). Towards a neural basis of auditory sentence processing. Trends in cognitive sciences, 6(2), 78-­‐84. Fujioka, T., Trainor, L. J., Ross, B., Kakigi, R., & Pantev, C. (2004). Musical training enhances automatic encoding of melodic contour and interval structure. Journal of Cognitive Neuroscience, 16(6), 1010-­‐1021. Gaser, C., & Schlaug, G. (2003). Brain structures differ between musicians and non-­‐
musicians. Journal of Neuroscience, 23(27), 9240. 90 Gelman, R., & Gallistel, C. (1978). The child's understanding of number. Cambridge, MA: Harvard University Press. Gentner, D. (1983). Structure-­‐mapping: A theoretical framework for analogy. Cognitive science, 7(2), 155-­‐170. Gentner, D. (1988). Metaphor as structure mapping: The relational shift. Child Development, 47-­‐59. Gentner, D., & Rattermann, M. J. (1991). Language and the career of similarity. In S. A. Gelman & J. P. Byrnes (Eds.), Perspectives on language and thought: Interrelations in development (pp. 225-­‐277). London: Cambridge University Press. Gick, M. L., & Holyoak, K. J. (1980). Analogical problem solving. Cognitive Psychology, 12(3), 306-­‐355. Guttman, S. E., Gilroy, L. A., & Blake, R. (2005). Hearing What the Eyes See: Auditory Encoding of Visual Temporal Sequences. Psychological Science, 228-­‐235. Hagoort, P. (2005). On Broca, brain, and binding: a new framework. Trends in cognitive sciences, 9(9), 416-­‐423. Halford, G. S. (2005). Development of thinking. In K. J. Holyoak & R. G. Morrison (Eds.), The Cambridge handbook of thinking and reasoning (pp. 529-­‐558). New York: Cambridge University Press. Hannon, E., & Trainor, L. (2007). Music acquisition: effects of enculturation and formal training on development. Trends in cognitive sciences, 11(11), 466-­‐
472. 91 Hauser, M. D., Chomsky, N., & Fitch, W. T. (2002). The faculty of language: What is it, who has it, and how did it evolve? Science, 298(5598), 1569-­‐1579. Hauser, M. D., & McDermott, J. (2003). The evolution of the music faculty: A comparative perspective. Nature Neuroscience, 6(7), 663-­‐668. Helmbold, N., Rammsayer, T., & Altenmüller, E. (2005). Differences in primary mental abilities between musicians and nonmusicians. Journal of Individual Differences, 26(2), 74-­‐85. Heron, J., Roach, N. W., Hanson, J. V., McGraw, P. V., & Whitaker, D. (2012). Audiovisual time perception is spatially specific. Experimental brain research, 218(3), 477-­‐485. Holyoak, K. J., & Thagard, P. (1995). Mental leaps: Analogy in creative thought. Cambridge, MA: MIT Press. Honing, H., & Ploeger, A. (2012). Cognition and the evolution of music: Pitfalls and prospects. Topics in Cognitive Science, 4(4), 513-­‐524. Hubbard, T. L. (1996). Synesthesia-­‐like mappings of lightness, pitch, and melodic interval. The American journal of psychology, 219-­‐238. Hummel, J. E., & Biederman, I. (1992). Dynamic binding in a neural network for shape recognition. Psychological Review, 99(3), 480. Jones, M., & Yee, W. (1997). Sensitivity to time change: The role of context and skill. Journal of Experimental Psychology: Human Perception and Performance, 23(3), 693. Kim, J. J., Pinker, S., Prince, A., & Prasada, S. (1991). Why no mere mortal has ever flown out to center field. Cognitive science, 15(2), 173-­‐218. 92 Kivy, P. (1993). The fine art of repetition: Essays in the philosophy of music. Cambridge, England: Cambridge Univ Press. Koelsch, S. (2011). Toward a neural basis of music perception–a review and updated model. Frontiers in Psychology, 2. Koelsch, S., Gunter, T., Friederici, A. D., & Schröger, E. (2000). Brain indices of music processing:“nonmusicians” are musical. Journal of Cognitive Neuroscience, 12(3), 520-­‐541. Koelsch, S., Kasper, E., Sammler, D., Schulze, K., Gunter, T., & Friederici, A. D. (2004). Music, language and meaning: brain signatures of semantic processing. Nature Neuroscience, 7(3), 302-­‐307. LaBar, K. S., Gitelman, D. R., Parrish, T. B., & Mesulam, M. (1999). Neuroanatomic overlap of working memory and spatial attention networks: a functional MRI comparison within subjects. Neuroimage, 10(6), 695-­‐704. Lavie, N. (2005). Distracted and confused?: Selective attention under load. Trends in cognitive sciences, 9(2), 75-­‐82. Lim, A., Doumas, L. A. A., & Sinnett, S. (2012). Modeling Melodic Perception as Relational Learning Using a Symbolic-­‐Connectionist Architecture (DORA). Paper presented at the 34th Annual Conference of the Cognitive Science Society, Sapporo, Japan. Lim, A., Doumas, L. A. A., & Sinnett, S. (2013). Modeling the Relational Shift in Melodic Processing of Young Children. Paper presented at the 35th Annual Conference of the Cognitive Science Society, Berlin, Germany. 93 Limb, C. J. (2006). Structural and functional neural correlates of music perception. The Anatomical Record Part A: Discoveries in Molecular, Cellular, and Evolutionary Biology, 288(4), 435-­‐446. Lovett, M. C., & Anderson, J. R. (2005). Thinking as a production system. In K. J. Holyoak & R. Morrison (Eds.), The Cambridge handbook of thinking and reasoning (pp. 401–429). New York: Cambridge University Press. Marcus, G. F. (2012). Musicality: Instinct or Acquired Skill? Topics in Cognitive Science, 4, 498-­‐512. Maybery, M. T., Clissa, P. J., Parmentier, F. B., Leung, D., Harsa, G., Fox, A. M., & Jones, D. M. (2009). Binding of verbal and spatial features in auditory working memory. Journal of Memory and Language, 61(1), 112-­‐133. McDermott, J. H., & Hauser, M. (2005). The origins of music: Innateness, uniqueness, and evolution. Music Perception, 23(1), 29-­‐59. McDermott, J. H., Lehr, A. J., & Oxenham, A. J. (2008). Is relative pitch specific to pitch? Psychological Science, 19(12), 1263-­‐1271. Medin, D. L., Goldstone, R. L., & Gentner, D. (1993). Respects for similarity. Psychological Review, 100(2), 254. Melara, R. D., & O'Brien, T. P. (1987). Interaction between synesthetically corresponding dimensions. Journal of experimental psychology: General, 116(4), 323. Michie, S. (1985). Development of absolute and relative concepts of number in preschool children. Developmental Psychology, 21(2), 247. 94 Miller, E. K., & Cohen, J. D. (2001). An integrative theory of prefrontal cortex function. Annual review of neuroscience, 24(1), 167-­‐202. Morrison, R. G., Doumas, L. A., & Richland, L. E. (2011). A computational account of children’s analogical reasoning: balancing inhibitory control in working memory and relational representation. Developmental Science, 14(3), 516-­‐
529. Morrison, R. G., Doumas, L. A. A., & Richland, L. E. (2006). The development of analogical reasoning in children: A computational account. Paper presented at the 28th Annual Conference of the Cognitive Science Society. Morrison, R. G., Holyoak, K. J., & Truong, B. (2001). Working memory modularity in analogical reasoning. Paper presented at the Proceedings of the twenty-­‐third annual conference of the Cognitive Science Society. Münte, T., Altenmüller, E., & Jäncke, L. (2002). The musician's brain as a model of neuroplasticity. Nature Reviews Neuroscience, 3(6), 473-­‐478. Nygaard, L. C., Herold, D. S., & Namy, L. L. (2009). The semantics of prosody: Acoustic and perceptual evidence of prosodic correlates to word meaning. Cognitive science, 33(1), 127-­‐146. Page, M. P. A. (1994). Modelling the perception of musical sequences with self-­‐
organizing neural networks. Connection Science, 6(2-­‐3), 223-­‐246. Patel, A. D. (2003). Language, music, syntax and the brain. Nature Neuroscience, 6(7), 674-­‐681. Patel, A. D. (2008). Music, language, and the brain. Oxford, England: Oxford University Press. 95 Patston, L., Hogg, S., & Tippett, L. (2007). Attention in musicians is more bilateral than in non-­‐musicians. Laterality, 12(3), 262. Pearce, M. T., & Rohrmeier, M. (2012). Music cognition and the cognitive sciences. Topics in Cognitive Science, 4(4), 468-­‐484. Pinker, S. (1997). How the mind works. 1997. NY: Norton. Pollack, R. (1969). Some implications of ontogenetic changes in perception. In Elkind & Flavell (Eds.), Studies in cognitive development (pp. 365-­‐407): O.U.P. Postle, B. R. (2006). Working memory as an emergent property of the mind and brain. Neuroscience, 139(1), 23-­‐38. Pratt, C. C. (1930). The spatial character of high and low tones. Journal of Experimental Psychology, 13(3), 278-­‐285. Purwins, H., Herrera, P., Grachten, M., Hazan, A., Marxer, R., & Serra, X. (2008). Computational models of music perception and cognition I: The perceptual and cognitive processing chain. Physics of Life Reviews, 5(3), 151-­‐168. Rauscher, F. (2003). Can Music Instruction Affect Children's Cognitive Development? ERIC Clearinghouse on Early Education and Parenting. Retrieved 10 October, 2011, from http://www.ericdigests.org/2004-­‐3/cognitive.html Rohrmeier, M., & Rebuschat, P. (2012). Implicit learning and acquisition of music. Topics in Cognitive Science, 4(4), 525-­‐553. Rosenthal, R. (1979). The file drawer problem and tolerance for null results. Psychological Bulletin, 86(3), 638. Rouget, G. (2004). Music with effects: "Musicking" to survive (The case of the Pygmies). Homme(171-­‐72), 27. 96 Rusconi, E., Kwan, B., Giordano, B. L., Umilta, C., & Butterworth, B. (2006). Spatial representation of pitch height: the SMARC effect. Cognition, 99(2), 113-­‐129. Saffran, J. R. (2003). Absolute pitch in infancy and adulthood: The role of tonal structure. Developmental Science, 6(1), 35-­‐43. Saffran, J. R., Aslin, R. N., & Newport, E. L. (1996). Statistical learning by 8-­‐month-­‐old infants. Science, 274(5294), 1926-­‐1928. Saffran, J. R., & Griepentrog, G. J. (2001). Absolute pitch in infant auditory learning: evidence for developmental reorganization. Developmental Psychology, 37(1), 74. Saffran, J. R., Johnson, E. K., Aslin, R. N., & Newport, E. L. (1999). Statistical learning of tone sequences by human infants and adults. Cognition, 70(1), 27-­‐52. Sergeant, D. (1969). Experimental investigation of absolute pitch. Journal of Research in Music Education, 17(1), 135-­‐143. Sergeant, D., & Roche, S. (1973). Perceptual shifts in the auditory information processing of young children. Psychology of Music, 1(2), 39-­‐48. Spence, C. (2011). Crossmodal correspondences: a tutorial review. Attention, Perception, & Psychophysics, 73(4), 971-­‐995. Spence, C., & Deroy, O. (2013). How automatic are crossmodal correspondences? Consciousness and cognition, 22(1), 245-­‐260. Stalinski, S. M., & Schellenberg, E. G. (2010). Shifting perceptions: Developmental changes in judgments of melodic similarity. Developmental Psychology, 46(6), 1799. 97 Takeuchi, A. H., & Hulse, S. H. (1993). Absolute pitch. Psychological bulletin, 113(2), 345. Tillmann, B. (2005). Implicit investigations of tonal knowledge in nonmusician listeners. Annals of the New York Academy of Sciences, 1060(1), 100-­‐110. Tillmann, B. (2012). Music and Language Perception: Expectations, Structural Integration, and Cognitive Sequencing. Topics in Cognitive Science, 4(4), 568-­‐
584. Tillmann, B., & Poulin-­‐Charronnat, B. (2010). Auditory expectations for newly acquired structures. The Quarterly Journal of Experimental Psychology, 63(8), 1646-­‐1664. Toro, J. M., Sinnett, S., & Soto-­‐Faraco, S. (2005). Speech segmentation by statistical learning depends on attention. Cognition, 97(2), B25-­‐B34. Toro, J. M., Sinnett, S., & Soto-­‐Faraco, S. (2011). Generalizing linguistic structures under high attention demands. Journal of Experimental Psychology: Learning, Memory, and Cognition, 37(2), 493. Trainor, L. J., Clark, E. D., Huntley, A., & Adams, B. A. (1997). The acoustic basis of preferences for infant-­‐directed singing. Infant Behavior and Development, 20(3), 383-­‐396. Trainor, L. J., McDonald, K. L., & Alain, C. (2002). Automatic and controlled processing of melodic contour and interval information measured by electrical brain activity. Journal of Cognitive Neuroscience, 14(3), 430. Trehub, S. E. (2001). Musical predispositions in infancy. Annals of the New York Academy of Sciences, 930(1), 1. 98 Trehub, S. E., Bull, D., & Thorpe, L. A. (1984). Infants' perception of melodies: The role of melodic contour. Child Development, 55(3), 821-­‐830. Trehub, S. E., & Hannon, E. E. (2006). Infant music perception: Domain-­‐general or domain-­‐specific mechanisms? Cognition, 100(1), 73-­‐99. Trehub, S. E., Trainor, L. J., & Unyk, A. M. (1993). Music and Speech Processing in the First Year of Life. Advances in Child Development and Behavior (Vol. 24, pp. 1-­‐
35). New York: Academic Press. Treisman, A. M. (1986). Features and objects in visual processing. Scientific American, 255(5), 114-­‐125. Treisman, A. M. (1998). Feature binding, attention and object perception. Philosophical Transactions of the Royal Society of London. Series B: Biological Sciences, 353(1373), 1295-­‐1306. Vernon, M. D. (1940). The relation of cognition and phantasy in children. British Journal of Psychology. General Section, 31(1), 1-­‐21. Walker, P., & Smith, S. (1984). Stroop interference based on the synaesthetic qualities of auditory pitch. Perception, 13(1), 75-­‐81. Wallin, N. (1991). Biomusicology: neurophysiological, neuropsychological, and evolutionary perspectives on the origins and purposes of music (Vol. 68). Stuyvesant, NY: Pendragon Press. 99