University of Miami Scholarly Repository Open Access Dissertations Electronic Theses and Dissertations 2010-05-03 The Empirical Testing of Musical Performance Assessment Paradigm Brian Eugene Russell University of Miami, [email protected] Follow this and additional works at: http://scholarlyrepository.miami.edu/oa_dissertations Recommended Citation Russell, Brian Eugene, "The Empirical Testing of Musical Performance Assessment Paradigm" (2010). Open Access Dissertations. 387. http://scholarlyrepository.miami.edu/oa_dissertations/387 This Open access is brought to you for free and open access by the Electronic Theses and Dissertations at Scholarly Repository. It has been accepted for inclusion in Open Access Dissertations by an authorized administrator of Scholarly Repository. For more information, please contact [email protected]. UNIVERSITY OF MIAMI THE EMPIRICAL TESTING OF A MUSICAL PERFORMANCE ASSESSMENT PARADIGM By Brian E. Russell A DISSERTATION Submitted to the Faculty of the University of Miami in partial fulfillment of the requirements for the degree of Doctor of Philosophy Coral Gables, Florida May 2010 ©2010 Brian E. Russell All Rights Reserved UNIVERSITY OF MIAMI A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy THE EMPIRICAL TESTING OF A MUSICAL PERFORMANCE ASSESSMENT PARADIGM Brian E. Russell Approved: ____________________ Stephen F. Zdzinski, Ph.D. Associate Professor of Music Education and Music Therapy ____________________ Terri A. Scandura, Ph.D. Dean of the Graduate School ____________________ Nicholas DeCarbo, Ph.D. Professor of Music Education and Music Therapy ____________________ Edward Asmus, Ph.D. Associate Dean, Graduate Studies ____________________ Joyce Jordan, Ph.D. Professor of Music Education and Music Therapy ____________________ Jill Kaplan, Ph.D. Lecturer, School of Psychology RUSSELL, BRIAN E. (Ph.D., Music Education) The Empirical Testing of a Musical Performance Assessment Paradigm. (May 2010) Abstract of a dissertation at the University of Miami. Dissertation supervised by Associate Professor Stephen F. Zdzinski. No. of pages in text. (163) The purpose of this study was to test a hypothesized model of aurally perceived performer-controlled musical factors that influence assessments of performance quality. Previous research studies on musical performance constructs, musical achievement, musical expression, and scale construction were examined to identify the factors that influence assessments of performance quality. A total of eight factors were identified: tone, intonation, rhythmic accuracy, articulation, tempo, dynamics, timbre, and interpretation. These factors were categorized as either technique or musical expression factors. Items representing these eight variables were chosen from previous research on scale development. Additional items, along with researcher created items, were also chosen to represent the variables of technique, musical expression and overall perceptions of performance quality. The 44 selected items were placed on the Aural Musical Performance Quality (AMPQ) measure and paired with a four-point Likert scale. The reliability for the AMPQ measure was reported at .977. A total of 58 volunteer adjudicators were recruited to evaluate four recordings that represented one of each instrumental category of interest: brass, woodwind, voice, and string. The resulting performance evaluations (N = 232) were analyzed using statistical regression and path analysis techniques. The results of the analysis provide empirical support for the existence of the model of aurally perceived performer-controlled musical factors. Technique demonstrated significant direct effects on overall perceptions of performance quality and musical expression. Musical expression also demonstrated a significant direct effect on overall perceptions of performance quality. The results of this study are consistent with a hypothesized model of performer-controlled musical factors. Dedicated to Kimberly, my loving and supportive wife. iii Acknowledgements I would like to acknowledge my advisor, Dr. Stephen F. Zdzinski, for all of the guidance and patience he has provided throughout the development of this study. I would also like to acknowledge my parents for keeping the house for a little while longer so I could accomplish my goals. I would also like to thank the members of my committee for offering valuable advice and guidance. In addition, I want to acknowledge Dr. Nick Myers for providing me with the statistical knowledge for completing this study and future research endeavors. Thank you to all of my friends and family for all of the support and words of encouragement. I truly appreciate all of you. iv CONTENTS Table of Figures ................................................................................................................ vii List of Tables ................................................................................................................... viii CHAPTER I Statement of the Problem ..............................................................................1 Justification .........................................................................................................5 The Theory ..........................................................................................................6 Purpose of the Study .........................................................................................10 Delimitations of the Study ................................................................................11 CHAPTER 2 Related Literature .......................................................................................12 Research Attempts to Identify Performance Variables.....................................13 Musical Performance Achievement as a Dependent Variable..........................27 Adjudicators and the Adjudication Process ......................................................34 Performance Aspects of Musical Expression....................................................48 Development of Musical Performance Measures .............................................56 Watkins-Farnum Performance Rating Scale ............................................57 Facet-factorial rating scales. .....................................................................60 Criteria-specific rating scales. ..................................................................71 Summary of Related Literature.........................................................................77 CHAPTER 3 Method........................................................................................................84 Gathering Performance Dimensions .................................................................84 Development of the Tentative Model ...............................................................86 Construction of the Aural Musical Performance Quality Measure...................89 Gathering Recordings of Solo Music Performance ..........................................90 Evaluations of Recorded Performances ............................................................91 Data Analysis and Preparation ..........................................................................92 v CHAPTER 4 Results and Discussion ...............................................................................93 Results ...............................................................................................................93 Discussion .......................................................................................................112 CHAPTER 5 Summary and Conclusions .......................................................................120 Summary .........................................................................................................120 Conclusions .....................................................................................................123 Suggestions for Further Research ...................................................................124 Implications for Teachers................................................................................126 References........................................................................................................................128 Appendix A Variables Collected from Performance Assessment Research ..................138 Appendix B Categorization of Performance Assessment Variables...............................139 Appendix C Aural Musical Performance Quality (AMPQ) Measure.............................140 Appendix D Evaluation Packet Instruction Sheet...........................................................142 Appendix E Waiver of Signed Consent Form ................................................................143 Appendix F Confirmatory Factor Analysis of AMPQ Items..........................................144 Appendix G AMOS Output of Estimated Performer-Controlled Musical Factors Model Across Brass, Woodwind, Voice and String Instruments..........................................149 Appendix H AMOS Output of Estimated Performer-Controlled Musical Factors: Woodwind Model ......................................................................................................152 Appendix I AMOS Output of Estimated Performer-Controlled Musical Factors: Voice Model .........................................................................................................................155 Appendix J AMOS Output of Estimated Performer-Controlled Musical Factors: String Model .........................................................................................................................158 Appendix K AMOS Output of Estimated Performer-Controlled Musical Factors: Brass Model .........................................................................................................................161 vi Table of Figures Figure 1. Hypothesized Model of Performer-Controlled Components of Technique .........8 Figure 2. Hypothesized Model of Performer-Controlled Components of Musical Expression......................................................................................................................8 Figure 3. Hypothesized Model of Performer-controlled Musical Factors...........................9 Figure 4. Model of Performer-Controlled Components of Technique ..............................97 Figure 5. Model of Performer-Controlled Components of Musical Expression ...............99 Figure 6. Performer-controlled Musical Performance Factors: Standardized Estimate Model .........................................................................................................................104 Figure F1. Confirmatory Factor Analysis of AMPQ Items .............................................145 vii List of Tables Table 1 Comparison of Facet-Factorial Factor Structures..................................................6 Table 2 Total and Subscale Reliabilities for AMPQ Measure..........................................93 Table 3 Correlations Between Technique and Component Factors..................................95 Table 4 Correlations Between Musical Expression and Component Factors...................95 Table 5 Summary of Simultaneous Regression for Variables Predicting Technique.......97 Table 6 Summary of Simultaneous Regression Analysis for Variables Predicting Musical Expression....................................................................................................................98 Table 7 Means, Standard Deviations, and Pearson Correlations for Combined Instrument Path Model of Performer-Controlled Musical Factors ..............................................100 Table 8 Path Estimates for Model of Performer-Controlled Musical Factors across Brass, String, Voice, and Woodwind Instruments................................................................101 Table 9 Summary of Sequential Regression Analysis for Variables Predicting Overall Perception of Performance Quality............................................................................102 Table 10 Summary of Simultaneous Regression of Musical Expression on Technique 103 Table 11 Standardized Path Coefficient Comparisons between Combined and Individual Instrument Path Models .............................................................................................105 Table 12 Estimated Path Coefficients for the Woodwind Model of Performer-Controlled Musical Factors..........................................................................................................106 Table 13 Means, Standard Deviations, and Pearson Correlations for Woodwind Path Model of Performer-Controlled Musical Factors ......................................................107 Table 14 Estimated Path Coefficients for the Voice Model of Performer-Controlled Musical Factors..........................................................................................................108 Table 15 Means, Standard Deviations, and Pearson Correlations for Voice Model of Performer-Controlled Musical Factors ......................................................................108 Table 16 Estimated Path Coefficients for the String Model of Performer-Controlled Musical Factors..........................................................................................................109 Table 17 Means, Standard Deviations, and Pearson Correlations for String Model of Performer-Controlled Musical Factors ......................................................................110 viii Table 18 Estimated Path Coefficients for the Brass Model of Performer-Controlled Musical Factors..........................................................................................................111 Table 19 Means, Standard Deviations, and Pearson Correlations for Brass Model of Performer-Controlled Musical Factors ......................................................................111 Table F1 Pattern Coefficients for AMPQ Factor Analysis..............................................146 Table F2 Model-fit Comparisons....................................................................................148 ix CHAPTER I Statement of the Problem Musical performance is a complex process. Nevertheless, it is imperative that we continue to clarify those complex processes that elude explanation. Understanding a subject such as music performance is a form of abstraction that requires a process of collecting pertinent information concerning characteristic features. “Abstraction involves selecting, from all those available, certain prominent features by which the real-world system can be represented meaningfully” (van Gigch, 1991, p.119). The prominent features identified as a result of abstraction can be used to develop a model useful for analytical purposes (Lippitt, 1973). Models help people visualize the structure of both concrete and conceptual processes. A model serves as a symbolic representation of what is known and understood about the various components of a complex process (Lippitt, 1973). By design, models are a simplification of a complex real-world structure. This simplification is used to facilitate an overall understanding of the process and the interrelationships between the individual components. Modeling is used by social scientists, educators, economists, physicists and mathematicians to correlate experience with proposed conceptual frameworks (Wrigley, 2005; van Gigch, 1991; Becker, 1983; Lippitt, 1973). Music researchers have employed the modeling process to facilitate understanding of complex concepts and processes including musical preference (LeBlanc, 1980), student course affect (Asmus, 1980), musical affect (Asmus, 1981), music assessment process (McPherson & Thompson, 1998; Wrigley, 2005), extramusical 1 2 influences on solo and ensemble ratings (Bergee, 2006), sight-reading (Kopiez & Lee, 2008), and listening (Madsen & Geringer, 2008). However, research regarding the structure of performance assessment has been limited (McPherson & Thompson, 1998; Wrigley, 2005). In order to provide clarity to measurements, musical contexts, and factors related to musical performance, further research into the structure of performance assessment is necessary. The inherently subjective nature of musical performance assessments and the situations to which they are applied within the educational arena call for as much objectivity as possible (Bergee, 2003, 1987; Radocy, 1986). In efforts to inject more objectivity into the performance evaluation process, researchers have made positive steps toward identifying factors that influence assessments of performance quality for individual instrument categories (Abeles, 1971; Burnsed, Hinkle, & King, 1985; Bergee, 1987; Thompson, Diamond, & Balkwill, 1998; Zdzinski & Barnes, 2002; Wrigley, 2005; Russell, 2007), factors that influence performance achievement (Schleuter, 1978; Zdzinski, 1993; Geringer & Johnson, 2007; Miksza, 2007), and higher-order factors (Bergee, 1995; Wrigley, 2005). However, the interrelationships of these influential factors across instrument categories are still unknown. The identification of a structure of musical factors that influence assessments of performance quality across instrument categories would benefit performers, students, educators, and researchers by illuminating the process of determining overall musical performance quality. Musical performance assessment is a divisive subject. Some feel that in order to fully perceive music, the performance of it cannot and should not be broken down into its component parts (van Gigch, 1991; Langer, 1953). “The import of an art symbol cannot 3 be built up like the meaning of a discourse, but must be seen in toto” (Langer, 1953, p. 379). Other researchers, however, have found that the evaluation of performance in separate component factors has no effect on overall assessments of performance quality (Burnsed, Hinkle, King, 1985; Mills, 1987; Bergee, 1995; Saunders & Holahan, 1997; Zdzinski & Barnes, 2002; Russell, 2007). The evaluation of separate musical factors can serve diagnostic purposes for improvement of teaching and learning strategies (Saunders & Holahan, 1997). In music education, “evaluative procedures are used to determine status so that progress toward educational goals can be appraised” (Leonard & House, 1972, pg. 29). Musical performance assessments imply the use of both performance measures and observations. Classroom teachers employ informal assessments to determine the pace of a lesson or unit of study. Formal performance measurement applications can include teacher administered tests, juries, and auditions. The information gathered from these observations and measurements in both formal and informal assessment situations provides important information to both performer and evaluator. The high stakes nature of some performance measurement applications requires valid and reliable measures that accurately represent what was heard by the evaluator (Payne, 2003; Bergee, 1987). Previous research on music performance assessment issues include rating scale construction for solo instruments (Watkins, 1942; Watkins & Farnum, 1956; Gutch, 1964, 1965; Abeles, 1971; Bergee, 1987; Saunders & Holahan, 1997; Zdzinski & Barnes; 2002; Russell, 2007), performance constructs (Bergee, 1995; Mills, 1987; Thompson, Diamond & Balkwill, 1998), interjudge reliability (Bergee, 2003), adjudicator experience (Schleff, 1992; Kim, 2000), listening and discrimination (Geringer & Madsen, 1998), and 4 instrument bias (Hewitt, 2007). This body of performance assessment research supports the ability to validly and reliably evaluate musical performance situations (i.e., solo and ensemble evaluations, juries, etc.). Additionally, these studies support the utilization of performance measurement as a viable means of identifying the underlying structures associated with musical performance. The validity and reliability of performance measures are subject to the inevitable and inherent subjectivity of the performance assessment process (Radocy, 1986). Improved reliabilities have been observed on measures that utilized a combination of specified evaluation criteria and standardized rating scales (Abeles, 1971; Bergee, 1987; Saunders & Holahan, 1997; Zdzinski & Barnes, 2002; Wrigley, 2005; Russell, 2007). While criteria for individual instrument categories have been identified, no consensus has been reached concerning which criteria should be specified for evaluation of a musical performance across instrument groups. Research on evaluation criteria has utilized several strategies for the creation of performance measures with specified criteria. Studies employing factor analysis techniques as a means of construct identification and scale construction have produced solo performance measures for string instruments (Zdzinski & Barnes, 2002), brass (Bergee, 1987), woodwind (Abeles, 1971), solo voice (Jones, 1986), percussion (Nichols, 2005), and guitar (Russell, 2007). These studies provide evidence of underlying factors that influence the evaluation of a musical performance (Saunders & Holahan, 1997). Using this statistical method many valid and reliable measures have been created. However, the issue over a generalized set of music performance evaluation criteria remains unsatisfied due to the limited generalizability of these studies. 5 Justification Understanding the structure of the component factors of musical performance that effect assessments of performance quality is crucial to understanding how music is aurally perceived during evaluation. Research studies concerning musical performance variables (Burnsed, Hinkle, & King, 1985; Mills, 1987; Bergee, 1995, Thompson, Diamond & Balkwill, 1998; Wrigley, 2005; Johnson & Geringer, 2007), musical performance achievement (Hodges, 1975; Suchor, 1977; Schleuter, 1978; Zdzinski, 1993; Geringer & Johnson, 2007; Miksza, 2007), performance adjudication (Fiske, 1975, 1979; Bergee, 1997, 2003; Geringer & Madsen, 1998, Thompson & Williamon, 2003), musical expression (Levi, 1978; Juslin & Lindstrom, 2003; Juslin & Laukka, 2004) and rating scale development (Abeles, 1971; Bergee, 1987; Zdzinski & Barnes, 2002; Russell, 2007) reveal consistencies in the conceptualization of musical performance assessment. Russell (2007) stated that the occurrence of common musical factors across instrument categories in facet-factorial studies suggests that an overall structure of music performance assessment may exist (see Table 1). However, no research has been conducted to investigate these commonalities. Occurrences of common musical factors suggest the possible existence of an overall structure of performance assessment across woodwind instruments, string instruments, voice, and brass instruments. This structure could serve as a catalyst for important research in the area of music performance and provide clarity to the factors that influence evaluations of performance quality across all instrument categories in various applications. An examination of the components and commonalities found within the 6 literature would satisfy the need to verify the literature analysis and help promote further understanding of the perception of music performance overall. Table 1 Comparison of Facet-Factorial Factor Structures Abeles (1971) Jones (1986) Bergee (1987) Zdzinski & Barnes (2002) Russell (2007) interpretation interpretation/ musical effect interpretation/ musical effect interpretation/ musical effect interpretation/ musical effect tone tone/ musicianship tone quality/ intonation articulation/ tone tone rhythm/ continuity suitability/ ensemble rhythm/ tempo rhythm/ tempo rhythm/ tempo articulation technique technique vibrato technique intonation diction intonation intonation tempo The Theory Research on performance variables, musical performance achievement, adjudication, musical expression, and performance measure development reveal commonalities between assessments of woodwind, string, voice, and brass performance (Abeles, 1971; Levi, 1978; Jones, 1986; Mills, 1987; Bergee, 1987, 1995, 2003; Zdzinski, 1993; Geringer & Madsen, 1998; Zdzinski & Barnes, 2002; Wrigley, 2005; Johnson & Geringer, 2007; Russell, 2007). These commonalities hint at a possible structure of musical performance factors across these instrument categories. This structure of musical 7 performance factors is hypothesized to influence overall perceptions regarding musical performance quality. The related literature in Chapter 2 reveals an exhaustive list of musical factors that influence assessments of musical performance (see Appendix A). These factors can be separated into general categories such as aural and visual factors, and more specific categories such as performer factors, composer factors, adjudicator factors, environmental factors, etc (see Appendix B) (Juslin & Lindstrom, 2003; Juslin & Laukka, 2004). The factors used for this hypothesized model concentrate on performer factors that are also aural factors. Performer-controlled factors are described for the purpose of this study as the musical components that are controlled by the performer during the time of performance; aural factors are described as musical components of performance that can be aurally perceived. The theory proposed in this study hypothesizes that the structure of aurally perceived performer-controlled musical factors exists and remains stable across assessments of brass, woodwind, voice, and string instruments. The proposed structure concentrates on the variables of technique, musical expression, and overall perception of performance quality (Levi, 1978; Mills, 1987; Bergee, 1995; Wrigley, 2005). Technique is hypothesized to be a composite of performance variables that represent technical ability (tone, intonation, articulation, and rhythmic accuracy) (see Figure 1); Musical expression is a composite of performance variables that represent the ability to communicate the expressive aspects of musical performance (tempo, dynamics, timbre, and interpretation) (see Figure 2). 8 Figure 1. Hypothesized Model of Performer-Controlled Components of Technique Figure 2. Hypothesized Model of Performer-Controlled Components of Musical Expression 9 Technique is hypothesized to have a direct effect on overall perception of performance quality (Mills, 1987; Bergee, 1995; Wrigley, 2005). The hypothesized effect of technique on overall perceptions of performance quality is also mediated through musical expression (Levi, 1978; Bergee, 1995; Juslin & Laukka, 2004; Wrigley, 2005; Johnson & Geringer, 2007; Geringer & Johnson, 2007). It is this proposed structure that is the focus of this study (see Figure 3). Figure 3. Hypothesized Model of Performer-controlled Musical Factors This theorized paradigm is based on three main evidences outlined by Keith (2006): 1) time precedence, 2) relevant research, and 3) logic. The time precedence of technique over musical expression suggests that the ability to make a sound must be first 10 in order to express yourself musically. Technique is considered necessary to make a sound using a musical instrument. The research presented in Chapter 2 provides support for the participant population (Fiske, 1975; Kim, 2000; Bergee, 2003), variables selected (Abeles, 1971; Levi, 1978; Jones, 1986; Mills, 1987; Bergee, 1987, 1995, 2003; Zdzinski, 1993; Geringer & Madsen, 1998; Zdzinski & Barnes, 2002; Wrigley, 2005; Johnson & Geringer, 2007; Russell, 2007), categorization methods (Zdzinski, 1993; Juslin & Lindstrom, 2003; Juslin & Laukka, 2004; Miksza, 2007), and the direction of the hypothesized paths (Levi, 1978; Mills, 1987; Bergee, 1995; Juslin & Laukka, 2004; Wrigley, 2005; Johnson & Geringer, 2007; Geringer & Johnson, 2007). It is reasonable and logical to infer the influence of technique on overall perceptions of musical performance quality. The logic behind this model suggests that an improvement in technical ability would increase the ability to express oneself musically on an instrument. In turn, this increased ability in technique and musical expression would influence the assessment of overall performance quality. Purpose of the Study The purpose of this study is to examine a hypothesized model of the aurally perceived performer-controlled musical factors that influence assessments of musical performance quality. Specifically this study intends to answer the following research questions: 1. Do the first-order performance factors of tone, intonation, rhythmic accuracy, tempo, dynamics, timbre, and interpretation adequately represent the second-order factors of Technique and Musical Expression according to the hypothesized model? 11 2. What are the relative contributions of Technique and Musical Expression on judgments of Overall Perceptions of Performance Quality according to the hypothesized model? 3. How well does the proposed model fit the data collected? Can a model of musical performance assessment be created and tested using performer-controlled musical factors for the outcome of evaluating aurally perceived musical performance quality? 4. Does the hypothesized model of performer-controlled musical factors remain stable for the individual brass, woodwind, voice, and string instrument categories? Delimitations of the Study The focus of this study is limited to the aural aspects of musical performance. This excludes aspects of performance that are perceived visually. These delimitations also include aspects of solo musical performance that are at the disposal of the performer. Specifically, this excludes factors of musical performance that are considered to be composer factors, ensemble factors, adjudicator factors, environmental factors, and nonmusical factors. The research approach that is employed is an additional delimitation of this study. This study utilizes a model-rejection approach to theoretical model testing. A modelrejection approach dictates that the theoretical model of performer-control musical factors will be tested using a “reject” or a “fail-to-reject” approach. This approach provides information regarding the performance of the hypothesized model and avoids any posthoc recalculation once the model has been estimated. CHAPTER 2 Related Literature Music performance has long been a fundamental focus of music research. Specifically, the evaluation of music performance is of key interest to music educators and music education as a profession. Scholars conducting research on musical performance evaluation have investigated the accuracy of evaluations, musical and extramusical influences on performance evaluations, and the construction of valid and reliable performance measures. An examination of early performance evaluation literature reveals a focus on sight-singing achievement. Researchers such as Hillbrand (1923), Mosher (1925), Knuth (1933) and Watkins (1942) developed measures to evaluate performances of sight-read music. These measures focused mainly on the rhythm and pitch accuracy of the excerpts performed. Early research on performance assessment also focused on the identification of factors that influence sight-reading achievement. Stelzer (1935) developed a sightreading measure for organ performance. Stelzer used this measure to analyze the underlying fundamentals of organ performance. Other researchers such as Bean (1938) and Wheelwright (1940) developed sight-reading measures to investigate factors that influence sight-reading achievement. Bean (1938) investigated sight-reading methodology and utilized a measure that evaluated the pitch accuracy of sight-read piano performances. Wheelwright (1940) also investigated piano sight-reading achievement and the influence of music spacing on sight-reading achievement. 12 13 These early studies provided a basis for continued research on music performance. Present day researchers continue to investigate the issues and aspects surrounding musical performance. The review contained in this chapter examines performance assessment literature. Specifically, the musical variables utilized for musical performance evaluation are of main interest. The literature will be discussed in the following sections: a) research attempts to improve accuracy, b) musical achievement as a dependent variable, c) adjudicators and the adjudication process, e) performance aspects of musical expression, and e) the development of musical performance measures. Research Attempts to Identify Performance Variables Many researchers have made efforts to improve the accuracy and efficiency of performance evaluations through investigations of the criteria used during performance evaluation. Most early research in evaluation criteria was centered on band performance and festival ranking. Owen (1969) found that the performance dimensions of technical accuracy, rhythm, pitch, musicality, tone, and sight-reading produced the most reliable rankings of student band auditions. A research study by Oakley (1972) compiled performance criteria from rating sheets used during evaluations of marching band performances. The musical criteria that most frequently appeared are technical accuracy, rhythm, intonation, tone quality, balance, expression, and precision. A checklist created by Neilson (1973) presented factors that adversely affect ensembles performing in a festival evaluation setting. The factors of intonation, phrasing, dynamics, melodic transparency, tempo, attacks and releases (articulation), and timbre were derived from analysis of handwritten comments entered onto evaluation sheets. 14 An attempt at evaluating musical performance across performance mediums was undertaken by Oldefendt (1976). Oldefendt developed a procedure and criteria for scoring both solo instrumental and solo vocal performances. Adjudicators evaluated musical performances in terms of completeness, pitch accuracy, and rhythmic accuracy. The frequency of performance errors determined the score for each performance dimension. An overall score was estimated as the sum of these criteria scores. Performance dimensions of tone and intonation were considered in the development of the criteria, but were excluded due to limitations in quantifying the quality of instrumental and vocal tones. Research by St. Cyr (1977) established an exhaustive list of criteria for high school band, orchestra, and chorus performance evaluation. The criteria compiled by St. Cyr represent both musical and non-musical variables. Musical variables include technique, interpretation, time, intonation, phrasing, pitch, balance, expression, articulation, and diction. The non-musical variables reported were appearance, breathing, conductor, accompaniment, instrumentation, voices, instrument quality, difficulty level, and arrangement quality. Criticisms of the performance evaluation process prompted Burnsed, Hinkle, and King (1985) to investigate inconsistency in performance evaluation. These criticisms included lack of appropriate measures, poor judge reliability, the lack of criteria agreement, the lack of an evaluation model, and inconsistent standards. Specifically, Burnsed, Hinkle, and King (1985) attempted to determine if the set of factors that included technique, interpretation, intonation, musical effect, tone, and balance were viable predictors of overall performance ratings. Ratings for the performance dimensions 15 were closely related to each other and to the overall performance ratings. The results indicate a high degree of intercorrelation between the six performance dimensions and the overall ratings (.78-.91). Bergee (1993) suggests that the close relationship between musical effect and the overall rating (.91) indicates the possibility that musical performance is evaluated in a global fashion regardless of the presence of separate performance dimensions. A study by Mills (1987) intended to find the nature of the global rating. The purpose of this study was to explain the assessment of solo music performance of Western Classical Music. Specifically, Mills attempted to define the constructs that adjudicators employ to assess solo music performance. Mills conducted this study in two phases. The first phase focused on establishing a vocabulary that could be used by both music teachers/specialists and non-specialist individuals with musical experience. Participants in this phase included performers and adjudicators. The performers used were all full-time students at least 15 years of age. Performing participants included a harpist, two horn players, a pianist, an oboist, and a violinist. Each performance was video taped for the adjudication portion. Volunteer adjudicators (N = 11) were separated into two groups: Group 1- music teachers and music specialist students (n = 2), Group 2- non-specialists with experience in music performance (n = 9). Each adjudicator evaluated five consecutive performances on the videotape. The assessment sessions lasted between 45 and 60 minutes. Adjudicators were asked to evaluate each performance by writing comments and assigning a grade out of thirty as if it were an Associate Boards of Royal Schools of Music (ABRSM) examination. At the end of the five evaluated performances 16 adjudicators were encouraged to discuss the performances in terms of performance constructs. These conversations were recorded and content analyzed. The results of this phase provided twelve constructs with which the performances were evaluated. The performance constructs compiled from the post adjudication interviews included: performer confidence/nervousness, performer enjoying/not enjoying performance, performer familiarity with performance material, performer does/does not make sense, use of dynamics are appropriate/inappropriate, use of tempi appropriate/inappropriate, performer phrasing appropriate/inappropriate, technical problems distracting/hardly noticeable, performance was hesitant/fluent, performance was insensitive/sensitive, performance was muddy/clean, performance was dull/interesting (Mills, 1987). Phase 2 of this study focused on the extent to which assessments of performance could be predicted from the performance constructs identified in Phase 1. Performances (N = 10) of violin, horn, piano, soprano, clarinet, harp, oboe, flute, double bass, and trombone were used. Adjudicating participants (N = 29) included 12 from Group 1 and 17 from Group 2. Adjudicators were given a two-sided assessment form to complete for each of the ten performances to be evaluated. The first side provided the same instructions used during Phase 1. The second side consisted of the twelve bipolar statements paired with a four point semantic differential. The results of this study indicate that performance variables which include player nervousness, performer enjoyment, performer knowledge of music, holistic sense of the music, dynamics, tempo, phrasing, technique, hesitation, insensitivity, performance clarity demonstrate small to moderate correlations with the overall marks. Correlations 17 between the performance constructs ranged from 0.2 (Variables “tempi appropriate/inappropriate” and “technical problems distracting/hardly noticeable”) to 0.7 (Variables “performer confidence/nervousness” and “performer enjoying/not enjoying performance,” “performer does/does not make sense” and “performance was dull/interesting,” “performance was hesitant/fluent” and “performance was muddy/clean,” “performance was insensitive/sensitive” and “performance was dull/interesting,” “performance was muddy/clean” and “performance was dull/interesting”); however, correlations between overall ratings and performance constructs were uniformly negative ranging from -0.4 (Variable “use of tempi appropriate/inappropriate”) to -0.7 (Variable “performance was insensitive/sensitive” and “performance was muddy/clean”) (Mills, 1987). A multiple regression analysis indicated that twelve performance variables accounted for 73% of the total variance. No difference between the two groups of adjudicators was apparent. Group 1 accounted for 75% variance and Group 2 accounted for 72% of the variance. Zdzinski (1991) suggests that these results should be considered tentative due to the lack of significance levels and small sample size. Mills (1987) concluded that it is possible to explain solo music performance using characteristics comprehensible by non-musicians. She suggests that these results have implications for not only solo musical performance assessment, but for music education as a whole. The utilization of performance constructs as predictors of overall performance can streamline the adjudication process and serve as a valuable source of information for the performer. 18 Continued investigation into the predictors of overall performance success has led to research concentrating on the organization of performance factors into higher order constructs. Bergee (1995) conducted an investigation into the existence of higher order performance factors. The purpose of this study was to identify the intermediate level construct of the Band Performance Rating Scale (BPRS) previously developed by Sagen (1983). Specifically, Bergee attempted to determine a) primary, intermediate, and higherorder factors within the BPRS, b) intercorrelations of the factors and the correlations of the BPRS items to the higher-order factor, and c) interjudge reliability and criterionrelated validity of the item regroupings. Sagen (1983) constructed the BPRS using a rational approach (Butt and Fiske, 1968) to choose and group items based on a preconceived notion of the subject. Statements regarding various aspects of band performance were collected from experienced band directors enrolled in graduate music education courses. An analysis of these statements yielded 206 item pool statements grouped into six categories: tone quality, technical accuracy, musical interpretation, intonation, rhythmic accuracy, and general musical effect. Sagen, along with three university professors, selected eight representative items for each of the six performance dimensions. Each of the 48 items was paired with a five-point Likert scale that ranged from Strongly Disagree (SD) to Strongly Agree (SA). Interjudge reliability was determined by an analysis of variance. Category by category test-retest coefficients for evaluations of two performances ranged from .64 to .94 (p < .05). Total score test-retest r’s were reported at .92 and .84 (p < .01) respectively. 19 Bergee (1995) recruited 245 graduate and undergraduate band students from three universities to evaluate a prerecorded high school band performance of Charles Carter’s Rhapsodic Episode. Participants were given instructions and information regarding the study and were allowed to listen to the recording as many times as necessary. The data was analyzed using a principal components method to group the items onto related factors. Three criteria were used to determine the number of factors to be rotated: (1) Eigenvalues greater than 1.00, (2) scree plot, and (3) an examination of the proportion of variance accounted for by the factors. The items on the BPRS were regrouped according to results of the promax oblique rotation. To determine interjudge reliability, Bergee (1995) recruited seven music education graduate students to evaluate five prerecorded high school band performances using the revised version of the BPRS. In addition, an independent panel of six judges was asked to record a global rating from I + to V- for each of the same five recordings. The factors were rotated using three, four, five, and six factor structures according to the criteria defined by Bergee. The results of the three-factor rotation yielded the most interpretable results. The three factors identified by Bergee are: (1) Tone Quality/Intonation, (2) Musicianship/Expressiveness, and (3) Rhythm/Articulation. A factor analysis of the primary factor matrix yielded one higher order factor with all three factors loading from .73 to .81. Bergee (1995) then related this newly identified higher-order factor with the original variables from the BPRS using Pearson’s r. The total score interjudge reliability for the revised BPRS was reported at .96. Bergee concludes that his hypothesis that band performance is adjudicated on a three-level hierarchal structure that includes: (1) a factor analyzed set of items, (2) 20 distinct primary factors (Tone Quality/Intonation, Musicianship/Expressiveness, and Rhythm/Articulation), and (3) a higher-order factor that is correlated strongly with the identified primary factors. These results group the primary factors differently than previous factor analysis studies involving instrumental performance. Bergee suggests that this is due to the intimate links between tone and intonation, musicianship and expressiveness, and rhythm and articulation. In contrast to the empirically based quantitative studies, a study by Thompson, Diamond, and Balkwill (1998) illustrated a qualitative technique for eliciting and exploring the constructs involved in the adjudication of piano music performance. Experienced adjudicators (n = 5) were recruited to evaluate six expert performances of Chopin’s Etude, Op. 25, No. 6. Each performance presented a different interpretation of the piece. This study was executed in two stages. The first stage involved eliciting six performance dimensions from each adjudicator. The first five constructs were selected according to personal criteria that the adjudicators used to distinguish between performances. A sixth factor was selected that describes the adjudicator’s evaluation of overall performance. The second stage required the adjudicators to apply these constructs to the evaluation of six expert piano performances. Adjudicators were tested individually. Each performance was recorded in a random order onto a cassette tape and played through high quality headphones. Repeated hearings were permitted but not necessary as adjudicators completed each evaluation while listening. 21 After all six performances were evaluated, five constructs were elicited using the triad method via a computer interface. The triad method extracts constructs through a series of random comparisons. After the constructs were identified adjudicators were requested to provide the opposing ends points for each construct. It was made clear to the participants that these statements need not be semantic opposites. Once all polar statements were entered, the participant adjudicators were prompted to rate each performance using the provided statements using a scale ranging from 1 to 9. This process was repeated five times. The data from the piano evaluations were analyzed using a repeated measures analysis of variance. The results of this study indicated a significant main effect of performance, F (4,20) = 6.00, p < .01. This indicates that the performance assessments demonstrated overall reliability. A Pearson correlation between all pairs of adjudicator ratings indicated a moderate degree of agreement between adjudicators (median correlation = .68). A total of fourteen constructs were extracted from the information provided by the adjudicators. These constructs were identified as: right-hand expression, phrasing, dynamics, rubato, form/structure, tonal balance, pedaling, attention to rhythm and meter, articulation, technical competence, tempo, expression in bars 27-30, expression at the climactic phrase, and expression at the end of the piece. Thompson et al (1998) also states that overall preference was strongly associated with the right-hand expression, phrasing and balance constructs. Thompson, Diamond, and Balkwill (1998) concluded that qualitative research techniques could reveal constructs that influence performance adjudication. These 22 research findings have implications for identifying performance constructs that can accurately represent the processes employed by experienced adjudicators through not only quantitative methods, but qualitative as well. A study by Wrigley (2005) attempted to identify underlying performance constructs by employing both qualitative and quantitative research methodologies. A study by Wrigley (2005) investigated ways of addressing current issues of accountability in collegiate music education by improving music performance evaluation. Specifically, the study outlined the structure of an ecological model of music performance that addresses both the musical and non-musical influences of music performance assessment. Utilizing both qualitative and quantitative methods, Wrigley (2005) examined musical performance aspects along with intrapersonal and interpersonal influences of the music evaluation process (i.e., performer flow state, self-evaluation, performance experience, gender, instrument type, and examiner fairness). This study was executed in four phases and involved faculty and students from an Australian university. The first and second phases involved the identification of performance dimensions from a total of 655 performance examination reports. The reports were content analyzed by experienced music faculty (N = 36) and performance constructs were extracted. Representatives from the string, voice, brass, and woodwind departments were consulted and interviewed to reach a consensus on the performance dimensions to be used on each of the instrument specific measures to be created in the third phase. The resulting dimensions were further analyzed into higher order constructs with accompanying descriptors provided by the content analysis and faculty consensus. These descriptors were then placed in rank order and assigned to their corresponding 23 higher order constructs for each instrument category to create the Performance Evaluation Report (PER) (Wrigley, 2005). Phase 3 of this investigation involved the implementation of the PER by 30 adjudicators among five instrument families. The implementation of the PER was executed over two years and four consecutive semesters. Each adjudicator was asked to rate performances within each performance dimension according to three categories: Needs Attention, Satisfactory, and Excellent. All performance examinations were administered during regularly scheduled examination periods within each semester. Reliability coefficients for the factors used on the Performance Evaluation Report ranged from .81 to .98. Factors selected to represent the instrument families explained between 57% and 71% of the overall variance for each model. Correlations between performance dimensions for each PER indicated a great deal of overlap between factors: brass (.97), strings (.93), woodwind (.92), and piano (.95). This overlap suggests and nonorthogonal relationship between the factors used on each of the measures (Wrigley, 2005). Data from each of the performance examinations within each instrument category were factor analyzed to confirm the models created by the qualitative analysis in Phase 2. The results confirm the separate factor structure for each instrument family created by the original sorting and assignment of performance constructs. Two factors concerning technical proficiency and musical interpretation along with seven core constructs (tone, tempo, rhythm, confident, style/character, phrase/shape, and dynamics) were found to be common among instrument families. 24 Wrigley (2005) states that the occurrence of common factors could possibly imply the existence of a generic set of cross-instrument assessment criteria. Definitions for musical interpretation and technical accuracy were unique to each instrument category. However, due to low sample size, particularly with woodwind and brass, he suggests these results be viewed with caution (2005). The fourth phase of the study included the administration of a Music Performance Questionnaire (MPQ). This questionnaire consisted of the Flow State Scale-2 (FSS-2), self ratings of skill level, challenge of performance, and overall quality, a rating of frequency regarding participation in assessment performances and solo/ensemble experience, the number of years of experience on their instrument, and demographic information. The FFS-2 was developed by Jackson & Marsh (1996) according to a model proposed by Csikszentmihalyi (1990). A total of 373 participants completed the self evaluation questionnaire. Each MPQ was completed immediately after either the midterm or end of the year jury performances. Results from the investigations of flow state and the influence on music performance indicated a significant result. Structural equation modeling and a multivariate analysis indicated a strong nonlinear relationship. Participants experiencing a high state of flow scored higher than those participants experiencing a low state of flow. This is consistent with the findings of Csikszentmihalyi (1975) who suggested that all complex activities require a higher state of flow. Wrigley (2005) suggests that a more empirical focus of student performance assessment would have a positive impact on the development of teaching strategies to enhance student learning and understanding. However, caution must be used in the 25 interpretation of these empirical tests due to the numerous sources of error variance in musical performance evaluation. The development of criterion-specific scales paired with a concentration on the diagnostic and feedback applications of assessment would be most beneficial. The ecological model proposed in this study illustrates the relationship between education institutions, community, and policy makers and their influence of the assessment of music performance. This holistic approach has implications for promoting an effective and beneficial method of addressing the accountability imperative in music education. More recently, the continued effort to improve wind band adjudication was addressed by Johnson and Geringer (2007). The purpose of this study was to examine the possible influences of music elements in the prediction of overall music performance evaluations of wind band pieces. Johnson and Geringer examined evaluator assessments of specific musical elements including balance/blend, dynamics, tone/intonation, rhythm/tempo, and musical expression for discernable patterns of judgment. In addition, the relationships between musical evaluations and acoustical measures of dynamics and rhythm were investigated. Eighty-four music students were asked to adjudicate recordings of four different excerpts. Each excerpt contained three versions of the same piece: high school band, college band, professional band. Evaluations were made using a 7-point semantic differential scale with student performance and professional performance as anchors. Dynamic and rhythmic measures were determined using two computer software applications that allowed for the measurement of maximum and minimum dynamic and measurement of rhythmic note duration. 26 The results of this study report significant main effects for ensemble level and replication. The main effect of ensemble level was responsible for the greatest contribution to total variance . Professional groups were consistently rated higher than college groups and college groups were consistently rated higher than high school groups. A stepwise multiple regression of overall rating on balance/blend, dynamics, tone/intonation, rhythm/tempo, and musical effect was performed to determine if a pattern existed for predicting overall ratings. Johnson and Geringer (2005) reported that a compilation of the resulting standard coefficients indicated that musical expression accounted for 50% of the predicted overall ratings, tone/intonation (22.5%), dynamics (17.5%), and balance/blend (10%). Rhythm and tempo did not demonstrate predictability of overall ratings for any of the trials. A frequency analysis of the areas in most need of improvement, as provided by the adjudication participants, indicated that across all levels of performance tone/intonation (34%) was the most frequently occurring response followed by balance/blend (21%), musical expression (18%), dynamics (16%), and rhythm/tempo (11%). Results from the correlations between overall performance evaluations and acoustic measures of dynamics (r = .06) and rhythm (r = .08) indicated no significant relationships existed. Johnson and Geringer (2007) state that the results of this study suggest concentration on the musical elements of musical expression and tone/intonation would be prudent. Teachers would be wise to focus on those aspects of performance that are given the most weight in an evaluation. The basics of instrumental technique and musical expression are delineating factors between the students and professionals. 27 Research on the identification and examination of musical performance variables provides valuable insight into the nature of musical performance assessment. The variables identified in this body of research include rhythm, interpretation, intonation, tone expression, pitch, musicality, phrasing, balance, articulation, diction, musical effect, and dynamics (Owen, 1969; Oakley, 1972; Neilson, 1973; Oldefendt, 1976; St. Cyr, 1977; Sagen, 1983; Burnsed, Hinkle, and King, 1985; Mills, 1987; Bergee, 1995; Thompson, Diamond, & Balkwill, 1998; Wrigley, 2005; Johnson and Geringer, 2007). An examination of performance factor research supports the existence of a hierarchical factor structure that consists of both technical and expressive components (Bergee, 1995; Thompson, Diamond, & Balkwill, 1998; Wrigley, 2005). Musical Performance Achievement as a Dependent Variable In an effort to understand the influences on musical performance evaluation, researchers have conducted a number of studies on musical achievement. The measures utilized in music performance achievement research provide valuable information regarding notions of the structure of perceived performance achievement. This section concentrates on the factors used by these researchers to define musical achievement. Early research on music achievement focused on auditory-visual discrimination skills. Research conducted by Stecklein and Aliferis (1957) investigated the influence of instrument on music achievement. Musical achievement was measured using the Aliferis Music Achievement Test (Aliferis, 1954), a listening test which measured melodic, harmonic, and rhythmic discrimination skills. Colwell (1963) investigated musical achievement, defined as auditory-visual discrimination ability, in both vocal and 28 instrumental classrooms. The measures employed by Colwell were the Aliferis Music Achievement Test, the Farnum Music Notation Test (Farnum, 1950), and the Knuth Achievement Tests (Knuth, 1967). Colwell concluded that musical achievement could easily be approximated through the use of a short sight-singing measure (1963). Later research began to focus on representing musical achievement with separate factors. Hodges (1975) defined musical achievement in terms of five representative musical factors, examining the influence of recorded aural models on the performance achievement of beginning band students. One hundred students from fourteen band classes participated in this study. Band classes were placed into either an experimental or control group. Performance achievement criteria utilized for these measures included tone quality, pitch accuracy, rhythm accuracy, tempo stability, dynamics, and an overall performance score derived from the total of all performance skill scores. An important study conducted by Suchor (1977) contributed toward the development of the model of performer-controlled musical factors by defining performance achievement using two main areas: aesthetic and technical. Suchor investigated the influence of personality type on piano performance achievement, group interaction, and perception of group. The judging-perceiving personality preference was measured on the Myers-Briggs Type Indicator. Participants (N = 24) were grouped into one of three categories: predominately perceiving, predominantly judging, and equally mixed. Performance achievement was scored using two variables: aesthetic expressive and accuracy. The aesthetic-expressive variable was represented by the dimensions of volume, touch, and tempo intention; the accuracy variable included melody, rhythm, rhythmic continuity, and harmony (1977). No effect for personality type (p = .219) was 29 found to influence performance achievement. However, significant differences were found for perception (p = .019) and interaction (p = .028). Suchor (1977) suggests that teacher flexibility is an invaluable teacher trait in developing the problem-solving skills of students. This study helped to define musical achievement as a combination of both technical and expressive aspects. A study that examined the technical aspects of performance was conducted by Schleuter (1978). Specifically, Schleuter (1978) studied the influence of lateral dominance, sex differences, and music aptitude on instrumental achievement. Results from this study indicated no significant effect for lateral dominance or sex differences. However, Schleuter did find a significant effect for music aptitude on music achievement scores. Schleuter states that these findings possibly suggest that music achievement in the initial stages is influenced by music aptitude (1978). Musical achievement data was gathered using measures of tonal skills (sense of tonality, tone quality, and intonation), rhythmic skills (consistency of tempo beats, accuracy of meter, and melodic rhythm patterns), instrument physical manipulation skills (finger, hand, arm dexterity, general muscle coordination), and general instrumental music performance skills (1978). This study helped to support the hypothesis that perceptions of technical achievement in musical performance is represented by component factors that include tone, rhythmic accuracy, and intonation. Another study by Zdzinski (1993) supports the definition of musical achievement as a combination of both technical and musically expressive aspects. In this study, Zdzinski explored the relationships between parental involvement, cognitive and affective student attributes, and music learning outcomes. Specifically, the effects of 30 parental involvement, music aptitude, grade level, and gender on musical learning outcomes were investigated. The learning outcomes, defined as cognitive music outcomes, performance outcomes, and affective outcomes, served as dependent variables. Performance outcomes measured in this study were gathered using two separate performance measures. Objective performance data consisting of note and rhythm accuracy was measured using the Watkins-Farnum Performance Scale (WFPS; 1954). An additional measure, the Performance Rating Scale Supplement (PRSS), was created by the researcher in order to gather subjective data in addition to objective data. The factors included on the PRSS were musicality, tone quality, intonation, and technique (Zdzinski, 1993). Interjudge reliability coefficients for the WFPS and the PPRS were reported at .979 and .882 respectively. Participants (N = 406) in this study included instrumental music students enrolled in five separate band programs in rural New York and Pennsylvania. Students in these programs ranged from grades four through twelve. The instruments played by the participants included flute, oboe, clarinet, saxophone, bassoon, trumpet, French horn, trombone, baritone, tuba, and percussion. The results of this study found a significant relationship between parental involvement and both cognitive and affective outcomes. However, the relationship between parental involvement and performance outcomes was reported to be mixed at best. Zdzinski (1993) reported no significance between parental involvement and performance outcomes at the secondary level, but did find significance at the elementary level with a shared variance of 13.8%. Grade level was also reported to account for 25% 31 of variance with performance scores. Zdzinski states these results should be viewed with caution due to wide ranging definitions of performance achievement. Geringer and Johnson (2007) explored the influence of general performance factors on perceptions of musical achievement. The purpose of this research was to examine the effects of performance duration on the consistency of wind band adjudication. Additionally, the design of this study also controlled for the effects of tempo and performance level during adjudication. Participants in this study (N = 96) included music students enrolled in music programs at one of three large universities in Missouri, Florida, and Kansas. Each participant was asked to evaluate a series of eighteen listening examples. The listening examples were presented in a series of fast and slow excerpts that varied in duration and performance level. The performance evaluations measured adjudicator responses to two prompts: (1) perceived performance level and (2) perceived performance dimension in need of attention. Perceived performance level was measured on a continuum with both student performance and professional performance as opposing anchors. The second prompt asked evaluators to select an area in need of improvement from a list of factors that included balance, blend, dynamics, tone, intonation, rhythm, tempo, and expression. Evaluators were also permitted to write in an alternate response if their evaluation did not include a factor listed on the form. The results of adjudicator responses indicated no significant main effect for duration. These results support previous research by Vasil (1973) who found that reliability and ratings were no different for performances of different durations. However, 32 Geringer and Johnson (2007) state that tempo did have a significant interaction with performance level and duration. This effect is concurrent with previous research that illustrates a preference for faster tempi. Geringer and Johnson state, however, that this result indicates an impression of performance quality rather than a preference. Adjudicator responses also revealed that tone and intonation were the areas indicated in need of improvement at the high school level. Musical expression was the area indicated in need of improvement for the more experienced groups. These results coincide with previous research by Johnson and Geringer (2007). Geringer and Johnson conclude that the results of this study indicate that a consistent and reliable evaluation of musical performance can be made rather quickly. In addition, these results illustrate the influence of tempo, tone, intonation, and musical expression on musical performance evaluation. Miksza (2007) examined the relationship between practice behaviors and performance achievement. Specifically, the purpose of this study was to examine relationships among observed practice behaviors, self-reported practice habits, and the performance achievement of high school wind players. Participants in this study included 60 high school students from six different high school music programs in Indiana and New Jersey. Each participant was monitored during three separate monitored practice sessions. Practice sessions were scheduled to include preparation time, practice time, and self-evaluation time. Students were not accompanied during the practice time in order to avoid any influence the observer may have on the practice situation (Miksza, 2007). Participants were recorded once at the 33 beginning of each practice session and once at the end to facilitate the pre-post test design. Performance measures employed in this study included the objective performance measure (OPM) and the subjective performance measure (SPM). The OPM was an adaptation of the Watkins-Farnum Performance Scale (Watkins-Farnum, 1954). The OPM adapted the WFPS to evaluate errors in pitch, rhythm, dynamics, and articulations in terms of numbers of beats performed incorrectly. The SPM was utilized as a measure of the subjective aspects of performance not evaluated by the Watkins-Farnum Performance Scale (WFPS). This measure was an adaptation of the Performance Rating Scale Supplement (PRSS) scale originally developed by Zdzinski (1993). The constructs measured on the SPM included etude specific criteria (execution of dynamics, etc.), interpretation/musical effect, tone/intonation, and technique/articulation. Miksza (2007) reports the internal consistency of the SPM, to .98. Miksza (2007) states that the results from this study support previous research by Miksza (2006) who found a lack of correlation between the time spent practicing and musical achievement. He suggests that the quality of practice session is possibly more influential that quantity of practice time. Instructors can influence practice quality by focusing on specific performance dimensions for individual practice. The results of this study also lend support to the utilization, identification and influence of individual performance factors such as pitch, rhythm, dynamics, interpretation/musical effect, tone/intonation, and technique/articulation. 34 Research on musical achievement has investigated the effect of both musical and non-musical influences on musical performance. This research is important because it provides insight into musical factors considered to be representative of musical performance, and information regarding possible sources of error in performance assessment. The musical factors represented in this section include tone, technique, volume, pitch, rhythm, melody, harmony, touch, intonation, musicality, expression, and tempo (Hodges, 1975; Suchor, 1977; Zdzinski, 1993; Geringer and Johnson, 2007; Miksza, 2007). Adjudicators and the Adjudication Process Along with investigations that examine the influences on musical achievement, researchers have also conducted many studies regarding the process of evaluating musical achievement. These researchers have not only contributed valuable information concerning the process of performance evaluation, but have also provided further support and information regarding the musical factors and the influence on performance evaluations. The musical factors used in adjudication research provide valuable information regarding aspects of performance considered to be predictive of overall performance quality. Research conducted by Fiske (1975) sought to determine what differences exist, if any, between brass and non-brass specialist evaluations of trumpet performance. Musical performances were evaluated on four separate factors including intonation, rhythm, interpretation, and technique. In addition, an overall category was also utilized. The results of this study indicated no significant difference between the evaluations made by 35 brass and non-brass specialist adjudicators. Fiske (1975) concludes that brass adjudication can be equally served by both brass and non-brass specialists. In 1979, Fiske investigated the influence of performance achievement and nonperformance achievement on assessments of musical performance. Performance achievement was measured using applied performance grades. Non-performance achievement was measured using music theory and music history grades. Fiske (1979) employed a test-retest design. Adjudicators were asked to listen to and evaluate recordings of trumpet performances (N = 40). Performances were evaluated using a rating sheet that included five separate performance categories: intonation, rhythm, technique, phrasing, and overall. Fiske reports sub-scale reliabilities between .60 and .63 for rhythm, phrasing, and technique. Sub-scale reliability for intonation was reported at .46. Fiske concludes that no significant relationships exist between performance ability and judge reliability, and performance ability and non-performance achievement. In an effort to continue exploration of the influence of the adjudicator in the adjudication process, Duerksen (1972) studied the effects of adjudicator expectation on evaluations of recorded musical performances. Music majors and non-music majors (N = 517) served as participants in this study and randomly assigned to either the control or experimental group. Each participant was asked to evaluate two recordings of identical piano performances. Participants were not informed of the test-retest design, but instead were told that one of the performances was that of a professional and the other was a student seeking entrance into a university piano performance program. Performances were rated using separate scales for pitch accuracy, rhythmic accuracy, appropriateness 36 of tempo, appropriateness of accent, dynamic contrast, tone quality, interpretation, and overall quality. The results of this study indicate a significant difference (p < .01) between the control and experimental groups for all trait measures of musical performance (Duerksen, 1972). This suggests that adjudicator expectation has a significant influence on the outcome of performance assessments. Duerksen also reported that the objective aspects (i.e., rhythm and pitch) of performance were no more influential than the subjective aspects (i.e., interpretation and overall effect). Additionally, no significant difference was found between the evaluations of music majors when compared to non-music majors. Duerksen’s research supports the influence of both technical and expressive factors in predictions of overall performance quality. A number of studies have also studied the influence of adjudicator experience on the process of performance adjudication. A study by Schleff (1992) sought to determine if a difference exists between the judgments of undergraduate music students and professional music critics with regards to the quality of recorded music performances. Specifically, Schleff attempted to determine the extent to which the judgments of undergraduate music students and professional musical critics conformed when evaluating two styles (Classical and Romantic) of prerecorded instrumental, piano, and vocal performances. A total of 117 performances were selected from a pool of more than 700 performance reviews. The recordings selected for inclusion in this study ultimately met two criteria: (1) a minimum of three published reviews, (2) either a superior or inferior performance quality evaluation from more than half of the reviewers. Reviews that were 37 neither deemed inferior or superior were discarded. Excerpts of the reviewed recordings were prepared and sent to university professors for consensus of opinion regarding quality of performance. Using a global rating scale, university professors evaluated the 100 performance excerpts. Excerpts that received a consensus of opinion regarding performance quality (n = 52) from both university professors and music critics were placed into a final pool from which 30 randomly selected excerpts were extracted. Graduate music education students (n = 18) participated in a pilot study that asked for each participant to compare the 30 performance excerpts to their personal conception of an ideal performance. From the results of the pilot study 28 performances were selected for inclusion. The selected performance excerpts were played for instrumental (n = 99), keyboard (n = 27), and vocal (n = 78) music education students (N = 204). After each recording was played, participants were asked to provide an overall performance quality rating on 13 performance characteristics: (1) projection and expressive import, (2) intonation, (3) rhythmic precision, (4) appropriate rhythmic flexibility, (5) phrasing, (6) expressive line, (7) blend with and between ensembles, (8) articulation, (9) appropriate use of dynamics, (10) balance between ensembles and parts, (11) tone quality, (12) technical facility of the performer(s), and (13) diction (if applicable). Each performance characteristic was paired with a nine-point semantic differential with anchors of inferior and superior. The results of this study indicate that undergraduate students more often agree with than disagree with the opinions of professional music critics with regards to superior performances. Schleff (1992) suggests that the disagreement regarding inferior 38 performances is probably due to the inexperience of the undergraduate musician. This conclusion coincides with Tiede (1971) who determined that student conductors with less experience were less critical of inferior performances than more experienced conductors. A difference was also reported regarding the participant’s performance area. Participants in both vocal and instrumental groups reported a lower level of agreement than the keyboard students regardless of performance medium (instrumental, vocal, or keyboard). Schleff also reported that keyboard participants indicated a higher level of confidence in their performance evaluations than both the vocal and instrumental groups. He concludes that the ability to make critical judgments regarding performance quality is a function of experience. This suggests that performance evaluation, whether assessed by faculty or through self evaluation, can be influenced by adjudicator experience. A research study by Bergee (1997) explored assessment accuracy in peer and self evaluations. The purpose of this study was to further explore the consistency and accuracy of peer evaluation and self-evaluation of end-of-semester applied music performances. Bergee attempted to provide evidence that the results obtained in previous research by Bergee (1993) applied to areas beyond brass performance. For this investigation the emphasis was expanded to include voice, percussion, string and wind instruments. Specifically, this study examined the following research questions: (1) what is the interjudge reliability of faculty and peer evaluations of undergraduate applied voice, percussion, wind, and stringed instrument end-of-semester performances? (2) To what extent do faculty, peer, and self-evaluations of undergraduate applied voice, percussion, wind, and stringed instrument end-of-semester performances intercorrelate? (3) Are there differences in ability to self-evaluate among different performance 39 concentrations (voice, percussion, etc.) or between two levels of performance achievement? Applied music faculty from three universities who were normally responsible for evaluating end-of-semester performances was recruited for the purposes of this study. In addition, participants representing each of the performance categories (voice, percussion, brass, woodwind, and strings) enrolled as music education or music performance majors were recruited from the same universities. The end-of-semester performances took place over one or two days. Each performance was video taped for the self-evaluation portion of the study. Performances were measured using the categories found on the Music Educators National Conference (1958) solo adjudication forms. Performance categories for voice included tone, intonation, diction, technique, interpretation, and musical effect. Percussion categories included tone, mallet/sticking technique, body and hand position, interpretation and musical effect. Wind and stringed instrument performance categories included tone, intonation, technique, interpretation and musical effect. An additional category of articulation was added to the wind instrument category per the request of the faculty. Following the end-of-semester performances, student participants received a copy of the video taped performances from their respective institutions. Each participant was asked to evaluate each performance (including their own) as objectively as possible. Participants were allowed to playback each performance at there leisure (Bergee, 1997). The results of this investigation indicate that faculty total and subscale interjudge reliabilities were uneven. Student peer group total and subscale interjudge reliabilities 40 reported more uniform and consistent results. Faculty and peer correlations for total and subscale score were high. Self-evaluations correlated poorly with both faculty and peer evaluations. No significant difference was reported to exist between performance levels. Bergee suggests that training be considered to alleviate the issue of uneven faculty interjudge reliability. He states however, that this instability is also probably due in part to the different methods used to evaluate the performances. Since the facet-factorial method of scale development has demonstrated high reliabilities, further consideration to the development of facet-factorial rating scales for all performing media would benefit music performance measurement. Bergee (1997) also points out that the poor correlation of self-evaluation with both peer and faculty evaluations are consistent with Bergee (1993). This situation could be improved with an open and supportive dialogue that discusses these discrepancies and the implementation of self-evaluation techniques shown to be effective in teacher training (Duke, 1987; Prickett, 1987; Arnold, 1995). The ability to listen critically is an invaluable skill for musicians and performance adjudicators alike. A fifth study in a line of research conducted by Geringer & Madsen (1998) attempted to determine whether musicians demonstrate consistent listening patterns when listening to music. Previous studies focused on intonation, tone quality or both simultaneously during listening (Madsen, Geringer, & Heller, 1991, 1993; Madsen & Geringer, 1998). These studies demonstrated the participating musicians’ ability to focus on one or more performance dimensions and successfully discriminate between both good and bad performances. Another important aspect of this line of research was the examination of continuous response versus paper-and-pencil rating scale response 41 modes. The results of this inquiry concluded that the response mode made no significant impact on the outcome of the listening responses. The focus of this investigation was to determine whether or not musicians could focus on other musical aspects during listening. Specifically, participants were asked to rate performance excerpts in the categories of phrasing/expression, intonation, rhythm, dynamics, tone quality, and overall performance. Undergraduate and graduate (N = 48) students enrolled in music theory, group piano, and music education courses at a southern university were recruited to participate (Geringer & Madsen, 1998). The recordings used for this research were previously recorded and used in the four previous studies. Four performers (soprano, tenor, violinist, and cellist) were employed to record both good and bad excerpts of Schubert and Gounod’s versions of “Ave Maria.” Participants were randomly assigned to an accompaniment group (n = 24) and an unaccompanied group (n = 24). The unaccompanied examples did not include a piano accompaniment. Participants were asked to evaluate each recorded example using six performance dimensions. Each dimension was accompanied with a five-point Likert scale ranging from 1 (representing a poor performance) to 5 (representing an excellent performance). An additional overall global rating was also assigned to each performance. At the end of each excerpt participants were asked to respond to one question, “What aspect of performance does this student need to improve most?” The results demonstrated that participants consistently and clearly discriminated between the good and bad performances. In response to the end question, intonation was the most identified category in need of improvement for the bad performances (78%). For 42 the good performances responses were almost evenly distributed across all performance categories (Geringer & Madsen, 1998). These results also extend the findings within this line of research by demonstrating that consistent discriminations could be made between both good and bad performances across several performance dimensions that included: phrasing/expression, intonation, rhythm, dynamics, tone quality, and overall performance. Geringer and Madsen (1998) also conclude that intonation and tone quality are important when evaluating a performance. Kim (2000) focused on whether or not inexperienced judges are as consistent as experienced judges. This issue of consistency in piano performance evaluation was examined under four separate conditions: (1) using a rating scale, (2) using both rating scale and musical score, (3) neither rating scale or musical score, (4) using musical score only. Additionally, this study attempted to determine which condition encourages the highest reliability. Participants in this study included experienced university level piano instructors (n = 3) and doctoral level piano students (n = 3). The piano instructors were recruited as “experienced” evaluators. The doctoral level students were recruited as “inexperienced” evaluators. The performing participants were all undergraduate piano majors enrolled in a music conservatory in New York City. All musical selections performed were from the Romantic era and were selected based upon similarity in both musical and technical content. A total of five pieces of musical literature were selected for this study. Each performance was recorded and transferred to tape. Participants in both adjudication 43 groups were asked to evaluate each performance under each of the four conditions. Performances were evaluated using the Piano Performance Evaluation Rating Scale (PPERS). The PPERS measures performance achievement across eight performance dimensions: tempo, rhythm, articulation, technique, interpretation, dynamics, tone/pedaling, and memory. The four conditions used to evaluate the music performances yielded several interesting results. The condition which implemented the use of the musical score as the sole means of evaluation criteria demonstrated the most impact. These results are also contrary to those of Wapnick et al. (1993) who suggested that the use of musical scores for performance evaluation have no effect on judge consistency. Kim (2000) concludes that the utilization of rating scales for the purposes of increased reliability and consistency in performance evaluation is still inconclusive. This conclusion is contrary to research by Abeles (1971) which suggests that employing rating scales for performance evaluation will improve interjudge reliability. The results of this study suggest that both experienced and inexperienced judges can achieve acceptable levels of interjudge reliability (above .88 and .60 respectively). Both experienced and inexperienced adjudicators demonstrated consistency in evaluating high-level piano performance. Kim (2000) suggests that the lower reliability scores exhibited by the inexperienced judges are indicative of the deficiencies in the traditional piano curriculum. In addition, these results suggest that even though experience in shown to be significantly influential (Duerksen, 1972), the size of the effect may not be too great. 44 Performance adjudications can be influenced by many variables. Some commonly occurring factors that have raised questions regarding their influence on adjudication reliability were investigated by Bergee (2003) who proposed to examine the interjudge reliability of faculty evaluation of end-of-semester jury performances in applied music. Specifically, Bergee attempts to investigate the effects of three separate circumstances commonly encountered in jury evaluations: variability in size of adjudication panel, mode of evaluation employed, and adjudicator experience. In an effort to collect the most reliable data from the end-of-semester performances, Bergee employed previously developed performance measures for brass, woodwinds, strings, percussion, and voice. Each of these measures was created using factor analysis as the method for selecting both items and factor structure. Bergee used this same method to develop a measure of piano performance for use in this study. Factor analysis has been demonstrated as a viable method for creating valid and reliable measures for instrumental performance (Abeles, 1971; Bergee, 1986; Zdzinski & Barnes, 2002; Russell, 2007). Bergee (2003) revised each performance measure to include only three representative items per performance dimension. This revision facilitated the limited amount of time allotted for each performance evaluation. Items with the highest factor loadings were selected for inclusion. Each subscale was examined for items that clearly represented distinct performance aspects. Redundant items were replaced. All items were paired with a five-point Likert scale that ranged from SD (Strongly Disagree) to SA (Strongly Agree). 45 In addition to the performance measure, each adjudicator was asked to assign an overall letter grade for the performance ranging from A+ (excellent performances in all aspects) to F (exceedingly poor performance in all aspects). On a separate form evaluators were asked to indicate what position they held at the institution and how many years of experience that they have evaluating jury performances. Participants in this study included brass (n = 4), percussion (n = 2), woodwind (n = 5), voice (n = 5), piano (n = 3), and string (n = 5) faculty members and teaching assistants who were slated to evaluate the end-of-semester jury performances for graduate and undergraduate music education majors and minors. Each participating adjudicator was briefed on the use of the measures. The results of this study indicated that interjudge reliability remained stable on all total scores, subscale scores, and global letter grades gathered during this study. These results support previous research by Fiske (1975, 1977) that demonstrates an increase in stability with an increase in panel size. Bergee (2003) attributes lower total score reliabilities than reported on previous studies on measurement development to the number of items used in each subscale. An increase in the number of items would increase the total score reliability, but would negatively impact the amount of time needed to evaluate each performance. Adjudicator experience, in conjunction with results from Fiske (1975) and Kim (2000) and in contrast to Duerksen (1972), seemed to have no effect on the outcome of the performance evaluations (Bergee, 2003). Participants anecdotally stated that this has to do with the relationships between more experienced faculty and new faculty. 46 Experienced faculty members help newer faculty members become comfortable with the process of the jury evaluations. The issues of surrounding performance assessment have prompted some researchers to question the stability of music assessment as a viable research tool. Research by Thompson and Williamon (2003) addressed conceptual and practical problems regarding the use of performance assessment as a research tool. The questions this research intended to answer were: to what extent do the evaluators’ marks concur? In what ways do they differ? To what extent is the system capable of reliably discriminating between features of the performance? Participants in this study volunteered to perform for up to 15 minutes. All participants were enrolled at the Royal College of Music (N = 61). The instrument categories represented in this student population are keyboard (n = 15), woodwinds (n = 10), strings (n = 24), and other (i.e., harp, guitar, brass, and voice) (n = 12). Each student was videotaped performing two contrasting pieces before a panel of adjudicators. The videotapes were then sent to a panel of external evaluators (n = 3) who evaluated each performance using a segmented marking scale. The measurement instrument consisted of four main areas: overall quality, perceived instrumental competence, musicality, communication. Each of these categories contained representative items paired with a scale that ranged from 1-10. Adjudicators were asked to consider both performances when assessing overall quality. The data were analyzed using a nonparametric statistical method (Spearman’s rho) to calculate the correlation coefficients between each combination of evaluators. The results demonstrated a moderate positive correlation over the complete set of 47 performances (mean ρ = 0.480, range 0.332-0.651, p < .05). This accounts for approximately 25% of the observed variance. These results do not support previous findings by Fiske (1977) who suggested that overall judgments are more reliable across evaluators than segmented evaluations of separate performance aspects. However, Thompson et al (2003) indicated that these results may not be comparable due to methodological differences. Further analysis of the data indicated a high degree of multicollinearity. Using the three main categories of perceived instrumental competence, musicality, and communication as predictors for the overall quality mark, the data revealed a high correlation among all sets of variables. Thompson and Williamon (2003) suggest that the interjudge reliability of performance assessment is at best moderate, and the lack of a uniform performance assessment research tool makes research results difficult to compare. The authors note the possible sources of error in their evaluation process, but they also point out that this scenario is representative of a realistic assessment situation and is no less reliable than a controlled research environment. A measure developed from the standardization, identification, and definition of the categories that influence performance assessment is possible. However, the downside involves the amount of time the evaluators must use to become acquainted with such an instrument. In conclusion, music performance assessment is simply not open to reliable and consistent scrutiny (Thompson and Williamon, 2003). Research on musical performance adjudication supports the ability to reliably evaluate musical performance regardless of adjudicator experience or primary instrument (Fiske, 1975; Schleff, 1992; Bergee, 1997, 2003; Geringer & Madsen, 1998). The 48 relevance to the current study is the support for the utilization of a participant sample that includes a wide array of people collegiate of professional experience in music. This body of literature also provides support for the use of individual performance factors to evaluate musical performance (Bergee, 1997, 2003; Thompson & Williamon, 2003). The performance factors represented in the adjudication literature include intonation, rhythm, technique, phrasing, pitch accuracy, tempo, accent, dynamics, interpretation, tone, blend, articulation, balance, musical effect, memory, musicality, and communication. In addition, this research demonstrates the assessment of both technical and expressive aspects of musical performance. Performance Aspects of Musical Expression Research on expression in musical performance has provided a better understanding of the subjective aspects of musical performance quality. Early research on musical expression focused on the influence of musical elements on perceived expression in music performance. Research by Gatewood (1927) compared rhythm, melody, harmony and timbre to the reported musical effects of thirty-five female participants. Participants listened to ten separate musical selections and answered three prompts regarding the most prominent musical element, reason for preference of selection, and emotion associated with the musical selection from a list of 12 adjectives. The results of this study illustrated rhythm as the factor most associated with feelings of excitement or stir. Melody was associated with feelings of rest and seriousness. Both harmony and timbre showed no definite associations among the participating listeners. 49 In a related study with a more elaborate statistical treatment, Gundlach (1935) utilized a list of 17 adjectives to describe the expressive aspects of musical excerpts. A factor analysis of the participant responses produced four separate factors: dynamics, tonality, motility, and an unnamed factor. Gundlach concluded that quality music may elicit predictable emotional reactions from some listeners. This research laid the foundation for understanding the effects of certain musical elements on the perceived quality of musical expression in musical performance. A series of studies by Hevner (1935, 1936, 1937, and 1938) sought to answer the question of whether a clearly apprehended meaning in music, one that occurs with high frequency, exists. A series of six experiments studied the influence of changes in modality, rhythm, tempo, harmony, melody and pitch had on the perceived expression of music. Hevner (1938) reported tempo as the factor of greatest importance in determining expressive response. The succession of the remaining variables in order of importance was reported as follows: modality, pitch, harmony, and rhythm. Melody, descending or ascending, was reported to carry practically no expressive meaning. Hevner concludes that the results of these studies indicate a “uniformity and consistency in the apprehension of musical meaning” (1938, p.207). The results of Hevner’s line of research suggest that a clearly apprehended meaning in music can indeed be ascertained with relative consistency. Additionally, Hevner provided a hierarchical model of expressive elements in musical performance. Research by Hoffren (1964a, 1964b) supports the line of research by Hevner (1935, 1936, and 1937) in an effort to measure the abilities of secondary students to discriminate expressive phrasing in musical performance. Expressive phrasing for the purposes of this 50 study was defined as the application of musical elements identified as rubato, smoothness, articulation, phrasing, unity, continuity, dynamics, and dynamic accentuation. Hoffren (1964a) states that “standards of value do exist and all applied music teaching is based on the premise that there are both acceptable and unacceptable expressive performances” (p. 32). Levi (1978) sought to establish a Gestalt concept of musical expression in music education. He argued that previous research on the subject of musical expression lacked a solid base of explanatory concepts. This deficiency “…has prevented perceptual concepts and processes from receiving full consideration in determining the educational role of musical expressiveness” (Levi, 1978, p. 425). Issues with musical expression arise when listeners use emotional terms to describe their perceptions of musical performances. Levi states that these emotional terms refer to specific musical expressive qualities within the perception of a performance. Gestalt theory identifies two defining characteristics of expressive qualities: (1) emotional attributes applied to perceptions, (2) particularity of expressive qualities are determined by the structure of the perceived event (Levi, 1978). Levi asserts that a conceptual economy is achieved “since expressive qualities depend on perceptual structure, no special or additional processes need be hypothesized to account for the applicability of emotional terms to music…nothing extra-musical or non-auditory is asserted ” (p. 427). This concept of expressiveness allows for emotional descriptions of music without presupposing the involvement of any non-auditory features, and categorizes these descriptions as indicators of features within the musical performance. 51 Levi (1978) concludes that musical expression is an integral part of the elementary processes of musical perception. Students who are exposed to educational experiences in musical expression will be provided insight into timeless life-values. The support for the application of expressive qualities to musical perception is demonstrated in the consistency of emotional attributions to musical performance. This point is supported by research conducted by Juslin (1997b) who found that communication of basic emotions through musical performance is reliable regardless of response format. In a series of two experiments, Juslin (1997b) compared listener descriptions of emotions in musical performance using two different response formats. Participants in the first experiment (N = 15) were asked to respond to ten different interpretations of the melody Nobody Knows by Stephen Foster. Juslin notes that the different interpretations of happiness, sadness, fear, anger, and love/tenderness were performed on both the electric guitar and electronic sequencer. The performances achieved the different emotional results by manipulating musical performance variables such as tempo, timbre, amplitude, and rhythm (1997b). The response sheet contained 16 different terms that included two intensities of the following: anger, happiness, sadness, fear, disgust, and love. The last four emotions included interest, curiosity, desire and jealousy. The results of the experiment demonstrated an agreement regarding the general emotion being communicated, but not the intensity. This forced-choice format was compared to an open-ended qualitative response format in Experiment 2. The second experiment asked volunteer participants (N = 12) to judge which emotion a musician is attempting to communicate. Each participant was instructed to 52 select two emotional terms that described the performance. The performances utilized in Experiment 2 were the same as those used in Experiment 1. Results of the second experiment confirmed a listener agreement regarding the general emotional character of the musical performances, but also indicated low levels of agreement regarding the intensity of the emotion communicated. Juslin (1997b) suggested that a multi-method approach to investigating expression in musical performance might provide the answers that researchers have been looking for. The results of this study demonstrate the ability of performers to reliably communicate emotions through performance, and the ability of listeners to reliably perceive this emotion regardless of response format. Gabrielsson (1999), who investigated the expression and communication of specific emotions through performance, reported similar results. Gabrielsson (1999) investigated the ability to communicate what was defined as basic emotions (happiness, sadness, anger, tenderness, fear, solemnity, and no expression) through musical performance. The instruments used for this study included violin, flute, saxophone, electric guitar, percussion, synthesizer, guitar rock band, and singing voice. Performers in this study were reported to manipulate performance variables including tempo, timing, amplitude/dynamics, intonation, timbre, tone onsets and offsets, and vibrato. Each performance consisted of an interpretation of What Shall We Do with the Drunken Sailor and was rated on a scale from 0 (minimum) to 10 (maximum). Gabrielsson reports that performers were indeed able to communicate general emotions through performance (1999). An analysis of variance supplemented by post hoc 53 comparisons illustrated a significant difference between intended and non-intended emotions (p < .05). The mean ratings for the perceived emotions indicated that performers were able to communicate happiness, anger, and fear reliably whereas listeners often confused sadness and tenderness. The “no expression” category received ratings on all emotions ranging from 4.1 to 7.7 suggesting that a melody without an intended emotional interpretation can elicit a wide range of emotions from listeners. The previously presented research studies on musical expression by Levi (1978), Juslin (1997b), and Gabrielsson (1999) supports the consistency of emotional communication and perception through musical performance. This consistency is only applied to broad emotional categories (i.e., happiness, sadness, anger). Finer distinctions between specific complex emotions (i.e., jealousy, shame) are not reliably communicated (Juslin, 1997a; Juslin & Laukka, 2004). A meta-analysis of 41 studies concerning emotional expression by Juslin and Lindstrom (2003) illustrates the ability of professional musicians to successfully communicate five basic emotions: happiness, anger, sadness, fear, and tenderness. Juslin and Laukka (2004) state that the inability to communicate specific emotions through musical performance is due in part to the redundancy of musical features involved in communicating emotions. The redundancy of musical features limits the complexity of emotions to be communicated. Emotions in music are expressed through the manipulation of musical features including tempo, mode, harmony, tonality, pitch, micro-intonation, contour, interval, rhythm, sound level, timbre, timing, articulation, accents, tone attacks and decays, and vibrato (Juslin & Laukka, 2004). Some of these features are manipulated at the compositional stage (i.e., mode and harmony), 54 while the performer manipulates other features, such as tempo and timbre, in real time. These musical features combine to form various configurations that represent different emotions. Understanding these configurations of musical features has implications for music education and the ability to improve expressive performance in music students. Juslin and Lindstrom (2003) present an expanded version of the Brunswick Lens Model (1952) that illustrates the direct effects and interactions of both composer and performer cues on the communication of emotions to listeners. The direct effects explain approximately 75-85% of the variance in emotional judgment. A simultaneous regression of listener happiness ratings on composer cues, performer cues, and their interactions reports that these independent variable account for 90% of the variance in listener happiness (adj. R2 = .90). The most influential musical features indicated in the model are mode (β = -0.73), tempo (β = 0.55), and the cross product of mode and tempo (β = 0.16). This model provides valuable information regarding emotional communication to listening audiences, but more research needs to be conducted. A questionnaire study conducted by Juslin and Laukka (2004) helped to shed light on expression, perception, and induction of emotional responses to music in context of everyday life. The purpose of this study was to gather information regarding the social context of music listening. Specifically, this study attempted to explore the possibility of social context influences on the perceptions of intended emotional communication by performers. A total of 141 music listeners from Sweden participated in the study including approximately 51% trained musicians and 49% untrained musicians. 55 Results from this study yielded information regarding listener ideas about a definition of musical expression, a hierarchy of musical virtues/features, the extent and content of musical communication (e.g., listener and performer connection), and the basis of musical judgments about musical expression. Listeners defined musical expression as a communication of emotions and or ideas. A ranking of the relative importance of musical virtues/features indicated a hierarchy of listener values: composition, expression, and uniqueness of sound. More musician than non-musician participants indicated that technical skills are an important aspect of musical expression (Mann-Whitney’s U-test, z = 2.25, p < .05). Listeners also indicated through several different prompts that the majority feels as though music and musicians communicate emotions through performance. A total of 74% of listeners indicated that judgments of expression in music are based upon musical elements (Juslin & Laukka, 2004). Overall, Juslin and Laukka state that these results suggest that music, depending on the listener, may induce emotions beyond the basic emotions. As for the influence of social context, this topic still needs to be researched further. The studies regarding expression in musical performance provide important information about the nature of emotion in music. The focus of the current study is to study the performer controlled musical features that influence the communication of expression in music. An examination of the research presented in this section yields a convergence of concepts concerning the performer-influenced aural aspects of musical performance. A representative model of these musical aspects is found in the next chapter. 56 Development of Musical Performance Measures Research studies on the development of performance measures provide support for the ability to objectively measure musical performance accurately and reliably. Watkins (1942) conducted the earliest significant research study that attempted to improve performance measurement. Watkins created an objectively-scored measure for cornet performance from a content analysis of 23 well-known cornet methods. Sixtyeight exercises of varying difficulty were then created from the content analysis. Participant volunteers were selected (N = 105) and ranked the exercises according to difficulty. From these exercises, twenty-eight exercises were evenly distributed between two separate forms. Test-retest and equivalent forms reliability coefficients were reported to be .953 after administration of both forms of the test to 71 volunteer participants. Another early series of studies by Gutsch (1964, 1965) attempted to improve performance assessment by creating an objective performance measure. Gutsch constructed an objectively based solo rhythmic sight-reading performance measure. The measure was based on a system of rhythmic construction introduced by Schillinger (1946). This system employed nineteen mathematical equations to generate 300 rhythmic figures. The rhythmic figures were examined for redundancy and the remaining 200 figures were arranged and ordered according to difficulty, and separated onto two equivalent forms. Data gathered from participant evaluations (N = 771) indicated an equivalent forms test reliability score of .92. A random sample of 81 evaluations were stratified and reordered. A modified version of the test was created and administered to 137 participants. A rank difference correlation between the stratified and random sample (n = 57 81) and the new sample (n = 137) was reported to be .98. These studies demonstrated the potential for reliability in an objectively based musical performance measure; however, due to the lack of variables the results of these studies are limited to evaluations of rhythm. Watkins-Farnum Performance Rating Scale. A measure developed by Watkins and Farnum (1954) known as the Watkins-Farnum Performance Scale (WFPS) was an extension of the measure for cornet performance by Watkins (1942). Watkins and Farnum transposed both forms of the original cornet performance measure to facilitate the evaluation of flutes, clarinet (soprano, alto, and bass), oboe, saxophone, French horn, tuba, trombone, and snare drum. Participants (N = 153) performed exercises using both forms of the WFPS and were evaluated according to the provided criteria. The equivalent forms reliability coefficient was reported at .95 for the measure. Watkins and Farnum reported a range of criterion-related validity based on rank order correlations between .68 and .87. This score differed by instrument category. Performance evaluations using the WFPS are based on the presence of errors within each measure. Points are deducted for errors in pitch and rhythm as well as tempo, dynamics, articulations, and other written musical directions. Only one error can be scored in any one measure. This binary state of correctness, present or not present, does not account for the intensity of an error within a measure. Critics of this scale fault not only the measure’s inability to account for the frequency of errors within one measure, but the apparent inability to measure other essential aspects such as expression, tone, or intonation as well (Bergee, 1987; Zdzinski, 1991). 58 A study by Stivers (1972) examined the reliability and validity of the WatkinsFarnum Performance Scale. A total of 198 participants were randomly separated into eight groups of approximately the same size. Groups 1-4 sight-read the WFPS twice, groups 5-6 sight-read one form of the WFPS once and then practiced the same exercises for one week before being evaluated, and groups 7-8 were given an opportunity to sightread the WFPS just once. Equivalent forms reliability and test-retest reliability coefficients were reported to be .97. Intra-judge reliability was reported to be .98 after both scorings of the WFPS. Inter-judge reliability was estimated between .88 and .97 for the different groups of judges. However, content validity was reported as moderate (.63) overall and low for correlations between WFPS and both grades (.40) and contest ratings (.12). In addition, scores from different instruments could not be compared due to inconsistencies in scoring between groups. This is most likely due to the fact that the exercises on the two forms of the WFPS are easier to perform on some instruments than others (Stivers, 1972). Stivers states that the WFPS gives reliable information about note and rhythm performance, but should always be supplemented with other ratings or comments if a true indication of performance abilities is to be assessed (1972). Numerous research studies have utilized the Watkins-Farnum Performance Scale as a means of collecting performance data. Several of these studies have addressed the lack of variables measured by the WFPS by adding supplemental criteria for performance dimensions such as tone quality, intonation, and phrasing. Folts (1973) studied the relative effectiveness of employing recorded materials during practice sessions on the performance of flute, clarinet, and cornet students. The WFPS plus an additional panel of 59 judges to rate participant tone quality was utilized to rate performance achievement. In a study of entrance on achievement and retention in beginning band programs, Silliman (1977) used tape recordings of student performances to rate tone quality, intonation, and phrasing in addition to the WFPS. Abdoo (1980) studied the effects of gestalt and associationist learning theories on the performance achievement of beginning wind and percussion students. Similar to Folts (1973), Abdoo utilized a panel of three judges to rate participant tone quality in addition to the WFPS. Zdzinski (1993) conducted research that utilized the Watkins-Farnum Performance Scale (1954) along with a researcher-designed Performance Rating Scale Supplement (PRSS). The purpose of this study was to examine the relationships among selected aspects of parental involvement and cognitive, affective, and instrumental performance outcomes. Participants in this study (N = 406) included students in grades 4 through 12 enrolled in band programs located in either New York or Pennsylvania. In order to examine participant performance outcomes, Zdzinski (1993) employed both the WFPS and the PRSS. The Performance Rating Scale Supplement was modeled after previously developed facet-factorial measures by Abeles (1971) and Bergee (1987) and designed to measure musicality, tone quality, intonation and technique. Alpha reliability coefficients were reported as .979 and .882 for the WFPS and the PRSS respectively. The results of this study suggested that parental involvement is related to overall cognitive, performance, and affective outcomes. Parental involvement was specifically related to performance outcomes at the elementary levels. 60 Facet-factorial rating scales. The need for a greater accuracy in performance measurement led to research that focused on a wider variety of empirically supported performance variables. Researchers began to find objective methods of analysis that provided the empirical support for a given factor structure. One factor analytical method used by many researchers is called facet-factorial. This method employs factor analysis to group items onto component variables according to a matrix of factor loadings. Studies on facet-factorial strategies for rating scale development utilize this method to facilitate item selection and identify performance variables. A landmark study by Abeles (1971) examined the assessment of clarinet performance. The purpose of this study was to investigate the application of a facetfactorial approach for constructing a measure of clarinet performance. This research was the first to apply factor analysis to the construction of a rating scale for musical performance. Twenty-five students enrolled in music education courses at the University of Maryland were asked to write one or two page essays describing the auditory aspects of a good or bad performance of a junior high school clarinetist (grades 7, 8, and 9). These descriptive statements were content analyzed for items describing performance. A content analysis yielded 54 different descriptive statements. The 54 descriptive statements were separated into seven a priori categories that included tone, intonation, interpretation, technique, rhythm, tempo, and general effect. An additional set of 40 items were gathered from previous literature was also added to the item pool. Each item was transformed into a statement usable for evaluating a clarinet 61 performance and paired with a five-point Likert scale. A total of 94 descriptive statements were examined for appropriateness and positive or negative tone. One hundred different solo clarinet performances were collected from 7th, 8th, and 9th grade clarinet students (n = 50). These recordings were separated into random groups of two and distributed to instrumental music teachers (n = 50) from Prince Georges County, Maryland. Each judge was instructed to listen to each recording and respond to each of the 94 Likert statements. No time limit was imposed on the evaluation procedure, and judges were informed that each performer was an eighth grade student enrolled in instrumental music for three years. The results of the item pool performance responses were subjected to a factor analysis. A principal components method with a varimax rotation was used to produce six underlying performance factors including interpretation, intonation, rhythm-continuity, tempo, articulation, and tone. An analysis of the resulting factor matrix indicated thirty items as most representative of the a priori categories. Each item selected had relatively high factor loadings and were factor simple (low correlations with the other factors). The thirty items were evenly distributed among the six factors and paired with a five-point Likert scale. To establish interjudge reliability, music teachers (n = 32) enrolled at the University of Maryland were recruited as volunteer judges. The judges were separated into three judge panels (n = 9, n = 12, n = 11 judges). High interjudge reliabilities were reported for both the total score (> .90) and scale scores (> .60) for each panel of judges for the revised CPRS. The criterion-related validity (> .80) was established by correlating scores from both the CPRS and global performance ratings. Abeles (1971) states that the 62 identification of factors that reflect non-idiosyncratic characteristics of clarinet performance has the potential for utilization as a general measure of music performance. The results of this study also demonstrate the ability of facet-factorial scale construction techniques to produce reliable and valid measures of musical performance. The research conducted by Abeles (1971) was followed by other studies examining the aural aspects of music performance assessment using facet-factorial techniques. Cooksey (1974) employed facet-factorial techniques to develop a rating scale for high school choral groups. The resulting factors identified for choral evaluation were diction, precision, dynamics, tone control, tempo, balance/blend, and interpretation/ musical effect. DCamp (1980) constructed a facet-factorial rating scale for high school band. The factors identified for band performance evaluation were tone/intonation, balance, musical interpretation, rhythm, and technical accuracy. Both the Cooksey (1974) and DCamp (1980) studies demonstrate the effectiveness of utilizing facet-factorial scales for the evaluation of group performance. A study by Jones (1986) attempted to expand the use of facet-factorial scales to examine both the aural and visual aspects of musical performance through the development of a solo vocal performance rating scale. This study was an extension of previous research by Abeles (1971), Cooksey (1974), and DCamp (1980). The purpose of this study was to develop a rating scale for individual vocal performance using facetfactorial techniques that would be appropriate for use in a high school choral rehearsal. Previous performance assessment employing facet-factorial methods have focused on the aural aspects of musical performance. Jones intended to use facet-factorial methods to develop a measure for evaluating both aural and visual aspects of vocal performance. 63 Vocal performances from high school students (n = 30) ranging in ages from 15 to 18 were videotaped using students from nine different schools within Arkansas. Participating students prepared solo music appropriate for contest. Wide ranges of vocal abilities from beginner to advanced were recorded for use in the subsequent phases of the study. Judges (n = 50) from six different states were used to adjudicate the videotaped performances. No judges participated in more than one phase of the study. Members of the National Association of Teachers of Singing (NATS) contributed forty-three essays regarding good and bad vocal performances. The essays were content analyzed and the extracted items were separated into a priori factors established by the National Interscholastic Music Activities Commission (NIMAC): tone, intonation, technique, interpretation, musical effect, and other factors. The item statements were examined for appropriateness and positive or negative disposition. Each of the 168 item statements selected to be included in the final item pool were accompanied by a fivepoint Likert scale ranging from Strongly Agree to Strongly Disagree. Fifteen volunteer judges experienced in vocal adjudication were used to evaluate two randomly selected performances. Each judge was given an unlimited amount of time to adjudicate each performance using the item pool developed in the previous phase. The results of the item pool adjudications were then submitted to a factor-analysis. A principal-components technique was used to identify the underlying factors that relate to vocal performance. This factor analysis produced twenty-six initial factors. From these twenty-six factors, a five-factor structure was chosen that includes: interpretation/ musical effect, tone/musicianship, technique, suitability/ensemble, and diction. These factors were then rotated using a varimax rotation to produce uncorrelated factor-simple 64 items that adequately represent each of the underlying performance dimensions. The results produced thirty-two items to be included in the Vocal Performance Rating Scale (VPRS). Interjudge reliability was calculated using three panels of judges consisting of fifteen judges evenly distributed. Using the final form of the VPRS, the total interjudge reliability was reported as .894, .917, and .920 for each group respectively. Interjudge reliability calculated for judge panels ranging from one to ten judges ranged from .627 to .958. Sub-scale reliability estimates ranged from .201 to .958. Jones attributes the wide spread in reliability to the instability of the Suitability/Ensemble factor. Criterion-related validity was established by correlating scores from the VPRS and scores from the NIMAC Vocal Solo Adjudication Form. Zero-order correlation coefficients between the VPRS total scores, sub-scale scores, and the NIMAC scores ranged from .351 to .878. A step-wise multiple regression of the VPRS sub-scale scores and the NIMAC criterion produced a corrected R2 score of .897. The results from Jones (1986) research coincided with the results found in previous studies by Abeles (1971), Cooksey (1974), and DCamp (1980). Utilizing facetfactorial scale construction techniques a reliable and valid scale was produced. However, judges tended to react differently to various visual components of performance such as age, appearance, physical size, grooming, etc. The adjudicator’s reaction to the visual aspects of the Vocal Performance Rating Scale may have contributed to instability of the measure (Jones, 1986). Bergee (1987) conducted facet-factorial research on the development of a rating scale for low brass performance. This study was a replication of previous research by 65 Abeles (1971) on woodwind performance. The purpose of this research was to develop an empirically valid and reliable rating scale for the evaluation of tuba and euphonium performance. In addition, Bergee intended to identify the factors that contribute to tuba and euphonium performance, and select items that appropriately represent the performance factors. Essays from professional tuba and euphonium players and adjudication sheets collected from area music teachers were used to form the initial item pool. The essays contained descriptions of good and bad tuba or euphonium performance. Additional statements were collected from previous research on performance assessment. A content analysis was performed on the item statements and a total of 210 descriptive statements were extracted. These statements were examined for redundancy and appropriateness. Any item deemed as redundant or inappropriate was eliminated. A panel of three judges determined the positive or negative tone of the item statements and agreed on 112 statements to be used in the final item pool. The item pool statements were separated into the a priori categories previously established by Abeles (1971): tone, intonation, interpretation, tempo, rhythm, technique, and general effect. Each item was translated into items that were appropriate for tuba and euphonium assessment and randomly ordered. A 5-point Likert scale ranging from Strongly Disagree to Strongly Agree was added to each item statement. A total of 100 collegiate and public school tuba (n = 50) and euphonium (n = 50) performances were recorded and evaluated using the 112 Likert-type items. Instrumental music teachers, University of Kansas faculty members, and graduate students served as 66 volunteer judges (n = 50). Each judge was asked to adjudicate two randomly assigned performances using the item pool. The results of the initial administrations of the item pool were factor analyzed. The factors retained for rotation were determined by 3 criteria: (1) precedent established by prior researchers, (2) Eigenvalues greater than 1.00, and (3) scree plot. The selected factors were then orthogonally rotated to obtain uncorrelated factors. The investigator decided on a four-factor structure that includes: interpretation/musical effect, tone quality/intonation, rhythm/tempo, and technique. The final version of the Euphonium and Tuba Performance Rating Scale (ETPRS) included 27 items and accounted for 75.8% of the total variance. Three separate panels of judges (n = 10 in each) were formed to determine the interjudge reliability of the ETPRS. The interjudge reliability for the total scores of the ETPRS was .944 for panel 1, .985 for panel 2, and .975 for panel 3. Interjudge reliability was also calculated for groups of judges of 1 to 20 using the Spearman-Brown Prophecy formula. These scores range from .835 to .984. Bergee (1987) states that these results support Fiske’s (1983) contention that interjudge reliability increases to the tenth judge and then levels off beyond this point. The criterion-related validity for the ETPRS was established in two separate studies. The first study employed three judge panels (n = 10 in each) comprised of instrumental music teachers. Each panel was asked to perform magnitude estimation on a set of performances. An examination of these zero-order correlation coefficients indicates a strong relationship between the ETPRS and the magnitude estimation (Bergee, 1987). A multiple regression analysis using the magnitude estimation as criterion reports corrected 67 R2 scores ranging from .831 to .913. This indicates that the ETPRS is an appropriate predictor of magnitude estimation criteria. The second study used 10 judges to evaluate the recorded performances using the aural items from the MENC (1958) adjudication form for solo wind instruments. The interjudge reliability for the MENC adjudication form is estimated at .978. In order to compare the results of the ETPRS to the MENC adjudication form, zero-order correlation coefficients were calculated. The results of the zero-order correlation coefficients ranged from .823 to .936. The results from the step-wise multiple regression indicated a R2 = .857. The results from this study suggest that a facet-factorial approach to rating scale development for low brass performance is a viable method for producing valid and reliable measures (Bergee, 1987). Bergee notes that the high reliability and quantifiable criterion-related validity is probably due to the procedure used to construct the measure. He also suggests that these results could potentially lead to the development of a comprehensive measure of music performance. A study by Zdzinski and Barnes (2002) developed a string instrument performance assessment using facet-factorial techniques. Specifically, this study sought to: (1) identify the factors that influence string performance assessment, (2) identify the items that best represent the performance factors, and (3) determine the reliability and validity of the new measure. This study follows the progression of facet-factorial research previously established by Abeles (1971), Cooksey (1974), DCamp (1980), Jones (1986), and Bergee (1987). 68 The initial pool of item statements was generated from descriptive essays gathered from string teachers, string education students, and the researchers (n = 25). The essays consisted of descriptions of the aural aspects of good or bad stringed instrument performances. Additional items were supplied by previously developed facet-factorial performance rating scales (Abeles, 1971; Bergee, 1987). All item pool statements were then placed into the a priori categories of tone, intonation, interpretation, technique, rhythm, tempo, and general effect previously established by Abeles (1971). Upon further examination of the item statements, Zdzinski and Barnes (2002) decided to employ an additional category for vibrato previously established in a study by Gillespie (1997). The item statements were examined for appropriateness and redundancy and transformed into statements that could be used for string performance adjudication. Each of the 90 item statements was paired with a 5-point Likert scale ranging from Strongly Agree to Strongly Disagree and randomly ordered. Middle school and high school string performers were recruited to record a total of 100 string performances. Each performance was digitally recorded and averaged approximately 31 seconds in length. Public school string teachers, university string faculty, graduate, and both junior and senior undergraduate string education majors (n = 50) were recruited to adjudicate two randomly selected recorded performances using the Likert-type items. During the adjudication process, judges were permitted to play each recording as many times as needed. The results of the adjudications using the item pool were factor analyzed. A principal component extraction and a varimax rotation of 4 to 10 factors were used to obtain an uncorrelated factor structure. Zdzinski and Barnes (2002) presented a five- 69 factor structure that included interpretation/musical effect, articulation/tone, intonation, rhythm/tempo, and vibrato. The researchers noted that a five-factor structure was the best fit to maintain the desired factor-simple structure. Based on the factor loadings, twentyeight items were selected to be included on the final version of the String Performance Rating Scale (SPRS). Each factor was represented by six item-statements with the exception of the vibrato factor, which was represented by four statements. The results from this study indicate overall interjudge reliability to be above .85 for all panels. Zdzinski and Barnes (2002) state that the subscale reliabilities, ranging from .67 to .92, are satisfactory with the exception of low subscale reliability on vibrato in panel three (.065). Criterion-related validity was established by comparing SPRS scores with scores obtained from both magnitude estimations and the MENC adjudication ballot (MENC, 1958). The zero-order correlations range from .67 to .77 between the SPRS and the MENC ballot, and .605 to .61 between the SPRS and the magnitude estimation scores. Content and construct validity are established through the use of previously established (Abeles, 1971; Bergee, 1987) categories and the item generation procedure. The factor structure of the SPRS differs slightly from those developed by Abeles (1971) and Bergee (1987). The factors for interpretation/musical effect and rhythm/tempo were represented in the Zdzinski and Barnes study as well as the Abeles and Bergee studies. However, the factors of articulation, tone, technique, and intonation were grouped differently in each study, and the vibrato factor was only present in the Zdzinski and Barnes study. Zdzinski and Barnes stated that the differences in the factor groupings 70 may be representative of the unique characteristics of both wind and stringed instrument technique (2002). More recently, a study by Russell (2007) examined the use of facet-factorial techniques in the development of a rating scale for guitar performance. The purpose of this study was to develop a valid and reliable measure usable for rating solo guitar performance. In addition, this study intended to identify the aural musical factors that influence evaluations of guitar performance. Statements concerning descriptions of both “good” and “bad” quality guitar performances were collected from guitar instructors, professors, and professional performers. Additional statements concerning performance quality were gathered from previous performance research conducted by Abeles (1971), Bergee (1987) and Zdzinski and Barnes (2002). The item pool statements were examined for suitability, redundancy, and positive/negative tone and then placed into a priori categories that included tone, intonation, technique, rhythm, tempo, interpretation, and musical effect. The resulting 99 item statements were paired with a five-point Likert scale. Participants in this study (N = 55) included high school, college, and professional guitar players. Each performer was asked to perform one or two repertoire selections of their choice. A total of 100 recordings averaging 27 seconds in length were gathered from the participant performances. The recordings were randomly paired into groups of two and transferred onto compact discs. Music professors and both undergraduate and graduate music students (n = 67) were recruited from Florida, California, and South Carolina to act as volunteer judges. Each judge was asked to evaluate two performances using the 99-statement item pool. 71 The results of the 134 item pool adjudications were factor analyzed using a varimax rotation to identify the underlying factor structure and the items that best supported each factor. The results of the factor analysis yielded a five-factor structure consisting of interpretation/musical effect, tone, technique, rhythm/tempo, and intonation. These factors accounted for approximately 71% of the total variance. To create the final version of the Guitar Performance Rating Scale (GPRS), an examination of the factor matrix was conducted in order to select the items that were most representative of the identified performance dimensions. A total of 32 items were selected to be most representative of the factors of the GPRS. Cronbach’s alpha for the GPRS was estimated at .962 for the 32-item scale. Russell (2007) suggests that the results from this research support results previous studies that demonstrate the facet-factorial approach as an appropriate method for developing valid and reliable performance rating scales. In addition, the performance factors identified in this study are similar to those found in previous research; this similarity suggests that there could possibly be a factor structure that is appropriate across all instrument groups. Zdzinski and Barnes (2002) suggested that performance measurement might be improved by using the facet-factorial approach as well as through the criteria-specific approach used by Levinowitz (1989), Rutkowski (1990), Azzara (1993), and Saunders and Holahan (1997). Criteria-specific rating scales. Rating scales that employ criteria-specific development strategies intend to objectively measure instrumental performance while providing specific performance feedback to both teacher and student. Criteria-specific 72 rating scales include descriptions of performance capability at various levels of achievement and give students a better sense of qualities of a good performance (Whitcomb, 1999). Researchers investigating the development and application of criteriaspecific rating scales indicate substantial reliability when evaluating instrumental and vocal performances (Levinowitz, 1989; Rutkowski, 1990; Azzara, 1993; Saunders & Holahan, 1997). In an effort to increase the accuracy of evaluated performances, researchers began to focus on the criteria used for the evaluation of performance variables. Kruth (1973) defined specific objectives to be met in an exhaustive evaluation measure for clarinet performance. The ten-page measure evaluated the following performance dimensions: embouchure, articulation, breath control, playing position, technical facility, and reading ability. Unfortunately, the length of this measure made it implausible to use in an evaluation situation where time is of the essence. An informative rubric format resulted from the idea of informing the performer of the criteria used for evaluation. A research study by Levinowitz (1989) utilized the rubric format to evaluate the rhythmic and tonal aspects of children’s vocal performance. This purpose of this research was to examine the relationship of a child’s ability to sing a song containing words and language development. In addition, this study attempted to answer whether or not a young child can perform a rote melody with words better than a melody presented on a neutral syllable. Participants in this study included two classes of nursery school students (n = 35) in Fort Washington, Pennsylvania. Each participant received music instruction for 30 minutes per week for five months. During the course of the study each class was taught two criterion songs. One 73 song was performed with words and the other song was performed using the neutral syllable “bum.” At the conclusion of instruction each participant was recorded singing both songs. Two judges evaluated the recordings using two researcher-designed measures designed to evaluate student achievement in the areas of tone and rhythm. These fivepoint rating scales employed descriptions of the characteristics of differing levels of tonal and rhythmic achievement. Interjudge reliability for the tonal rating scale ranged from r = .78 to .93 and from r = .84 to .90 for the rhythm rating scale. Levinowitz claimed direct validity for both rating scales due to the consistency of the adjudication scores. The t-test results indicated no significant difference between the ability to perform rhythm with or without words. However, these t-tests did illustrate a significant difference between the tonal abilities of songs performed with words and without words. Levinowitz (1989) suggested that a child could potentially perform a melody more accurately if there is no distraction from words. Scores from the tonal and rhythmic rating scales were also correlated with scores from the Standford-Binet I.Q. Test and the Peabody Picture Vocabulary Test (PPVT). The results indicated no significant correlation between a child’s ability to perform a song with words and language development. The results from this study suggested the existence of two separate mental processes necessary for learning a song: (1) acquisition of melodic material and (2) learning the text. Additional conclusions regarding the efficacy of criteria-specific rating scales indicated the potential for high interjudge reliability, but suggested further investigation into the validity of these measures (Levinowitz, 1989). 74 The trend of high interjudge reliabilities is also apparent in a study by Azzara (1993) on the effects of improvisation instruction on the musical achievement of fifth grade students. Participants in this study included fifth grade students (N = 66) from two separate schools. Participants in the experimental group received instruction on improvisation in addition to the regular class curriculum. At the end of the 14-week instruction period each participant recorded three etudes: (1) a prepared etude, (2) a teacher-assisted etude, and (3) a sight-read etude. A criteria-specific rating scale was used to evaluate the recorded etudes. The measure was designed to evaluate tonal, rhythm, and expression factors using a five-point scale. Each successive interdependent criterion assumed that the performer has gained proficiency at the previous level(s) (Azzara, 1993). Interjudge reliability was estimated at .94 and no source of validity was mentioned. A panel of four judges was used to adjudicate the 198 recordings on three separate occasions in order to adjudicate the tonal, rhythm, and expression factors separately. The composite results of the tonal, rhythmic, and expression evaluations were subjected to a two-way analysis of variance. The results indicate that students who received instruction in improvisation obtained higher composite scores than students who did not receive this instruction (Azzara, 1993). The performance of the criteria-specific measure used in this study was consistent with previously developed rating scales by Levinowitz (1989) and Rutkowski (1990). Since a source of validity was not identified, continued investigation into the establishment of validity for criteria-specific rating scales is necessary. 75 Saunders and Holahan (1997) investigated the suitability of criteria-specific rating scales in the selection of high school honors ensemble participants. The research questions proposed in this study included: (1) do criteria-specific rating scales yield adequate measurement results? (2) do criteria-specific scales help judges discriminate between different levels of instrumental performance? (3) which performance factors are most predictive of the overall score? A total of 926 woodwind and brass performers seeking entrance into the Connecticut All-State Band and 36 judges participated in this study. The measure used to evaluate the student performers addressed the three sections of audition material: solo evaluation, scales, and sight-reading. The solo evaluation included the factors of tone, intonation, technique/articulation, melodic accuracy, rhythmic accuracy, tempo, and interpretation. The scale section included the factors of technique, note accuracy, and musicianship. The sight-reading portion of the measure consisted of tone, note accuracy, rhythmic accuracy, technique/articulation, and interpretation factors. Each factor was accompanied by continuous or additive criteria on a five-point scale. The results of adjudications were analyzed to determine the frequency of response, means, and standard deviations for each performance dimension. The alpha reliability for the combined scores of all instruments was estimated at .915. This high alpha reliability is consistent with prior criteria-related research studies (Levinowitz, 1989; Rutkowski, 1990; Azzara, 1993). Scores from each of the performance dimensions were correlated to the total score. The median correlation was estimated at .73. In addition, a stepwise multiple regression and an analysis of variance were performed to 76 determine the amount of variance contributed by each factor. The results indicated that all of the performance factors contributed significantly (p < .001) to the prediction of overall woodwind and brass performance and accounted for 92% of the variance. Saunders and Holahan (1997) claimed that the overall pattern of correlations provides indirect evidence of the validity for criteria-specific rating scales used in this study. Saunders and Holahan stated that the results of this study suggest that criteriaspecific rating scales are a viable means for assessing high school woodwind and brass performance with sizeable reliability. Furthermore, the authors stated that the results of this research indicated evidence of superior diagnostic validity. Saunders and Holahan (1997) also suggested future research should investigate the utilization of factor-analysis to determine the stability of the factor structure within criteria-specific rating scales. A more recent study by Norris and Borst (2007) compared the reliability of two choral performance measures. The first measure (Form A) was a traditional rating scale that required adjudicators to assign a score of 1-5 for each performance dimension. The second measure (Form B) was a rubric format that utilized descriptors that defined each level of achievement for a given performance dimension. Both forms used the same performance dimensions of tone, diction, bend, intonation, rhythm, balance, and interpretation. Participants included randomly selected SATB choruses (N = 15). Each chorus performed one selection that was recorded onto compact discs. A panel of four judges reviewed the recordings twice. Form A was used in the first evaluation, and Form B was used for the second evaluation. The two judging sessions were separated by 1.5-hour 77 lunch break. Means and standard deviations were calculated as a well as intraclass correlations from the data gathered. The results of the t-tests indicated significant differences between the two forms (p < .05) in all performance dimensions except interpretation (t = -1.79, p = .079). Norris and Borst (2007) state that the intraclass overall reliability for Form B was .15 higher than that of Form A. However, neither form provided any real information about levels of rhythmic achievement (Norris & Borst, 2007). The authors conclude that measures, which include dimension-specific criteria, are more appropriate for evaluations of musical performance than measures without these dimension-specific descriptors. Summary of Related Literature Research on evaluation criteria has provided valuable information regarding the process of musical performance assessment. The literature reviewed in this section outlines research on the identification of performance constructs, musical achievement, adjudicators and the adjudication process, musical expression, and performance measure development. Studies concerning the identification of influential performance variables have provided important insights into the overall structure of musical performance. Early research studies on performance constructs were centered on investigating band performance and festival rankings (Owen, 1969; Oakley, 1972; Neilson, 1973). This early research focused on content analysis of measures and comments from band evaluations to determine the underlying performance dimensions that were influencing judgments of performance quality. Some of the performance dimensions identified were 78 rhythm, interpretation intonation, tone, expression, pitch, musicality, phrasing, balance, articulation, diction, musical effect, and dynamics (Owen, 1969; Oakley, 1972; Neilson, 1973; Oldefendt, 1976; St. Cyr, 1977; Sagen, 1983; Burnsed, Hinkle, and King, 1985; Mills, 1987; Bergee, 1995; Thompson, Diamond, & Balkwill, 1998; Wrigley, 2005; Johnson and Geringer, 2007). As research on performance constructs continued, researchers began to shed light on the nature of performance evaluation and the global rating. Research by Burnsed, Hinkle, and King (1985) demonstrated high correlations between individual performance dimensions and global ratings. These results are also supported by Mills (1987) who suggested that performance could indeed be represented by individual performance constructs. An investigation by Bergee (1995) looked into the existence of higher order musical performance factors. Three higher order factors were identified: Tone Quality/Intonation, Musicianship/Expressiveness, and Rhythm/Articulation. This study helps establish the importance of both technical and expressive aspects of performance. Research by Thompson, Diamond, and Balkwill (1998) also suggests that aspects of expression play an important role during evaluations of musical performance. More recently, research by Wrigley (2005) investigated the musical and nonmusical influences of performance evaluation. This study provides support for the existence of two main cross-instrument factors. These factors were identified as technical proficiency and interpretation. A study by Johnson and Geringer (2007) sought to examine the influences of musical elements on evaluations of wind band performances. Musical expression and tone/intonation accounted for approximately 77.5% of the 79 variance in wind band evaluations. The conclusions made by these researchers lend support to a primary influence of expressive and technical aspects of musical performance during evaluations of performance quality. Research on musical achievement has investigated the influence of a variety of variables on musical achievement, such as musical instrument (Stecklein and Aliferis, 1957), classroom type (Colwell, 1963), personality type (Suchor, 1977), lateral dominance/sex/music aptitude (Schleuter, 1978), parental involvement/cognitive and affective student attributes (Zdzinski, 1993), and practice behaviors (Miksza, 2007). Early research on musical achievement defined musical achievement as aural discrimination ability (Stecklein and Aliferis, 1957; Colwell, 1963). Later on, research began to widen the concept of musical achievement to include tone, technique, pitch, rhythm, intonation, musicality, expression, and tempo (Hodges, 1975; Zdzinski, 1993; Geringer & Johnson, 2007). The idea of musical performance achievement as a combination of both aesthetic expressive and accuracy factors was supported in a research study by Suchor (1977). Suchor investigated the influence of personality type on piano performance achievement, group interaction, and perception of group. This duality of subjective and objective influences on performance achievement is also reflected in research by Zdzinski (1993) and Miksza (2007). Research on the process of musical performance adjudication focuses on the variables that are suspected to influence adjudicators. This body of research has helped to determine where possible sources of measurement error originate in an evaluation situation. Early contributions to adjudication research found possible influences of 80 evaluator expectation (Duerksen, 1972) and no influence of non-performance achievement on music performance evaluations (Fiske, 1979). Some later adjudication research has concentrated on the educational level and musical experience of the adjudicator and how these factors influence the consistency and reliability of musical performance judgments. Research in this area has found no difference between specialists and non-specialists judgments of musical performance quality (Fiske, 1975; Kim, 2000; Bergee, 2003; Hewitt, 2007). Studies of undergraduate musicians reveal their ability to capably evaluate musical performances (Schleff, 1992; Bergee, 1997; Geringer & Madsen, 1998; Hewitt, 2007). Research by Bergee (2003) studied the effects of variability in size of adjudication panel, response mode of evaluation employed, and adjudicator experience on end of the semester jury adjudications. The results of this study found no significant effect for response mode or evaluator experience. However, Bergee also reports that an increase in evaluation stability could be made possible by an increase judge panel size and number of subscale items. The question of stability and consistency in adjudication was also addressed by Thompson and Williamon (2003) who suggest that the reliability of musical evaluations is at best moderate. The results of this study suggest that the evaluation of separate performance aspects for the purpose of musical performance evaluation could be more reliable than overall judgments. Research by Hewitt (2007), who found that adjudicators could successfully concentrate on multiple performance factors, also supports the utilization of separate performance factors. Thompson and Williamon conclude that the development of a standardized set of defined performance criteria could indeed be 81 possible. The identification of uniform methods of evaluation can lead to better stability in musical performance evaluations (Bergee, 1997; Thompson & Williamon, 2003). Research on expression in music indicates the influence of rhythm, melody, harmony, timbre, dynamics, tonality, pitch, and tempo on listener’s impressions of a performer’s emotional communication abilities. Early research by Hevner (1938) suggested a ranked order of importance for expressive musical elements: tempo, modality, pitch, harmony, and rhythm. Hevner concluded from her studies in musical expression that a consistency of apprehended meaning in music exists (1938). Hoffren (1964a, 1964b) established the ability of secondary students’ ability to discern meaning in music. Hoffren concluded that standards of value do exist in regards to expressive performance in music. Levi (1978) supports this by stating that music expression is an integral part of the perception of performance quality. The idea of consistency in apprehended meaning in music has inspired many recent research studies on the subject. Research by Juslin (1977b), Gabrielsson (1999), and Juslin & Lindstrom (2003) explored the possibility of apprehended meaning in music. These researchers concluded that performers could indeed reliably communicate emotion through performance. A research study by Juslin and Laukka (2004) illustrated distinctions between composer controlled and performer controlled musical elements that influence evaluations of musical expression. In a survey conducted by Juslin and Laukka, participants were asked to define musical expression as well as raked musical elements in terms of importance. Musical expression was defined as the communication between 82 performers and audience. Participants ranked the factors of expression and technical skills highly as factors of musical importance. Studies on the development of performance measures have identified objective strategies for creating musical performance measures. Performance development research demonstrates that the utilization of different response modes makes no significant difference in measure reliability (Saunders & Holahan, 1997; Zdzinski & Barnes, 2002; Russell, 2007; Norris & Borst, 2007). This conclusion is supported by adjudication research by Bergee (2003). These objective measure development strategies also offer empirical insight into factors that influence evaluations of performance quality. Measures utilizing evaluations of separate performance dimensions have shown considerable reliability (Abeles, 1971; Bergee, 1987; Zdzinski & Barnes, 2002; Russell, 2007). Research on performance measure development has established the important influence of rhythm, pitch, tone, intonation, musical effect, interpretation, and technique. The occurrence of repeating performance dimensions across several instrument categories indicates the possibility if cross instrument performance factors that influence judgments of performance quality (Wrigley, 2005; Russell, 2007). The literature presented in this chapter presents a collection of musical factors which are thought to influence assessments of musical performance quality including: technique, interpretation, rhythm, intonation, phrasing, pitch, musicality, tone, balance, blend, expression, dynamics, tempo, articulation, timbre, accent, rubato, vibrato, embouchure, smoothness, unity, continuity, style, position, posture, ensemble, accompaniment, appearance, breathing, conductor, instrumentation, instrument quality, 83 difficulty, arrangement, attack, release, range, communication, melody, tonality, and harmony. These musical factors represent both technical and expressive aspects of musical performance. This literature review also reveals a further categorization of these musical factors into performer controlled and composer controlled musical aspects. The purpose of the present study is to examine a hypothesized model of the performercontrolled musical components that influence assessments of musical performance quality. CHAPTER 3 Method The purpose of this study is to verify the existence of a model of musical factors that influence assessments of aurally presented musical performance. Specifically, this model represents the aurally perceived performer-controlled musical performance factors that influence assessments of solo musical performance quality. The support for this model is founded upon previous research concerning performance constructs, musical achievement, musical performance adjudication, musical expression, and musical performance rating scale development. This model assumes that the quality of performercontrolled musical factors has an impact on assessments of overall performance quality. The order of this investigation proceeded as follows: a) gathering of a priori performance variables from previous research on performance constructs, musical achievement, musical performance adjudication, musical expression, and musical performance rating scale development; b) development of a tentative model of performercontrolled musical performance factors; c) construction of the aural musical performance quality measure; d) gathering recordings of solo brass, woodwinds, strings, voice, and guitar performances; e) evaluation of recorded performances by volunteer adjudicators; and f) data analysis (Keith, 2006; van Gigch, 1991; Lippitt, 1973). Gathering Performance Dimensions Researchers investigating the issues surrounding performance assessment have utilized numerous performance variables for the purpose of evaluating some aspect of 84 85 musical performance. An examination of this research yields and exhaustive list of performance dimensions that include technique, interpretation, rhythm, intonation, phrasing, pitch, musicality, tone, balance, blend, expression, dynamics, tempo, articulation, timbre, accent, rubato, vibrato, embouchure, smoothness, unity, continuity, style, position, posture, ensemble, accompaniment, appearance, breathing, conductor, instrumentation, instrument quality, difficulty, arrangement, attack, release, range, communication, melody, tonality, and harmony. An adaptation of the Brunswick Lens Model (1952) by Juslin and Lindstrom (2003) illustrates a separation of performer and composer musical factors that influence the communication of emotions to listeners. This concept of separate performer and composer musical cues was extended for the purpose of assessing musical performance achievement. The collected variables were examined for redundancy and appropriateness and then separated into one of two categories: performer-controlled cues and non performer-controlled cues. The variables were then categorized further into visual, ensemble, performer, and composer factors (see Appendix B). Since this study is interested in modeling performercontrolled performance factors, all variables making reference to composer cues, visual cues, ensemble cues, non-musical factors and descriptions of performance factors were set aside. The remaining variables are categorized as aurally perceived performercontrolled cues: technique, interpretation, intonation, musicality, tone, expression, dynamics, tempo, articulation, timbre, rubato, vibrato, and communication. The variable of rhythm was labeled by Juslin and Lindstrom (2003) as a composer cue; however, the frequent occurrence of rhythm as an influential performance dimension demanded a reconsideration (Fiske, 1972, 1975, 1979; Hodges, 1974, 1975; Schleuter, 1978; Sage, 86 1978, 1983; Abeles, 1971; Bergee, 1987, 1989, 1993, 1995, 2003; Radocy & Boyle, 1988; Geringer & Madsen, 1989; Zdzinski & Barnes, 2002; Russell, 2007; Miksza, 2007). Rhythm was relabeled to reflect a more specific performer-controlled musical component entitled rhythmic accuracy. This variable refers to the metric accuracy of the rhythms performed in relation to the pulse of the music. Development of the Tentative Model A tentative model was developed from previous research studies on performance variables, performance achievement, performance adjudication, and music performance measure development. Models are developed from conclusions drawn from both formal and informal theories, previous research, time precedence, common sense and logic (Keith, 2006). “A model is by nature a simplification and thus may or may not include all the variables. It should include, however, all of those variables which the model-builder considers important…” (Lippitt, 1973, p.2). Since the purpose of this study is to investigate a tentative model of the performer-controlled aural musical factors that influence assessments of musical performance quality, the hypothesized model will consist of the performer-controlled musical variables gathered in the previous step. The structure of performer-controlled musical factors is hypothesized to consist of two subcategories of performance variables: technique and musical expression. An examination of the aurally perceived performer-controlled performance variables gathered in the previous step, and a content analysis of previous research on performance assessment supported the utilization of both technical and expressive aspects during evaluations of musical performance (Suchor, 1977; Zdzinski, 1993; Bergee, 1995; 87 Wrigley, 2005; Johnson & Geringer, 2007; Miksza, 2007). The collected musical factors were then separated into categories of either technique or musical expression. The subcategory of technique consists of the variables: tone, intonation, articulation and rhythmic accuracy. For the purpose of this study, the assessment of tone is defined as an evaluation of the quality of sound produced on a musical instrument. An assessment of intonation is defined as an evaluation of the accuracy of pitch relations (Miller, Felbaum, Tengi, & Langone, 2006). Articulation is evaluated on the quality of tonal attacks and releases. Rhythmic accuracy is defined as the metric exactitude of the performed rhythms in relation to a steady pulse. Definitions for the performance components of technique originated from previous studies on performance constructs and musical performance rating scale development (Abeles, 1971; Jones, 1986; Bergee, 1987, 1995; Zdzinski & Barnes, 2002; Russell, 2007). These performance first-order components are hypothesized to have a direct effect on evaluations of the technical aspects of performer-controlled musical factors. An analysis of research concerning performance assessment and musical expression reveals some redundancy in terms used to describe musical expression. Many research studies show an overlap in the usage of the terms: musicality, musicianship, musical effect, communication, and expression (Owen, 1969; Cooksey, 1974; Burnsed, Hinkle, & King, 1985; Bergee, 1987, 1995; Geringer & Madsen, 1998; Zdzinski & Barnes, 2002; Thompson & Williamon, 2003; Russell, 2007; Abril & Flowers, 2007; Miksza, 2007). These terms were described in previous research using items or definitions that refer to communication of style or emotion. In research by Juslin and Laukka (2004), results indicated that over 74% of the participants surveyed defined 88 musical expression as the communication of musical emotions and ideas. Therefore, for the purposes of this research these terms are considered to be synonymous with the term “musical expression.” The subcategory of musical expression is hypothesized to include the variables tempo, dynamics, timbre, and interpretation (Hevner, 1938; Hoffren, 1964a; Smith, 1968; Bergee, 1995; Juslin, 1997b; Juslin & Lindstrom, 2003; Juslin & Laukka, 2004; Russell, 2007). Tempo, for the purposes of this study, is defined as a frequency of rhythmic pulses within a given time period. This is usually measured in beats per minute. Dynamics refers to the amplitude of the musical sounds and the appropriateness within the context of the music. The assessment of timbre is defined as the manipulation of a sound wave that results in a change in the characteristic of the sound produced by a musical instrument. Interpretation is defined as a manipulation of the original or expected musical components in order to effectively express the stylistic concerns of the music. The identified factors vibrato and rubato are considered to be functions of performer interpretation and were omitted to avoid redundancy. These variables are hypothesized to have a direct effect on evaluations of expressive aspects of performer-controlled musical factors. The hypothesized paradigm suggests that both technique and musical expression have direct effects on overall perceptions of performance quality. Technique is also theorized to have an indirect effect on overall performance quality perception through musical expression. The model presented in this section illustrates the theoretical importance of both technical and expressive musical factors when evaluating the quality of musical performance, and suggests that these performance dimensions have an effect 89 on the outcome of perceived performance quality. Both technical and expressive factors consist of smaller musical components. This hypothesized model of aurally perceived performer-controlled musical factors illustrates the predicted structure of identified performance variables on the global perception of musical performance quality. Construction of the Aural Musical Performance Quality Measure Data for each of the identified performance variables were gathered using a researcher-designed measure of aural musical performance quality. The first-order variables intended for measurement include: tone, intonation, articulation, rhythmic accuracy, tempo, dynamics, timbre, and interpretation. Descriptive items that represent each of the performer-controlled musical factors were used to assess participant performance. The items chosen to represent the first-order performance factors originated from previous research concerning musical performance assessment and musical performance rating scale construction (Abeles, 1971; Neilson, 1973; Bergee, 1987; Zdzinski, 1993, 2002; Russell, 2007). This initial item pool was examined for redundancy, appropriateness, and completeness. Both positive and negative items are desired in order to prevent a set item response during evaluation. Items selected for inclusion in the final item pool were paired with a four-point Likert scale ranging from Strongly Disagree to Strongly Agree. The first-order performance dimensions (tone, intonation, rhythmic accuracy, articulation, tempo, dynamics, timbre, interpretation) were each represented by four items. Additionally, items were gathered and selected in order to collect data regarding technique, musical expression, and overall perceptions of performance quality. The items 90 chosen to represent both technique and musical expression were gathered from previously existing musical performance rating scales (Bergee, 1987, 1993, 1995, 2003; Geringer & Madsen, 1998; Russell, 2007). Each item was examined for appropriateness and redundancy and paired with a four-point Likert scale ranging from Strongly Disagree to Strongly Agree. The independent variables of technique and musical expression were each represented by four items. Data for overall perception of performance quality was obtained using four researcher-created items modeled after items from previous facetfactorial studies that demonstrated high factor loadings (Bergee, 2003; Russell, 2007). The items representing the eleven observed variables were placed on the Aural Musical of Performance Quality (AMPQ) measure (see Appendix C). Gathering Recordings of Solo Music Performance Participants in the performance portion of this study include professional musicians and both undergraduate and graduate performance and music education majors from a large southeastern university. The instrumentation of volunteer performers represents the brass, woodwinds, voice, and strings (including guitar) instrument families. A total of 50 solo performance recordings were gathered in order to produce an adequate sample of performances (brass n =7, strings n = 20, voice n = 6, woodwind n = 17). From this pool of instrumental and vocal performances, a total of four recordings were selected at random to represent each musical instrument category: voice, strings, brass, and woodwind. The volunteer participants were asked to perform excerpts of prepared pieces from their repertoire. Each performance was digitally recorded using a Sony Net-MD 91 Mini-disc recorder (MZ-N10) and Sony Electret Condenser Microphone (ECM-MS907) to ensure a high-fidelity reproduction of each performance. The ability levels of the performances range from beginner to expert in order to provide a wide array of performance abilities. The four performance recordings were randomly ordered and recorded onto compact discs. Each compact disc was then placed into evaluation packets. The evaluation packets contained: (1) one information/direction sheet, (2) one judging consent form, (3) four Aural Musical Performance Quality (AMPQ) measures (one per listening example), (4) one compact disc containing four representative solo performances, and (5) one mailing envelope (for judges solicited out of state) (see Appendices D & E). Evaluations of Recorded Performances College undergraduate and graduate music students, university music professors, primary and secondary school music educators, and professional musicians from Florida, Oklahoma, Virginia, and Colorado were solicited as volunteer judges (N = 58) to evaluate the performance recordings using the Aural Musical Performance Quality (AMPQ) measure. The selection of judges was limited to those who have instrumental or vocal performance experience at a professional or collegiate level. Each volunteer judge received one evaluation packet. Adjudicators were instructed to listen to each recording separately. The participant judges evaluated each performance using the AMPQ measure. Judges were 92 permitted to review the recordings as many times as necessary to aid in the adjudication of the recordings (Zdzinski & Barnes, 2002; Russell, 2007). Data Analysis and Preparation Once the recording evaluations (N = 232) were collected, data were recorded onto an electronic spreadsheet according to the following response key: 4 = Strongly Agree, 3 = Agree, 2 = Disagree, 1 = Strongly Disagree. All negatively worded items were recoded in order to maintain the same metric throughout the analysis. Once all responses were entered, the variables of tone, intonation, rhythmic accuracy, articulation, tempo, dynamics, timbre, interpretation, technique, musical expression, and overall perception of performance quality were analyzed using the raw data from the representative items on the AMPQ measure. These variables were analyzed by utilizing reliability, correlation, and regression sub-routines in the Statistical Package for Social Sciences (SPSS) as well as path analysis sub-routines from the Analysis of Moment Structures (AMOS) software package. CHAPTER 4 Results and Discussion Results Solo instrumental evaluations using the Aural Musical Performance Quality (AMPQ) measure (N = 232) were collected to estimate the proposed model of aurally perceived performer-controlled musical performance factors. A reliability analysis was conducted on the AMPQ measure. The alpha reliability for the total AMPQ measure was estimated at .977. Individual alpha reliabilities for all subscales contained in the AMPQ measure are provided in Table 2. Table 2 Total and Subscale Reliabilities for AMPQ Measure Scale # of Items Cronbach’s α Total (all variables) 44 .977 Technique/ Musical Expression/ Overall Perception 12 .957 Technique 4 .922 Musical expression 4 .891 Overall Perception 4 .930 Tone/ intonation/ rhythmic accuracy/ articulation 16 .937 Tone 4 .896 Intonation 4 .826 Rhythmic accuracy 4 .886 93 94 Table 2 continued Articulation Tempo/ dynamics/ timbre/ interpretation 4 .789 16 .927 Tempo 4 .838 Dynamics 4 .909 Timbre 4 .887 Interpretation 4 .891 Note. Reliabilities were calculated with N = 232. According to the proposed model, the components of technique include tone, intonation, rhythmic accuracy, and articulation. The subscale reliability for this group of component factors was estimated at .937. The model also describes the components of musical expression as tempo, dynamics, timbre, and interpretation. The subscale reliability for this grouping of component factors is .927. The hypothesized component structures of technique and musical expression (illustrated in Figures 1 & 2) are supported through strong correlation between the component factors and the observed variables of technique and musical expression (see Table 3 & Table 4). A correlation between variables must exist if variables are to be included in the same model (Keith, 2006). 95 Table 3 Correlations Between Technique and Component Factors Subscale Tone Intonation Rhythmic Accuracy Articulation Evaluations (N = 232) 1 .667*** .542*** .743*** Intonation .667*** 1 .592*** .680*** Rhythmic Accuracy .542*** .592*** 1 .649*** Articulation .743*** .680*** .649*** 1 Technique .795*** 693*** .655*** .827*** Tone *** p < .001 Table 4 Correlations Between Musical Expression and Component Factors Subscale Tempo Dynamics Timbre Interpretation Evaluations (N = 232) 1 .303*** .466*** .529*** Dynamics .303*** 1 .585*** .658*** Timbre .466*** .585*** 1 .727*** Interpretation .529*** .658*** .727*** 1 Musical Expression .454*** .646*** 705*** .853*** Tempo *** p < .001 96 The first research question asks about the representativeness of the component factors of technique and music expression. The proposed model in this study hypothesizes that performer-controlled factors of technique are represented by tone, intonation, rhythmic accuracy, and articulation. The performer-controlled factors of musical expression are hypothesized to consist of tempo, dynamics, timbre, and interpretation. To address this research question, two separate regression analyses were employed. A regression analysis allows for the calculation of direct effects for each of the component factors of technique and musical expression. The standardized coefficients from the sub-routine provide the beta weights necessary to estimate the influence of each variable. A supplementary confirmatory factor analysis, analyzing the representativeness of items chosen to represent each of the component factors, is located in Appendix F. The results of the regression of technique on tone, intonation, articulation, and rhythmic accuracy indicate that these component factors account for 76% of the variance in technique as measured by the AMPQ instrument (R2 = .760, F(4, 227) = 179.28, p < .001) (see Table 5). Tone, rhythmic accuracy, and articulation predict significant (p < .01) moderate to large increases in appraised quality of technique in musical performance. Intonation, however, predicts a small but meaningful non-significant increase in appraised technical quality (β = 0.09, b = 0.11, p = .075) (see Figure 4). 97 Table 5 Summary of Simultaneous Regression for Variables Predicting Technique Variable B SE B β CI Tone .381 .059 .338*** [.265 - .497] Intonation .114 .064 .087 [-.012 - .240] Rhythmic Accuracy .194 .057 .153** [.081 - .307] Articulation .513 .072 .406*** [.371 - .655] Note. N = 232 for this regression. CI = Confidence Interval. **p < .01. ***p < .001. Figure 4. Model of Performer-Controlled Components of Technique 98 Musical expression was regressed on tempo, dynamics, timbre, and interpretation (see Table 6). The variables tempo, dynamics, timbre, and interpretation combined to account for 75.7% of the variance in musical expression as measured by the AMPQ instrument (R2 = .757, F(4, 227) = 176.84, p < .001). The results illustrated in Table 6 indicate that the representativeness of these component factors is mixed. Dynamics, timbre, and interpretation demonstrate significant (p < .01) moderate to large effects on musical expression. In contrast, tempo (β = 0.01, b = 0.01, p = .771) predicts a negligible (0.01 SD) non-significant increase in appraised quality of musical expression for every one SD increase in appraised quality of tempo after controlling for the effects of dynamics, timbre, and interpretation (see Figure 5). Table 6 Summary of Simultaneous Regression Analysis for Variables Predicting Musical Expression Variable B SE B β CI Tempo .012 .043 .011 [-.072 - .097] Dynamics .138 .042 .141** [.056 - .220] Timbre .135 .047 .141** [.043 - .226] Interpretation .640 .053 .660*** [.537 - .744] Note. N = 232 for this regression. CI = Confidence Interval. **p < .01. ***p < .001. 99 Figure 5. Model of Performer-Controlled Components of Musical Expression A path analysis was employed to address the second research question regarding the relative contributions of technique and musical expression on overall perception of performance quality. Path analysis allows for the estimation of standardized path coefficients also known as beta weights. The path analysis can be calculated by utilizing the correlations, standard deviations, mean, and number of cases to produce a covariance matrix (see Table 7). 100 Table 7 Means, Standard Deviations, and Pearson Correlations for Combined Instrument Path Model of Performer-Controlled Musical Factors Pearson r Variable M SD 1 2 3 Evaluations (N = 232) 1.Technique 10.987 3.201 1 .715** .844** 2. Musical Expression 11.560 2.638 .715** 1 .786** 3. Overall Perception 11.448 3.155 .844** .786** 1 **Correlation is significant for p < .01 (2-tailed) Table 8 illustrates the results of the path analysis of the performer-controlled musical factors across evaluations of brass, woodwind, voice, and string instruments. Standardized path coefficients estimating direct effects for the model of performercontrolled musical factors are interpreted as follows: technique overall perception (β = 0.58) predicts a large increase in appraised quality of overall perception of performance quality for every one SD increase in appraised quality of technique after controlling for the effects of musical experience; technique musical expression (β = 0.72) predicts a large increase in appraised quality of musical expression for every one SD increase in appraised quality of technique after controlling for the effects of musical experience; musical expression overall perception (β = 0.38) predicts a large increase in appraised quality of overall perception of performance quality for every one SD increase in 101 appraised quality of musical expression after controlling for the effects of musical experience. A complete report of the path analysis output is located in Appendix G. Table 8 Path Estimates for Model of Performer-Controlled Musical Factors across Brass, String, Voice, and Woodwind Instruments Estimated Path B SE B β Technique Musical Expression .589 .038 .715*** Technique Overall Perceptions .569 .043 .577*** Musical Expression Overall Perceptions .447 .053 .373*** Note. N = 232. ***p < .001. In the case of simple recursive models such as the one presented in this study, the paths estimated using structural equation modeling programs are equal to the coefficients estimated in a series of simultaneous and sequential regressions (Keith, 2006). This series of regressions provides valuable estimations regarding the amount of overall variance accounted for by the technique and musical expression while estimating the same paths in the proposed model. Two separate regressions were employed to estimate the model: 1) sequential regression of overall perception of performance quality on the predictor variables of technique and musical expression, 2) simultaneous regression of musical expression on technique. The hypothesized model of performer-controlled musical factors determined the entrance order of the variables in the sequential regression. Results of the sequential 102 regression indicate that the variables technique and musical expression combined to account for 77.9% of the variance in overall perceptions of performance quality (R2 = .779, F(2, 229) = 407.55, p < .001). Musical expression accounted for an additional 7.0% of variance in overall perception, after controlling for effects of technique (ΔR2 = .069, F(1, 229) = 71.84, p < .001) (see Table 9). Table 9 Summary of Sequential Regression Analysis for Variables Predicting Overall Perception of Performance Quality Variable B SE B β CI .832 .035 .844*** [.763 - .900] Technique .568 .044 .576*** [.482 - .653] Musical Expression .448 .053 .375*** [.344 - .553] Step 1 Technique Step 2 Note. N = 232. *** p < .001 Results of the simultaneous regression indicate that technique is estimated to account for 51.1% of the variance in musical expression (R2 = .511, F(1, 230) = 239.96, p < .001) (see Table 10). 103 Table 10 Summary of Simultaneous Regression of Musical Expression on Technique Variable Technique B SE B β CI .589 .038 .715*** [.514 - .664] Note. N = 232 for this regression. CI = Confidence Interval. *** p < .001 The third research question inquires about the fit of the hypothesized model. This is answered through indices of model fit, which compare the observed covariance matrix to the expected covariance matrix. Model fit indices essentially provide information regarding how well the collected data fits the proposed model. Absolute indices of model 2 fit report a non-significant Chi-square estimate (χ = 0.00, df = 0, N = 232). This score can be interpreted as an indication of good-fit, and is a result of the just-identified status of the proposed model. A just-identified model status describes a model where the number of possible variable estimates is equal to the number of variables actually estimated. In other words, there is enough information to solve for the paths of the proposed model (Keith, 2006). The second half of this research question asks: can a model of musical performance assessment be created and tested using performer-controlled musical factors for the outcome of evaluating aurally perceived musical performance quality? The answer to this question was contingent upon the ability to estimate the hypothesized model of performer-controlled musical factors using available statistical techniques and the data collected. The methods used to construct the AMPQ measure and the results of the reliability analyses indicate that musical performance can be measured with 104 acceptable reliability and construct validity. Results of the path analysis demonstrate the ability to estimate the hypothesized paths of the proposed model (Figure 6). Technique demonstrated large direct (β = .58) and indirect effects (β = .27) on overall perceptions of performance quality, as well as significant direct effect (β = .72) on musical expression. Musical expression also demonstrated significant direct effects (β = .37) on overall perception of performance quality (see Appendix D). The theory, time precedence, relevant research and logic suggest it is indeed possible to create and test a hypothesized model of performer-controlled music factors. Figure 6. Performer-controlled Musical Performance Factors: Standardized Estimate Model 105 The fourth research question inquires about the existence and stability of the proposed model among the separate musical instrument categories of solo brass, woodwind, strings, and voice performance. By sorting the data via instrument category, the necessary matrices consisting of number of cases, standard deviation, and correlations demonstrating correlation between technique, musical expression, and overall perception of performance quality were calculated (see Tables 13, 15, 17, &19). Individual path estimates of the performer-controlled musical factors categorized by instrument indicate that the performer-controlled musical factor model does remain stable across assessments of string, voice, and woodwind performance quality. However, the brass model demonstrates a moderate but non-significant path estimate of technique on overall perceptions of performance quality (see Table 11). Table 11 Standardized Path Coefficient Comparisons between Combined and Individual Instrument Path Models Model Estimated Path Combined Woodwind Voice String Brass (N = 232) (n = 58) (n = 58) (n = 58) (n = 58) Technique Musical Expression .715*** .459*** .541*** .452*** .279* Technique Overall Perceptions .577*** .332** .561*** .610*** .182 Musical Expression Overall Perceptions .373*** .519*** .413*** .271** .481** *p < .05. , **p < .01., ***p < .001. 106 Since each participant evaluated all four musical examples, the number of evaluations for each instrument category was equal to the number of total participants (N = 58). Correlations, standard deviations, and means (see Table 13) were used to calculate the standardized path coefficients for the woodwind model of performer-controlled musical performance factors. The standardized coefficients are interpreted as follows: technique overall perception (β = 0.33) predicts a large increase in appraised quality of overall perception of performance quality for every one SD increase in appraised quality of technique; technique musical expression (β = 0.46) predicts a large increase in appraised quality of musical expression for every one SD increase in appraised quality of technique; musical expression overall perception (β = 0.52) predicts a large increase in appraised quality of overall perception of performance quality for every one SD increase in appraised quality of musical expression (see Table 12). Appendix H contains all of the path analysis output related to the woodwind model. Table 12 Estimated Path Coefficients for the Woodwind Model of Performer-Controlled Musical Factors Estimated Path B SE B β Technique Musical Expression .464 .119 .459*** Technique Overall Perceptions .311 .095 .332** Musical Expression Overall Perceptions .480 .094 .519*** Note. n = 58 for this path analysis. **p < .01. ***p < .001. 107 Table 13 Means, Standard Deviations, and Pearson Correlations for Woodwind Path Model of Performer-Controlled Musical Factors Pearson r Variable M SD 1 2 3 Evaluations (n = 58) 1.Technique 11.672 2.114 1 .459*** .570*** 2. Musical Expression 11.483 2.138 .459*** 1 .671*** 3. Overall Perceptions 12.362 1.980 .570*** .671*** 1 *** p < .001 (2-tailed) The standardized path coefficients estimated for the voice model of performercontrolled musical performance factors are reported and interpreted as follows: technique overall perception (β = 0.56) large increase in appraised quality of overall perception of performance quality for every one SD increase in appraised quality of technique; technique musical expression (β = 0.54) predicts a large increase in appraised quality of musical expression for every one SD increase in appraised quality of technique; musical expression overall perception (β = 0.41) predicts a large increase in appraised quality of overall perception of performance quality for every one SD increase in appraised quality of musical expression (see Table 14). Correlations, standard deviations, and means necessary for calculation of the standardized path coefficients are provided in Table 15. Appendix I contains all path analysis output for the voice model. 108 Table 14 Estimated Path Coefficients for the Voice Model of Performer-Controlled Musical Factors Estimated Path B SE B β Technique Musical Expression .548 .113 .541*** Technique Overall Perceptions .521 .075 .561*** Musical Expression Overall Perceptions .378 .074 .413*** Note. n = 58 for this path analysis. ***p < .001. Table 15 Means, Standard Deviations, and Pearson Correlations for Voice Model of PerformerControlled Musical Factors Pearson r Variable M SD 1 2 3 Evaluations (n = 58) 1.Technique 11.707 1.910 1 .541*** .784*** 2. Musical Expression 12.070 1.936 .541*** 1 .716*** 3. Overall Perceptions 12.207 1.775 .784*** .716*** 1 *** p < .001 (2-tailed) Standardized path coefficients, calculated from the necessary descriptive statistics and correlations, were estimated for the strings model of performer-controlled musical 109 performance factors (see Tables 16 & 17). These path coefficients are interpreted as follows: technique overall perception (β = 0.61) predicts a large increase in appraised quality of overall perception of performance quality for every one SD increase in appraised quality of technique; technique musical expression (β = 0.45) predicts a large increase in appraised quality of musical expression for every one SD increase in appraised quality of technique; musical expression overall perception (β = 0.27) predicts a large increase in appraised quality of overall perception of performance quality for every one SD increase in appraised quality of musical expression (see Table 16). Appendix J contains all path analysis output for the string model. Table 16 Estimated Path Coefficients for the String Model of Performer-Controlled Musical Factors Estimated Path B SE B β Technique Musical Expression .451 .118 .452*** Technique Overall Perceptions .637 .099 .610*** Musical Expression Overall Perceptions .284 .099 .271** Note. n = 58 for this path analysis. **p < .01. ***p < .001. 110 Table 17 Means, Standard Deviations, and Pearson Correlations for String Model of PerformerControlled Musical Factors Pearson r Variable M SD 1 2 3 Evaluations (n = 58) 1.Technique 13.741 1.888 1 .452*** .733*** 2. Musical Expression 13.776 1.883 .452*** 1 .547*** 3. Overall Perceptions 13.897 1.971 .733*** .547*** 1 *** p < .001 (1-tailed) After collecting the means, standard deviations, and sample size statistics for the brass model (see Table 19), beta weights were estimated. The results of the path analysis are interpreted as follows: technique overall perception (β = 0.18, p = .112) predicts a small but meaningful non-significant increase in appraised quality of overall perception of performance quality for every one SD increase in appraised quality of technique; technique musical expression (β = 0.28) predicts a large increase in appraised quality of musical expression for every one SD increase in appraised quality of technique; musical expression overall perception (β = 0.48, b = 0.52, p < .001) predicts a large increase in appraised quality of overall perception of performance quality for every one SD increase in appraised quality of musical expression (see Table 18). Appendix K contains all path analysis output for the brass model. 111 Table 18 Estimated Path Coefficients for the Brass Model of Performer-Controlled Musical Factors Estimated Path B SE B β Technique Musical Expression .295 .135 .279* Technique Overall Perceptions .207 .130 .182 Musical Expression Overall Perceptions .519 .123 .481*** Note. n = 58 for this path analysis. *p < .05. ***p < .001. Table 19 Means, Standard Deviations, and Pearson Correlations for Brass Model of PerformerControlled Musical Factors Pearson r Variable M SD 1 2 3 Evaluations (n = 58) 1.Technique 6.828 1.875 1 .279** .316*** 2. Musical Expression 8.914 1.985 .279** 1 .532* 3. Overall Perception 7.328 2.139 .316*** .532* 1 *p < .05 (1-tailed), **p < .01 (1-tailed), ***p < .001 (1-tailed). 112 Discussion This study investigated a theoretical influence of aurally perceived performercontrolled musical factors on assessments of musical performance quality. The analysis of hypothesized higher-order factors indicates that the proposed model successfully illustrates the positive relationship between technique and musical expression and assessments of performance quality. These results coincide with previous research on performance constructs and assessment that suggest the importance of both technical and expressive aspects in the evaluation of musical performance (Hevner, 1938; Abeles, 1971; Suchor, 1977; Levi, 1978; Jones, 1986; Bergee, 1987, 1995, 2003; Zdzinski, 1993; Thompson, Diamond, & Balkwill, 1998; Zdzinski & Barnes, 2002; Wrigley, 2005; Russell, 2007). This study also examines the composition of both technique and musical expression. Results from this analysis of technique and musical expression components reveal a number of first-order factors that influence assessments of technique and musical expression quality. These findings correspond with previous performance assessment research studies that support the ability to evaluate musical performance and represent higher-order factors using related component factors (Hevner, 1938; Abeles, 1971; Levi, 1978; Jones, 1986; Mills 1987; Bergee, 1987, 1995, 2003; Zdzinski, 1993; Juslin, 1997b; Zdzinski & Barnes, 2002; Miksza, 2007; Russell, 2007). The results of the regressions of technique and musical expression on their respective component factors unveil the representative nature of the hypothesized firstorder component factors of technique and musical expression. The variables tone, 113 intonation, rhythmic accuracy, and articulation combined to account for 76% of the variance in technique as measured by the AMPQ instrument. Reliability scores for this group of factors and subscales indicate that these components were measured with a small amount of error (α = .937). The initial inclusion of the component factors of both technique and musical expression was supported by the occurrence of strong correlation with each of the component factors (Keith, 2006). Analysis of the observed components of technique is mostly consistent with the hypothesized model of the performer-controlled components of technique. Tone, rhythmic accuracy, and articulation indicated significant effects on technique. However, intonation indicated a non-significant effect. The lack of significance demonstrated by the intonation variable (p = .075) could be attributed to a number of causes. One simple reason for the lack of significance, since significance estimates are linked to sample size, is the use of a modest number of cases (N = 232). Another reason for this would be the possibility that intonation might not be part of performer-controlled technique factors. Research by Bergee (1987, 1993, & 1995) and Miksza (2007) provide support for this interpretation. However, because of the occurrence of a moderate effect size, the significance for intonation in this model is most likely dependent upon a larger sample size (Kline, 2005). This model of technique indicates that increases in perceived quality of tone, intonation, rhythmic accuracy, and articulation can lead to predictable increases in perceived quality of technique. The component factors of musical expression demonstrated mixed results. The variables tempo, dynamics, timbre, and interpretation combined to account for 75.7% of the variance in musical expression as measured by the AMPQ instrument. The alpha 114 reliability for the sub-group of factors is reported at .927. Even though not all of the component factors emerged as significant components of musical expression, dynamics, timbre, and interpretation all demonstrated positive beta weights that indicate increases in perceived quality of these factors will lead to predictable increases in the perceived quality of musical expression in performance. Tempo, however, did not exhibit this statistically significant relationship (p = .771). The negligible path coefficient and lack of significance (p > .05) of the tempo variable could be attributed to several factors. The modest sample size could have influenced the lack of significance. However, the lack of effect size as indicated by the standardized path coefficient suggests that tempo, however correlated to musical expression, may not belong to the component model of musical expression. This outcome is concurrent with Johnson and Geringer (2005) who found that tempo did not demonstrate predictability. However, these results also contradict those found by Geringer and Johnson (2007) that demonstrated tempo as an influential factor in perceptions of musical quality. The exclusion of the tempo factor is an interesting finding. The correlation (r = .454) indicates that a positive relation between tempo and musical expression exists. However, previous research on the relation of tempo to musical performance may hold some clues to this non-significant effect. Many researchers have utilized a combination of rhythm and tempo as indicators of performance quality (Bergee, 1987; Zdzinski & Barnes, 2002; Johnson & Geringer, 2007; Russell, 2007). One possible reason for the positive correlation between musical expression and tempo and an insignificant effect 115 might be that tempo is actually related to the components of technique. For example, tempo could be a component factor of rhythmic accuracy. The phasing out of the tempo factor could also be a result of differences in musical style. Non-jazz music is generally performed at tempos that are previously decided by the composer. Whereas jazz performance tempos are, at many times, dictated by the performer. This stylistic difference could have played a part in the current study since the randomly selected performances were all performed in a non-jazz style. This implies that tempo could belong to a model other than that of performer-controlled musical factors. Results from the analysis of both technique and musical expression component models support conclusions made by previous researchers that indicate the existence of higher order relationships between component factors of musical performance (Bergee, 1995; Thompson, Diamond, & Balkwill, 1998; Wrigley, 2005; Russell, 2007). The results from the path analysis of performer-controlled musical factors and the influence on assessments of overall performance quality indicate the ability to predict increases in perceptions of overall quality both directly and indirectly through technique and directly through musical expression. This model also illustrates the ability to positively effect perceptions of musical expression through an increase in perceived quality of technique. This influence of technical and expressive musical performance factors is supported in previous research on performance constructs and musical achievement (Suchor, 1977; Levi, 1978; Zdzinski, 1993; Thompson & Williamon, 2003; Johnson & Geringer, 2007; Miksza, 2007). The results of this path analysis are consistent with the predicted model of performer-controlled musical factors. 116 The absolute index of good-fit indicated that the collected data demonstrated 2 good-fit with the proposed model of performer-controlled factors (χ = 0.00, df = 0, N = 232). This fit index could be interpreted as a perfect fit between the observed and expected covariance matrices. However, caution should be taken with such an interpretation when dealing with just-identified models. Since most, if not all, justidentified models could obtain the same indication of good-fit, care must be taken in the development of the theory to which the data is applied. Theory, supported by relevant research, time precedence, and logic must be developed before assumptions about relationships can be tested in a causal framework. Research that tests a hypothesis without theory is purely of an exploratory nature and cannot make any causal assumptions (Kline, 2005; Keith, 2006). The theoretical structure proposed and tested in the current study is based on previous research concerning performance constructs, musical achievement, adjudicators and the adjudication process, musical expression, and performance measure development in addition to the necessary facets of theoretical development that include time precedence and logic. A comparison of the proposed model among the instrumental categories included in this study indicated a general stability of the proposed structure. Solo musical performances of woodwind, voice, and string instruments demonstrated that the proposed influence of technique and musical expression on overall perception of performance quality remains significant (p < .01). These results are concurrent with Wrigley (2005) who found a great deal of overlap between brass, strings, woodwind, and piano instruments. Differences in significant beta weights for the individual models could be attributed to several factors. 117 One possible reason for the observed differences between the individual instrument models and the combined instrument model is the difference in sample size. The sample size for the combined model (N = 232) was four times larger than the sample size for each of the individual instrumental models (n = 58). A larger sample size would be helpful in obtaining a truer population estimate. Another possible reason supported in prior research by Abeles (1971), Jones (1986), Bergee (1987, 2003), Zdzinski & Barnes (2002), Wrigley (2005), and Russell (2007) is the notion of instrument differences. This suggests that differences in model structure are perhaps due to physical and technical differences between instruments. However, this cannot be confirmed nor denied until a replication of the current study is conducted with larger sample sizes. When comparing the structures of each solo instrument model, as listed in Table 11, there are some interesting similarities between the general instrument categories. Both wind instrument categories, woodwind and brass, demonstrated the same pattern of beta weights. Each of the wind instrument categories indicated that the direct effect of musical expression on overall perceptions of performance quality was greater than the direct effects of technique on overall perception and technique on musical expression. The non-wind instrument families indicated that the path between technique and overall perceptions had a greater effect than the direct effects of both musical expression on overall perceptions and technique on musical expression. The individual solo brass instrument model requires a separate discussion. This model did not demonstrate stability regarding the effects of technique and musical expression on perception of overall performance quality. This could be due to several 118 issues not present in the woodwind, voice, or string models. One issue is the occurrence of low mean scores for technique, musical expression, and overall performance quality as compared to the entire sample (see Tables 7 & 19). Another issue is the occurrence of relatively low correlations (r = .28 - .53) in comparison to the entire sample (r = .72 .84). Since path estimates are calculated from these descriptive statistics, it is a logical choice to assume that low means and correlations would have a detrimental effect on a path analysis. The low mean score also suggests that the quality of the randomly chosen solo brass performance was scored lower than the rest of the solo instrument samples. It is possible that this particular performance could have skewed the results of the path estimates. A replication of the current study with a wider range of performance samples to choose from could provide an answer to the question of stability of the proposed model of performer-controlled musical factors within the brass instrument family. The significance outcome of the brass model defies a reasonable explanation. According to the model, brass instrument technique has no significant direct effect on perceptions of performance quality. Instead the results suggest that technique only has an indirect effect on quality assessments through musical expression. The main issue with the significance of this particular model is suspected to be small sample size. The moderate effect of technique on overall perception gives a hint to the possibility that this effect could become clearer with the utilization of a large sample. The estimates of these hypothesized models indicate that it is indeed possible to test theoretical models of musical performance by using available statistical methods. The proposed theory of performer-controlled components of perceived performance quality 119 demonstrates this ability. Technique demonstrates a significant effect on musical expression, and both technique and musical expression demonstrate significant effect on overall perceptions of performance quality. The stability of this model among individual instrument categories provides further support for the validity of the proposed model of aurally perceived performer-controlled musical factors. CHAPTER 5 Summary and Conclusions Summary Previous research on music performance adjudication and the evaluation process yield consistencies in the conceptual framework of musical performance factors (Abeles, 1971; Sagen, 1983; Burnsed, Hinkle, and King, 1985; Mills, 1987; Bergee, 1987, 1995; Saunders & Holahan, 1997; Thompson, Diamond, & Balkwill, 1998; Zdzinski & Barnes, 2002; Wrigley, 2005; Johnson and Geringer, 2007; Russell, 2007). The consistencies revealed in the research studies point to the importance of both technique and musical expression in the evaluation of performance quality (Levi, 1978; Juslin, 1977b; Zdzinski, 1993; Gabrielson, 1999; Juslin & Lindstrom, 2003; Juslin & Laukka, 2004). The purpose of this study was to test a hypothesized model of the aurally perceived performercontrolled musical factors that influence assessments of musical performance quality. This model consists of three main variables: technique, musical expression, and overall perception of musical performance quality. The model asserts that technique has a direct effect on overall perceptions of performance quality and an indirect effect through musical expression. In addition, this model also examines the hypothesized individual component factors of both technique and musical expression. Variables from research on musical performance constructs, musical achievement, performance adjudication, musical expression, and performance measurement were gathered and categorized into aural, visual, performer-controlled, composer, ensemble and non-musical influences. Since the purpose of this study is to examine the aurally 120 121 perceived performer-controlled musical factors, any variable not categorized as aural and performer-controlled was excluded. The remaining factors were examined for redundancy and appropriateness. The factors selected for the current study were tone, intonation, rhythmic accuracy, articulation, tempo, dynamics, timbre, and interpretation. These selected variables were further categorized into either technique or musical expression (Levi, 1978; Juslin & Lindstrom, 2003; Juslin & Laukka, 2004; Wrigley 2005). To collect information on the performance factors of interest, the Aural Musical Performance Quality (AMPQ) measure was constructed. The AMPQ measure was developed using previous items from performance measurement studies that demonstrated strong factor loadings and high reliabilities as well as researcher created items designed after those items demonstrating high factor simple loadings (Abeles, 1971; Bergee, 1987, 1995; Zdzinski, 1993; Saunders & Holohan, 1997; Zdzinski & Barnes, 2002; Russell, 2007). The total reliability analysis reported a Cronbach’s alpha reliability score of .977. Individual subscale reliabilities ranged from .789 to .957 with most alpha reliabilities above .80. The AMPQ measure was used to evaluate solo performance recordings. The four performance recordings used for the present study were randomly selected from a total of 50 volunteer solo performances ranging from beginner to professional and represented brass, string, voice, and woodwind instruments. Volunteer adjudicators (N = 58) from Florida, Colorado, Oklahoma, and Virginia were recruited to evaluate the four performances using the AMPQ measure. Each adjudicator evaluated all four recordings. A total of 232 evaluations using the AMPQ measure were collected. 122 Results from the performance data revealed high positive correlations between technique, musical expression, and overall perceptions of performance quality (r = .72 to .84). These correlations lend support to the inclusion of these variables in the same model (Keith, 2006). Positive correlations were also found between the respective component factors of technique (r = .66 to .83) and musical expression (r = .45 to .85). A path analysis examining the effects of technique and musical expression on overall perceptions of performance quality yielded standardized path coefficients ranging from 0.38 to 0.72. The regression to determine the influences of component factors on technique produced standardized coefficients ranging from 0.09 (intonation) to 0.41 (articulation). A regression of musical expression on hypothesized component factors reported significant standardized coefficients ranging from 0.14 to 0.66. The only variables to yield a non-significant beta weight were tempo (β = 0.01, p > .05) and intonation (β = 0.09, p > .05). Individual analyses of the performer-controlled factors categorized by instrument type revealed the stability of the structure of technique, musical expression, and overall perceptions of performance quality among brass, string, woodwind, and voice instruments. The brass model indicated a significant (p < .05) indirect path between technique and overall perception that is mediated through musical expression, and a moderate but non-significant path estimate between technique and overall perceptions of performance quality (β = 0.18, p > .05). This particular result is contributed to the small sample size for individual instruments. 123 Conclusions Music performance is indeed a complex process. Quality assessments of musical performances continue to be an important part of the process that educates musicians. Whether these assessments are made formally by adjudicators or privately by the musicians themselves, informed evaluations of performance quality can lead to marked improvement in overall performance quality. The identification of a structure of factors that influence assessments of performance quality could provide musicians and educators with the knowledge to make informed assessments of musical performance quality. The identification of this successful model of performer-controlled performance factors satisfies the proposed theory based on logic, time precedence, and related relevant literature. The structure of the model proposed in the present study illustrates the importance of technique and musical expression in perceptions of overall performance quality. Results from the analysis of performance evaluations suggest that deficiencies in technique will influence not just assessments of technique, but musical expression and overall perception of performance quality as well. Furthermore, the stability of the proposed paradigm among string, voice, and woodwind instruments suggests that the utilization of separate structures for the purpose of performance quality assessment may not be necessary. Component factors of both technique and musical expression are also identified in the model proposed in the current study. The component factors of technique are identified as tone, intonation, rhythmic accuracy, and articulation. This suggests that a concentration on the improvement of these individual performance factors could lead to improvements in overall technique. The identified component factors of musical 124 expression are dynamics, timbre, and interpretation. This suggests that a concentration on incorporating and improving these performance factors could lead to predictable increases in perceptions of the expressive qualities of a musical performance. The factor of tempo was dropped due to the need for further investigation regarding the influence of tempo on overall perceptions of performance quality. A larger benefit concerning the identification of musical factors that influence assessments of performance quality is the information provided to the performer. Musicians aiming to improve the quality of their musical performances can employ this model to diagnose deficiencies in preparation and practice strategies. Audiences will benefit through more enjoyable concert experiences and possibly respond with increased patronage to the arts and arts education. On a more global scale, music education can benefit from the added stability of established theoretical models. The establishment of theoretical models concerning assessment of music performance could silence those would argue against the ability to teach music to students and demonstrate predictable educational outcomes. In this age of accountability in education, music education needs empirical support in order to continue to spread the joy of making music to students in public and private school settings. Suggestions for Further Research A continued examination into the stability of the model of aurally perceived performer-controlled musical factors is needed. Since musical performance is such a complex process to examine, it is imperative that we continue to test theories that could possibly have great impact on music teaching and learning. A replication of the present 125 study with much larger sample sizes is necessary to continue to examine the stability of the performer-controlled musical factors. In addition, a replication would help to clarify the stability of the component structures of both technique and musical expression. Specifically, an examination of the influence of tempo on overall perception and the possible link to rhythmic accuracy, and an examination of intonation and the possible link to tone is necessary. With the replication of the present study, a concentration on obtaining large samples of homogeneous instrument grouping would be necessary to obtain a clearer estimation of the stability of the model of performer-controlled musical factors within individual instrument categories. An emphasis should be placed on the investigation of the stability of the brass model. Another research suggestion would be to extend this structure to instrument categories not utilized in the present study. An investigation into the stability of the model of performer-controlled musical factors in piano and pitched percussion could be executed with the same research design employed in the current study. This continued investigation is necessary to come closer to fully realizing the stability of the current model across instrument categories. A continued examination of the component models of technique and musical expression could also be helpful. One suggestion would be to investigate the reclassification of the tempo variable into technique or separate component factor. The identification and classification of individual performer-controlled musical factors would have implications for improved diagnosis in music instruction. 126 The existence of the proposed model of performer-controlled musical factors also alludes to the existence of related musical models. These models may include, but are not limited to, composer cues, visual cues, and ensemble cues. An investigation into the development of these theoretical models could lead to a greater understanding of the process of musical performance assessment. Implications for Teachers Teachers can utilize information in the present study to help focus strategies for student music learning. By focusing instruction on technique and musical expression teachers can help students reach higher levels of achievement in musical performance. Classroom and studio teachers can also use the individual factors of technique and musical expression as an aide in the diagnosis of student performance deficiencies. The model proposed in the current study also provides teachers with theoretical support for the aspects of music already present in their curriculum. Teachers can utilize the model of performer-controlled musical factors to illustrate to students the effects of musical components on overall perceptions of quality. This structure can be used to create a routine approach to problem solving in a music classroom. Music performance is an integral part of the music education process. Assessment of music performance quality is used to determine professional placements, higher education admission, school ensemble placements, juries, and competitions. The high stakes nature of these assessments call for as much objectivity as possible. In order to attain these higher levels of objectivity and accuracy in musical performance assessment, it is imperative that we continue to explore the nature of musical performance. 127 The testing of theoretical models is necessary to explore and continue to understand the world in which we live. Music is an omnipresent aspect of every culture. Yet the structure of music is so complex that we must decompose it in order to understand it. By continuing to test theories concerning the various aspects of music we can come closer to understanding and improving experiences with music. References Abdoo, F. B. (1980). A comparison of the effects of gestalt and associationist learning theories on the musical development of elementary school beginning wind and percussion instrumental students (Doctoral dissertation, University of Southern California). Dissertation Abstracts International, 41, 1268A. Abeles, H. F. (1971). An application of the facet-factorial approach to scale construction in the development of a rating scale for clarinet music performance. Unpublished doctoral dissertation, University of Maryland, 1971. Abril, C. R., & Flowers, P. J. (2007). Attention, preference, and identity in music listening by middle school students of different linguistic backgrounds. Journal of Research in Music Education, 55, 204-219. Arnold, J. A. (1995). Effects of competency-based methods of instruction and selfobservation on ensemble directors’ use of sequential patterns. Journal of Research in Music Education, 43, 127-138. Asmus, E. P. (1980). Empirical testing of an affective learning paradigm. Journal of Research in Music Education, 28(3), 143-154. Asmus, E. P. (1981). Higher order factors of a multidimensional instrument for the measurement of musical affect. Paper presented at the Research Symposium on the Psychology and Acoustics of Music, University of Kansas, Lawrence, KS. Asmus, E. P. (1989). Factor analysis: A look at the technique through the data of Rainbow. Bulletin of the Council for Research in Music Education, 101. Asmus, E. (1999). Music assessment concepts. Music Educators Journal, 86(2), 19-24. Asmus, E. P. (2009). The measurement of musical expression. Paper presented at the Suncoast Music Education Symposium, Tampa, FL. Azzara, C. D. (1993). Audiation-based improvisation techniques and elementary instrumental students' music achievement. Journal of Research in Music Education, 41, 328-342. Bean, K. L. (1938). An experimental approach to the reading of music. Unpublished doctoral dissertation, University of Michigan, 1938. Dissertation Abstracts International, AAT 0135656. Becker, W. E., Jr. (1983). Research in economic education. Journal of Economic Education, 14(2), 4-10. 128 129 Bergee, M. J. (1987). An application of the facet-factorial approach to scale construction in the development of a rating scale for euphonium and tuba music performance (Doctoral dissertation, University of Kansas, 1987). Dissertation Abstracts International, 49 (05), 1086A. Bergee, M. J. (1993). A comparison of faculty, peer, and self-evaluation of applied brass jury performances. Journal of Research in Music Education, 41, 19-27. Bergee, M. J. (1995). Primary and higher-order factors in a scale assessing concert band performance. Bulletin of the Council for Research in Music Education, 126, 1-14. Bergee, M. J. (1997). Relationships among faculty, peer, and self-evaluations of applied music performances. Journal of Research in Music Education, 45, 601-612. Bergee, M. J. (2003). Faculty interjudge reliability of music performance evaluation. Journal of Research in Music Education, 51, 137-150. Bloom, B. S. (1976). Human characteristics and school learning. New York: McGrawHill. Boyle, J. D., & Radocy, R. E. (1987). Measurement and evaluation of musical experiences. New York: Schirmer Books. Brunswick, E. (1952). The conceptual framework of psychology. Chicago: University of Chicago Press. Burnsed, V., Hinkle, B., & King, S. (1985). Performance evaluation reliability at selected concert festivals. Journal of Band Research, 21(1), 22-29. Butt, D. S., & Fiske, D. W. (1968). Comparison of strategies in developing scales for dominance. Psychological Bulletin, 70(6), 505–519. Byo, J. L., & Brooks, R. (1994). A comparison of junior high musicians and music educators’ performance evaluations of instrumental music. Contributions to Music Education, 21, 26-38. Csikszentmihalyi, M. (1975). Beyond boredom and anxiety. San Francisco: Jossey-Bass. Csikszentmihalyi, M. (1990). Flow: The psychology of optimal experience. New York: Harper & Row. Colwell, R. (1963). An investigation of musical achievement among vocal students, vocal-instrumental students, and instrumental students. Journal of Research in Music Education, 11, 123-130. 130 Cooksey, J. M. (1977). A facet-factorial approach to rating high school choral music performance. Journal of Research in Music Education, 25, 100-114. DCamp, C. B. (1980). An application of the facet-factorial approach to scale construction in the development of a rating scale for high school band performance (Doctoral dissertation, University of Iowa, 1980). Dissertation Abstracts International, 41, 1462A. Deaton, W. L., Poggio, J. P., & Glasnapp, D. R. (1977). A scale to assess the affective entry level of students. Paper presented at the meeting of the National Council on Measurement in Education, New York, 1977. Duerksen, G. L. (1972). Some effects of expectation on evaluation of recorded musical performance. Journal of Research in Music Education, 20, 268-272. Duke, R. L. (1987). Observation of applied music instruction: The perceptions of trained and untrained observers. In C. K. Madsen & C. A. Prickett (Eds.), Applications of research in music behavior (pp. 115-124). Tuscaloosa: University of Alabama Press. Edmonston, W. E., Jr. (1966). The use of the semantic differential technique in the esthetic evaluation of musical excerpts. American Journal of Psychology, 79, 650652. Edmonston, W. E., Jr. (1969). Familiarity and musical training in esthetic evaluation of music. Journal of Social Psychology, 79, 109-111. Farnum, S. E. (1950) Prediction of success in instrumental music. Unpublished doctoral dissertation, Harvard University, 1950. Fiske, H. E. (1975). Judge-group differences in the rating of secondary school trumpet performances. Journal of Research in Music Education, 23, 186-196. Fiske, H. E. (1977). Relationship of selected factors in performance adjudication reliability. Journal of Research in Music Education, 25, 256-263. Fiske, H.E. (1979). Musical performance evaluation ability: Toward a model of specificity. Bulletin of the Council for Research in Music Education, 59, 27-31. Folts, M. L. (1973). The relative effect of two procedures as followed by flute, clarinet, and trumpet students while practicing, on the development if time quality and on selected performance skills: An experiment in student use of sound-recorded material (Doctoral dissertation, New York University). Dissertation Abstracts International, 34, 1312A. 131 Gabrielsson, A. (1999). Studying emotional expression in musical performance. Bulletin of the Council for Research in Music Education, 141, 47-53. Gatewood, E. L. (1927). An experimental study of musical enjoyment. In Schoen (Ed.), The effects of music. Freeport, NY: Books for Library Press. Geringer, J. M., & Johnson, C. M. (2007). Effects of duration, tempo, and performance level on musicians’ ratings of wind band performances. Journal of Research in Music Education, 55, 289-301. Geringer, J. M., & Madsen, C. (1998). Musician’s ratings of good versus bad vocal and string performances. Journal of Research in Music Education, 46, 522-534. Gillespie, R. (1997). Ratings of violin and viola vibrato performance in audio-only and audiovisual presentations. Journal of Research in Music Education, 44, 212-220. Glasnapp, D. R., Poggio, J. P., & Deaton, W. L. (1976). Causal analysis within a mastery learning paradigm. Paper presented at the meeting of the American Educational Research Association, San Francisco, 1976. Gundlach, R. H. (1935). Factors determining the characterization of musical phrases. American Journal of Psychology, 47, 624-643. Gutsch, K. U. (1964). One approach toward the development of an individual test for assessing one aspect of instrumental music achievement. Bulletin of the Council for Research in Music Education, 2. Gutsch, K. U. (1965). Evaluation in instrumental music performance: An individual approach. Bulletin of the Council for Research in Music Education, 4. Hevner, K. (1935). The affective character of the major and minor modes in music. American Journal of Psychology, 47, 103-118. Hevner, K. (1936). Experimental studies of the elements of expression in music. American Journal of Psychology, 48, 246-268. Hevner, K. (1937). The affective value of pitch and tempo in music. American Journal of Psychology, 49, 621-630. Hevner, K. (1938). Studies in expressiveness of music. Music Teachers National Association Proceedings, 39, 199-217. Hewitt, M. P. (2002). Self-evaluation tendencies of junior high school instrumentalists. Journal of Research in Music Education, 50, 215-226. 132 Hewitt, M. P. (2005). Self-evaluation accuracy among high school and middle school instrumentalists. Journal of Research in Music Education, 53, 148-161. Hewitt, M. P. (2007). Influence of primary performance instrument and education level on music performance evaluation. Journal of Research in Music Education, 55, 18-30. Hewitt, M. P., & Smith, B. P. (2004). The influence of teaching career level and primary performance instrument on the assessment of music performance. Journal of Research in Music Education, 52, 314-327. Hillbrand, E. K. (1923). Hillbrand sight-singing test. New York: World Book. Hodges, D. A. (1975). The effects of recorded aural models on the performance achievement of beginning band classes, Journal of Band Research, 12(1), 30-34. Hoffren, J. A. (1964a). A test of musical expression. Bulletin of the Council for Research in Music Education, 2, 32-35. Hoffren, J. A. (1964b). The construction and validation of a test of expressive phrasing in music. Journal of Research in Music Education, 12, 159-164. Horacek, L. (1955). The relationship of mood and melodic pattern in folk songs (Doctoral dissertation, University of Kansas, 1957). Dissertation Abstracts, 17, 1567. Iltis, J. L. (1970). The construction and validation of a test to measure the ability of high school students to evaluate musical performance (Doctoral dissertation, Indiana University, 1970). Dissertation Abstracts International, 31, 3582A. Jackson, S. A., & Marsh, H. W. (1996). Development and validation of a scale to measure optimal experience: The flow state scale. Journal of Sport & Exercise Psychology, 18, 17-35. Johnson, C. M., & Geringer, J. M. (2007). Predicting music majors’ overall ratings of wind band performances: Elements of music. Bulletin of the Council for Research in Music Education, 173, 25-38. Jones, H. (1986). An application of the facet-factorial approach to scale construction in the development of a rating scale for high school vocal solo performance (Doctoral dissertation, University of Oklahoma, 1986). Dissertation Abstracts International, 47, 1230A. Juslin, P. N. (1997a). Emotional communication in music performance: a functionalist perspective and some data. Music Perception, 14, 383-418. 133 Juslin, P. N. (1997b). Can results from studies of perceived expression in music performances be generalized across response formats? Psychomusicology, 16, 77101. Juslin, P. N., & Laukka, P. (2004). Expression, perception, and induction of musical emotions: A review and a questionnaire study of everyday listening (Stockholm Music Acoustics Conference, 2003). Journal of New Music Research, 33, 217238. Juslin, P. N., & Lindstrom, E. (2003). Musical expression of emotions: Modeling composed and performed features. Paper presented at the 5th Triennial ESCOM Conference at Hanover University of Music and Drama, Germany. Keith, T. Z. (2006). Multiple Regression and Beyond. Boston: Pearson Education, 2006. Kim, S. Y. (2000). Group differences in piano performance evaluation by experienced and inexperienced judges. Contributions to Music Education, 27(2), 23-36. Kline, R. B. (2005). Principles and Practice of Structural Equation Modeling. New York: Guilford Press, 2005. Knuth, W. (1967). Achievement Tests in Music: Recognition of Rhythm and Melody: Complete Manual of Directions. rev. ed. Divisions 1, 2, and 3; Forms A and B. San Francisco: Creative Arts Research Associates, 1967. Knuth, W. E. (1933). The construction and validation of music tests designed to measure certain aspects of sight-reading. Unpublished doctoral dissertation, University of California, 1933. Dissertation Abstracts International AAT 0140276. Kopiez, R., & Lee, J. I. (2008). Toward a general model of skills involved in sight reading music. Music Education Research, 10, 41-62. Kruth, E. C. (1973). A suggested technique for evaluating wind instrument performance. Journal of Band Research, 10, 24-36. Langer, S. K. (1953). Feeling and form. New York: Charles Scribner’s Sons. LeBlanc, A. (1980). Outline of a proposed model of sources of variation in musical taste. Bulletin of the Council for Research in Music Education, 61, 29-34. Leonard, C.; House, R. W. (1972). Foundations and principles of music education. New York: McGraw Hill. Levi, D. S. (1978). Expressive qualities in music perception and music education. Journal of Research in Music Education, 26, 425-435. 134 Levinowitz. (1989). An investigation of preschool children's comparative capability to sing songs with and without words. Bulletin of the Council for Research in Music Education, 100, 14-19. Lippitt, G. L. (1973). Visualizing change: Model building and the change process. La Jolla, CA: University Associates, 1973. Madsen, C. K., & Geringer, J. M. (1998). Comparison of good versus bad tone quality/intonation of vocal and string performances: Issues concerning measurement and reliability of the continuous response digital interface. Paper presented at the meeting of the Research Commission of the International Society for Music Education, Johannesburg, South Africa. Madsen, C. K., Geringer, J. M., & Heller, J. (1991). Comparison of good versus bad intonation of accompanied and unaccompanied vocal and string performances using a continuous response digital interface (CRDI). Canadian Music Educator: Special Research Edition, 33, 123-130. Madsen, C. K., Geringer, J. M., & Heller, J. (1993). Comparison of good versus bad tone quality of accompanied and unaccompanied vocal and string performances. Bulletin for the Council of Research in Music Education, 119, 93-100. Marchand, D. J. (1975). A study of two approaches to developing expressive performance. Journal of Research in Music Education, 23, 14-22. McPherson, G. E., & Thompson, W. F. (1998). Assessing music performance: Issues and influences. Research Studies in Music Education (10), 12-24. Miksza, P. (2007). An investigation of observed practice behaviors, self-reported practice habits, and the performance achievement of high school wind players. Journal of Research in Music Education, 55, 359-375. Miksza, P. (2006). Relationships among impulsiveness, locus of control, gender, and music practice. Journal of Research in Music Education, 54, 308-323. Miller, G. A., Felbaum, C., Tengi, R., & Langone, H. (2006). Wordnet: A lexical database for the English language. Retrieved June 16, 2009, from the Princeton University, Cognitive Science Laboratories Web Site: http://wordnet.princeton.edu/ Mills, J. (1987). Assessment of solo musical performance- a preliminary study. Bulletin of the Council for Research in Music Education, 91, 119-125. Mosher, R. M., (1926). A study of the group method of measurement of sightsinging. Unpublished doctoral dissertation, Columbia University, 1926. Dissertation Abstracts International, AAT 0127542. 135 Neilson, J. (1973). A blueprint for adjudicators. The Instrumentalist, 28(5), 46-48. Nichols, J. P. (2005). A factor analysis approach to the development of a rating scale for snare drum performance. (Doctoral dissertation, University of Iowa). Dissertation Abstracts International, 46, 3282A. Norris, C. E., & Borst, J. D. (2007). An examination of the reliabilities of two choral festival adjudication forms. Journal of Research in Music Education, 55, 237-251. Oakley, D. L. (1972). An investigation of criteria used in the evaluation of marching bands. Journal of Band Research, 9(1), 32-37. Oldefendt, S. J. (1976). Scoring instrumental and vocal musical performances. Unpublished paper presented at the National Council on Measurement in Education: San Francisco. [ERIC Document Reproduction Service No. ED 129 839]. Owen, C. D. (1969). A study of criteria for the evaluation of secondary school instrumentalists when auditioning for festival bands. Unpublished doctoral dissertation, East Texas State University, 1969. Payne, D. A. (2003). Applied educational assessment (2nd ed.). Belmont, CA: Wadsworth Thompson Learning. Prickett, C. A. (1987). The effect of self-monitoring on the rate of verbal mannerism of song teachers. In C. K. Madsen & C. A. Prickett (Eds.), Applications of research in music behavior (pp. 125-134). Tuscaloosa: University of Alabama Press. Radocy, R. E. (1986). On quantifying the uncountable in musical behavior. Bulletin of the Council for Research in Music Education, 88, 22-31. Russell, B. E. (2007). An application of the facet-factorial approach to scale construction in the development of a guitar performance rating scale. Unpublished master thesis, University of Miami, 2007. Russell, B. E. (in press). An application of the facet-factorial approach to scale construction in the development of a guitar performance rating scale. Bulletin of the Council for Research in Music Education, 2009. Rutkowski, J. (1990). The measurement and evaluation of children's singing voice development. Quarterly Journal of Music Teaching and Learning, 1(1 & 2), 8195. Sagen, D. P., (1983). The development and validation of a university band performance rating scale. Journal of Band Research 18(2), 1- 11. 136 Saunders, T. C., & Holahan, J. M. (1997). Criteria-specific rating scales in the evaluation of high school instrumental performance. Journal of Research in Music Education, 45, 259-272. Schillinger, J. (1946). The Schillinger system of musical composition. New York: Carl Fischer, Inc. Schleff, J. S. (1992). Critical judgments of undergraduate music education students in response to recorded performances. Contributions to Music Education, 19, 60-74. Schleuter, S. L. (1978). Effects of certain lateral dominance traits, music aptitude, and sex differences with instrumental music achievement. Journal of Research in Music Education, 26, 22-31. Silliman, T. E. (1977). The effect of entrance age in achievement and retention in the beginning band instrument program (Doctoral dissertation, University if Maryland). Dissertation Abstracts International, 48, 5982A. Smith, N. E. (1968). A study of certain expressive-acoustic equivalents in the performance styles of five trumpet players. (Doctoral dissertation, Florida State University, 1968). Dissertation Abstracts International, 30, 5021A-5022A. St. Cyr, A. W. (1977). Evaluative criteria for band, orchestra, chorus. Unpublished doctoral dissertation, Boston College, 1977. Dissertation Abstracts International, AAT 7718638. Stecklein, J. E., & Aliferis, J. (1957). The relationship of instrument to music achievement scores. Journal of Research in Music Education, 5, 3-15. Stelzer, T. G. W. (1935). Construction, interpretation, and use of a sight reading scale in organ music, with an analysis of organ playing into fundamental abilities (Doctoral dissertation, University of Nebraska, 1935). Dissertation Abstracts International, AAT DP13957. Stivers, J. D. (1972). A reliability and validity study of the Watkins-Farnum Performance Scale. Unpublished doctoral dissertation, University of Illinois at UrbanaChampaign, 1972. Suchor, V. (1977). The influence of personality composition in applied piano groups. Journal of Research in Music Education, 25(3), 171-83. Thompson, W., Diamond, C.T.P., & Balkwill, L. (1998). The adjudication of six performances of a Chopin etude: A study of expert knowledge. Psychology of Music, 26, 154-175. 137 Thompson, S., & Williamon, A. (2003). Evaluating evaluation: Musical performance assessment as a research tool. Music Perception, 21, 21-41. Tiede, R. L. (1971). A study of the effects of experience in evaluating unidentified instrumental performances on the student conductor’s critical perception of performance. Doctoral dissertation, University of Illinois at Urbana Champaign, 1971. Dissertation Abstracts International, 32, 4653A-4654A. Van Gigch, J. P. (1991). System Design Modeling and Metamodeling. New York: Plenum Press, 1991. Vasil, T. (1973). The effects of systematically varying selected factors of on music performance adjudication. Unpublished doctoral dissertation, University of Connecticut, Storrs, 1973. Wapnick, J., Flowers, P., Alegant, M., & Jasinskas, L. (1993). Consistency in piano performance evaluation. Journal of Research in Music Education, 41, 282-292. Watkins, J. G. (1942). Objective measurement of instrumental music. New York: Teachers College Bureau of Publications, Columbia University. Watkins, J. G., & Farnum, S. E. (1954). The Watkins-Farnum Performance Scale, Form A & Form B. Milwaukee, WI: Hal Leonard. Wheelwright, L. F. (1940). An experimental study of the perceptibility and spacing of music symbols. Unpublished doctoral dissertation, Columbia University, 1940. Dissertation Abstracts International, AAT 0165563. Whitcomb, R. (1999). Writing rubrics for the music classroom. Music Educators Journal, 85(6), 26-32. Wrigley, W. J. (2005). Improving music performance assessment. Unpublished doctoral thesis, Griffith University, 2005. Zdzinski, S. F. (1991). Measurement of solo instrumental music performance: a review of literature. Bulletin of the Council for Research in Music Education, 109. Zdzinski, S. F. (1993). Relationships among parental involvement, selected student attributes, and learning outcomes in instrumental music. Unpublished doctoral dissertation, Indiana University. Zdzinski, S. F., & Barnes, G. V. (2002). Development and validation of a string performance rating scale. Journal of Research in Music Education, 50, 245-255. Appendix A Variables Collected from Performance Assessment Research Technique Interpretation Rhythm Intonation Phrasing Pitch Musicality Tone Balance Blend Expression Dynamics Amplitude Tempo Articulation Timbre Accent Instrumentation Instrument quality Difficulty Arrangement Attack Release Reading Range Left-hand Communication Melody Mode Tonality Harmony Memory Bow Breathing 138 Rubato Vibrato Embouchure Smoothness Unity Continuity Style Position Posture Ensemble Accompaniment Appearance Conductor Accuracy Appendix B Categorization of Performance Assessment Variables Performer Composer Ensemble Visual Tempo Amplitude Dynamics Articulation Accent Timbre Tone Intonation Technique Interpretation Accuracy Musicality Phrasing Vibrato Breathing Attack Release Range Left-hand Continuity Smoothness Reading Memory Communication Mode Tonality Harmony Pitch Melody Rhythm Arrangement Difficulty Style Unity Balance Blend Accompaniment Conductor Instrumentation Embouchure Position Posture Appearance Bow Instrument Quality 139 Appendix C Aural Musical Performance Quality (AMPQ) Measure Questions: 1. Tone is strong 2. Tone is full 3. Thin tone quality 4. Sound is clear 5. Played out of tune. 6. Performer was able to adjust pitch. 7. Intonation is inconsistent. 8. Intonation is good 9. Correct rhythms 10. Off-beats played properly 11. Rhythm was distorted 12. Insecure rhythm 13. Poor synchronization 14. Attacks and releases were clean 15. Impeccable articulation 16. Articulation is overly percussive 17. Tempo is steady 18. Tempo not controlled 19. The tempo was in good taste 20. Lack of a steady pulse 21. Dynamics are played 22. Dynamics used to help phrasing 23. Good dynamic contrast 24. Appropriate dynamics 25. Timbre was harsh or strident. 26. Demonstrated a singing quality 27. Lacked resonance 28. Timbre appropriate for style 29. The interpretation was musical. 30. Lack of style in performance. 31. Effective musical communication 32. Melodic phrasing 33. Made numerous errors in technique. 34. Insecure technique 35. Precision is lacking 36. Played fluently SA SA SA SA SA SA SA SA SA SA SA SA SA SA SA SA SA SA SA SA SA SA SA SA SA SA SA SA SA SA SA SA SA SA SA SA 140 A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A A D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D D SD SD SD SD SD SD SD SD SD SD SD SD SD SD SD SD SD SD SD SD SD SD SD SD SD SD SD SD SD SD SD SD SD SD SD SD 141 37. 38. 39. 40. 41. 42. 43. 44. Performance not expressive Performance reflected sensitivity Melodic expression Spiritless playing Overall quality lacking Excellent performance overall Poor performance quality Quality of performance is good SA SA SA SA SA SA SA SA A A A A A A A A D D D D D D D D SD SD SD SD SD SD SD SD Appendix D Evaluation Packet Instruction Sheet Please read this page First!! Title: The Empirical Testing of A Musical Performance Assessment Paradigm. DIRECTIONS: To begin: 1. Locate the following materials in the packet: a. Compact disc (1). b. Adjudication form(s) (4). c. Return Envelope- (1 postage paid envelope). 2. Listen to the first track on the compact disc. Read each statement on the adjudication form carefully and indicate your response by circling the “SA” if you strongly agree, the “A” if you agree, the “D” if you disagree, or the “SD” if you strongly disagree. You may listen to the recording as many times as is needed to complete the adjudication form. Please indicate which track you are evaluating by circling the appropriate track number at the top of the adjudication form. 3. Repeat STEP 3 for each of the remaining tracks. When you are finished: 1. Please include the following items in the Return Envelope: a. 4- Adjudication forms (completed). b. 1- Compact Disc. 2. Send the Return Envelope via U.S. Postal service. Place the return envelope with the materials mentioned above in the mailbox. That’s it! Brian Russell 7700 S.W. 146 Road Miami, FL 33183 Questions????: Please feel free to call Brian Russell at: (305) 720-4099 142 Appendix E Waiver of Signed Consent Form Empirical Testing of a Performance Assessment Paradigm Consent PURPOSE: You are being asked to participate in a research study that examines the factors surrounding musical performance evaluation by the University of Miami in Coral Gables, Florida. The purpose of this study is to empirically test a performance assessment paradigm. The results of this study will aid in strengthening the body of research dealing with success in musical performance and evaluation. All evaluations submitted for the purposes of this research will remain anonymous and cannot be traced back to you. No identifiable information (name, address, identification numbers, etc.) will be collected. Completion of the performance evaluation is considered your consent to participation. RISKS: There are no anticipated risks associated with the recording or evaluation form. BENEFITS: No direct benefit can be promised to you from your participation in this study. However, guitar students and teachers may benefit from this research. COSTS: There is no cost to participants involved in this study. PAYMENT TO PARTICIPANT: There is no payment for participation in this study. CONFIDENTIALITY: The investigator will consider your responses confidential to the extent permitted by law. No identifiable information is necessary to take part in this study. RIGHT TO WITHDRAW: Your participation in this study is voluntary; you have the right to withdraw. After the materials are distributed, you can decide to stop at any time. There are no negative consequences if you decide to not participate in this study. OTHER PERTINENT INFORMATION: The investigator will answer any questions you might have about the study. The investigator will give you a copy of this consent form and you may contact him at (305)720-4099. If you have any questions regarding your rights as a research participant you should contact the University of Miami Human Subjects Research Office at (305) 243-3195. 143 Appendix F Confirmatory Factor Analysis of AMPQ Items A supplemental confirmatory factor analysis of the Aural Musical Performance Quality (AMPQ) measure was conducted to determine the representativeness of items selected to represent the component factors of tone, intonation, rhythmic accuracy, articulation, tempo, dynamics, timbre, and interpretation. Confirmatory factor analysis is a method of establishing evidence for the validity of measures. The model depicted below is also referred to as a latent variable measurement model. The AMPQ measure uses 32 Likert scale items to measure performance achievement. The 32 rectangular boxes on the right side of the model, the observed variables, represent the items selected for inclusion on the AMPQ measure. The items are theorized to assess eight different latent constructs. The eight ovals on the left side of the model represent the latent, unobserved, variables: Latent Tone, Latent Intonation, Latent Rhythm Accuracy, Latent Articulation, Latent Tempo, Latent Dynamics, Latent Timbre, and Latent Interpretation. The smaller circles on the leftmost side of the diagram represent the unique variance unaccounted for by the associated latent variable plus any measurement error that may exist. For example: u1 represents the specific variance in the first tone item that is unaccounted for by Latent Tone plus the measurement error for that item (see Figure F1). 144 145 Figure F1. Confirmatory Factor Analysis of AMPQ Items 146 Model-fit indices indicate that the imposed model exhibits adequate fit to the data collected (SRMR = .058; TLI = .916; CFI = .926). In short, this means that the proposed theory is supported by the data, and the proposed model is one viable representation of the true relations underlying the data. Results of the confirmatory factor analysis indicate that the items selected to represent the performance factors: tone, intonation, rhythmic accuracy, articulation, tempo, dynamics, timbre, and interpretation are indeed representative as indicated by the significant pattern coefficients between the observed and latent variables (see Table F1). The covariances illustrated in the model are equivalent, in this case, to correlations between the latent variables. All covariances depicted in this model are significant. Table F1 Pattern Coefficients for AMPQ Factor Analysis Item # Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Q11 Q12 Q13 Latent Variable Tone Tone Tone Tone Intonation Intonation Intonation Intonation Rhythmic Accuracy Rhythmic Accuracy Rhythmic Accuracy Rhythmic Accuracy Articulation Unstandardized Standardized S.E. C.R. p Estimate Estimate value .658 .852 .041 15.894 *** .770 .905 .044 17.558 *** .604 .746 .046 13.035 *** .665 .822 .044 15.028 *** .511 .687 .045 11.371 *** .420 .675 .038 11.111 *** .602 .720 .050 12.093 *** .682 .877 .042 16.090 *** .525 .781 .038 13.759 *** .499 .777 .037 13.646 *** .654 .842 .042 15.387 *** .702 .857 .044 15.816 *** .454 .654 .042 10.807 *** 147 Table F1 continued Q14 Q15 Articulation Articulation .758 .739 .839 .809 .050 15.303 .051 14.495 *** *** Q16 Q17 Articulation Tempo .350 .509 .044 .675 .880 .042 16.188 *** *** Q18 Tempo .560 .788 .041 13.752 *** Q19 Tempo .367 .567 .041 8.935 *** Q20 Tempo .615 .789 .045 13.777 *** Q21 Dynamics .715 .928 .039 18.294 *** Q22 Dynamics .757 .914 .042 17.822 *** Q23 Dynamics .601 .823 .040 15.074 *** Q24 Dynamics .497 .709 .041 12.167 *** Q25 Timbre .606 .749 .046 13.061 *** Q26 Timbre .749 .868 .046 16.305 *** Q27 Timbre .706 .849 .045 15.733 *** Q28 Timbre .539 .793 .038 14.173 *** Q29 Interpretation .609 .797 .043 14.273 *** Q30 Interpretation .644 .807 .044 14.521 *** Q31 Interpretation .720 .887 .043 16.861 *** Q32 Interpretation .592 .788 .042 14.022 *** 7.979 *** p < .001 The high correlation between Latent Tone and Latent Timbre (r = .89) suggests the possibility that these two latent variables are actually measuring the same thing. This revised model can be compared to the original model by constraining the covariance between Latent Tone and Latent Timbre to one; essentially collapsing these two separate 148 latent variables into one latent variable. This hypothesis was tested and compared to the original model (Table F2). Table F2 Model-fit Comparisons Model CMIN df TLI CFI AIC BIC Original 841.20 436 .916 .926 1025.20 1342.30 Revised 890.11 437 .906 .917 1072.11 1385.77 The results of this comparison analysis indicate that the revised model does not demonstrate a significantly better fit than the original model, and actually demonstrates a mathematically worse fit than the original model according to model-fit indices (TLI = .906; CFI = .917). The revised model affords one additional degree of freedom (df = 437) than the original model, but this increase in parsimony is not justified given a comparison of the model-fit indices. Since the results of the model comparison indicate that the items selected to represent tone and timbre do indeed measure separate latent variables, the original measurement model will be retained. Appendix G AMOS Output of Estimated Performer-Controlled Musical Factors Model Across Brass, Woodwind, Voice and String Instruments Regression Weights: (Group number 1 - Default model) Estimate S.E. C.R. P MUSEX <--- TECHNQ .589 .038 15.544 *** OVERALL <--- TECHNQ .569 .043 13.086 *** OVERALL <--- MUSEX .447 .053 8.470 *** Label Standardized Regression Weights: (Group number 1 - Default model) MUSEX <--- TECHNQ OVERALL <--- TECHNQ OVERALL <--- MUSEX Estimate .715 .577 .373 Variances: (Group number 1 - Default model) TECHNQ d1 d2 Estimate S.E. C.R. P Label 10.202 .949 10.747 *** 3.386 .315 10.747 *** 2.175 .202 10.747 *** Matrices (Group number 1 - Default model) Total Effects (Group number 1 - Default model) MUSEX OVERALL TECHNQ .589 .832 MUSEX .000 .447 Standardized Total Effects (Group number 1 - Default model) MUSEX OVERALL TECHNQ .715 .844 MUSEX .000 .373 149 150 Direct Effects (Group number 1 - Default model) MUSEX OVERALL TECHNQ .589 .569 MUSEX .000 .447 Standardized Direct Effects (Group number 1 - Default model) MUSEX OVERALL TECHNQ .715 .577 MUSEX .000 .373 Indirect Effects (Group number 1 - Default model) MUSEX OVERALL TECHNQ .000 .263 MUSEX .000 .000 Standardized Indirect Effects (Group number 1 - Default model) MUSEX OVERALL TECHNQ .000 .267 MUSEX .000 .000 Indirect Effects - Standard Errors (Group number 1 - Default model) MUSEX OVERALL TECHNQ .000 .035 MUSEX .000 .000 Indirect Effects (Group number 1 - Default model) Indirect Effects - Lower Bounds (BC) (Group number 1 - Default model) MUSEX OVERALL TECHNQ .000 .207 MUSEX .000 .000 151 Indirect Effects - Upper Bounds (BC) (Group number 1 - Default model) MUSEX OVERALL TECHNQ .000 .323 MUSEX .000 .000 Indirect Effects - Two Tailed Significance (BC) (Group number 1 - Default model) MUSEX OVERALL TECHNQ ... .001 MUSEX ... ... Appendix H AMOS Output of Estimated Performer-Controlled Musical Factors: Woodwind Model Regression Weights: (Group number 1 - Default model) Estimate S.E. MUSEX <--- TECHNQ .464 .119 OVERALL <--- TECHNQ .311 .095 OVERALL <--- MUSEX .480 .094 C.R. P 3.901 *** 3.273 .001 5.114 *** Label Standardized Regression Weights: (Group number 1 - Default model) MUSEX <--- TECHNQ OVERALL <--- TECHNQ OVERALL <--- MUSEX Estimate .459 .332 .519 Variances: (Group number 1 - Default model) TECHNQ d1 d2 Estimate S.E. 4.393 .823 3.545 .664 1.783 .334 C.R. 5.339 5.339 5.339 P Label *** *** *** Matrices (Group number 1 - Default model) Total Effects (Group number 1 - Default model) MUSEX OVERALL TECHNQ .464 .534 MUSEX .000 .480 Standardized Total Effects (Group number 1 - Default model) MUSEX OVERALL TECHNQ .459 .570 MUSEX .000 .519 152 153 Direct Effects (Group number 1 - Default model) MUSEX OVERALL TECHNQ .464 .311 MUSEX .000 .480 Standardized Direct Effects (Group number 1 - Default model) MUSEX OVERALL TECHNQ .459 .332 MUSEX .000 .519 Indirect Effects (Group number 1 - Default model) MUSEX OVERALL TECHNQ .000 .223 MUSEX .000 .000 Standardized Indirect Effects (Group number 1 - Default model) MUSEX OVERALL TECHNQ .000 .238 MUSEX .000 .000 Indirect Effects - Standard Errors (Group number 1 - Default model) MUSEX OVERALL TECHNQ .000 .074 MUSEX .000 .000 Indirect Effects (Group number 1 - Default model) Indirect Effects - Lower Bounds (BC) (Group number 1 - Default model) MUSEX OVERALL TECHNQ .000 .128 MUSEX .000 .000 154 Indirect Effects - Upper Bounds (BC) (Group number 1 - Default model) MUSEX OVERALL TECHNQ .000 .379 MUSEX .000 .000 Indirect Effects - Two Tailed Significance (BC) (Group number 1 - Default model) MUSEX OVERALL TECHNQ ... .000 MUSEX ... ... Appendix I AMOS Output of Estimated Performer-Controlled Musical Factors: Voice Model Regression Weights: (Group number 1 - Default model) MUSEX <--- TECHNQ OVERALL <--- TECHNQ OVERALL <--- MUSEX Estimate S.E. .548 .113 .521 .075 .378 .074 C.R. P 4.857 *** 6.918 *** 5.090 *** Label Standardized Regression Weights: (Group number 1 - Default model) MUSEX <--- TECHNQ OVERALL <--- TECHNQ OVERALL <--- MUSEX Estimate .541 .561 .413 Variances: (Group number 1 - Default model) TECHNQ d1 d2 Estimate S.E. 3.587 .672 2.606 .488 .820 .154 C.R. 5.339 5.339 5.339 P Label *** *** *** Matrices (Group number 1 - Default model) Total Effects (Group number 1 - Default model) MUSEX OVERALL TECHNQ .548 .728 MUSEX .000 .378 Standardized Total Effects (Group number 1 - Default model) MUSEX OVERALL TECHNQ .541 .784 MUSEX .000 .413 155 156 Direct Effects (Group number 1 - Default model) MUSEX OVERALL TECHNQ .548 .521 MUSEX .000 .378 Standardized Direct Effects (Group number 1 - Default model) MUSEX OVERALL TECHNQ .541 .561 MUSEX .000 .413 Indirect Effects (Group number 1 - Default model) MUSEX OVERALL TECHNQ .000 .207 MUSEX .000 .000 Standardized Indirect Effects (Group number 1 - Default model) MUSEX OVERALL TECHNQ .000 .223 MUSEX .000 .000 Indirect Effects - Standard Errors (Group number 1 - Default model) MUSEX OVERALL TECHNQ .000 .060 MUSEX .000 .000 Indirect Effects (Group number 1 - Default model) Indirect Effects - Lower Bounds (BC) (Group number 1 - Default model) MUSEX OVERALL TECHNQ .000 .127 MUSEX .000 .000 157 Indirect Effects - Upper Bounds (BC) (Group number 1 - Default model) MUSEX OVERALL TECHNQ .000 .333 MUSEX .000 .000 Indirect Effects - Two Tailed Significance (BC) (Group number 1 - Default model) MUSEX OVERALL TECHNQ ... .000 MUSEX ... ... Appendix J AMOS Output of Estimated Performer-Controlled Musical Factors: String Model Regression Weights: (Group number 1 - Default model) MUSEX <--- TECHNQ OVERALL <--- TECHNQ OVERALL <--- MUSEX Estimate S.E. .451 .118 .637 .099 .284 .099 C.R. P 3.826 *** 6.466 *** 2.871 .004 Label Standardized Regression Weights: (Group number 1 - Default model) MUSEX <--- TECHNQ OVERALL <--- TECHNQ OVERALL <--- MUSEX Estimate .452 .610 .271 Variances: (Group number 1 - Default model) TECHNQ d1 d2 Estimate S.E. 3.500 .656 2.772 .519 1.543 .289 C.R. 5.339 5.339 5.339 P Label *** *** *** Matrices (Group number 1 - Default model) Total Effects (Group number 1 - Default model) MUSEX OVERALL TECHNQ .451 .765 MUSEX .000 .284 Standardized Total Effects (Group number 1 - Default model) MUSEX OVERALL TECHNQ .452 .733 MUSEX .000 .271 158 159 Direct Effects (Group number 1 - Default model) MUSEX OVERALL TECHNQ .451 .637 MUSEX .000 .284 Standardized Direct Effects (Group number 1 - Default model) MUSEX OVERALL TECHNQ .452 .610 MUSEX .000 .271 Indirect Effects (Group number 1 - Default model) MUSEX OVERALL TECHNQ .000 .128 MUSEX .000 .000 Standardized Indirect Effects (Group number 1 - Default model) MUSEX OVERALL TECHNQ .000 .123 MUSEX .000 .000 Indirect Effects - Standard Errors (Group number 1 - Default model) MUSEX OVERALL TECHNQ .000 .058 MUSEX .000 .000 Indirect Effects (Group number 1 - Default model) Indirect Effects - Lower Bounds (BC) (Group number 1 - Default model) MUSEX OVERALL TECHNQ .000 .055 MUSEX .000 .000 160 Indirect Effects - Upper Bounds (BC) (Group number 1 - Default model) MUSEX OVERALL TECHNQ .000 .256 MUSEX .000 .000 Indirect Effects - Two Tailed Significance (BC) (Group number 1 - Default model) MUSEX OVERALL TECHNQ ... .002 MUSEX ... ... Appendix K AMOS Output of Estimated Performer-Controlled Musical Factors: Brass Model Regression Weights: (Group number 1 - Default model) MUSEX <--- TECHNQ OVERALL <--- TECHNQ OVERALL <--- MUSEX Estimate S.E. .295 .135 .207 .130 .519 .123 C.R. P 2.194 .028 1.590 .112 4.211 *** Label Standardized Regression Weights: (Group number 1 - Default model) MUSEX <--- TECHNQ OVERALL <--- TECHNQ OVERALL <--- MUSEX Estimate .279 .182 .481 Variances: (Group number 1 - Default model) TECHNQ d1 d2 Estimate S.E. 3.453 .647 3.570 .669 3.087 .578 C.R. 5.339 5.339 5.339 P Label *** *** *** Matrices (Group number 1 - Default model) Total Effects (Group number 1 - Default model) MUSEX OVERALL TECHNQ .295 .361 MUSEX .000 .519 Standardized Total Effects (Group number 1 - Default model) MUSEX OVERALL TECHNQ .279 .316 MUSEX .000 .481 161 162 Direct Effects (Group number 1 - Default model) MUSEX OVERALL TECHNQ .295 .207 MUSEX .000 .519 Standardized Direct Effects (Group number 1 - Default model) MUSEX OVERALL TECHNQ .279 .182 MUSEX .000 .481 Indirect Effects (Group number 1 - Default model) MUSEX OVERALL TECHNQ .000 .153 MUSEX .000 .000 Standardized Indirect Effects (Group number 1 - Default model) MUSEX OVERALL TECHNQ .000 .134 MUSEX .000 .000 Indirect Effects - Standard Errors (Group number 1 - Default model) MUSEX OVERALL TECHNQ .000 .082 MUSEX .000 .000 Indirect Effects (Group number 1 - Default model) Indirect Effects - Lower Bounds (BC) (Group number 1 - Default model) MUSEX OVERALL TECHNQ .000 .048 MUSEX .000 .000 163 Indirect Effects - Upper Bounds (BC) (Group number 1 - Default model) MUSEX OVERALL TECHNQ .000 .333 MUSEX .000 .000 Indirect Effects - Two Tailed Significance (BC) (Group number 1 - Default model) MUSEX OVERALL TECHNQ ... .016 MUSEX ... ...
© Copyright 2026 Paperzz