The Empirical Testing of Musical Performance Assessment Paradigm

University of Miami
Scholarly Repository
Open Access Dissertations
Electronic Theses and Dissertations
2010-05-03
The Empirical Testing of Musical Performance
Assessment Paradigm
Brian Eugene Russell
University of Miami, [email protected]
Follow this and additional works at: http://scholarlyrepository.miami.edu/oa_dissertations
Recommended Citation
Russell, Brian Eugene, "The Empirical Testing of Musical Performance Assessment Paradigm" (2010). Open Access Dissertations. 387.
http://scholarlyrepository.miami.edu/oa_dissertations/387
This Open access is brought to you for free and open access by the Electronic Theses and Dissertations at Scholarly Repository. It has been accepted for
inclusion in Open Access Dissertations by an authorized administrator of Scholarly Repository. For more information, please contact
[email protected].
UNIVERSITY OF MIAMI
THE EMPIRICAL TESTING OF A MUSICAL PERFORMANCE ASSESSMENT
PARADIGM
By
Brian E. Russell
A DISSERTATION
Submitted to the Faculty
of the University of Miami
in partial fulfillment of the requirements for
the degree of Doctor of Philosophy
Coral Gables, Florida
May 2010
©2010
Brian E. Russell
All Rights Reserved
UNIVERSITY OF MIAMI
A dissertation submitted in partial fulfillment of
the requirements for the degree of
Doctor of Philosophy
THE EMPIRICAL TESTING OF A MUSICAL PERFORMANCE ASSESSMENT
PARADIGM
Brian E. Russell
Approved:
____________________
Stephen F. Zdzinski, Ph.D.
Associate Professor of Music
Education and Music Therapy
____________________
Terri A. Scandura, Ph.D.
Dean of the Graduate School
____________________
Nicholas DeCarbo, Ph.D.
Professor of Music Education and
Music Therapy
____________________
Edward Asmus, Ph.D.
Associate Dean, Graduate Studies
____________________
Joyce Jordan, Ph.D.
Professor of Music Education and
Music Therapy
____________________
Jill Kaplan, Ph.D.
Lecturer, School of Psychology
RUSSELL, BRIAN E.
(Ph.D., Music Education)
The Empirical Testing of a Musical
Performance Assessment Paradigm.
(May 2010)
Abstract of a dissertation at the University of Miami.
Dissertation supervised by Associate Professor Stephen F. Zdzinski.
No. of pages in text. (163)
The purpose of this study was to test a hypothesized model of aurally perceived
performer-controlled musical factors that influence assessments of performance quality.
Previous research studies on musical performance constructs, musical achievement,
musical expression, and scale construction were examined to identify the factors that
influence assessments of performance quality. A total of eight factors were identified:
tone, intonation, rhythmic accuracy, articulation, tempo, dynamics, timbre, and
interpretation. These factors were categorized as either technique or musical expression
factors. Items representing these eight variables were chosen from previous research on
scale development. Additional items, along with researcher created items, were also
chosen to represent the variables of technique, musical expression and overall perceptions
of performance quality. The 44 selected items were placed on the Aural Musical
Performance Quality (AMPQ) measure and paired with a four-point Likert scale. The
reliability for the AMPQ measure was reported at .977. A total of 58 volunteer
adjudicators were recruited to evaluate four recordings that represented one of each
instrumental category of interest: brass, woodwind, voice, and string. The resulting
performance evaluations (N = 232) were analyzed using statistical regression and path
analysis techniques. The results of the analysis provide empirical support for the
existence of the model of aurally perceived performer-controlled musical factors.
Technique demonstrated significant direct effects on overall perceptions of performance
quality and musical expression. Musical expression also demonstrated a significant direct
effect on overall perceptions of performance quality. The results of this study are
consistent with a hypothesized model of performer-controlled musical factors.
Dedicated to Kimberly, my loving and supportive wife.
iii
Acknowledgements
I would like to acknowledge my advisor, Dr. Stephen F. Zdzinski, for all of the
guidance and patience he has provided throughout the development of this study. I would
also like to acknowledge my parents for keeping the house for a little while longer so I
could accomplish my goals. I would also like to thank the members of my committee for
offering valuable advice and guidance. In addition, I want to acknowledge Dr. Nick
Myers for providing me with the statistical knowledge for completing this study and
future research endeavors. Thank you to all of my friends and family for all of the
support and words of encouragement. I truly appreciate all of you.
iv
CONTENTS
Table of Figures ................................................................................................................ vii
List of Tables ................................................................................................................... viii
CHAPTER I Statement of the Problem ..............................................................................1
Justification .........................................................................................................5
The Theory ..........................................................................................................6
Purpose of the Study .........................................................................................10
Delimitations of the Study ................................................................................11
CHAPTER 2 Related Literature .......................................................................................12
Research Attempts to Identify Performance Variables.....................................13
Musical Performance Achievement as a Dependent Variable..........................27
Adjudicators and the Adjudication Process ......................................................34
Performance Aspects of Musical Expression....................................................48
Development of Musical Performance Measures .............................................56
Watkins-Farnum Performance Rating Scale ............................................57
Facet-factorial rating scales. .....................................................................60
Criteria-specific rating scales. ..................................................................71
Summary of Related Literature.........................................................................77
CHAPTER 3 Method........................................................................................................84
Gathering Performance Dimensions .................................................................84
Development of the Tentative Model ...............................................................86
Construction of the Aural Musical Performance Quality Measure...................89
Gathering Recordings of Solo Music Performance ..........................................90
Evaluations of Recorded Performances ............................................................91
Data Analysis and Preparation ..........................................................................92
v
CHAPTER 4 Results and Discussion ...............................................................................93
Results ...............................................................................................................93
Discussion .......................................................................................................112
CHAPTER 5 Summary and Conclusions .......................................................................120
Summary .........................................................................................................120
Conclusions .....................................................................................................123
Suggestions for Further Research ...................................................................124
Implications for Teachers................................................................................126
References........................................................................................................................128
Appendix A Variables Collected from Performance Assessment Research ..................138
Appendix B Categorization of Performance Assessment Variables...............................139
Appendix C Aural Musical Performance Quality (AMPQ) Measure.............................140
Appendix D Evaluation Packet Instruction Sheet...........................................................142
Appendix E Waiver of Signed Consent Form ................................................................143
Appendix F Confirmatory Factor Analysis of AMPQ Items..........................................144
Appendix G AMOS Output of Estimated Performer-Controlled Musical Factors Model
Across Brass, Woodwind, Voice and String Instruments..........................................149
Appendix H AMOS Output of Estimated Performer-Controlled Musical Factors:
Woodwind Model ......................................................................................................152
Appendix I AMOS Output of Estimated Performer-Controlled Musical Factors: Voice
Model .........................................................................................................................155
Appendix J AMOS Output of Estimated Performer-Controlled Musical Factors: String
Model .........................................................................................................................158
Appendix K AMOS Output of Estimated Performer-Controlled Musical Factors: Brass
Model .........................................................................................................................161
vi
Table of Figures
Figure 1. Hypothesized Model of Performer-Controlled Components of Technique .........8
Figure 2. Hypothesized Model of Performer-Controlled Components of Musical
Expression......................................................................................................................8
Figure 3. Hypothesized Model of Performer-controlled Musical Factors...........................9
Figure 4. Model of Performer-Controlled Components of Technique ..............................97
Figure 5. Model of Performer-Controlled Components of Musical Expression ...............99
Figure 6. Performer-controlled Musical Performance Factors: Standardized Estimate
Model .........................................................................................................................104
Figure F1. Confirmatory Factor Analysis of AMPQ Items .............................................145
vii
List of Tables
Table 1 Comparison of Facet-Factorial Factor Structures..................................................6
Table 2 Total and Subscale Reliabilities for AMPQ Measure..........................................93
Table 3 Correlations Between Technique and Component Factors..................................95
Table 4 Correlations Between Musical Expression and Component Factors...................95
Table 5 Summary of Simultaneous Regression for Variables Predicting Technique.......97
Table 6 Summary of Simultaneous Regression Analysis for Variables Predicting Musical
Expression....................................................................................................................98
Table 7 Means, Standard Deviations, and Pearson Correlations for Combined Instrument
Path Model of Performer-Controlled Musical Factors ..............................................100
Table 8 Path Estimates for Model of Performer-Controlled Musical Factors across Brass,
String, Voice, and Woodwind Instruments................................................................101
Table 9 Summary of Sequential Regression Analysis for Variables Predicting Overall
Perception of Performance Quality............................................................................102
Table 10 Summary of Simultaneous Regression of Musical Expression on Technique 103
Table 11 Standardized Path Coefficient Comparisons between Combined and Individual
Instrument Path Models .............................................................................................105
Table 12 Estimated Path Coefficients for the Woodwind Model of Performer-Controlled
Musical Factors..........................................................................................................106
Table 13 Means, Standard Deviations, and Pearson Correlations for Woodwind Path
Model of Performer-Controlled Musical Factors ......................................................107
Table 14 Estimated Path Coefficients for the Voice Model of Performer-Controlled
Musical Factors..........................................................................................................108
Table 15 Means, Standard Deviations, and Pearson Correlations for Voice Model of
Performer-Controlled Musical Factors ......................................................................108
Table 16 Estimated Path Coefficients for the String Model of Performer-Controlled
Musical Factors..........................................................................................................109
Table 17 Means, Standard Deviations, and Pearson Correlations for String Model of
Performer-Controlled Musical Factors ......................................................................110
viii
Table 18 Estimated Path Coefficients for the Brass Model of Performer-Controlled
Musical Factors..........................................................................................................111
Table 19 Means, Standard Deviations, and Pearson Correlations for Brass Model of
Performer-Controlled Musical Factors ......................................................................111
Table F1 Pattern Coefficients for AMPQ Factor Analysis..............................................146
Table F2 Model-fit Comparisons....................................................................................148
ix
CHAPTER I
Statement of the Problem
Musical performance is a complex process. Nevertheless, it is imperative that we
continue to clarify those complex processes that elude explanation. Understanding a
subject such as music performance is a form of abstraction that requires a process of
collecting pertinent information concerning characteristic features. “Abstraction involves
selecting, from all those available, certain prominent features by which the real-world
system can be represented meaningfully” (van Gigch, 1991, p.119). The prominent
features identified as a result of abstraction can be used to develop a model useful for
analytical purposes (Lippitt, 1973).
Models help people visualize the structure of both concrete and conceptual
processes. A model serves as a symbolic representation of what is known and understood
about the various components of a complex process (Lippitt, 1973). By design, models
are a simplification of a complex real-world structure. This simplification is used to
facilitate an overall understanding of the process and the interrelationships between the
individual components. Modeling is used by social scientists, educators, economists,
physicists and mathematicians to correlate experience with proposed conceptual
frameworks (Wrigley, 2005; van Gigch, 1991; Becker, 1983; Lippitt, 1973).
Music researchers have employed the modeling process to facilitate
understanding of complex concepts and processes including musical preference
(LeBlanc, 1980), student course affect (Asmus, 1980), musical affect (Asmus, 1981),
music assessment process (McPherson & Thompson, 1998; Wrigley, 2005), extramusical
1
2
influences on solo and ensemble ratings (Bergee, 2006), sight-reading (Kopiez & Lee,
2008), and listening (Madsen & Geringer, 2008). However, research regarding the
structure of performance assessment has been limited (McPherson & Thompson, 1998;
Wrigley, 2005). In order to provide clarity to measurements, musical contexts, and
factors related to musical performance, further research into the structure of performance
assessment is necessary.
The inherently subjective nature of musical performance assessments and the
situations to which they are applied within the educational arena call for as much
objectivity as possible (Bergee, 2003, 1987; Radocy, 1986). In efforts to inject more
objectivity into the performance evaluation process, researchers have made positive steps
toward identifying factors that influence assessments of performance quality for
individual instrument categories (Abeles, 1971; Burnsed, Hinkle, & King, 1985; Bergee,
1987; Thompson, Diamond, & Balkwill, 1998; Zdzinski & Barnes, 2002; Wrigley, 2005;
Russell, 2007), factors that influence performance achievement (Schleuter, 1978;
Zdzinski, 1993; Geringer & Johnson, 2007; Miksza, 2007), and higher-order factors
(Bergee, 1995; Wrigley, 2005). However, the interrelationships of these influential
factors across instrument categories are still unknown. The identification of a structure of
musical factors that influence assessments of performance quality across instrument
categories would benefit performers, students, educators, and researchers by illuminating
the process of determining overall musical performance quality.
Musical performance assessment is a divisive subject. Some feel that in order to
fully perceive music, the performance of it cannot and should not be broken down into its
component parts (van Gigch, 1991; Langer, 1953). “The import of an art symbol cannot
3
be built up like the meaning of a discourse, but must be seen in toto” (Langer, 1953, p.
379). Other researchers, however, have found that the evaluation of performance in
separate component factors has no effect on overall assessments of performance quality
(Burnsed, Hinkle, King, 1985; Mills, 1987; Bergee, 1995; Saunders & Holahan, 1997;
Zdzinski & Barnes, 2002; Russell, 2007). The evaluation of separate musical factors can
serve diagnostic purposes for improvement of teaching and learning strategies (Saunders
& Holahan, 1997).
In music education, “evaluative procedures are used to determine status so that
progress toward educational goals can be appraised” (Leonard & House, 1972, pg. 29).
Musical performance assessments imply the use of both performance measures and
observations. Classroom teachers employ informal assessments to determine the pace of a
lesson or unit of study. Formal performance measurement applications can include
teacher administered tests, juries, and auditions. The information gathered from these
observations and measurements in both formal and informal assessment situations
provides important information to both performer and evaluator. The high stakes nature
of some performance measurement applications requires valid and reliable measures that
accurately represent what was heard by the evaluator (Payne, 2003; Bergee, 1987).
Previous research on music performance assessment issues include rating scale
construction for solo instruments (Watkins, 1942; Watkins & Farnum, 1956; Gutch,
1964, 1965; Abeles, 1971; Bergee, 1987; Saunders & Holahan, 1997; Zdzinski & Barnes;
2002; Russell, 2007), performance constructs (Bergee, 1995; Mills, 1987; Thompson,
Diamond & Balkwill, 1998), interjudge reliability (Bergee, 2003), adjudicator experience
(Schleff, 1992; Kim, 2000), listening and discrimination (Geringer & Madsen, 1998), and
4
instrument bias (Hewitt, 2007). This body of performance assessment research supports
the ability to validly and reliably evaluate musical performance situations (i.e., solo and
ensemble evaluations, juries, etc.). Additionally, these studies support the utilization of
performance measurement as a viable means of identifying the underlying structures
associated with musical performance.
The validity and reliability of performance measures are subject to the inevitable
and inherent subjectivity of the performance assessment process (Radocy, 1986).
Improved reliabilities have been observed on measures that utilized a combination of
specified evaluation criteria and standardized rating scales (Abeles, 1971; Bergee, 1987;
Saunders & Holahan, 1997; Zdzinski & Barnes, 2002; Wrigley, 2005; Russell, 2007).
While criteria for individual instrument categories have been identified, no consensus has
been reached concerning which criteria should be specified for evaluation of a musical
performance across instrument groups.
Research on evaluation criteria has utilized several strategies for the creation of
performance measures with specified criteria. Studies employing factor analysis
techniques as a means of construct identification and scale construction have produced
solo performance measures for string instruments (Zdzinski & Barnes, 2002), brass
(Bergee, 1987), woodwind (Abeles, 1971), solo voice (Jones, 1986), percussion (Nichols,
2005), and guitar (Russell, 2007). These studies provide evidence of underlying factors
that influence the evaluation of a musical performance (Saunders & Holahan, 1997).
Using this statistical method many valid and reliable measures have been created.
However, the issue over a generalized set of music performance evaluation criteria
remains unsatisfied due to the limited generalizability of these studies.
5
Justification
Understanding the structure of the component factors of musical performance that
effect assessments of performance quality is crucial to understanding how music is
aurally perceived during evaluation. Research studies concerning musical performance
variables (Burnsed, Hinkle, & King, 1985; Mills, 1987; Bergee, 1995, Thompson,
Diamond & Balkwill, 1998; Wrigley, 2005; Johnson & Geringer, 2007), musical
performance achievement (Hodges, 1975; Suchor, 1977; Schleuter, 1978; Zdzinski, 1993;
Geringer & Johnson, 2007; Miksza, 2007), performance adjudication (Fiske, 1975, 1979;
Bergee, 1997, 2003; Geringer & Madsen, 1998, Thompson & Williamon, 2003), musical
expression (Levi, 1978; Juslin & Lindstrom, 2003; Juslin & Laukka, 2004) and rating
scale development (Abeles, 1971; Bergee, 1987; Zdzinski & Barnes, 2002; Russell,
2007) reveal consistencies in the conceptualization of musical performance assessment.
Russell (2007) stated that the occurrence of common musical factors across instrument
categories in facet-factorial studies suggests that an overall structure of music
performance assessment may exist (see Table 1). However, no research has been
conducted to investigate these commonalities.
Occurrences of common musical factors suggest the possible existence of an
overall structure of performance assessment across woodwind instruments, string
instruments, voice, and brass instruments. This structure could serve as a catalyst for
important research in the area of music performance and provide clarity to the factors that
influence evaluations of performance quality across all instrument categories in various
applications. An examination of the components and commonalities found within the
6
literature would satisfy the need to verify the literature analysis and help promote further
understanding of the perception of music performance overall.
Table 1
Comparison of Facet-Factorial Factor Structures
Abeles (1971)
Jones (1986)
Bergee (1987)
Zdzinski &
Barnes (2002)
Russell (2007)
interpretation
interpretation/
musical effect
interpretation/
musical effect
interpretation/
musical effect
interpretation/
musical effect
tone
tone/
musicianship
tone quality/
intonation
articulation/
tone
tone
rhythm/
continuity
suitability/
ensemble
rhythm/ tempo
rhythm/ tempo
rhythm/ tempo
articulation
technique
technique
vibrato
technique
intonation
diction
intonation
intonation
tempo
The Theory
Research on performance variables, musical performance achievement,
adjudication, musical expression, and performance measure development reveal
commonalities between assessments of woodwind, string, voice, and brass performance
(Abeles, 1971; Levi, 1978; Jones, 1986; Mills, 1987; Bergee, 1987, 1995, 2003; Zdzinski,
1993; Geringer & Madsen, 1998; Zdzinski & Barnes, 2002; Wrigley, 2005; Johnson &
Geringer, 2007; Russell, 2007). These commonalities hint at a possible structure of
musical performance factors across these instrument categories. This structure of musical
7
performance factors is hypothesized to influence overall perceptions regarding musical
performance quality.
The related literature in Chapter 2 reveals an exhaustive list of musical factors
that influence assessments of musical performance (see Appendix A). These factors can
be separated into general categories such as aural and visual factors, and more specific
categories such as performer factors, composer factors, adjudicator factors,
environmental factors, etc (see Appendix B) (Juslin & Lindstrom, 2003; Juslin & Laukka,
2004). The factors used for this hypothesized model concentrate on performer factors that
are also aural factors. Performer-controlled factors are described for the purpose of this
study as the musical components that are controlled by the performer during the time of
performance; aural factors are described as musical components of performance that can
be aurally perceived.
The theory proposed in this study hypothesizes that the structure of aurally
perceived performer-controlled musical factors exists and remains stable across
assessments of brass, woodwind, voice, and string instruments. The proposed structure
concentrates on the variables of technique, musical expression, and overall perception of
performance quality (Levi, 1978; Mills, 1987; Bergee, 1995; Wrigley, 2005). Technique
is hypothesized to be a composite of performance variables that represent technical
ability (tone, intonation, articulation, and rhythmic accuracy) (see Figure 1); Musical
expression is a composite of performance variables that represent the ability to
communicate the expressive aspects of musical performance (tempo, dynamics, timbre,
and interpretation) (see Figure 2).
8
Figure 1. Hypothesized Model of Performer-Controlled Components of Technique
Figure 2. Hypothesized Model of Performer-Controlled Components of Musical
Expression
9
Technique is hypothesized to have a direct effect on overall perception of
performance quality (Mills, 1987; Bergee, 1995; Wrigley, 2005). The hypothesized effect
of technique on overall perceptions of performance quality is also mediated through
musical expression (Levi, 1978; Bergee, 1995; Juslin & Laukka, 2004; Wrigley, 2005;
Johnson & Geringer, 2007; Geringer & Johnson, 2007). It is this proposed structure that
is the focus of this study (see Figure 3).
Figure 3. Hypothesized Model of Performer-controlled Musical Factors
This theorized paradigm is based on three main evidences outlined by Keith
(2006): 1) time precedence, 2) relevant research, and 3) logic. The time precedence of
technique over musical expression suggests that the ability to make a sound must be first
10
in order to express yourself musically. Technique is considered necessary to make a
sound using a musical instrument. The research presented in Chapter 2 provides support
for the participant population (Fiske, 1975; Kim, 2000; Bergee, 2003), variables selected
(Abeles, 1971; Levi, 1978; Jones, 1986; Mills, 1987; Bergee, 1987, 1995, 2003; Zdzinski,
1993; Geringer & Madsen, 1998; Zdzinski & Barnes, 2002; Wrigley, 2005; Johnson &
Geringer, 2007; Russell, 2007), categorization methods (Zdzinski, 1993; Juslin &
Lindstrom, 2003; Juslin & Laukka, 2004; Miksza, 2007), and the direction of the
hypothesized paths (Levi, 1978; Mills, 1987; Bergee, 1995; Juslin & Laukka, 2004;
Wrigley, 2005; Johnson & Geringer, 2007; Geringer & Johnson, 2007).
It is reasonable and logical to infer the influence of technique on overall
perceptions of musical performance quality. The logic behind this model suggests that an
improvement in technical ability would increase the ability to express oneself musically
on an instrument. In turn, this increased ability in technique and musical expression
would influence the assessment of overall performance quality.
Purpose of the Study
The purpose of this study is to examine a hypothesized model of the aurally
perceived performer-controlled musical factors that influence assessments of musical
performance quality. Specifically this study intends to answer the following research
questions:
1. Do the first-order performance factors of tone, intonation, rhythmic accuracy, tempo,
dynamics, timbre, and interpretation adequately represent the second-order factors of
Technique and Musical Expression according to the hypothesized model?
11
2. What are the relative contributions of Technique and Musical Expression on
judgments of Overall Perceptions of Performance Quality according to the
hypothesized model?
3. How well does the proposed model fit the data collected? Can a model of musical
performance assessment be created and tested using performer-controlled musical
factors for the outcome of evaluating aurally perceived musical performance quality?
4. Does the hypothesized model of performer-controlled musical factors remain stable
for the individual brass, woodwind, voice, and string instrument categories?
Delimitations of the Study
The focus of this study is limited to the aural aspects of musical performance.
This excludes aspects of performance that are perceived visually. These delimitations
also include aspects of solo musical performance that are at the disposal of the performer.
Specifically, this excludes factors of musical performance that are considered to be
composer factors, ensemble factors, adjudicator factors, environmental factors, and
nonmusical factors.
The research approach that is employed is an additional delimitation of this study.
This study utilizes a model-rejection approach to theoretical model testing. A modelrejection approach dictates that the theoretical model of performer-control musical factors
will be tested using a “reject” or a “fail-to-reject” approach. This approach provides
information regarding the performance of the hypothesized model and avoids any posthoc recalculation once the model has been estimated.
CHAPTER 2
Related Literature
Music performance has long been a fundamental focus of music research.
Specifically, the evaluation of music performance is of key interest to music educators
and music education as a profession. Scholars conducting research on musical
performance evaluation have investigated the accuracy of evaluations, musical and
extramusical influences on performance evaluations, and the construction of valid and
reliable performance measures.
An examination of early performance evaluation literature reveals a focus on
sight-singing achievement. Researchers such as Hillbrand (1923), Mosher (1925), Knuth
(1933) and Watkins (1942) developed measures to evaluate performances of sight-read
music. These measures focused mainly on the rhythm and pitch accuracy of the excerpts
performed.
Early research on performance assessment also focused on the identification of
factors that influence sight-reading achievement. Stelzer (1935) developed a sightreading measure for organ performance. Stelzer used this measure to analyze the
underlying fundamentals of organ performance. Other researchers such as Bean (1938)
and Wheelwright (1940) developed sight-reading measures to investigate factors that
influence sight-reading achievement. Bean (1938) investigated sight-reading
methodology and utilized a measure that evaluated the pitch accuracy of sight-read piano
performances. Wheelwright (1940) also investigated piano sight-reading achievement
and the influence of music spacing on sight-reading achievement.
12
13
These early studies provided a basis for continued research on music
performance. Present day researchers continue to investigate the issues and aspects
surrounding musical performance. The review contained in this chapter examines
performance assessment literature. Specifically, the musical variables utilized for musical
performance evaluation are of main interest. The literature will be discussed in the
following sections: a) research attempts to improve accuracy, b) musical achievement as
a dependent variable, c) adjudicators and the adjudication process, e) performance
aspects of musical expression, and e) the development of musical performance measures.
Research Attempts to Identify Performance Variables
Many researchers have made efforts to improve the accuracy and efficiency of
performance evaluations through investigations of the criteria used during performance
evaluation. Most early research in evaluation criteria was centered on band performance
and festival ranking. Owen (1969) found that the performance dimensions of technical
accuracy, rhythm, pitch, musicality, tone, and sight-reading produced the most reliable
rankings of student band auditions. A research study by Oakley (1972) compiled
performance criteria from rating sheets used during evaluations of marching band
performances. The musical criteria that most frequently appeared are technical accuracy,
rhythm, intonation, tone quality, balance, expression, and precision. A checklist created
by Neilson (1973) presented factors that adversely affect ensembles performing in a
festival evaluation setting. The factors of intonation, phrasing, dynamics, melodic
transparency, tempo, attacks and releases (articulation), and timbre were derived from
analysis of handwritten comments entered onto evaluation sheets.
14
An attempt at evaluating musical performance across performance mediums was
undertaken by Oldefendt (1976). Oldefendt developed a procedure and criteria for
scoring both solo instrumental and solo vocal performances. Adjudicators evaluated
musical performances in terms of completeness, pitch accuracy, and rhythmic accuracy.
The frequency of performance errors determined the score for each performance
dimension. An overall score was estimated as the sum of these criteria scores.
Performance dimensions of tone and intonation were considered in the development of
the criteria, but were excluded due to limitations in quantifying the quality of
instrumental and vocal tones.
Research by St. Cyr (1977) established an exhaustive list of criteria for high
school band, orchestra, and chorus performance evaluation. The criteria compiled by St.
Cyr represent both musical and non-musical variables. Musical variables include
technique, interpretation, time, intonation, phrasing, pitch, balance, expression,
articulation, and diction. The non-musical variables reported were appearance, breathing,
conductor, accompaniment, instrumentation, voices, instrument quality, difficulty level,
and arrangement quality.
Criticisms of the performance evaluation process prompted Burnsed, Hinkle, and
King (1985) to investigate inconsistency in performance evaluation. These criticisms
included lack of appropriate measures, poor judge reliability, the lack of criteria
agreement, the lack of an evaluation model, and inconsistent standards. Specifically,
Burnsed, Hinkle, and King (1985) attempted to determine if the set of factors that
included technique, interpretation, intonation, musical effect, tone, and balance were
viable predictors of overall performance ratings. Ratings for the performance dimensions
15
were closely related to each other and to the overall performance ratings. The results
indicate a high degree of intercorrelation between the six performance dimensions and
the overall ratings (.78-.91). Bergee (1993) suggests that the close relationship between
musical effect and the overall rating (.91) indicates the possibility that musical
performance is evaluated in a global fashion regardless of the presence of separate
performance dimensions.
A study by Mills (1987) intended to find the nature of the global rating. The
purpose of this study was to explain the assessment of solo music performance of
Western Classical Music. Specifically, Mills attempted to define the constructs that
adjudicators employ to assess solo music performance. Mills conducted this study in two
phases. The first phase focused on establishing a vocabulary that could be used by both
music teachers/specialists and non-specialist individuals with musical experience.
Participants in this phase included performers and adjudicators. The performers used
were all full-time students at least 15 years of age. Performing participants included a
harpist, two horn players, a pianist, an oboist, and a violinist. Each performance was
video taped for the adjudication portion.
Volunteer adjudicators (N = 11) were separated into two groups: Group 1- music
teachers and music specialist students (n = 2), Group 2- non-specialists with experience
in music performance (n = 9). Each adjudicator evaluated five consecutive performances
on the videotape. The assessment sessions lasted between 45 and 60 minutes.
Adjudicators were asked to evaluate each performance by writing comments and
assigning a grade out of thirty as if it were an Associate Boards of Royal Schools of
Music (ABRSM) examination. At the end of the five evaluated performances
16
adjudicators were encouraged to discuss the performances in terms of performance
constructs. These conversations were recorded and content analyzed.
The results of this phase provided twelve constructs with which the performances
were evaluated. The performance constructs compiled from the post adjudication
interviews included: performer confidence/nervousness, performer enjoying/not enjoying
performance, performer familiarity with performance material, performer does/does not
make sense, use of dynamics are appropriate/inappropriate, use of tempi
appropriate/inappropriate, performer phrasing appropriate/inappropriate, technical
problems distracting/hardly noticeable, performance was hesitant/fluent, performance
was insensitive/sensitive, performance was muddy/clean, performance was
dull/interesting (Mills, 1987).
Phase 2 of this study focused on the extent to which assessments of performance
could be predicted from the performance constructs identified in Phase 1. Performances
(N = 10) of violin, horn, piano, soprano, clarinet, harp, oboe, flute, double bass, and
trombone were used. Adjudicating participants (N = 29) included 12 from Group 1 and
17 from Group 2. Adjudicators were given a two-sided assessment form to complete for
each of the ten performances to be evaluated. The first side provided the same
instructions used during Phase 1. The second side consisted of the twelve bipolar
statements paired with a four point semantic differential.
The results of this study indicate that performance variables which include player
nervousness, performer enjoyment, performer knowledge of music, holistic sense of the
music, dynamics, tempo, phrasing, technique, hesitation, insensitivity, performance
clarity demonstrate small to moderate correlations with the overall marks. Correlations
17
between the performance constructs ranged from 0.2 (Variables “tempi
appropriate/inappropriate” and “technical problems distracting/hardly noticeable”) to 0.7
(Variables “performer confidence/nervousness” and “performer enjoying/not enjoying
performance,” “performer does/does not make sense” and “performance was
dull/interesting,” “performance was hesitant/fluent” and “performance was
muddy/clean,” “performance was insensitive/sensitive” and “performance was
dull/interesting,” “performance was muddy/clean” and “performance was
dull/interesting”); however, correlations between overall ratings and performance
constructs were uniformly negative ranging from -0.4 (Variable “use of tempi
appropriate/inappropriate”) to -0.7 (Variable “performance was insensitive/sensitive” and
“performance was muddy/clean”) (Mills, 1987). A multiple regression analysis indicated
that twelve performance variables accounted for 73% of the total variance. No difference
between the two groups of adjudicators was apparent. Group 1 accounted for 75%
variance and Group 2 accounted for 72% of the variance. Zdzinski (1991) suggests that
these results should be considered tentative due to the lack of significance levels and
small sample size.
Mills (1987) concluded that it is possible to explain solo music performance using
characteristics comprehensible by non-musicians. She suggests that these results have
implications for not only solo musical performance assessment, but for music education
as a whole. The utilization of performance constructs as predictors of overall
performance can streamline the adjudication process and serve as a valuable source of
information for the performer.
18
Continued investigation into the predictors of overall performance success has led
to research concentrating on the organization of performance factors into higher order
constructs. Bergee (1995) conducted an investigation into the existence of higher order
performance factors. The purpose of this study was to identify the intermediate level
construct of the Band Performance Rating Scale (BPRS) previously developed by Sagen
(1983). Specifically, Bergee attempted to determine a) primary, intermediate, and higherorder factors within the BPRS, b) intercorrelations of the factors and the correlations of
the BPRS items to the higher-order factor, and c) interjudge reliability and criterionrelated validity of the item regroupings.
Sagen (1983) constructed the BPRS using a rational approach (Butt and Fiske,
1968) to choose and group items based on a preconceived notion of the subject.
Statements regarding various aspects of band performance were collected from
experienced band directors enrolled in graduate music education courses. An analysis of
these statements yielded 206 item pool statements grouped into six categories: tone
quality, technical accuracy, musical interpretation, intonation, rhythmic accuracy, and
general musical effect. Sagen, along with three university professors, selected eight
representative items for each of the six performance dimensions. Each of the 48 items
was paired with a five-point Likert scale that ranged from Strongly Disagree (SD) to
Strongly Agree (SA). Interjudge reliability was determined by an analysis of variance.
Category by category test-retest coefficients for evaluations of two performances ranged
from .64 to .94 (p < .05). Total score test-retest r’s were reported at .92 and .84 (p < .01)
respectively.
19
Bergee (1995) recruited 245 graduate and undergraduate band students from three
universities to evaluate a prerecorded high school band performance of Charles Carter’s
Rhapsodic Episode. Participants were given instructions and information regarding the
study and were allowed to listen to the recording as many times as necessary.
The data was analyzed using a principal components method to group the items
onto related factors. Three criteria were used to determine the number of factors to be
rotated: (1) Eigenvalues greater than 1.00, (2) scree plot, and (3) an examination of the
proportion of variance accounted for by the factors. The items on the BPRS were
regrouped according to results of the promax oblique rotation.
To determine interjudge reliability, Bergee (1995) recruited seven music
education graduate students to evaluate five prerecorded high school band performances
using the revised version of the BPRS. In addition, an independent panel of six judges
was asked to record a global rating from I + to V- for each of the same five recordings.
The factors were rotated using three, four, five, and six factor structures according
to the criteria defined by Bergee. The results of the three-factor rotation yielded the most
interpretable results. The three factors identified by Bergee are: (1) Tone
Quality/Intonation, (2) Musicianship/Expressiveness, and (3) Rhythm/Articulation.
A factor analysis of the primary factor matrix yielded one higher order factor with
all three factors loading from .73 to .81. Bergee (1995) then related this newly identified
higher-order factor with the original variables from the BPRS using Pearson’s r. The total
score interjudge reliability for the revised BPRS was reported at .96.
Bergee concludes that his hypothesis that band performance is adjudicated on a
three-level hierarchal structure that includes: (1) a factor analyzed set of items, (2)
20
distinct primary factors (Tone Quality/Intonation, Musicianship/Expressiveness, and
Rhythm/Articulation), and (3) a higher-order factor that is correlated strongly with the
identified primary factors. These results group the primary factors differently than
previous factor analysis studies involving instrumental performance. Bergee suggests that
this is due to the intimate links between tone and intonation, musicianship and
expressiveness, and rhythm and articulation.
In contrast to the empirically based quantitative studies, a study by Thompson,
Diamond, and Balkwill (1998) illustrated a qualitative technique for eliciting and
exploring the constructs involved in the adjudication of piano music performance.
Experienced adjudicators (n = 5) were recruited to evaluate six expert performances of
Chopin’s Etude, Op. 25, No. 6. Each performance presented a different interpretation of
the piece.
This study was executed in two stages. The first stage involved eliciting six
performance dimensions from each adjudicator. The first five constructs were selected
according to personal criteria that the adjudicators used to distinguish between
performances. A sixth factor was selected that describes the adjudicator’s evaluation of
overall performance.
The second stage required the adjudicators to apply these constructs to the
evaluation of six expert piano performances. Adjudicators were tested individually. Each
performance was recorded in a random order onto a cassette tape and played through high
quality headphones. Repeated hearings were permitted but not necessary as adjudicators
completed each evaluation while listening.
21
After all six performances were evaluated, five constructs were elicited using the
triad method via a computer interface. The triad method extracts constructs through a
series of random comparisons. After the constructs were identified adjudicators were
requested to provide the opposing ends points for each construct. It was made clear to the
participants that these statements need not be semantic opposites. Once all polar
statements were entered, the participant adjudicators were prompted to rate each
performance using the provided statements using a scale ranging from 1 to 9. This
process was repeated five times.
The data from the piano evaluations were analyzed using a repeated measures
analysis of variance. The results of this study indicated a significant main effect of
performance, F (4,20) = 6.00, p < .01. This indicates that the performance assessments
demonstrated overall reliability. A Pearson correlation between all pairs of adjudicator
ratings indicated a moderate degree of agreement between adjudicators (median
correlation = .68).
A total of fourteen constructs were extracted from the information provided by the
adjudicators. These constructs were identified as: right-hand expression, phrasing,
dynamics, rubato, form/structure, tonal balance, pedaling, attention to rhythm and meter,
articulation, technical competence, tempo, expression in bars 27-30, expression at the
climactic phrase, and expression at the end of the piece. Thompson et al (1998) also
states that overall preference was strongly associated with the right-hand expression,
phrasing and balance constructs.
Thompson, Diamond, and Balkwill (1998) concluded that qualitative research
techniques could reveal constructs that influence performance adjudication. These
22
research findings have implications for identifying performance constructs that can
accurately represent the processes employed by experienced adjudicators through not
only quantitative methods, but qualitative as well. A study by Wrigley (2005) attempted
to identify underlying performance constructs by employing both qualitative and
quantitative research methodologies.
A study by Wrigley (2005) investigated ways of addressing current issues of
accountability in collegiate music education by improving music performance evaluation.
Specifically, the study outlined the structure of an ecological model of music
performance that addresses both the musical and non-musical influences of music
performance assessment. Utilizing both qualitative and quantitative methods, Wrigley
(2005) examined musical performance aspects along with intrapersonal and interpersonal
influences of the music evaluation process (i.e., performer flow state, self-evaluation,
performance experience, gender, instrument type, and examiner fairness).
This study was executed in four phases and involved faculty and students from an
Australian university. The first and second phases involved the identification of
performance dimensions from a total of 655 performance examination reports. The
reports were content analyzed by experienced music faculty (N = 36) and performance
constructs were extracted. Representatives from the string, voice, brass, and woodwind
departments were consulted and interviewed to reach a consensus on the performance
dimensions to be used on each of the instrument specific measures to be created in the
third phase. The resulting dimensions were further analyzed into higher order constructs
with accompanying descriptors provided by the content analysis and faculty consensus.
These descriptors were then placed in rank order and assigned to their corresponding
23
higher order constructs for each instrument category to create the Performance
Evaluation Report (PER) (Wrigley, 2005).
Phase 3 of this investigation involved the implementation of the PER by 30
adjudicators among five instrument families. The implementation of the PER was
executed over two years and four consecutive semesters. Each adjudicator was asked to
rate performances within each performance dimension according to three categories:
Needs Attention, Satisfactory, and Excellent. All performance examinations were
administered during regularly scheduled examination periods within each semester.
Reliability coefficients for the factors used on the Performance Evaluation Report
ranged from .81 to .98. Factors selected to represent the instrument families explained
between 57% and 71% of the overall variance for each model. Correlations between
performance dimensions for each PER indicated a great deal of overlap between factors:
brass (.97), strings (.93), woodwind (.92), and piano (.95). This overlap suggests and nonorthogonal relationship between the factors used on each of the measures (Wrigley,
2005).
Data from each of the performance examinations within each instrument category
were factor analyzed to confirm the models created by the qualitative analysis in Phase 2.
The results confirm the separate factor structure for each instrument family created by the
original sorting and assignment of performance constructs. Two factors concerning
technical proficiency and musical interpretation along with seven core constructs (tone,
tempo, rhythm, confident, style/character, phrase/shape, and dynamics) were found to be
common among instrument families.
24
Wrigley (2005) states that the occurrence of common factors could possibly imply
the existence of a generic set of cross-instrument assessment criteria. Definitions for
musical interpretation and technical accuracy were unique to each instrument category.
However, due to low sample size, particularly with woodwind and brass, he suggests
these results be viewed with caution (2005).
The fourth phase of the study included the administration of a Music Performance
Questionnaire (MPQ). This questionnaire consisted of the Flow State Scale-2 (FSS-2),
self ratings of skill level, challenge of performance, and overall quality, a rating of
frequency regarding participation in assessment performances and solo/ensemble
experience, the number of years of experience on their instrument, and demographic
information. The FFS-2 was developed by Jackson & Marsh (1996) according to a model
proposed by Csikszentmihalyi (1990). A total of 373 participants completed the self
evaluation questionnaire. Each MPQ was completed immediately after either the midterm
or end of the year jury performances.
Results from the investigations of flow state and the influence on music
performance indicated a significant result. Structural equation modeling and a
multivariate analysis indicated a strong nonlinear relationship. Participants experiencing a
high state of flow scored higher than those participants experiencing a low state of flow.
This is consistent with the findings of Csikszentmihalyi (1975) who suggested that all
complex activities require a higher state of flow.
Wrigley (2005) suggests that a more empirical focus of student performance
assessment would have a positive impact on the development of teaching strategies to
enhance student learning and understanding. However, caution must be used in the
25
interpretation of these empirical tests due to the numerous sources of error variance in
musical performance evaluation. The development of criterion-specific scales paired with
a concentration on the diagnostic and feedback applications of assessment would be most
beneficial. The ecological model proposed in this study illustrates the relationship
between education institutions, community, and policy makers and their influence of the
assessment of music performance. This holistic approach has implications for promoting
an effective and beneficial method of addressing the accountability imperative in music
education.
More recently, the continued effort to improve wind band adjudication was
addressed by Johnson and Geringer (2007). The purpose of this study was to examine the
possible influences of music elements in the prediction of overall music performance
evaluations of wind band pieces. Johnson and Geringer examined evaluator assessments
of specific musical elements including balance/blend, dynamics, tone/intonation,
rhythm/tempo, and musical expression for discernable patterns of judgment. In addition,
the relationships between musical evaluations and acoustical measures of dynamics and
rhythm were investigated.
Eighty-four music students were asked to adjudicate recordings of four different
excerpts. Each excerpt contained three versions of the same piece: high school band,
college band, professional band. Evaluations were made using a 7-point semantic
differential scale with student performance and professional performance as anchors.
Dynamic and rhythmic measures were determined using two computer software
applications that allowed for the measurement of maximum and minimum dynamic and
measurement of rhythmic note duration.
26
The results of this study report significant main effects for ensemble level and
replication. The main effect of ensemble level was responsible for the greatest
contribution to total variance
. Professional groups were consistently rated
higher than college groups and college groups were consistently rated higher than high
school groups.
A stepwise multiple regression of overall rating on balance/blend, dynamics,
tone/intonation, rhythm/tempo, and musical effect was performed to determine if a
pattern existed for predicting overall ratings. Johnson and Geringer (2005) reported that a
compilation of the resulting standard coefficients indicated that musical expression
accounted for 50% of the predicted overall ratings, tone/intonation (22.5%), dynamics
(17.5%), and balance/blend (10%). Rhythm and tempo did not demonstrate predictability
of overall ratings for any of the trials. A frequency analysis of the areas in most need of
improvement, as provided by the adjudication participants, indicated that across all levels
of performance tone/intonation (34%) was the most frequently occurring response
followed by balance/blend (21%), musical expression (18%), dynamics (16%), and
rhythm/tempo (11%). Results from the correlations between overall performance
evaluations and acoustic measures of dynamics (r = .06) and rhythm (r = .08) indicated
no significant relationships existed.
Johnson and Geringer (2007) state that the results of this study suggest
concentration on the musical elements of musical expression and tone/intonation would
be prudent. Teachers would be wise to focus on those aspects of performance that are
given the most weight in an evaluation. The basics of instrumental technique and musical
expression are delineating factors between the students and professionals.
27
Research on the identification and examination of musical performance variables
provides valuable insight into the nature of musical performance assessment. The
variables identified in this body of research include rhythm, interpretation, intonation,
tone expression, pitch, musicality, phrasing, balance, articulation, diction, musical effect,
and dynamics (Owen, 1969; Oakley, 1972; Neilson, 1973; Oldefendt, 1976; St. Cyr,
1977; Sagen, 1983; Burnsed, Hinkle, and King, 1985; Mills, 1987; Bergee, 1995;
Thompson, Diamond, & Balkwill, 1998; Wrigley, 2005; Johnson and Geringer, 2007).
An examination of performance factor research supports the existence of a hierarchical
factor structure that consists of both technical and expressive components (Bergee, 1995;
Thompson, Diamond, & Balkwill, 1998; Wrigley, 2005).
Musical Performance Achievement as a Dependent Variable
In an effort to understand the influences on musical performance evaluation,
researchers have conducted a number of studies on musical achievement. The measures
utilized in music performance achievement research provide valuable information
regarding notions of the structure of perceived performance achievement. This section
concentrates on the factors used by these researchers to define musical achievement.
Early research on music achievement focused on auditory-visual discrimination
skills. Research conducted by Stecklein and Aliferis (1957) investigated the influence of
instrument on music achievement. Musical achievement was measured using the Aliferis
Music Achievement Test (Aliferis, 1954), a listening test which measured melodic,
harmonic, and rhythmic discrimination skills. Colwell (1963) investigated musical
achievement, defined as auditory-visual discrimination ability, in both vocal and
28
instrumental classrooms. The measures employed by Colwell were the Aliferis Music
Achievement Test, the Farnum Music Notation Test (Farnum, 1950), and the Knuth
Achievement Tests (Knuth, 1967). Colwell concluded that musical achievement could
easily be approximated through the use of a short sight-singing measure (1963).
Later research began to focus on representing musical achievement with separate
factors. Hodges (1975) defined musical achievement in terms of five representative
musical factors, examining the influence of recorded aural models on the performance
achievement of beginning band students. One hundred students from fourteen band
classes participated in this study. Band classes were placed into either an experimental or
control group. Performance achievement criteria utilized for these measures included
tone quality, pitch accuracy, rhythm accuracy, tempo stability, dynamics, and an overall
performance score derived from the total of all performance skill scores.
An important study conducted by Suchor (1977) contributed toward the
development of the model of performer-controlled musical factors by defining
performance achievement using two main areas: aesthetic and technical. Suchor
investigated the influence of personality type on piano performance achievement, group
interaction, and perception of group. The judging-perceiving personality preference was
measured on the Myers-Briggs Type Indicator. Participants (N = 24) were grouped into
one of three categories: predominately perceiving, predominantly judging, and equally
mixed. Performance achievement was scored using two variables: aesthetic expressive
and accuracy. The aesthetic-expressive variable was represented by the dimensions of
volume, touch, and tempo intention; the accuracy variable included melody, rhythm,
rhythmic continuity, and harmony (1977). No effect for personality type (p = .219) was
29
found to influence performance achievement. However, significant differences were
found for perception (p = .019) and interaction (p = .028). Suchor (1977) suggests that
teacher flexibility is an invaluable teacher trait in developing the problem-solving skills
of students. This study helped to define musical achievement as a combination of both
technical and expressive aspects. A study that examined the technical aspects of
performance was conducted by Schleuter (1978).
Specifically, Schleuter (1978) studied the influence of lateral dominance, sex
differences, and music aptitude on instrumental achievement. Results from this study
indicated no significant effect for lateral dominance or sex differences. However,
Schleuter did find a significant effect for music aptitude on music achievement scores.
Schleuter states that these findings possibly suggest that music achievement in the initial
stages is influenced by music aptitude (1978). Musical achievement data was gathered
using measures of tonal skills (sense of tonality, tone quality, and intonation), rhythmic
skills (consistency of tempo beats, accuracy of meter, and melodic rhythm patterns),
instrument physical manipulation skills (finger, hand, arm dexterity, general muscle
coordination), and general instrumental music performance skills (1978). This study
helped to support the hypothesis that perceptions of technical achievement in musical
performance is represented by component factors that include tone, rhythmic accuracy,
and intonation.
Another study by Zdzinski (1993) supports the definition of musical achievement
as a combination of both technical and musically expressive aspects. In this study,
Zdzinski explored the relationships between parental involvement, cognitive and
affective student attributes, and music learning outcomes. Specifically, the effects of
30
parental involvement, music aptitude, grade level, and gender on musical learning
outcomes were investigated. The learning outcomes, defined as cognitive music
outcomes, performance outcomes, and affective outcomes, served as dependent variables.
Performance outcomes measured in this study were gathered using two separate
performance measures. Objective performance data consisting of note and rhythm
accuracy was measured using the Watkins-Farnum Performance Scale (WFPS; 1954). An
additional measure, the Performance Rating Scale Supplement (PRSS), was created by
the researcher in order to gather subjective data in addition to objective data. The factors
included on the PRSS were musicality, tone quality, intonation, and technique (Zdzinski,
1993). Interjudge reliability coefficients for the WFPS and the PPRS were reported at
.979 and .882 respectively.
Participants (N = 406) in this study included instrumental music students enrolled
in five separate band programs in rural New York and Pennsylvania. Students in these
programs ranged from grades four through twelve. The instruments played by the
participants included flute, oboe, clarinet, saxophone, bassoon, trumpet, French horn,
trombone, baritone, tuba, and percussion.
The results of this study found a significant relationship between parental
involvement and both cognitive and affective outcomes. However, the relationship
between parental involvement and performance outcomes was reported to be mixed at
best. Zdzinski (1993) reported no significance between parental involvement and
performance outcomes at the secondary level, but did find significance at the elementary
level with a shared variance of 13.8%. Grade level was also reported to account for 25%
31
of variance with performance scores. Zdzinski states these results should be viewed with
caution due to wide ranging definitions of performance achievement.
Geringer and Johnson (2007) explored the influence of general performance
factors on perceptions of musical achievement. The purpose of this research was to
examine the effects of performance duration on the consistency of wind band
adjudication. Additionally, the design of this study also controlled for the effects of
tempo and performance level during adjudication.
Participants in this study (N = 96) included music students enrolled in music
programs at one of three large universities in Missouri, Florida, and Kansas. Each
participant was asked to evaluate a series of eighteen listening examples. The listening
examples were presented in a series of fast and slow excerpts that varied in duration and
performance level.
The performance evaluations measured adjudicator responses to two prompts: (1)
perceived performance level and (2) perceived performance dimension in need of
attention. Perceived performance level was measured on a continuum with both student
performance and professional performance as opposing anchors. The second prompt
asked evaluators to select an area in need of improvement from a list of factors that
included balance, blend, dynamics, tone, intonation, rhythm, tempo, and expression.
Evaluators were also permitted to write in an alternate response if their evaluation did not
include a factor listed on the form.
The results of adjudicator responses indicated no significant main effect for
duration. These results support previous research by Vasil (1973) who found that
reliability and ratings were no different for performances of different durations. However,
32
Geringer and Johnson (2007) state that tempo did have a significant interaction with
performance level and duration. This effect is concurrent with previous research that
illustrates a preference for faster tempi. Geringer and Johnson state, however, that this
result indicates an impression of performance quality rather than a preference.
Adjudicator responses also revealed that tone and intonation were the areas
indicated in need of improvement at the high school level. Musical expression was the
area indicated in need of improvement for the more experienced groups. These results
coincide with previous research by Johnson and Geringer (2007). Geringer and Johnson
conclude that the results of this study indicate that a consistent and reliable evaluation of
musical performance can be made rather quickly. In addition, these results illustrate the
influence of tempo, tone, intonation, and musical expression on musical performance
evaluation.
Miksza (2007) examined the relationship between practice behaviors and
performance achievement. Specifically, the purpose of this study was to examine
relationships among observed practice behaviors, self-reported practice habits, and the
performance achievement of high school wind players.
Participants in this study included 60 high school students from six different high
school music programs in Indiana and New Jersey. Each participant was monitored
during three separate monitored practice sessions. Practice sessions were scheduled to
include preparation time, practice time, and self-evaluation time. Students were not
accompanied during the practice time in order to avoid any influence the observer may
have on the practice situation (Miksza, 2007). Participants were recorded once at the
33
beginning of each practice session and once at the end to facilitate the pre-post test
design.
Performance measures employed in this study included the objective performance
measure (OPM) and the subjective performance measure (SPM). The OPM was an
adaptation of the Watkins-Farnum Performance Scale (Watkins-Farnum, 1954). The
OPM adapted the WFPS to evaluate errors in pitch, rhythm, dynamics, and articulations
in terms of numbers of beats performed incorrectly.
The SPM was utilized as a measure of the subjective aspects of performance not
evaluated by the Watkins-Farnum Performance Scale (WFPS). This measure was an
adaptation of the Performance Rating Scale Supplement (PRSS) scale originally
developed by Zdzinski (1993). The constructs measured on the SPM included etude
specific criteria (execution of dynamics, etc.), interpretation/musical effect,
tone/intonation, and technique/articulation. Miksza (2007) reports the internal
consistency of the SPM,
to .98.
Miksza (2007) states that the results from this study support previous research by
Miksza (2006) who found a lack of correlation between the time spent practicing and
musical achievement. He suggests that the quality of practice session is possibly more
influential that quantity of practice time. Instructors can influence practice quality by
focusing on specific performance dimensions for individual practice. The results of this
study also lend support to the utilization, identification and influence of individual
performance factors such as pitch, rhythm, dynamics, interpretation/musical effect,
tone/intonation, and technique/articulation.
34
Research on musical achievement has investigated the effect of both musical and
non-musical influences on musical performance. This research is important because it
provides insight into musical factors considered to be representative of musical
performance, and information regarding possible sources of error in performance
assessment. The musical factors represented in this section include tone, technique,
volume, pitch, rhythm, melody, harmony, touch, intonation, musicality, expression, and
tempo (Hodges, 1975; Suchor, 1977; Zdzinski, 1993; Geringer and Johnson, 2007;
Miksza, 2007).
Adjudicators and the Adjudication Process
Along with investigations that examine the influences on musical achievement,
researchers have also conducted many studies regarding the process of evaluating
musical achievement. These researchers have not only contributed valuable information
concerning the process of performance evaluation, but have also provided further support
and information regarding the musical factors and the influence on performance
evaluations. The musical factors used in adjudication research provide valuable
information regarding aspects of performance considered to be predictive of overall
performance quality.
Research conducted by Fiske (1975) sought to determine what differences exist, if
any, between brass and non-brass specialist evaluations of trumpet performance. Musical
performances were evaluated on four separate factors including intonation, rhythm,
interpretation, and technique. In addition, an overall category was also utilized. The
results of this study indicated no significant difference between the evaluations made by
35
brass and non-brass specialist adjudicators. Fiske (1975) concludes that brass
adjudication can be equally served by both brass and non-brass specialists.
In 1979, Fiske investigated the influence of performance achievement and nonperformance achievement on assessments of musical performance. Performance
achievement was measured using applied performance grades. Non-performance
achievement was measured using music theory and music history grades. Fiske (1979)
employed a test-retest design. Adjudicators were asked to listen to and evaluate
recordings of trumpet performances (N = 40). Performances were evaluated using a
rating sheet that included five separate performance categories: intonation, rhythm,
technique, phrasing, and overall. Fiske reports sub-scale reliabilities between .60 and .63
for rhythm, phrasing, and technique. Sub-scale reliability for intonation was reported at
.46. Fiske concludes that no significant relationships exist between performance ability
and judge reliability, and performance ability and non-performance achievement.
In an effort to continue exploration of the influence of the adjudicator in the
adjudication process, Duerksen (1972) studied the effects of adjudicator expectation on
evaluations of recorded musical performances. Music majors and non-music majors (N =
517) served as participants in this study and randomly assigned to either the control or
experimental group. Each participant was asked to evaluate two recordings of identical
piano performances. Participants were not informed of the test-retest design, but instead
were told that one of the performances was that of a professional and the other was a
student seeking entrance into a university piano performance program. Performances
were rated using separate scales for pitch accuracy, rhythmic accuracy, appropriateness
36
of tempo, appropriateness of accent, dynamic contrast, tone quality, interpretation, and
overall quality.
The results of this study indicate a significant difference (p < .01) between the
control and experimental groups for all trait measures of musical performance (Duerksen,
1972). This suggests that adjudicator expectation has a significant influence on the
outcome of performance assessments. Duerksen also reported that the objective aspects
(i.e., rhythm and pitch) of performance were no more influential than the subjective
aspects (i.e., interpretation and overall effect). Additionally, no significant difference was
found between the evaluations of music majors when compared to non-music majors.
Duerksen’s research supports the influence of both technical and expressive factors in
predictions of overall performance quality.
A number of studies have also studied the influence of adjudicator experience on
the process of performance adjudication. A study by Schleff (1992) sought to determine
if a difference exists between the judgments of undergraduate music students and
professional music critics with regards to the quality of recorded music performances.
Specifically, Schleff attempted to determine the extent to which the judgments of
undergraduate music students and professional musical critics conformed when
evaluating two styles (Classical and Romantic) of prerecorded instrumental, piano, and
vocal performances.
A total of 117 performances were selected from a pool of more than 700
performance reviews. The recordings selected for inclusion in this study ultimately met
two criteria: (1) a minimum of three published reviews, (2) either a superior or inferior
performance quality evaluation from more than half of the reviewers. Reviews that were
37
neither deemed inferior or superior were discarded. Excerpts of the reviewed recordings
were prepared and sent to university professors for consensus of opinion regarding
quality of performance.
Using a global rating scale, university professors evaluated the 100 performance
excerpts. Excerpts that received a consensus of opinion regarding performance quality (n
= 52) from both university professors and music critics were placed into a final pool from
which 30 randomly selected excerpts were extracted. Graduate music education students
(n = 18) participated in a pilot study that asked for each participant to compare the 30
performance excerpts to their personal conception of an ideal performance. From the
results of the pilot study 28 performances were selected for inclusion.
The selected performance excerpts were played for instrumental (n = 99),
keyboard (n = 27), and vocal (n = 78) music education students (N = 204). After each
recording was played, participants were asked to provide an overall performance quality
rating on 13 performance characteristics: (1) projection and expressive import, (2)
intonation, (3) rhythmic precision, (4) appropriate rhythmic flexibility, (5) phrasing, (6)
expressive line, (7) blend with and between ensembles, (8) articulation, (9) appropriate
use of dynamics, (10) balance between ensembles and parts, (11) tone quality, (12)
technical facility of the performer(s), and (13) diction (if applicable). Each performance
characteristic was paired with a nine-point semantic differential with anchors of inferior
and superior.
The results of this study indicate that undergraduate students more often agree
with than disagree with the opinions of professional music critics with regards to superior
performances. Schleff (1992) suggests that the disagreement regarding inferior
38
performances is probably due to the inexperience of the undergraduate musician. This
conclusion coincides with Tiede (1971) who determined that student conductors with less
experience were less critical of inferior performances than more experienced conductors.
A difference was also reported regarding the participant’s performance area.
Participants in both vocal and instrumental groups reported a lower level of agreement
than the keyboard students regardless of performance medium (instrumental, vocal, or
keyboard). Schleff also reported that keyboard participants indicated a higher level of
confidence in their performance evaluations than both the vocal and instrumental groups.
He concludes that the ability to make critical judgments regarding performance quality is
a function of experience. This suggests that performance evaluation, whether assessed by
faculty or through self evaluation, can be influenced by adjudicator experience.
A research study by Bergee (1997) explored assessment accuracy in peer and self
evaluations. The purpose of this study was to further explore the consistency and
accuracy of peer evaluation and self-evaluation of end-of-semester applied music
performances. Bergee attempted to provide evidence that the results obtained in previous
research by Bergee (1993) applied to areas beyond brass performance. For this
investigation the emphasis was expanded to include voice, percussion, string and wind
instruments. Specifically, this study examined the following research questions: (1) what
is the interjudge reliability of faculty and peer evaluations of undergraduate applied
voice, percussion, wind, and stringed instrument end-of-semester performances? (2) To
what extent do faculty, peer, and self-evaluations of undergraduate applied voice,
percussion, wind, and stringed instrument end-of-semester performances intercorrelate?
(3) Are there differences in ability to self-evaluate among different performance
39
concentrations (voice, percussion, etc.) or between two levels of performance
achievement?
Applied music faculty from three universities who were normally responsible for
evaluating end-of-semester performances was recruited for the purposes of this study. In
addition, participants representing each of the performance categories (voice, percussion,
brass, woodwind, and strings) enrolled as music education or music performance majors
were recruited from the same universities. The end-of-semester performances took place
over one or two days. Each performance was video taped for the self-evaluation portion
of the study.
Performances were measured using the categories found on the Music Educators
National Conference (1958) solo adjudication forms. Performance categories for voice
included tone, intonation, diction, technique, interpretation, and musical effect.
Percussion categories included tone, mallet/sticking technique, body and hand position,
interpretation and musical effect. Wind and stringed instrument performance categories
included tone, intonation, technique, interpretation and musical effect. An additional
category of articulation was added to the wind instrument category per the request of the
faculty.
Following the end-of-semester performances, student participants received a copy
of the video taped performances from their respective institutions. Each participant was
asked to evaluate each performance (including their own) as objectively as possible.
Participants were allowed to playback each performance at there leisure (Bergee, 1997).
The results of this investigation indicate that faculty total and subscale interjudge
reliabilities were uneven. Student peer group total and subscale interjudge reliabilities
40
reported more uniform and consistent results. Faculty and peer correlations for total and
subscale score were high. Self-evaluations correlated poorly with both faculty and peer
evaluations. No significant difference was reported to exist between performance levels.
Bergee suggests that training be considered to alleviate the issue of uneven faculty
interjudge reliability. He states however, that this instability is also probably due in part
to the different methods used to evaluate the performances. Since the facet-factorial
method of scale development has demonstrated high reliabilities, further consideration to
the development of facet-factorial rating scales for all performing media would benefit
music performance measurement.
Bergee (1997) also points out that the poor correlation of self-evaluation with
both peer and faculty evaluations are consistent with Bergee (1993). This situation could
be improved with an open and supportive dialogue that discusses these discrepancies and
the implementation of self-evaluation techniques shown to be effective in teacher training
(Duke, 1987; Prickett, 1987; Arnold, 1995).
The ability to listen critically is an invaluable skill for musicians and performance
adjudicators alike. A fifth study in a line of research conducted by Geringer & Madsen
(1998) attempted to determine whether musicians demonstrate consistent listening
patterns when listening to music. Previous studies focused on intonation, tone quality or
both simultaneously during listening (Madsen, Geringer, & Heller, 1991, 1993; Madsen
& Geringer, 1998). These studies demonstrated the participating musicians’ ability to
focus on one or more performance dimensions and successfully discriminate between
both good and bad performances. Another important aspect of this line of research was
the examination of continuous response versus paper-and-pencil rating scale response
41
modes. The results of this inquiry concluded that the response mode made no significant
impact on the outcome of the listening responses.
The focus of this investigation was to determine whether or not musicians could
focus on other musical aspects during listening. Specifically, participants were asked to
rate performance excerpts in the categories of phrasing/expression, intonation, rhythm,
dynamics, tone quality, and overall performance. Undergraduate and graduate (N = 48)
students enrolled in music theory, group piano, and music education courses at a southern
university were recruited to participate (Geringer & Madsen, 1998).
The recordings used for this research were previously recorded and used in the
four previous studies. Four performers (soprano, tenor, violinist, and cellist) were
employed to record both good and bad excerpts of Schubert and Gounod’s versions of
“Ave Maria.” Participants were randomly assigned to an accompaniment group (n = 24)
and an unaccompanied group (n = 24). The unaccompanied examples did not include a
piano accompaniment.
Participants were asked to evaluate each recorded example using six performance
dimensions. Each dimension was accompanied with a five-point Likert scale ranging
from 1 (representing a poor performance) to 5 (representing an excellent performance).
An additional overall global rating was also assigned to each performance. At the end of
each excerpt participants were asked to respond to one question, “What aspect of
performance does this student need to improve most?”
The results demonstrated that participants consistently and clearly discriminated
between the good and bad performances. In response to the end question, intonation was
the most identified category in need of improvement for the bad performances (78%). For
42
the good performances responses were almost evenly distributed across all performance
categories (Geringer & Madsen, 1998).
These results also extend the findings within this line of research by
demonstrating that consistent discriminations could be made between both good and bad
performances across several performance dimensions that included: phrasing/expression,
intonation, rhythm, dynamics, tone quality, and overall performance. Geringer and
Madsen (1998) also conclude that intonation and tone quality are important when
evaluating a performance.
Kim (2000) focused on whether or not inexperienced judges are as consistent as
experienced judges. This issue of consistency in piano performance evaluation was
examined under four separate conditions: (1) using a rating scale, (2) using both rating
scale and musical score, (3) neither rating scale or musical score, (4) using musical score
only. Additionally, this study attempted to determine which condition encourages the
highest reliability.
Participants in this study included experienced university level piano instructors
(n = 3) and doctoral level piano students (n = 3). The piano instructors were recruited as
“experienced” evaluators. The doctoral level students were recruited as “inexperienced”
evaluators. The performing participants were all undergraduate piano majors enrolled in a
music conservatory in New York City. All musical selections performed were from the
Romantic era and were selected based upon similarity in both musical and technical
content.
A total of five pieces of musical literature were selected for this study. Each
performance was recorded and transferred to tape. Participants in both adjudication
43
groups were asked to evaluate each performance under each of the four conditions.
Performances were evaluated using the Piano Performance Evaluation Rating Scale
(PPERS). The PPERS measures performance achievement across eight performance
dimensions: tempo, rhythm, articulation, technique, interpretation, dynamics,
tone/pedaling, and memory.
The four conditions used to evaluate the music performances yielded several
interesting results. The condition which implemented the use of the musical score as the
sole means of evaluation criteria demonstrated the most impact. These results are also
contrary to those of Wapnick et al. (1993) who suggested that the use of musical scores
for performance evaluation have no effect on judge consistency. Kim (2000) concludes
that the utilization of rating scales for the purposes of increased reliability and
consistency in performance evaluation is still inconclusive. This conclusion is contrary to
research by Abeles (1971) which suggests that employing rating scales for performance
evaluation will improve interjudge reliability.
The results of this study suggest that both experienced and inexperienced judges
can achieve acceptable levels of interjudge reliability (above .88 and .60 respectively).
Both experienced and inexperienced adjudicators demonstrated consistency in evaluating
high-level piano performance. Kim (2000) suggests that the lower reliability scores
exhibited by the inexperienced judges are indicative of the deficiencies in the traditional
piano curriculum. In addition, these results suggest that even though experience in shown
to be significantly influential (Duerksen, 1972), the size of the effect may not be too
great.
44
Performance adjudications can be influenced by many variables. Some commonly
occurring factors that have raised questions regarding their influence on adjudication
reliability were investigated by Bergee (2003) who proposed to examine the interjudge
reliability of faculty evaluation of end-of-semester jury performances in applied music.
Specifically, Bergee attempts to investigate the effects of three separate circumstances
commonly encountered in jury evaluations: variability in size of adjudication panel, mode
of evaluation employed, and adjudicator experience.
In an effort to collect the most reliable data from the end-of-semester
performances, Bergee employed previously developed performance measures for brass,
woodwinds, strings, percussion, and voice. Each of these measures was created using
factor analysis as the method for selecting both items and factor structure. Bergee used
this same method to develop a measure of piano performance for use in this study. Factor
analysis has been demonstrated as a viable method for creating valid and reliable
measures for instrumental performance (Abeles, 1971; Bergee, 1986; Zdzinski & Barnes,
2002; Russell, 2007).
Bergee (2003) revised each performance measure to include only three
representative items per performance dimension. This revision facilitated the limited
amount of time allotted for each performance evaluation. Items with the highest factor
loadings were selected for inclusion. Each subscale was examined for items that clearly
represented distinct performance aspects. Redundant items were replaced. All items were
paired with a five-point Likert scale that ranged from SD (Strongly Disagree) to SA
(Strongly Agree).
45
In addition to the performance measure, each adjudicator was asked to assign an
overall letter grade for the performance ranging from A+ (excellent performances in all
aspects) to F (exceedingly poor performance in all aspects). On a separate form
evaluators were asked to indicate what position they held at the institution and how many
years of experience that they have evaluating jury performances.
Participants in this study included brass (n = 4), percussion (n = 2), woodwind (n
= 5), voice (n = 5), piano (n = 3), and string (n = 5) faculty members and teaching
assistants who were slated to evaluate the end-of-semester jury performances for graduate
and undergraduate music education majors and minors. Each participating adjudicator
was briefed on the use of the measures.
The results of this study indicated that interjudge reliability remained stable on all
total scores, subscale scores, and global letter grades gathered during this study. These
results support previous research by Fiske (1975, 1977) that demonstrates an increase in
stability with an increase in panel size. Bergee (2003) attributes lower total score
reliabilities than reported on previous studies on measurement development to the
number of items used in each subscale. An increase in the number of items would
increase the total score reliability, but would negatively impact the amount of time
needed to evaluate each performance.
Adjudicator experience, in conjunction with results from Fiske (1975) and Kim
(2000) and in contrast to Duerksen (1972), seemed to have no effect on the outcome of
the performance evaluations (Bergee, 2003). Participants anecdotally stated that this has
to do with the relationships between more experienced faculty and new faculty.
46
Experienced faculty members help newer faculty members become comfortable with the
process of the jury evaluations.
The issues of surrounding performance assessment have prompted some
researchers to question the stability of music assessment as a viable research tool.
Research by Thompson and Williamon (2003) addressed conceptual and practical
problems regarding the use of performance assessment as a research tool. The questions
this research intended to answer were: to what extent do the evaluators’ marks concur? In
what ways do they differ? To what extent is the system capable of reliably discriminating
between features of the performance?
Participants in this study volunteered to perform for up to 15 minutes. All
participants were enrolled at the Royal College of Music (N = 61). The instrument
categories represented in this student population are keyboard (n = 15), woodwinds (n =
10), strings (n = 24), and other (i.e., harp, guitar, brass, and voice) (n = 12). Each student
was videotaped performing two contrasting pieces before a panel of adjudicators. The
videotapes were then sent to a panel of external evaluators (n = 3) who evaluated each
performance using a segmented marking scale.
The measurement instrument consisted of four main areas: overall quality,
perceived instrumental competence, musicality, communication. Each of these categories
contained representative items paired with a scale that ranged from 1-10. Adjudicators
were asked to consider both performances when assessing overall quality.
The data were analyzed using a nonparametric statistical method (Spearman’s
rho) to calculate the correlation coefficients between each combination of evaluators. The
results demonstrated a moderate positive correlation over the complete set of
47
performances (mean ρ = 0.480, range 0.332-0.651, p < .05). This accounts for
approximately 25% of the observed variance. These results do not support previous
findings by Fiske (1977) who suggested that overall judgments are more reliable across
evaluators than segmented evaluations of separate performance aspects. However,
Thompson et al (2003) indicated that these results may not be comparable due to
methodological differences. Further analysis of the data indicated a high degree of
multicollinearity. Using the three main categories of perceived instrumental competence,
musicality, and communication as predictors for the overall quality mark, the data
revealed a high correlation among all sets of variables.
Thompson and Williamon (2003) suggest that the interjudge reliability of
performance assessment is at best moderate, and the lack of a uniform performance
assessment research tool makes research results difficult to compare. The authors note the
possible sources of error in their evaluation process, but they also point out that this
scenario is representative of a realistic assessment situation and is no less reliable than a
controlled research environment. A measure developed from the standardization,
identification, and definition of the categories that influence performance assessment is
possible. However, the downside involves the amount of time the evaluators must use to
become acquainted with such an instrument. In conclusion, music performance
assessment is simply not open to reliable and consistent scrutiny (Thompson and
Williamon, 2003).
Research on musical performance adjudication supports the ability to reliably
evaluate musical performance regardless of adjudicator experience or primary instrument
(Fiske, 1975; Schleff, 1992; Bergee, 1997, 2003; Geringer & Madsen, 1998). The
48
relevance to the current study is the support for the utilization of a participant sample that
includes a wide array of people collegiate of professional experience in music. This body
of literature also provides support for the use of individual performance factors to
evaluate musical performance (Bergee, 1997, 2003; Thompson & Williamon, 2003). The
performance factors represented in the adjudication literature include intonation, rhythm,
technique, phrasing, pitch accuracy, tempo, accent, dynamics, interpretation, tone, blend,
articulation, balance, musical effect, memory, musicality, and communication. In
addition, this research demonstrates the assessment of both technical and expressive
aspects of musical performance.
Performance Aspects of Musical Expression
Research on expression in musical performance has provided a better
understanding of the subjective aspects of musical performance quality. Early research on
musical expression focused on the influence of musical elements on perceived expression
in music performance. Research by Gatewood (1927) compared rhythm, melody,
harmony and timbre to the reported musical effects of thirty-five female participants.
Participants listened to ten separate musical selections and answered three prompts
regarding the most prominent musical element, reason for preference of selection, and
emotion associated with the musical selection from a list of 12 adjectives. The results of
this study illustrated rhythm as the factor most associated with feelings of excitement or
stir. Melody was associated with feelings of rest and seriousness. Both harmony and
timbre showed no definite associations among the participating listeners.
49
In a related study with a more elaborate statistical treatment, Gundlach (1935)
utilized a list of 17 adjectives to describe the expressive aspects of musical excerpts. A
factor analysis of the participant responses produced four separate factors: dynamics,
tonality, motility, and an unnamed factor. Gundlach concluded that quality music may
elicit predictable emotional reactions from some listeners. This research laid the
foundation for understanding the effects of certain musical elements on the perceived
quality of musical expression in musical performance.
A series of studies by Hevner (1935, 1936, 1937, and 1938) sought to answer the
question of whether a clearly apprehended meaning in music, one that occurs with high
frequency, exists. A series of six experiments studied the influence of changes in
modality, rhythm, tempo, harmony, melody and pitch had on the perceived expression of
music. Hevner (1938) reported tempo as the factor of greatest importance in determining
expressive response. The succession of the remaining variables in order of importance
was reported as follows: modality, pitch, harmony, and rhythm. Melody, descending or
ascending, was reported to carry practically no expressive meaning. Hevner concludes
that the results of these studies indicate a “uniformity and consistency in the
apprehension of musical meaning” (1938, p.207).
The results of Hevner’s line of research suggest that a clearly apprehended
meaning in music can indeed be ascertained with relative consistency. Additionally,
Hevner provided a hierarchical model of expressive elements in musical performance.
Research by Hoffren (1964a, 1964b) supports the line of research by Hevner (1935, 1936,
and 1937) in an effort to measure the abilities of secondary students to discriminate
expressive phrasing in musical performance. Expressive phrasing for the purposes of this
50
study was defined as the application of musical elements identified as rubato,
smoothness, articulation, phrasing, unity, continuity, dynamics, and dynamic
accentuation. Hoffren (1964a) states that “standards of value do exist and all applied
music teaching is based on the premise that there are both acceptable and unacceptable
expressive performances” (p. 32).
Levi (1978) sought to establish a Gestalt concept of musical expression in music
education. He argued that previous research on the subject of musical expression lacked a
solid base of explanatory concepts. This deficiency “…has prevented perceptual concepts
and processes from receiving full consideration in determining the educational role of
musical expressiveness” (Levi, 1978, p. 425).
Issues with musical expression arise when listeners use emotional terms to
describe their perceptions of musical performances. Levi states that these emotional terms
refer to specific musical expressive qualities within the perception of a performance.
Gestalt theory identifies two defining characteristics of expressive qualities: (1)
emotional attributes applied to perceptions, (2) particularity of expressive qualities are
determined by the structure of the perceived event (Levi, 1978). Levi asserts that a
conceptual economy is achieved “since expressive qualities depend on perceptual
structure, no special or additional processes need be hypothesized to account for the
applicability of emotional terms to music…nothing extra-musical or non-auditory is
asserted ” (p. 427). This concept of expressiveness allows for emotional descriptions of
music without presupposing the involvement of any non-auditory features, and
categorizes these descriptions as indicators of features within the musical performance.
51
Levi (1978) concludes that musical expression is an integral part of the
elementary processes of musical perception. Students who are exposed to educational
experiences in musical expression will be provided insight into timeless life-values. The
support for the application of expressive qualities to musical perception is demonstrated
in the consistency of emotional attributions to musical performance. This point is
supported by research conducted by Juslin (1997b) who found that communication of
basic emotions through musical performance is reliable regardless of response format.
In a series of two experiments, Juslin (1997b) compared listener descriptions of
emotions in musical performance using two different response formats. Participants in the
first experiment (N = 15) were asked to respond to ten different interpretations of the
melody Nobody Knows by Stephen Foster. Juslin notes that the different interpretations
of happiness, sadness, fear, anger, and love/tenderness were performed on both the
electric guitar and electronic sequencer. The performances achieved the different
emotional results by manipulating musical performance variables such as tempo, timbre,
amplitude, and rhythm (1997b).
The response sheet contained 16 different terms that included two intensities of
the following: anger, happiness, sadness, fear, disgust, and love. The last four emotions
included interest, curiosity, desire and jealousy. The results of the experiment
demonstrated an agreement regarding the general emotion being communicated, but not
the intensity. This forced-choice format was compared to an open-ended qualitative
response format in Experiment 2.
The second experiment asked volunteer participants (N = 12) to judge which
emotion a musician is attempting to communicate. Each participant was instructed to
52
select two emotional terms that described the performance. The performances utilized in
Experiment 2 were the same as those used in Experiment 1. Results of the second
experiment confirmed a listener agreement regarding the general emotional character of
the musical performances, but also indicated low levels of agreement regarding the
intensity of the emotion communicated.
Juslin (1997b) suggested that a multi-method approach to investigating expression
in musical performance might provide the answers that researchers have been looking
for. The results of this study demonstrate the ability of performers to reliably
communicate emotions through performance, and the ability of listeners to reliably
perceive this emotion regardless of response format. Gabrielsson (1999), who
investigated the expression and communication of specific emotions through
performance, reported similar results.
Gabrielsson (1999) investigated the ability to communicate what was defined as
basic emotions (happiness, sadness, anger, tenderness, fear, solemnity, and no
expression) through musical performance. The instruments used for this study included
violin, flute, saxophone, electric guitar, percussion, synthesizer, guitar rock band, and
singing voice. Performers in this study were reported to manipulate performance
variables including tempo, timing, amplitude/dynamics, intonation, timbre, tone onsets
and offsets, and vibrato. Each performance consisted of an interpretation of What Shall
We Do with the Drunken Sailor and was rated on a scale from 0 (minimum) to 10
(maximum).
Gabrielsson reports that performers were indeed able to communicate general
emotions through performance (1999). An analysis of variance supplemented by post hoc
53
comparisons illustrated a significant difference between intended and non-intended
emotions (p < .05). The mean ratings for the perceived emotions indicated that
performers were able to communicate happiness, anger, and fear reliably whereas
listeners often confused sadness and tenderness. The “no expression” category received
ratings on all emotions ranging from 4.1 to 7.7 suggesting that a melody without an
intended emotional interpretation can elicit a wide range of emotions from listeners.
The previously presented research studies on musical expression by Levi (1978),
Juslin (1997b), and Gabrielsson (1999) supports the consistency of emotional
communication and perception through musical performance. This consistency is only
applied to broad emotional categories (i.e., happiness, sadness, anger). Finer distinctions
between specific complex emotions (i.e., jealousy, shame) are not reliably communicated
(Juslin, 1997a; Juslin & Laukka, 2004). A meta-analysis of 41 studies concerning
emotional expression by Juslin and Lindstrom (2003) illustrates the ability of
professional musicians to successfully communicate five basic emotions: happiness,
anger, sadness, fear, and tenderness.
Juslin and Laukka (2004) state that the inability to communicate specific
emotions through musical performance is due in part to the redundancy of musical
features involved in communicating emotions. The redundancy of musical features limits
the complexity of emotions to be communicated. Emotions in music are expressed
through the manipulation of musical features including tempo, mode, harmony, tonality,
pitch, micro-intonation, contour, interval, rhythm, sound level, timbre, timing,
articulation, accents, tone attacks and decays, and vibrato (Juslin & Laukka, 2004). Some
of these features are manipulated at the compositional stage (i.e., mode and harmony),
54
while the performer manipulates other features, such as tempo and timbre, in real time.
These musical features combine to form various configurations that represent different
emotions.
Understanding these configurations of musical features has implications for music
education and the ability to improve expressive performance in music students. Juslin and
Lindstrom (2003) present an expanded version of the Brunswick Lens Model (1952) that
illustrates the direct effects and interactions of both composer and performer cues on the
communication of emotions to listeners. The direct effects explain approximately 75-85%
of the variance in emotional judgment. A simultaneous regression of listener happiness
ratings on composer cues, performer cues, and their interactions reports that these
independent variable account for 90% of the variance in listener happiness (adj. R2 = .90).
The most influential musical features indicated in the model are mode (β = -0.73), tempo
(β = 0.55), and the cross product of mode and tempo (β = 0.16). This model provides
valuable information regarding emotional communication to listening audiences, but
more research needs to be conducted.
A questionnaire study conducted by Juslin and Laukka (2004) helped to shed light
on expression, perception, and induction of emotional responses to music in context of
everyday life. The purpose of this study was to gather information regarding the social
context of music listening. Specifically, this study attempted to explore the possibility of
social context influences on the perceptions of intended emotional communication by
performers. A total of 141 music listeners from Sweden participated in the study
including approximately 51% trained musicians and 49% untrained musicians.
55
Results from this study yielded information regarding listener ideas about a
definition of musical expression, a hierarchy of musical virtues/features, the extent and
content of musical communication (e.g., listener and performer connection), and the basis
of musical judgments about musical expression. Listeners defined musical expression as
a communication of emotions and or ideas. A ranking of the relative importance of
musical virtues/features indicated a hierarchy of listener values: composition, expression,
and uniqueness of sound. More musician than non-musician participants indicated that
technical skills are an important aspect of musical expression (Mann-Whitney’s U-test, z
= 2.25, p < .05). Listeners also indicated through several different prompts that the
majority feels as though music and musicians communicate emotions through
performance. A total of 74% of listeners indicated that judgments of expression in music
are based upon musical elements (Juslin & Laukka, 2004). Overall, Juslin and Laukka
state that these results suggest that music, depending on the listener, may induce
emotions beyond the basic emotions. As for the influence of social context, this topic still
needs to be researched further.
The studies regarding expression in musical performance provide important
information about the nature of emotion in music. The focus of the current study is to
study the performer controlled musical features that influence the communication of
expression in music. An examination of the research presented in this section yields a
convergence of concepts concerning the performer-influenced aural aspects of musical
performance. A representative model of these musical aspects is found in the next
chapter.
56
Development of Musical Performance Measures
Research studies on the development of performance measures provide support
for the ability to objectively measure musical performance accurately and reliably.
Watkins (1942) conducted the earliest significant research study that attempted to
improve performance measurement. Watkins created an objectively-scored measure for
cornet performance from a content analysis of 23 well-known cornet methods. Sixtyeight exercises of varying difficulty were then created from the content analysis.
Participant volunteers were selected (N = 105) and ranked the exercises according to
difficulty. From these exercises, twenty-eight exercises were evenly distributed between
two separate forms. Test-retest and equivalent forms reliability coefficients were reported
to be .953 after administration of both forms of the test to 71 volunteer participants.
Another early series of studies by Gutsch (1964, 1965) attempted to improve
performance assessment by creating an objective performance measure. Gutsch
constructed an objectively based solo rhythmic sight-reading performance measure. The
measure was based on a system of rhythmic construction introduced by Schillinger
(1946). This system employed nineteen mathematical equations to generate 300 rhythmic
figures. The rhythmic figures were examined for redundancy and the remaining 200
figures were arranged and ordered according to difficulty, and separated onto two
equivalent forms.
Data gathered from participant evaluations (N = 771) indicated an equivalent
forms test reliability score of .92. A random sample of 81 evaluations were stratified and
reordered. A modified version of the test was created and administered to 137
participants. A rank difference correlation between the stratified and random sample (n =
57
81) and the new sample (n = 137) was reported to be .98. These studies demonstrated the
potential for reliability in an objectively based musical performance measure; however,
due to the lack of variables the results of these studies are limited to evaluations of
rhythm.
Watkins-Farnum Performance Rating Scale. A measure developed by Watkins
and Farnum (1954) known as the Watkins-Farnum Performance Scale (WFPS) was an
extension of the measure for cornet performance by Watkins (1942). Watkins and
Farnum transposed both forms of the original cornet performance measure to facilitate
the evaluation of flutes, clarinet (soprano, alto, and bass), oboe, saxophone, French horn,
tuba, trombone, and snare drum. Participants (N = 153) performed exercises using both
forms of the WFPS and were evaluated according to the provided criteria. The equivalent
forms reliability coefficient was reported at .95 for the measure. Watkins and Farnum
reported a range of criterion-related validity based on rank order correlations between .68
and .87. This score differed by instrument category.
Performance evaluations using the WFPS are based on the presence of errors
within each measure. Points are deducted for errors in pitch and rhythm as well as tempo,
dynamics, articulations, and other written musical directions. Only one error can be
scored in any one measure. This binary state of correctness, present or not present, does
not account for the intensity of an error within a measure. Critics of this scale fault not
only the measure’s inability to account for the frequency of errors within one measure,
but the apparent inability to measure other essential aspects such as expression, tone, or
intonation as well (Bergee, 1987; Zdzinski, 1991).
58
A study by Stivers (1972) examined the reliability and validity of the WatkinsFarnum Performance Scale. A total of 198 participants were randomly separated into
eight groups of approximately the same size. Groups 1-4 sight-read the WFPS twice,
groups 5-6 sight-read one form of the WFPS once and then practiced the same exercises
for one week before being evaluated, and groups 7-8 were given an opportunity to sightread the WFPS just once.
Equivalent forms reliability and test-retest reliability coefficients were reported to
be .97. Intra-judge reliability was reported to be .98 after both scorings of the WFPS.
Inter-judge reliability was estimated between .88 and .97 for the different groups of
judges. However, content validity was reported as moderate (.63) overall and low for
correlations between WFPS and both grades (.40) and contest ratings (.12). In addition,
scores from different instruments could not be compared due to inconsistencies in scoring
between groups. This is most likely due to the fact that the exercises on the two forms of
the WFPS are easier to perform on some instruments than others (Stivers, 1972). Stivers
states that the WFPS gives reliable information about note and rhythm performance, but
should always be supplemented with other ratings or comments if a true indication of
performance abilities is to be assessed (1972).
Numerous research studies have utilized the Watkins-Farnum Performance Scale
as a means of collecting performance data. Several of these studies have addressed the
lack of variables measured by the WFPS by adding supplemental criteria for performance
dimensions such as tone quality, intonation, and phrasing. Folts (1973) studied the
relative effectiveness of employing recorded materials during practice sessions on the
performance of flute, clarinet, and cornet students. The WFPS plus an additional panel of
59
judges to rate participant tone quality was utilized to rate performance achievement. In a
study of entrance on achievement and retention in beginning band programs, Silliman
(1977) used tape recordings of student performances to rate tone quality, intonation, and
phrasing in addition to the WFPS. Abdoo (1980) studied the effects of gestalt and
associationist learning theories on the performance achievement of beginning wind and
percussion students. Similar to Folts (1973), Abdoo utilized a panel of three judges to rate
participant tone quality in addition to the WFPS.
Zdzinski (1993) conducted research that utilized the Watkins-Farnum
Performance Scale (1954) along with a researcher-designed Performance Rating Scale
Supplement (PRSS). The purpose of this study was to examine the relationships among
selected aspects of parental involvement and cognitive, affective, and instrumental
performance outcomes. Participants in this study (N = 406) included students in grades 4
through 12 enrolled in band programs located in either New York or Pennsylvania.
In order to examine participant performance outcomes, Zdzinski (1993) employed
both the WFPS and the PRSS. The Performance Rating Scale Supplement was modeled
after previously developed facet-factorial measures by Abeles (1971) and Bergee (1987)
and designed to measure musicality, tone quality, intonation and technique. Alpha
reliability coefficients were reported as .979 and .882 for the WFPS and the PRSS
respectively. The results of this study suggested that parental involvement is related to
overall cognitive, performance, and affective outcomes. Parental involvement was
specifically related to performance outcomes at the elementary levels.
60
Facet-factorial rating scales. The need for a greater accuracy in performance
measurement led to research that focused on a wider variety of empirically supported
performance variables. Researchers began to find objective methods of analysis that
provided the empirical support for a given factor structure. One factor analytical method
used by many researchers is called facet-factorial. This method employs factor analysis to
group items onto component variables according to a matrix of factor loadings. Studies
on facet-factorial strategies for rating scale development utilize this method to facilitate
item selection and identify performance variables.
A landmark study by Abeles (1971) examined the assessment of clarinet
performance. The purpose of this study was to investigate the application of a facetfactorial approach for constructing a measure of clarinet performance. This research was
the first to apply factor analysis to the construction of a rating scale for musical
performance.
Twenty-five students enrolled in music education courses at the University of
Maryland were asked to write one or two page essays describing the auditory aspects of a
good or bad performance of a junior high school clarinetist (grades 7, 8, and 9). These
descriptive statements were content analyzed for items describing performance. A
content analysis yielded 54 different descriptive statements.
The 54 descriptive statements were separated into seven a priori categories that
included tone, intonation, interpretation, technique, rhythm, tempo, and general effect.
An additional set of 40 items were gathered from previous literature was also added to
the item pool. Each item was transformed into a statement usable for evaluating a clarinet
61
performance and paired with a five-point Likert scale. A total of 94 descriptive
statements were examined for appropriateness and positive or negative tone.
One hundred different solo clarinet performances were collected from 7th, 8th, and
9th grade clarinet students (n = 50). These recordings were separated into random groups
of two and distributed to instrumental music teachers (n = 50) from Prince Georges
County, Maryland. Each judge was instructed to listen to each recording and respond to
each of the 94 Likert statements. No time limit was imposed on the evaluation procedure,
and judges were informed that each performer was an eighth grade student enrolled in
instrumental music for three years.
The results of the item pool performance responses were subjected to a factor
analysis. A principal components method with a varimax rotation was used to produce six
underlying performance factors including interpretation, intonation, rhythm-continuity,
tempo, articulation, and tone. An analysis of the resulting factor matrix indicated thirty
items as most representative of the a priori categories. Each item selected had relatively
high factor loadings and were factor simple (low correlations with the other factors). The
thirty items were evenly distributed among the six factors and paired with a five-point
Likert scale.
To establish interjudge reliability, music teachers (n = 32) enrolled at the
University of Maryland were recruited as volunteer judges. The judges were separated
into three judge panels (n = 9, n = 12, n = 11 judges). High interjudge reliabilities were
reported for both the total score (> .90) and scale scores (> .60) for each panel of judges
for the revised CPRS. The criterion-related validity (> .80) was established by correlating
scores from both the CPRS and global performance ratings. Abeles (1971) states that the
62
identification of factors that reflect non-idiosyncratic characteristics of clarinet
performance has the potential for utilization as a general measure of music performance.
The results of this study also demonstrate the ability of facet-factorial scale construction
techniques to produce reliable and valid measures of musical performance.
The research conducted by Abeles (1971) was followed by other studies
examining the aural aspects of music performance assessment using facet-factorial
techniques. Cooksey (1974) employed facet-factorial techniques to develop a rating scale
for high school choral groups. The resulting factors identified for choral evaluation were
diction, precision, dynamics, tone control, tempo, balance/blend, and interpretation/
musical effect. DCamp (1980) constructed a facet-factorial rating scale for high school
band. The factors identified for band performance evaluation were tone/intonation,
balance, musical interpretation, rhythm, and technical accuracy. Both the Cooksey (1974)
and DCamp (1980) studies demonstrate the effectiveness of utilizing facet-factorial scales
for the evaluation of group performance.
A study by Jones (1986) attempted to expand the use of facet-factorial scales to
examine both the aural and visual aspects of musical performance through the
development of a solo vocal performance rating scale. This study was an extension of
previous research by Abeles (1971), Cooksey (1974), and DCamp (1980). The purpose of
this study was to develop a rating scale for individual vocal performance using facetfactorial techniques that would be appropriate for use in a high school choral rehearsal.
Previous performance assessment employing facet-factorial methods have focused on the
aural aspects of musical performance. Jones intended to use facet-factorial methods to
develop a measure for evaluating both aural and visual aspects of vocal performance.
63
Vocal performances from high school students (n = 30) ranging in ages from 15
to 18 were videotaped using students from nine different schools within Arkansas.
Participating students prepared solo music appropriate for contest. Wide ranges of vocal
abilities from beginner to advanced were recorded for use in the subsequent phases of the
study. Judges (n = 50) from six different states were used to adjudicate the videotaped
performances. No judges participated in more than one phase of the study.
Members of the National Association of Teachers of Singing (NATS) contributed
forty-three essays regarding good and bad vocal performances. The essays were content
analyzed and the extracted items were separated into a priori factors established by the
National Interscholastic Music Activities Commission (NIMAC): tone, intonation,
technique, interpretation, musical effect, and other factors. The item statements were
examined for appropriateness and positive or negative disposition. Each of the 168 item
statements selected to be included in the final item pool were accompanied by a fivepoint Likert scale ranging from Strongly Agree to Strongly Disagree.
Fifteen volunteer judges experienced in vocal adjudication were used to evaluate
two randomly selected performances. Each judge was given an unlimited amount of time
to adjudicate each performance using the item pool developed in the previous phase. The
results of the item pool adjudications were then submitted to a factor-analysis.
A principal-components technique was used to identify the underlying factors that
relate to vocal performance. This factor analysis produced twenty-six initial factors. From
these twenty-six factors, a five-factor structure was chosen that includes: interpretation/
musical effect, tone/musicianship, technique, suitability/ensemble, and diction. These
factors were then rotated using a varimax rotation to produce uncorrelated factor-simple
64
items that adequately represent each of the underlying performance dimensions. The
results produced thirty-two items to be included in the Vocal Performance Rating Scale
(VPRS).
Interjudge reliability was calculated using three panels of judges consisting of
fifteen judges evenly distributed. Using the final form of the VPRS, the total interjudge
reliability was reported as .894, .917, and .920 for each group respectively. Interjudge
reliability calculated for judge panels ranging from one to ten judges ranged from .627 to
.958. Sub-scale reliability estimates ranged from .201 to .958. Jones attributes the wide
spread in reliability to the instability of the Suitability/Ensemble factor.
Criterion-related validity was established by correlating scores from the VPRS
and scores from the NIMAC Vocal Solo Adjudication Form. Zero-order correlation
coefficients between the VPRS total scores, sub-scale scores, and the NIMAC scores
ranged from .351 to .878. A step-wise multiple regression of the VPRS sub-scale scores
and the NIMAC criterion produced a corrected R2 score of .897.
The results from Jones (1986) research coincided with the results found in
previous studies by Abeles (1971), Cooksey (1974), and DCamp (1980). Utilizing facetfactorial scale construction techniques a reliable and valid scale was produced. However,
judges tended to react differently to various visual components of performance such as
age, appearance, physical size, grooming, etc. The adjudicator’s reaction to the visual
aspects of the Vocal Performance Rating Scale may have contributed to instability of the
measure (Jones, 1986).
Bergee (1987) conducted facet-factorial research on the development of a rating
scale for low brass performance. This study was a replication of previous research by
65
Abeles (1971) on woodwind performance. The purpose of this research was to develop an
empirically valid and reliable rating scale for the evaluation of tuba and euphonium
performance. In addition, Bergee intended to identify the factors that contribute to tuba
and euphonium performance, and select items that appropriately represent the
performance factors.
Essays from professional tuba and euphonium players and adjudication sheets
collected from area music teachers were used to form the initial item pool. The essays
contained descriptions of good and bad tuba or euphonium performance. Additional
statements were collected from previous research on performance assessment. A content
analysis was performed on the item statements and a total of 210 descriptive statements
were extracted. These statements were examined for redundancy and appropriateness.
Any item deemed as redundant or inappropriate was eliminated. A panel of three judges
determined the positive or negative tone of the item statements and agreed on 112
statements to be used in the final item pool.
The item pool statements were separated into the a priori categories previously
established by Abeles (1971): tone, intonation, interpretation, tempo, rhythm, technique,
and general effect. Each item was translated into items that were appropriate for tuba and
euphonium assessment and randomly ordered. A 5-point Likert scale ranging from
Strongly Disagree to Strongly Agree was added to each item statement.
A total of 100 collegiate and public school tuba (n = 50) and euphonium (n = 50)
performances were recorded and evaluated using the 112 Likert-type items. Instrumental
music teachers, University of Kansas faculty members, and graduate students served as
66
volunteer judges (n = 50). Each judge was asked to adjudicate two randomly assigned
performances using the item pool.
The results of the initial administrations of the item pool were factor analyzed.
The factors retained for rotation were determined by 3 criteria: (1) precedent established
by prior researchers, (2) Eigenvalues greater than 1.00, and (3) scree plot. The selected
factors were then orthogonally rotated to obtain uncorrelated factors. The investigator
decided on a four-factor structure that includes: interpretation/musical effect, tone
quality/intonation, rhythm/tempo, and technique. The final version of the Euphonium and
Tuba Performance Rating Scale (ETPRS) included 27 items and accounted for 75.8% of
the total variance.
Three separate panels of judges (n = 10 in each) were formed to determine the
interjudge reliability of the ETPRS. The interjudge reliability for the total scores of the
ETPRS was .944 for panel 1, .985 for panel 2, and .975 for panel 3. Interjudge reliability
was also calculated for groups of judges of 1 to 20 using the Spearman-Brown Prophecy
formula. These scores range from .835 to .984. Bergee (1987) states that these results
support Fiske’s (1983) contention that interjudge reliability increases to the tenth judge
and then levels off beyond this point.
The criterion-related validity for the ETPRS was established in two separate
studies. The first study employed three judge panels (n = 10 in each) comprised of
instrumental music teachers. Each panel was asked to perform magnitude estimation on a
set of performances. An examination of these zero-order correlation coefficients indicates
a strong relationship between the ETPRS and the magnitude estimation (Bergee, 1987). A
multiple regression analysis using the magnitude estimation as criterion reports corrected
67
R2 scores ranging from .831 to .913. This indicates that the ETPRS is an appropriate
predictor of magnitude estimation criteria.
The second study used 10 judges to evaluate the recorded performances using the
aural items from the MENC (1958) adjudication form for solo wind instruments. The
interjudge reliability for the MENC adjudication form is estimated at .978. In order to
compare the results of the ETPRS to the MENC adjudication form, zero-order correlation
coefficients were calculated. The results of the zero-order correlation coefficients ranged
from .823 to .936. The results from the step-wise multiple regression indicated a R2 =
.857.
The results from this study suggest that a facet-factorial approach to rating scale
development for low brass performance is a viable method for producing valid and
reliable measures (Bergee, 1987). Bergee notes that the high reliability and quantifiable
criterion-related validity is probably due to the procedure used to construct the measure.
He also suggests that these results could potentially lead to the development of a
comprehensive measure of music performance.
A study by Zdzinski and Barnes (2002) developed a string instrument
performance assessment using facet-factorial techniques. Specifically, this study sought
to: (1) identify the factors that influence string performance assessment, (2) identify the
items that best represent the performance factors, and (3) determine the reliability and
validity of the new measure. This study follows the progression of facet-factorial research
previously established by Abeles (1971), Cooksey (1974), DCamp (1980), Jones (1986),
and Bergee (1987).
68
The initial pool of item statements was generated from descriptive essays
gathered from string teachers, string education students, and the researchers (n = 25). The
essays consisted of descriptions of the aural aspects of good or bad stringed instrument
performances. Additional items were supplied by previously developed facet-factorial
performance rating scales (Abeles, 1971; Bergee, 1987). All item pool statements were
then placed into the a priori categories of tone, intonation, interpretation, technique,
rhythm, tempo, and general effect previously established by Abeles (1971). Upon further
examination of the item statements, Zdzinski and Barnes (2002) decided to employ an
additional category for vibrato previously established in a study by Gillespie (1997). The
item statements were examined for appropriateness and redundancy and transformed into
statements that could be used for string performance adjudication. Each of the 90 item
statements was paired with a 5-point Likert scale ranging from Strongly Agree to
Strongly Disagree and randomly ordered.
Middle school and high school string performers were recruited to record a total
of 100 string performances. Each performance was digitally recorded and averaged
approximately 31 seconds in length. Public school string teachers, university string
faculty, graduate, and both junior and senior undergraduate string education majors (n =
50) were recruited to adjudicate two randomly selected recorded performances using the
Likert-type items. During the adjudication process, judges were permitted to play each
recording as many times as needed.
The results of the adjudications using the item pool were factor analyzed. A
principal component extraction and a varimax rotation of 4 to 10 factors were used to
obtain an uncorrelated factor structure. Zdzinski and Barnes (2002) presented a five-
69
factor structure that included interpretation/musical effect, articulation/tone, intonation,
rhythm/tempo, and vibrato. The researchers noted that a five-factor structure was the best
fit to maintain the desired factor-simple structure. Based on the factor loadings, twentyeight items were selected to be included on the final version of the String Performance
Rating Scale (SPRS). Each factor was represented by six item-statements with the
exception of the vibrato factor, which was represented by four statements.
The results from this study indicate overall interjudge reliability to be above .85
for all panels. Zdzinski and Barnes (2002) state that the subscale reliabilities, ranging
from .67 to .92, are satisfactory with the exception of low subscale reliability on vibrato
in panel three (.065). Criterion-related validity was established by comparing SPRS
scores with scores obtained from both magnitude estimations and the MENC adjudication
ballot (MENC, 1958). The zero-order correlations range from .67 to .77 between the
SPRS and the MENC ballot, and .605 to .61 between the SPRS and the magnitude
estimation scores. Content and construct validity are established through the use of
previously established (Abeles, 1971; Bergee, 1987) categories and the item generation
procedure.
The factor structure of the SPRS differs slightly from those developed by Abeles
(1971) and Bergee (1987). The factors for interpretation/musical effect and rhythm/tempo
were represented in the Zdzinski and Barnes study as well as the Abeles and Bergee
studies. However, the factors of articulation, tone, technique, and intonation were
grouped differently in each study, and the vibrato factor was only present in the Zdzinski
and Barnes study. Zdzinski and Barnes stated that the differences in the factor groupings
70
may be representative of the unique characteristics of both wind and stringed instrument
technique (2002).
More recently, a study by Russell (2007) examined the use of facet-factorial
techniques in the development of a rating scale for guitar performance. The purpose of
this study was to develop a valid and reliable measure usable for rating solo guitar
performance. In addition, this study intended to identify the aural musical factors that
influence evaluations of guitar performance.
Statements concerning descriptions of both “good” and “bad” quality guitar
performances were collected from guitar instructors, professors, and professional
performers. Additional statements concerning performance quality were gathered from
previous performance research conducted by Abeles (1971), Bergee (1987) and Zdzinski
and Barnes (2002). The item pool statements were examined for suitability, redundancy,
and positive/negative tone and then placed into a priori categories that included tone,
intonation, technique, rhythm, tempo, interpretation, and musical effect. The resulting 99
item statements were paired with a five-point Likert scale.
Participants in this study (N = 55) included high school, college, and professional
guitar players. Each performer was asked to perform one or two repertoire selections of
their choice. A total of 100 recordings averaging 27 seconds in length were gathered from
the participant performances. The recordings were randomly paired into groups of two
and transferred onto compact discs.
Music professors and both undergraduate and graduate music students (n = 67)
were recruited from Florida, California, and South Carolina to act as volunteer judges.
Each judge was asked to evaluate two performances using the 99-statement item pool.
71
The results of the 134 item pool adjudications were factor analyzed using a varimax
rotation to identify the underlying factor structure and the items that best supported each
factor. The results of the factor analysis yielded a five-factor structure consisting of
interpretation/musical effect, tone, technique, rhythm/tempo, and intonation. These
factors accounted for approximately 71% of the total variance.
To create the final version of the Guitar Performance Rating Scale (GPRS), an
examination of the factor matrix was conducted in order to select the items that were
most representative of the identified performance dimensions. A total of 32 items were
selected to be most representative of the factors of the GPRS. Cronbach’s alpha for the
GPRS was estimated at .962 for the 32-item scale.
Russell (2007) suggests that the results from this research support results previous
studies that demonstrate the facet-factorial approach as an appropriate method for
developing valid and reliable performance rating scales. In addition, the performance
factors identified in this study are similar to those found in previous research; this
similarity suggests that there could possibly be a factor structure that is appropriate across
all instrument groups. Zdzinski and Barnes (2002) suggested that performance
measurement might be improved by using the facet-factorial approach as well as through
the criteria-specific approach used by Levinowitz (1989), Rutkowski (1990), Azzara
(1993), and Saunders and Holahan (1997).
Criteria-specific rating scales. Rating scales that employ criteria-specific
development strategies intend to objectively measure instrumental performance while
providing specific performance feedback to both teacher and student. Criteria-specific
72
rating scales include descriptions of performance capability at various levels of
achievement and give students a better sense of qualities of a good performance
(Whitcomb, 1999). Researchers investigating the development and application of criteriaspecific rating scales indicate substantial reliability when evaluating instrumental and
vocal performances (Levinowitz, 1989; Rutkowski, 1990; Azzara, 1993; Saunders &
Holahan, 1997).
In an effort to increase the accuracy of evaluated performances, researchers began
to focus on the criteria used for the evaluation of performance variables. Kruth (1973)
defined specific objectives to be met in an exhaustive evaluation measure for clarinet
performance. The ten-page measure evaluated the following performance dimensions:
embouchure, articulation, breath control, playing position, technical facility, and reading
ability. Unfortunately, the length of this measure made it implausible to use in an
evaluation situation where time is of the essence. An informative rubric format resulted
from the idea of informing the performer of the criteria used for evaluation.
A research study by Levinowitz (1989) utilized the rubric format to evaluate the
rhythmic and tonal aspects of children’s vocal performance. This purpose of this research
was to examine the relationship of a child’s ability to sing a song containing words and
language development. In addition, this study attempted to answer whether or not a
young child can perform a rote melody with words better than a melody presented on a
neutral syllable. Participants in this study included two classes of nursery school students
(n = 35) in Fort Washington, Pennsylvania.
Each participant received music instruction for 30 minutes per week for five
months. During the course of the study each class was taught two criterion songs. One
73
song was performed with words and the other song was performed using the neutral
syllable “bum.” At the conclusion of instruction each participant was recorded singing
both songs.
Two judges evaluated the recordings using two researcher-designed measures
designed to evaluate student achievement in the areas of tone and rhythm. These fivepoint rating scales employed descriptions of the characteristics of differing levels of tonal
and rhythmic achievement. Interjudge reliability for the tonal rating scale ranged from r =
.78 to .93 and from r = .84 to .90 for the rhythm rating scale. Levinowitz claimed direct
validity for both rating scales due to the consistency of the adjudication scores.
The t-test results indicated no significant difference between the ability to perform
rhythm with or without words. However, these t-tests did illustrate a significant
difference between the tonal abilities of songs performed with words and without words.
Levinowitz (1989) suggested that a child could potentially perform a melody more
accurately if there is no distraction from words. Scores from the tonal and rhythmic rating
scales were also correlated with scores from the Standford-Binet I.Q. Test and the
Peabody Picture Vocabulary Test (PPVT). The results indicated no significant correlation
between a child’s ability to perform a song with words and language development.
The results from this study suggested the existence of two separate mental
processes necessary for learning a song: (1) acquisition of melodic material and (2)
learning the text. Additional conclusions regarding the efficacy of criteria-specific rating
scales indicated the potential for high interjudge reliability, but suggested further
investigation into the validity of these measures (Levinowitz, 1989).
74
The trend of high interjudge reliabilities is also apparent in a study by Azzara
(1993) on the effects of improvisation instruction on the musical achievement of fifth
grade students. Participants in this study included fifth grade students (N = 66) from two
separate schools. Participants in the experimental group received instruction on
improvisation in addition to the regular class curriculum. At the end of the 14-week
instruction period each participant recorded three etudes: (1) a prepared etude, (2) a
teacher-assisted etude, and (3) a sight-read etude.
A criteria-specific rating scale was used to evaluate the recorded etudes. The
measure was designed to evaluate tonal, rhythm, and expression factors using a five-point
scale. Each successive interdependent criterion assumed that the performer has gained
proficiency at the previous level(s) (Azzara, 1993). Interjudge reliability was estimated at
.94 and no source of validity was mentioned. A panel of four judges was used to
adjudicate the 198 recordings on three separate occasions in order to adjudicate the tonal,
rhythm, and expression factors separately.
The composite results of the tonal, rhythmic, and expression evaluations were
subjected to a two-way analysis of variance. The results indicate that students who
received instruction in improvisation obtained higher composite scores than students who
did not receive this instruction (Azzara, 1993). The performance of the criteria-specific
measure used in this study was consistent with previously developed rating scales by
Levinowitz (1989) and Rutkowski (1990). Since a source of validity was not identified,
continued investigation into the establishment of validity for criteria-specific rating scales
is necessary.
75
Saunders and Holahan (1997) investigated the suitability of criteria-specific rating
scales in the selection of high school honors ensemble participants. The research
questions proposed in this study included: (1) do criteria-specific rating scales yield
adequate measurement results? (2) do criteria-specific scales help judges discriminate
between different levels of instrumental performance? (3) which performance factors are
most predictive of the overall score? A total of 926 woodwind and brass performers
seeking entrance into the Connecticut All-State Band and 36 judges participated in this
study.
The measure used to evaluate the student performers addressed the three sections
of audition material: solo evaluation, scales, and sight-reading. The solo evaluation
included the factors of tone, intonation, technique/articulation, melodic accuracy,
rhythmic accuracy, tempo, and interpretation. The scale section included the factors of
technique, note accuracy, and musicianship. The sight-reading portion of the measure
consisted of tone, note accuracy, rhythmic accuracy, technique/articulation, and
interpretation factors. Each factor was accompanied by continuous or additive criteria on
a five-point scale.
The results of adjudications were analyzed to determine the frequency of
response, means, and standard deviations for each performance dimension. The alpha
reliability for the combined scores of all instruments was estimated at .915. This high
alpha reliability is consistent with prior criteria-related research studies (Levinowitz,
1989; Rutkowski, 1990; Azzara, 1993). Scores from each of the performance dimensions
were correlated to the total score. The median correlation was estimated at .73. In
addition, a stepwise multiple regression and an analysis of variance were performed to
76
determine the amount of variance contributed by each factor. The results indicated that all
of the performance factors contributed significantly (p < .001) to the prediction of overall
woodwind and brass performance and accounted for 92% of the variance. Saunders and
Holahan (1997) claimed that the overall pattern of correlations provides indirect evidence
of the validity for criteria-specific rating scales used in this study.
Saunders and Holahan stated that the results of this study suggest that criteriaspecific rating scales are a viable means for assessing high school woodwind and brass
performance with sizeable reliability. Furthermore, the authors stated that the results of
this research indicated evidence of superior diagnostic validity. Saunders and Holahan
(1997) also suggested future research should investigate the utilization of factor-analysis
to determine the stability of the factor structure within criteria-specific rating scales.
A more recent study by Norris and Borst (2007) compared the reliability of two
choral performance measures. The first measure (Form A) was a traditional rating scale
that required adjudicators to assign a score of 1-5 for each performance dimension. The
second measure (Form B) was a rubric format that utilized descriptors that defined each
level of achievement for a given performance dimension. Both forms used the same
performance dimensions of tone, diction, bend, intonation, rhythm, balance, and
interpretation.
Participants included randomly selected SATB choruses (N = 15). Each chorus
performed one selection that was recorded onto compact discs. A panel of four judges
reviewed the recordings twice. Form A was used in the first evaluation, and Form B was
used for the second evaluation. The two judging sessions were separated by 1.5-hour
77
lunch break. Means and standard deviations were calculated as a well as intraclass
correlations from the data gathered.
The results of the t-tests indicated significant differences between the two forms
(p < .05) in all performance dimensions except interpretation (t = -1.79, p = .079). Norris
and Borst (2007) state that the intraclass overall reliability for Form B was .15 higher
than that of Form A. However, neither form provided any real information about levels of
rhythmic achievement (Norris & Borst, 2007). The authors conclude that measures,
which include dimension-specific criteria, are more appropriate for evaluations of
musical performance than measures without these dimension-specific descriptors.
Summary of Related Literature
Research on evaluation criteria has provided valuable information regarding the
process of musical performance assessment. The literature reviewed in this section
outlines research on the identification of performance constructs, musical achievement,
adjudicators and the adjudication process, musical expression, and performance measure
development.
Studies concerning the identification of influential performance variables have
provided important insights into the overall structure of musical performance. Early
research studies on performance constructs were centered on investigating band
performance and festival rankings (Owen, 1969; Oakley, 1972; Neilson, 1973). This
early research focused on content analysis of measures and comments from band
evaluations to determine the underlying performance dimensions that were influencing
judgments of performance quality. Some of the performance dimensions identified were
78
rhythm, interpretation intonation, tone, expression, pitch, musicality, phrasing, balance,
articulation, diction, musical effect, and dynamics (Owen, 1969; Oakley, 1972; Neilson,
1973; Oldefendt, 1976; St. Cyr, 1977; Sagen, 1983; Burnsed, Hinkle, and King, 1985;
Mills, 1987; Bergee, 1995; Thompson, Diamond, & Balkwill, 1998; Wrigley, 2005;
Johnson and Geringer, 2007).
As research on performance constructs continued, researchers began to shed light
on the nature of performance evaluation and the global rating. Research by Burnsed,
Hinkle, and King (1985) demonstrated high correlations between individual performance
dimensions and global ratings. These results are also supported by Mills (1987) who
suggested that performance could indeed be represented by individual performance
constructs.
An investigation by Bergee (1995) looked into the existence of higher order
musical performance factors. Three higher order factors were identified: Tone
Quality/Intonation, Musicianship/Expressiveness, and Rhythm/Articulation. This study
helps establish the importance of both technical and expressive aspects of performance.
Research by Thompson, Diamond, and Balkwill (1998) also suggests that aspects of
expression play an important role during evaluations of musical performance.
More recently, research by Wrigley (2005) investigated the musical and nonmusical influences of performance evaluation. This study provides support for the
existence of two main cross-instrument factors. These factors were identified as technical
proficiency and interpretation. A study by Johnson and Geringer (2007) sought to
examine the influences of musical elements on evaluations of wind band performances.
Musical expression and tone/intonation accounted for approximately 77.5% of the
79
variance in wind band evaluations. The conclusions made by these researchers lend
support to a primary influence of expressive and technical aspects of musical
performance during evaluations of performance quality.
Research on musical achievement has investigated the influence of a variety of
variables on musical achievement, such as musical instrument (Stecklein and Aliferis,
1957), classroom type (Colwell, 1963), personality type (Suchor, 1977), lateral
dominance/sex/music aptitude (Schleuter, 1978), parental involvement/cognitive and
affective student attributes (Zdzinski, 1993), and practice behaviors (Miksza, 2007).
Early research on musical achievement defined musical achievement as aural
discrimination ability (Stecklein and Aliferis, 1957; Colwell, 1963). Later on, research
began to widen the concept of musical achievement to include tone, technique, pitch,
rhythm, intonation, musicality, expression, and tempo (Hodges, 1975; Zdzinski, 1993;
Geringer & Johnson, 2007).
The idea of musical performance achievement as a combination of both aesthetic
expressive and accuracy factors was supported in a research study by Suchor (1977).
Suchor investigated the influence of personality type on piano performance achievement,
group interaction, and perception of group. This duality of subjective and objective
influences on performance achievement is also reflected in research by Zdzinski (1993)
and Miksza (2007).
Research on the process of musical performance adjudication focuses on the
variables that are suspected to influence adjudicators. This body of research has helped to
determine where possible sources of measurement error originate in an evaluation
situation. Early contributions to adjudication research found possible influences of
80
evaluator expectation (Duerksen, 1972) and no influence of non-performance
achievement on music performance evaluations (Fiske, 1979).
Some later adjudication research has concentrated on the educational level and
musical experience of the adjudicator and how these factors influence the consistency and
reliability of musical performance judgments. Research in this area has found no
difference between specialists and non-specialists judgments of musical performance
quality (Fiske, 1975; Kim, 2000; Bergee, 2003; Hewitt, 2007). Studies of undergraduate
musicians reveal their ability to capably evaluate musical performances (Schleff, 1992;
Bergee, 1997; Geringer & Madsen, 1998; Hewitt, 2007).
Research by Bergee (2003) studied the effects of variability in size of adjudication
panel, response mode of evaluation employed, and adjudicator experience on end of the
semester jury adjudications. The results of this study found no significant effect for
response mode or evaluator experience. However, Bergee also reports that an increase in
evaluation stability could be made possible by an increase judge panel size and number of
subscale items.
The question of stability and consistency in adjudication was also addressed by
Thompson and Williamon (2003) who suggest that the reliability of musical evaluations
is at best moderate. The results of this study suggest that the evaluation of separate
performance aspects for the purpose of musical performance evaluation could be more
reliable than overall judgments. Research by Hewitt (2007), who found that adjudicators
could successfully concentrate on multiple performance factors, also supports the
utilization of separate performance factors. Thompson and Williamon conclude that the
development of a standardized set of defined performance criteria could indeed be
81
possible. The identification of uniform methods of evaluation can lead to better stability
in musical performance evaluations (Bergee, 1997; Thompson & Williamon, 2003).
Research on expression in music indicates the influence of rhythm, melody,
harmony, timbre, dynamics, tonality, pitch, and tempo on listener’s impressions of a
performer’s emotional communication abilities. Early research by Hevner (1938)
suggested a ranked order of importance for expressive musical elements: tempo,
modality, pitch, harmony, and rhythm. Hevner concluded from her studies in musical
expression that a consistency of apprehended meaning in music exists (1938).
Hoffren (1964a, 1964b) established the ability of secondary students’ ability to
discern meaning in music. Hoffren concluded that standards of value do exist in regards
to expressive performance in music. Levi (1978) supports this by stating that music
expression is an integral part of the perception of performance quality. The idea of
consistency in apprehended meaning in music has inspired many recent research studies
on the subject. Research by Juslin (1977b), Gabrielsson (1999), and Juslin & Lindstrom
(2003) explored the possibility of apprehended meaning in music. These researchers
concluded that performers could indeed reliably communicate emotion through
performance.
A research study by Juslin and Laukka (2004) illustrated distinctions between
composer controlled and performer controlled musical elements that influence
evaluations of musical expression. In a survey conducted by Juslin and Laukka,
participants were asked to define musical expression as well as raked musical elements in
terms of importance. Musical expression was defined as the communication between
82
performers and audience. Participants ranked the factors of expression and technical
skills highly as factors of musical importance.
Studies on the development of performance measures have identified objective
strategies for creating musical performance measures. Performance development research
demonstrates that the utilization of different response modes makes no significant
difference in measure reliability (Saunders & Holahan, 1997; Zdzinski & Barnes, 2002;
Russell, 2007; Norris & Borst, 2007). This conclusion is supported by adjudication
research by Bergee (2003).
These objective measure development strategies also offer empirical insight into
factors that influence evaluations of performance quality. Measures utilizing evaluations
of separate performance dimensions have shown considerable reliability (Abeles, 1971;
Bergee, 1987; Zdzinski & Barnes, 2002; Russell, 2007). Research on performance
measure development has established the important influence of rhythm, pitch, tone,
intonation, musical effect, interpretation, and technique. The occurrence of repeating
performance dimensions across several instrument categories indicates the possibility if
cross instrument performance factors that influence judgments of performance quality
(Wrigley, 2005; Russell, 2007).
The literature presented in this chapter presents a collection of musical factors
which are thought to influence assessments of musical performance quality including:
technique, interpretation, rhythm, intonation, phrasing, pitch, musicality, tone, balance,
blend, expression, dynamics, tempo, articulation, timbre, accent, rubato, vibrato,
embouchure, smoothness, unity, continuity, style, position, posture, ensemble,
accompaniment, appearance, breathing, conductor, instrumentation, instrument quality,
83
difficulty, arrangement, attack, release, range, communication, melody, tonality, and
harmony. These musical factors represent both technical and expressive aspects of
musical performance. This literature review also reveals a further categorization of these
musical factors into performer controlled and composer controlled musical aspects. The
purpose of the present study is to examine a hypothesized model of the performercontrolled musical components that influence assessments of musical performance
quality.
CHAPTER 3
Method
The purpose of this study is to verify the existence of a model of musical factors
that influence assessments of aurally presented musical performance. Specifically, this
model represents the aurally perceived performer-controlled musical performance factors
that influence assessments of solo musical performance quality. The support for this
model is founded upon previous research concerning performance constructs, musical
achievement, musical performance adjudication, musical expression, and musical
performance rating scale development. This model assumes that the quality of performercontrolled musical factors has an impact on assessments of overall performance quality.
The order of this investigation proceeded as follows: a) gathering of a priori
performance variables from previous research on performance constructs, musical
achievement, musical performance adjudication, musical expression, and musical
performance rating scale development; b) development of a tentative model of performercontrolled musical performance factors; c) construction of the aural musical performance
quality measure; d) gathering recordings of solo brass, woodwinds, strings, voice, and
guitar performances; e) evaluation of recorded performances by volunteer adjudicators;
and f) data analysis (Keith, 2006; van Gigch, 1991; Lippitt, 1973).
Gathering Performance Dimensions
Researchers investigating the issues surrounding performance assessment have
utilized numerous performance variables for the purpose of evaluating some aspect of
84
85
musical performance. An examination of this research yields and exhaustive list of
performance dimensions that include technique, interpretation, rhythm, intonation,
phrasing, pitch, musicality, tone, balance, blend, expression, dynamics, tempo,
articulation, timbre, accent, rubato, vibrato, embouchure, smoothness, unity, continuity,
style, position, posture, ensemble, accompaniment, appearance, breathing, conductor,
instrumentation, instrument quality, difficulty, arrangement, attack, release, range,
communication, melody, tonality, and harmony. An adaptation of the Brunswick Lens
Model (1952) by Juslin and Lindstrom (2003) illustrates a separation of performer and
composer musical factors that influence the communication of emotions to listeners. This
concept of separate performer and composer musical cues was extended for the purpose
of assessing musical performance achievement. The collected variables were examined
for redundancy and appropriateness and then separated into one of two categories:
performer-controlled cues and non performer-controlled cues.
The variables were then categorized further into visual, ensemble, performer, and
composer factors (see Appendix B). Since this study is interested in modeling performercontrolled performance factors, all variables making reference to composer cues, visual
cues, ensemble cues, non-musical factors and descriptions of performance factors were
set aside. The remaining variables are categorized as aurally perceived performercontrolled cues: technique, interpretation, intonation, musicality, tone, expression,
dynamics, tempo, articulation, timbre, rubato, vibrato, and communication. The variable
of rhythm was labeled by Juslin and Lindstrom (2003) as a composer cue; however, the
frequent occurrence of rhythm as an influential performance dimension demanded a
reconsideration (Fiske, 1972, 1975, 1979; Hodges, 1974, 1975; Schleuter, 1978; Sage,
86
1978, 1983; Abeles, 1971; Bergee, 1987, 1989, 1993, 1995, 2003; Radocy & Boyle,
1988; Geringer & Madsen, 1989; Zdzinski & Barnes, 2002; Russell, 2007; Miksza,
2007). Rhythm was relabeled to reflect a more specific performer-controlled musical
component entitled rhythmic accuracy. This variable refers to the metric accuracy of the
rhythms performed in relation to the pulse of the music.
Development of the Tentative Model
A tentative model was developed from previous research studies on performance
variables, performance achievement, performance adjudication, and music performance
measure development. Models are developed from conclusions drawn from both formal
and informal theories, previous research, time precedence, common sense and logic
(Keith, 2006). “A model is by nature a simplification and thus may or may not include all
the variables. It should include, however, all of those variables which the model-builder
considers important…” (Lippitt, 1973, p.2). Since the purpose of this study is to
investigate a tentative model of the performer-controlled aural musical factors that
influence assessments of musical performance quality, the hypothesized model will
consist of the performer-controlled musical variables gathered in the previous step.
The structure of performer-controlled musical factors is hypothesized to consist of
two subcategories of performance variables: technique and musical expression. An
examination of the aurally perceived performer-controlled performance variables
gathered in the previous step, and a content analysis of previous research on performance
assessment supported the utilization of both technical and expressive aspects during
evaluations of musical performance (Suchor, 1977; Zdzinski, 1993; Bergee, 1995;
87
Wrigley, 2005; Johnson & Geringer, 2007; Miksza, 2007). The collected musical factors
were then separated into categories of either technique or musical expression.
The subcategory of technique consists of the variables: tone, intonation,
articulation and rhythmic accuracy. For the purpose of this study, the assessment of tone
is defined as an evaluation of the quality of sound produced on a musical instrument. An
assessment of intonation is defined as an evaluation of the accuracy of pitch relations
(Miller, Felbaum, Tengi, & Langone, 2006). Articulation is evaluated on the quality of
tonal attacks and releases. Rhythmic accuracy is defined as the metric exactitude of the
performed rhythms in relation to a steady pulse. Definitions for the performance
components of technique originated from previous studies on performance constructs and
musical performance rating scale development (Abeles, 1971; Jones, 1986; Bergee, 1987,
1995; Zdzinski & Barnes, 2002; Russell, 2007). These performance first-order
components are hypothesized to have a direct effect on evaluations of the technical
aspects of performer-controlled musical factors.
An analysis of research concerning performance assessment and musical
expression reveals some redundancy in terms used to describe musical expression. Many
research studies show an overlap in the usage of the terms: musicality, musicianship,
musical effect, communication, and expression (Owen, 1969; Cooksey, 1974; Burnsed,
Hinkle, & King, 1985; Bergee, 1987, 1995; Geringer & Madsen, 1998; Zdzinski &
Barnes, 2002; Thompson & Williamon, 2003; Russell, 2007; Abril & Flowers, 2007;
Miksza, 2007). These terms were described in previous research using items or
definitions that refer to communication of style or emotion. In research by Juslin and
Laukka (2004), results indicated that over 74% of the participants surveyed defined
88
musical expression as the communication of musical emotions and ideas. Therefore, for
the purposes of this research these terms are considered to be synonymous with the term
“musical expression.”
The subcategory of musical expression is hypothesized to include the variables
tempo, dynamics, timbre, and interpretation (Hevner, 1938; Hoffren, 1964a; Smith, 1968;
Bergee, 1995; Juslin, 1997b; Juslin & Lindstrom, 2003; Juslin & Laukka, 2004; Russell,
2007). Tempo, for the purposes of this study, is defined as a frequency of rhythmic pulses
within a given time period. This is usually measured in beats per minute. Dynamics refers
to the amplitude of the musical sounds and the appropriateness within the context of the
music. The assessment of timbre is defined as the manipulation of a sound wave that
results in a change in the characteristic of the sound produced by a musical instrument.
Interpretation is defined as a manipulation of the original or expected musical
components in order to effectively express the stylistic concerns of the music. The
identified factors vibrato and rubato are considered to be functions of performer
interpretation and were omitted to avoid redundancy. These variables are hypothesized to
have a direct effect on evaluations of expressive aspects of performer-controlled musical
factors.
The hypothesized paradigm suggests that both technique and musical expression
have direct effects on overall perceptions of performance quality. Technique is also
theorized to have an indirect effect on overall performance quality perception through
musical expression. The model presented in this section illustrates the theoretical
importance of both technical and expressive musical factors when evaluating the quality
of musical performance, and suggests that these performance dimensions have an effect
89
on the outcome of perceived performance quality. Both technical and expressive factors
consist of smaller musical components. This hypothesized model of aurally perceived
performer-controlled musical factors illustrates the predicted structure of identified
performance variables on the global perception of musical performance quality.
Construction of the Aural Musical Performance Quality Measure
Data for each of the identified performance variables were gathered using a
researcher-designed measure of aural musical performance quality. The first-order
variables intended for measurement include: tone, intonation, articulation, rhythmic
accuracy, tempo, dynamics, timbre, and interpretation. Descriptive items that represent
each of the performer-controlled musical factors were used to assess participant
performance. The items chosen to represent the first-order performance factors originated
from previous research concerning musical performance assessment and musical
performance rating scale construction (Abeles, 1971; Neilson, 1973; Bergee, 1987;
Zdzinski, 1993, 2002; Russell, 2007). This initial item pool was examined for
redundancy, appropriateness, and completeness. Both positive and negative items are
desired in order to prevent a set item response during evaluation. Items selected for
inclusion in the final item pool were paired with a four-point Likert scale ranging from
Strongly Disagree to Strongly Agree. The first-order performance dimensions (tone,
intonation, rhythmic accuracy, articulation, tempo, dynamics, timbre, interpretation) were
each represented by four items.
Additionally, items were gathered and selected in order to collect data regarding
technique, musical expression, and overall perceptions of performance quality. The items
90
chosen to represent both technique and musical expression were gathered from
previously existing musical performance rating scales (Bergee, 1987, 1993, 1995, 2003;
Geringer & Madsen, 1998; Russell, 2007). Each item was examined for appropriateness
and redundancy and paired with a four-point Likert scale ranging from Strongly Disagree
to Strongly Agree. The independent variables of technique and musical expression were
each represented by four items. Data for overall perception of performance quality was
obtained using four researcher-created items modeled after items from previous facetfactorial studies that demonstrated high factor loadings (Bergee, 2003; Russell, 2007).
The items representing the eleven observed variables were placed on the Aural Musical
of Performance Quality (AMPQ) measure (see Appendix C).
Gathering Recordings of Solo Music Performance
Participants in the performance portion of this study include professional
musicians and both undergraduate and graduate performance and music education majors
from a large southeastern university. The instrumentation of volunteer performers
represents the brass, woodwinds, voice, and strings (including guitar) instrument families.
A total of 50 solo performance recordings were gathered in order to produce an adequate
sample of performances (brass n =7, strings n = 20, voice n = 6, woodwind n = 17). From
this pool of instrumental and vocal performances, a total of four recordings were selected
at random to represent each musical instrument category: voice, strings, brass, and
woodwind.
The volunteer participants were asked to perform excerpts of prepared pieces
from their repertoire. Each performance was digitally recorded using a Sony Net-MD
91
Mini-disc recorder (MZ-N10) and Sony Electret Condenser Microphone (ECM-MS907)
to ensure a high-fidelity reproduction of each performance. The ability levels of the
performances range from beginner to expert in order to provide a wide array of
performance abilities.
The four performance recordings were randomly ordered and recorded onto
compact discs. Each compact disc was then placed into evaluation packets. The
evaluation packets contained: (1) one information/direction sheet, (2) one judging
consent form, (3) four Aural Musical Performance Quality (AMPQ) measures (one per
listening example), (4) one compact disc containing four representative solo
performances, and (5) one mailing envelope (for judges solicited out of state) (see
Appendices D & E).
Evaluations of Recorded Performances
College undergraduate and graduate music students, university music professors,
primary and secondary school music educators, and professional musicians from Florida,
Oklahoma, Virginia, and Colorado were solicited as volunteer judges (N = 58) to evaluate
the performance recordings using the Aural Musical Performance Quality (AMPQ)
measure. The selection of judges was limited to those who have instrumental or vocal
performance experience at a professional or collegiate level. Each volunteer judge
received one evaluation packet.
Adjudicators were instructed to listen to each recording separately. The
participant judges evaluated each performance using the AMPQ measure. Judges were
92
permitted to review the recordings as many times as necessary to aid in the adjudication
of the recordings (Zdzinski & Barnes, 2002; Russell, 2007).
Data Analysis and Preparation
Once the recording evaluations (N = 232) were collected, data were recorded onto
an electronic spreadsheet according to the following response key: 4 = Strongly Agree, 3
= Agree, 2 = Disagree, 1 = Strongly Disagree. All negatively worded items were recoded
in order to maintain the same metric throughout the analysis. Once all responses were
entered, the variables of tone, intonation, rhythmic accuracy, articulation, tempo,
dynamics, timbre, interpretation, technique, musical expression, and overall perception of
performance quality were analyzed using the raw data from the representative items on
the AMPQ measure. These variables were analyzed by utilizing reliability, correlation,
and regression sub-routines in the Statistical Package for Social Sciences (SPSS) as well
as path analysis sub-routines from the Analysis of Moment Structures (AMOS) software
package.
CHAPTER 4
Results and Discussion
Results
Solo instrumental evaluations using the Aural Musical Performance Quality
(AMPQ) measure (N = 232) were collected to estimate the proposed model of aurally
perceived performer-controlled musical performance factors. A reliability analysis was
conducted on the AMPQ measure. The alpha reliability for the total AMPQ measure was
estimated at .977. Individual alpha reliabilities for all subscales contained in the AMPQ
measure are provided in Table 2.
Table 2
Total and Subscale Reliabilities for AMPQ Measure
Scale
# of Items
Cronbach’s α
Total (all variables)
44
.977
Technique/ Musical
Expression/ Overall
Perception
12
.957
Technique
4
.922
Musical expression
4
.891
Overall Perception
4
.930
Tone/ intonation/ rhythmic
accuracy/ articulation
16
.937
Tone
4
.896
Intonation
4
.826
Rhythmic accuracy
4
.886
93
94
Table 2 continued
Articulation
Tempo/ dynamics/ timbre/
interpretation
4
.789
16
.927
Tempo
4
.838
Dynamics
4
.909
Timbre
4
.887
Interpretation
4
.891
Note. Reliabilities were calculated with N = 232.
According to the proposed model, the components of technique include tone,
intonation, rhythmic accuracy, and articulation. The subscale reliability for this group of
component factors was estimated at .937. The model also describes the components of
musical expression as tempo, dynamics, timbre, and interpretation. The subscale
reliability for this grouping of component factors is .927. The hypothesized component
structures of technique and musical expression (illustrated in Figures 1 & 2) are
supported through strong correlation between the component factors and the observed
variables of technique and musical expression (see Table 3 & Table 4). A correlation
between variables must exist if variables are to be included in the same model (Keith,
2006).
95
Table 3
Correlations Between Technique and Component Factors
Subscale
Tone
Intonation
Rhythmic
Accuracy
Articulation
Evaluations (N = 232)
1
.667***
.542***
.743***
Intonation
.667***
1
.592***
.680***
Rhythmic
Accuracy
.542***
.592***
1
.649***
Articulation
.743***
.680***
.649***
1
Technique
.795***
693***
.655***
.827***
Tone
*** p < .001
Table 4
Correlations Between Musical Expression and Component Factors
Subscale
Tempo
Dynamics
Timbre
Interpretation
Evaluations (N = 232)
1
.303***
.466***
.529***
Dynamics
.303***
1
.585***
.658***
Timbre
.466***
.585***
1
.727***
Interpretation
.529***
.658***
.727***
1
Musical
Expression
.454***
.646***
705***
.853***
Tempo
*** p < .001
96
The first research question asks about the representativeness of the component
factors of technique and music expression. The proposed model in this study
hypothesizes that performer-controlled factors of technique are represented by tone,
intonation, rhythmic accuracy, and articulation. The performer-controlled factors of
musical expression are hypothesized to consist of tempo, dynamics, timbre, and
interpretation. To address this research question, two separate regression analyses were
employed. A regression analysis allows for the calculation of direct effects for each of the
component factors of technique and musical expression. The standardized coefficients
from the sub-routine provide the beta weights necessary to estimate the influence of each
variable. A supplementary confirmatory factor analysis, analyzing the representativeness
of items chosen to represent each of the component factors, is located in Appendix F.
The results of the regression of technique on tone, intonation, articulation, and
rhythmic accuracy indicate that these component factors account for 76% of the variance
in technique as measured by the AMPQ instrument (R2 = .760, F(4, 227) = 179.28, p <
.001) (see Table 5). Tone, rhythmic accuracy, and articulation predict significant (p < .01)
moderate to large increases in appraised quality of technique in musical performance.
Intonation, however, predicts a small but meaningful non-significant increase in
appraised technical quality (β = 0.09, b = 0.11, p = .075) (see Figure 4).
97
Table 5
Summary of Simultaneous Regression for Variables Predicting Technique
Variable
B
SE B
β
CI
Tone
.381
.059
.338***
[.265 - .497]
Intonation
.114
.064
.087
[-.012 - .240]
Rhythmic Accuracy
.194
.057
.153**
[.081 - .307]
Articulation
.513
.072
.406***
[.371 - .655]
Note. N = 232 for this regression. CI = Confidence Interval. **p < .01. ***p < .001.
Figure 4. Model of Performer-Controlled Components of Technique
98
Musical expression was regressed on tempo, dynamics, timbre, and interpretation
(see Table 6). The variables tempo, dynamics, timbre, and interpretation combined to
account for 75.7% of the variance in musical expression as measured by the AMPQ
instrument (R2 = .757, F(4, 227) = 176.84, p < .001). The results illustrated in Table 6
indicate that the representativeness of these component factors is mixed. Dynamics,
timbre, and interpretation demonstrate significant (p < .01) moderate to large effects on
musical expression. In contrast, tempo (β = 0.01, b = 0.01, p = .771) predicts a negligible
(0.01 SD) non-significant increase in appraised quality of musical expression for every
one SD increase in appraised quality of tempo after controlling for the effects of
dynamics, timbre, and interpretation (see Figure 5).
Table 6
Summary of Simultaneous Regression Analysis for Variables Predicting Musical
Expression
Variable
B
SE B
β
CI
Tempo
.012
.043
.011
[-.072 - .097]
Dynamics
.138
.042
.141**
[.056 - .220]
Timbre
.135
.047
.141**
[.043 - .226]
Interpretation
.640
.053
.660***
[.537 - .744]
Note. N = 232 for this regression. CI = Confidence Interval. **p < .01. ***p < .001.
99
Figure 5. Model of Performer-Controlled Components of Musical Expression
A path analysis was employed to address the second research question regarding
the relative contributions of technique and musical expression on overall perception of
performance quality. Path analysis allows for the estimation of standardized path
coefficients also known as beta weights. The path analysis can be calculated by utilizing
the correlations, standard deviations, mean, and number of cases to produce a covariance
matrix (see Table 7).
100
Table 7
Means, Standard Deviations, and Pearson Correlations for Combined Instrument Path
Model of Performer-Controlled Musical Factors
Pearson r
Variable
M
SD
1
2
3
Evaluations (N = 232)
1.Technique
10.987
3.201
1
.715**
.844**
2. Musical
Expression
11.560
2.638
.715**
1
.786**
3. Overall
Perception
11.448
3.155
.844**
.786**
1
**Correlation is significant for p < .01 (2-tailed)
Table 8 illustrates the results of the path analysis of the performer-controlled
musical factors across evaluations of brass, woodwind, voice, and string instruments.
Standardized path coefficients estimating direct effects for the model of performercontrolled musical factors are interpreted as follows: technique  overall perception (β =
0.58) predicts a large increase in appraised quality of overall perception of performance
quality for every one SD increase in appraised quality of technique after controlling for
the effects of musical experience; technique  musical expression (β = 0.72) predicts a
large increase in appraised quality of musical expression for every one SD increase in
appraised quality of technique after controlling for the effects of musical experience;
musical expression  overall perception (β = 0.38) predicts a large increase in appraised
quality of overall perception of performance quality for every one SD increase in
101
appraised quality of musical expression after controlling for the effects of musical
experience. A complete report of the path analysis output is located in Appendix G.
Table 8
Path Estimates for Model of Performer-Controlled Musical Factors across Brass, String,
Voice, and Woodwind Instruments
Estimated Path
B
SE B
β
Technique  Musical Expression
.589
.038
.715***
Technique  Overall Perceptions
.569
.043
.577***
Musical Expression  Overall Perceptions
.447
.053
.373***
Note. N = 232. ***p < .001.
In the case of simple recursive models such as the one presented in this study, the
paths estimated using structural equation modeling programs are equal to the coefficients
estimated in a series of simultaneous and sequential regressions (Keith, 2006). This series
of regressions provides valuable estimations regarding the amount of overall variance
accounted for by the technique and musical expression while estimating the same paths in
the proposed model.
Two separate regressions were employed to estimate the model: 1) sequential
regression of overall perception of performance quality on the predictor variables of
technique and musical expression, 2) simultaneous regression of musical expression on
technique. The hypothesized model of performer-controlled musical factors determined
the entrance order of the variables in the sequential regression. Results of the sequential
102
regression indicate that the variables technique and musical expression combined to
account for 77.9% of the variance in overall perceptions of performance quality (R2 =
.779, F(2, 229) = 407.55, p < .001). Musical expression accounted for an additional 7.0%
of variance in overall perception, after controlling for effects of technique (ΔR2 = .069,
F(1, 229) = 71.84, p < .001) (see Table 9).
Table 9
Summary of Sequential Regression Analysis for Variables Predicting Overall Perception
of Performance Quality
Variable
B
SE B
β
CI
.832
.035
.844***
[.763 - .900]
Technique
.568
.044
.576***
[.482 - .653]
Musical
Expression
.448
.053
.375***
[.344 - .553]
Step 1
Technique
Step 2
Note. N = 232. *** p < .001
Results of the simultaneous regression indicate that technique is estimated to account for
51.1% of the variance in musical expression (R2 = .511, F(1, 230) = 239.96, p < .001)
(see Table 10).
103
Table 10
Summary of Simultaneous Regression of Musical Expression on Technique
Variable
Technique
B
SE B
β
CI
.589
.038
.715***
[.514 - .664]
Note. N = 232 for this regression. CI = Confidence Interval. *** p < .001
The third research question inquires about the fit of the hypothesized model. This
is answered through indices of model fit, which compare the observed covariance matrix
to the expected covariance matrix. Model fit indices essentially provide information
regarding how well the collected data fits the proposed model. Absolute indices of model
2
fit report a non-significant Chi-square estimate (χ = 0.00, df = 0, N = 232). This score
can be interpreted as an indication of good-fit, and is a result of the just-identified status
of the proposed model. A just-identified model status describes a model where the
number of possible variable estimates is equal to the number of variables actually
estimated. In other words, there is enough information to solve for the paths of the
proposed model (Keith, 2006).
The second half of this research question asks: can a model of musical
performance assessment be created and tested using performer-controlled musical factors
for the outcome of evaluating aurally perceived musical performance quality? The
answer to this question was contingent upon the ability to estimate the hypothesized
model of performer-controlled musical factors using available statistical techniques and
the data collected. The methods used to construct the AMPQ measure and the results of
the reliability analyses indicate that musical performance can be measured with
104
acceptable reliability and construct validity. Results of the path analysis demonstrate the
ability to estimate the hypothesized paths of the proposed model (Figure 6). Technique
demonstrated large direct (β = .58) and indirect effects (β = .27) on overall perceptions of
performance quality, as well as significant direct effect (β = .72) on musical expression.
Musical expression also demonstrated significant direct effects (β = .37) on overall
perception of performance quality (see Appendix D). The theory, time precedence,
relevant research and logic suggest it is indeed possible to create and test a hypothesized
model of performer-controlled music factors.
Figure 6. Performer-controlled Musical Performance Factors: Standardized Estimate
Model
105
The fourth research question inquires about the existence and stability of the
proposed model among the separate musical instrument categories of solo brass,
woodwind, strings, and voice performance. By sorting the data via instrument category,
the necessary matrices consisting of number of cases, standard deviation, and correlations
demonstrating correlation between technique, musical expression, and overall perception
of performance quality were calculated (see Tables 13, 15, 17, &19).
Individual path estimates of the performer-controlled musical factors categorized
by instrument indicate that the performer-controlled musical factor model does remain
stable across assessments of string, voice, and woodwind performance quality. However,
the brass model demonstrates a moderate but non-significant path estimate of technique
on overall perceptions of performance quality (see Table 11).
Table 11
Standardized Path Coefficient Comparisons between Combined and Individual
Instrument Path Models
Model
Estimated Path
Combined
Woodwind
Voice
String
Brass
(N = 232)
(n = 58)
(n = 58)
(n = 58)
(n = 58)
Technique  Musical
Expression
.715***
.459***
.541***
.452***
.279*
Technique  Overall
Perceptions
.577***
.332**
.561***
.610***
.182
Musical Expression 
Overall Perceptions
.373***
.519***
.413***
.271**
.481**
*p < .05. , **p < .01., ***p < .001.
106
Since each participant evaluated all four musical examples, the number of
evaluations for each instrument category was equal to the number of total participants (N
= 58). Correlations, standard deviations, and means (see Table 13) were used to calculate
the standardized path coefficients for the woodwind model of performer-controlled
musical performance factors. The standardized coefficients are interpreted as follows:
technique  overall perception (β = 0.33) predicts a large increase in appraised quality of
overall perception of performance quality for every one SD increase in appraised quality
of technique; technique  musical expression (β = 0.46) predicts a large increase in
appraised quality of musical expression for every one SD increase in appraised quality of
technique; musical expression  overall perception (β = 0.52) predicts a large increase in
appraised quality of overall perception of performance quality for every one SD increase
in appraised quality of musical expression (see Table 12). Appendix H contains all of the
path analysis output related to the woodwind model.
Table 12
Estimated Path Coefficients for the Woodwind Model of Performer-Controlled Musical
Factors
Estimated Path
B
SE B
β
Technique  Musical Expression
.464
.119
.459***
Technique  Overall Perceptions
.311
.095
.332**
Musical Expression  Overall Perceptions
.480
.094
.519***
Note. n = 58 for this path analysis. **p < .01. ***p < .001.
107
Table 13
Means, Standard Deviations, and Pearson Correlations for Woodwind Path Model of
Performer-Controlled Musical Factors
Pearson r
Variable
M
SD
1
2
3
Evaluations (n = 58)
1.Technique
11.672
2.114
1
.459***
.570***
2. Musical
Expression
11.483
2.138
.459***
1
.671***
3. Overall
Perceptions
12.362
1.980
.570***
.671***
1
*** p < .001 (2-tailed)
The standardized path coefficients estimated for the voice model of performercontrolled musical performance factors are reported and interpreted as follows: technique
 overall perception (β = 0.56) large increase in appraised quality of overall perception
of performance quality for every one SD increase in appraised quality of technique;
technique  musical expression (β = 0.54) predicts a large increase in appraised quality
of musical expression for every one SD increase in appraised quality of technique;
musical expression  overall perception (β = 0.41) predicts a large increase in appraised
quality of overall perception of performance quality for every one SD increase in
appraised quality of musical expression (see Table 14). Correlations, standard deviations,
and means necessary for calculation of the standardized path coefficients are provided in
Table 15. Appendix I contains all path analysis output for the voice model.
108
Table 14
Estimated Path Coefficients for the Voice Model of Performer-Controlled Musical
Factors
Estimated Path
B
SE B
β
Technique  Musical Expression
.548
.113
.541***
Technique  Overall Perceptions
.521
.075
.561***
Musical Expression  Overall Perceptions
.378
.074
.413***
Note. n = 58 for this path analysis. ***p < .001.
Table 15
Means, Standard Deviations, and Pearson Correlations for Voice Model of PerformerControlled Musical Factors
Pearson r
Variable
M
SD
1
2
3
Evaluations (n = 58)
1.Technique
11.707
1.910
1
.541***
.784***
2. Musical
Expression
12.070
1.936
.541***
1
.716***
3. Overall
Perceptions
12.207
1.775
.784***
.716***
1
*** p < .001 (2-tailed)
Standardized path coefficients, calculated from the necessary descriptive statistics
and correlations, were estimated for the strings model of performer-controlled musical
109
performance factors (see Tables 16 & 17). These path coefficients are interpreted as
follows: technique  overall perception (β = 0.61) predicts a large increase in appraised
quality of overall perception of performance quality for every one SD increase in
appraised quality of technique; technique  musical expression (β = 0.45) predicts a
large increase in appraised quality of musical expression for every one SD increase in
appraised quality of technique; musical expression  overall perception (β = 0.27)
predicts a large increase in appraised quality of overall perception of performance quality
for every one SD increase in appraised quality of musical expression (see Table 16).
Appendix J contains all path analysis output for the string model.
Table 16
Estimated Path Coefficients for the String Model of Performer-Controlled Musical
Factors
Estimated Path
B
SE B
β
Technique  Musical Expression
.451
.118
.452***
Technique  Overall Perceptions
.637
.099
.610***
Musical Expression  Overall Perceptions
.284
.099
.271**
Note. n = 58 for this path analysis. **p < .01. ***p < .001.
110
Table 17
Means, Standard Deviations, and Pearson Correlations for String Model of PerformerControlled Musical Factors
Pearson r
Variable
M
SD
1
2
3
Evaluations (n = 58)
1.Technique
13.741
1.888
1
.452***
.733***
2. Musical
Expression
13.776
1.883
.452***
1
.547***
3. Overall
Perceptions
13.897
1.971
.733***
.547***
1
*** p < .001 (1-tailed)
After collecting the means, standard deviations, and sample size statistics for the
brass model (see Table 19), beta weights were estimated. The results of the path analysis
are interpreted as follows: technique  overall perception (β = 0.18, p = .112) predicts a
small but meaningful non-significant increase in appraised quality of overall perception
of performance quality for every one SD increase in appraised quality of technique;
technique  musical expression (β = 0.28) predicts a large increase in appraised quality
of musical expression for every one SD increase in appraised quality of technique;
musical expression  overall perception (β = 0.48, b = 0.52, p < .001) predicts a large
increase in appraised quality of overall perception of performance quality for every one
SD increase in appraised quality of musical expression (see Table 18). Appendix K
contains all path analysis output for the brass model.
111
Table 18
Estimated Path Coefficients for the Brass Model of Performer-Controlled Musical
Factors
Estimated Path
B
SE B
β
Technique  Musical Expression
.295
.135
.279*
Technique  Overall Perceptions
.207
.130
.182
Musical Expression  Overall Perceptions
.519
.123
.481***
Note. n = 58 for this path analysis. *p < .05. ***p < .001.
Table 19
Means, Standard Deviations, and Pearson Correlations for Brass Model of PerformerControlled Musical Factors
Pearson r
Variable
M
SD
1
2
3
Evaluations (n = 58)
1.Technique
6.828
1.875
1
.279**
.316***
2. Musical
Expression
8.914
1.985
.279**
1
.532*
3. Overall
Perception
7.328
2.139
.316***
.532*
1
*p < .05 (1-tailed), **p < .01 (1-tailed), ***p < .001 (1-tailed).
112
Discussion
This study investigated a theoretical influence of aurally perceived performercontrolled musical factors on assessments of musical performance quality. The analysis
of hypothesized higher-order factors indicates that the proposed model successfully
illustrates the positive relationship between technique and musical expression and
assessments of performance quality. These results coincide with previous research on
performance constructs and assessment that suggest the importance of both technical and
expressive aspects in the evaluation of musical performance (Hevner, 1938; Abeles,
1971; Suchor, 1977; Levi, 1978; Jones, 1986; Bergee, 1987, 1995, 2003; Zdzinski, 1993;
Thompson, Diamond, & Balkwill, 1998; Zdzinski & Barnes, 2002; Wrigley, 2005;
Russell, 2007).
This study also examines the composition of both technique and musical
expression. Results from this analysis of technique and musical expression components
reveal a number of first-order factors that influence assessments of technique and musical
expression quality. These findings correspond with previous performance assessment
research studies that support the ability to evaluate musical performance and represent
higher-order factors using related component factors (Hevner, 1938; Abeles, 1971; Levi,
1978; Jones, 1986; Mills 1987; Bergee, 1987, 1995, 2003; Zdzinski, 1993; Juslin, 1997b;
Zdzinski & Barnes, 2002; Miksza, 2007; Russell, 2007).
The results of the regressions of technique and musical expression on their
respective component factors unveil the representative nature of the hypothesized firstorder component factors of technique and musical expression. The variables tone,
113
intonation, rhythmic accuracy, and articulation combined to account for 76% of the
variance in technique as measured by the AMPQ instrument. Reliability scores for this
group of factors and subscales indicate that these components were measured with a
small amount of error (α = .937). The initial inclusion of the component factors of both
technique and musical expression was supported by the occurrence of strong correlation
with each of the component factors (Keith, 2006).
Analysis of the observed components of technique is mostly consistent with the
hypothesized model of the performer-controlled components of technique. Tone,
rhythmic accuracy, and articulation indicated significant effects on technique. However,
intonation indicated a non-significant effect. The lack of significance demonstrated by the
intonation variable (p = .075) could be attributed to a number of causes. One simple
reason for the lack of significance, since significance estimates are linked to sample size,
is the use of a modest number of cases (N = 232). Another reason for this would be the
possibility that intonation might not be part of performer-controlled technique factors.
Research by Bergee (1987, 1993, & 1995) and Miksza (2007) provide support for this
interpretation. However, because of the occurrence of a moderate effect size, the
significance for intonation in this model is most likely dependent upon a larger sample
size (Kline, 2005). This model of technique indicates that increases in perceived quality
of tone, intonation, rhythmic accuracy, and articulation can lead to predictable increases
in perceived quality of technique.
The component factors of musical expression demonstrated mixed results. The
variables tempo, dynamics, timbre, and interpretation combined to account for 75.7% of
the variance in musical expression as measured by the AMPQ instrument. The alpha
114
reliability for the sub-group of factors is reported at .927. Even though not all of the
component factors emerged as significant components of musical expression, dynamics,
timbre, and interpretation all demonstrated positive beta weights that indicate increases in
perceived quality of these factors will lead to predictable increases in the perceived
quality of musical expression in performance. Tempo, however, did not exhibit this
statistically significant relationship (p = .771).
The negligible path coefficient and lack of significance (p > .05) of the tempo
variable could be attributed to several factors. The modest sample size could have
influenced the lack of significance. However, the lack of effect size as indicated by the
standardized path coefficient suggests that tempo, however correlated to musical
expression, may not belong to the component model of musical expression. This outcome
is concurrent with Johnson and Geringer (2005) who found that tempo did not
demonstrate predictability. However, these results also contradict those found by
Geringer and Johnson (2007) that demonstrated tempo as an influential factor in
perceptions of musical quality.
The exclusion of the tempo factor is an interesting finding. The correlation (r =
.454) indicates that a positive relation between tempo and musical expression exists.
However, previous research on the relation of tempo to musical performance may hold
some clues to this non-significant effect. Many researchers have utilized a combination of
rhythm and tempo as indicators of performance quality (Bergee, 1987; Zdzinski &
Barnes, 2002; Johnson & Geringer, 2007; Russell, 2007). One possible reason for the
positive correlation between musical expression and tempo and an insignificant effect
115
might be that tempo is actually related to the components of technique. For example,
tempo could be a component factor of rhythmic accuracy.
The phasing out of the tempo factor could also be a result of differences in
musical style. Non-jazz music is generally performed at tempos that are previously
decided by the composer. Whereas jazz performance tempos are, at many times, dictated
by the performer. This stylistic difference could have played a part in the current study
since the randomly selected performances were all performed in a non-jazz style. This
implies that tempo could belong to a model other than that of performer-controlled
musical factors. Results from the analysis of both technique and musical expression
component models support conclusions made by previous researchers that indicate the
existence of higher order relationships between component factors of musical
performance (Bergee, 1995; Thompson, Diamond, & Balkwill, 1998; Wrigley, 2005;
Russell, 2007).
The results from the path analysis of performer-controlled musical factors and the
influence on assessments of overall performance quality indicate the ability to predict
increases in perceptions of overall quality both directly and indirectly through technique
and directly through musical expression. This model also illustrates the ability to
positively effect perceptions of musical expression through an increase in perceived
quality of technique. This influence of technical and expressive musical performance
factors is supported in previous research on performance constructs and musical
achievement (Suchor, 1977; Levi, 1978; Zdzinski, 1993; Thompson & Williamon, 2003;
Johnson & Geringer, 2007; Miksza, 2007). The results of this path analysis are consistent
with the predicted model of performer-controlled musical factors.
116
The absolute index of good-fit indicated that the collected data demonstrated
2
good-fit with the proposed model of performer-controlled factors (χ = 0.00, df = 0, N =
232). This fit index could be interpreted as a perfect fit between the observed and
expected covariance matrices. However, caution should be taken with such an
interpretation when dealing with just-identified models. Since most, if not all, justidentified models could obtain the same indication of good-fit, care must be taken in the
development of the theory to which the data is applied. Theory, supported by relevant
research, time precedence, and logic must be developed before assumptions about
relationships can be tested in a causal framework. Research that tests a hypothesis
without theory is purely of an exploratory nature and cannot make any causal
assumptions (Kline, 2005; Keith, 2006). The theoretical structure proposed and tested in
the current study is based on previous research concerning performance constructs,
musical achievement, adjudicators and the adjudication process, musical expression, and
performance measure development in addition to the necessary facets of theoretical
development that include time precedence and logic.
A comparison of the proposed model among the instrumental categories included
in this study indicated a general stability of the proposed structure. Solo musical
performances of woodwind, voice, and string instruments demonstrated that the proposed
influence of technique and musical expression on overall perception of performance
quality remains significant (p < .01). These results are concurrent with Wrigley (2005)
who found a great deal of overlap between brass, strings, woodwind, and piano
instruments. Differences in significant beta weights for the individual models could be
attributed to several factors.
117
One possible reason for the observed differences between the individual
instrument models and the combined instrument model is the difference in sample size.
The sample size for the combined model (N = 232) was four times larger than the sample
size for each of the individual instrumental models (n = 58). A larger sample size would
be helpful in obtaining a truer population estimate.
Another possible reason supported in prior research by Abeles (1971), Jones
(1986), Bergee (1987, 2003), Zdzinski & Barnes (2002), Wrigley (2005), and Russell
(2007) is the notion of instrument differences. This suggests that differences in model
structure are perhaps due to physical and technical differences between instruments.
However, this cannot be confirmed nor denied until a replication of the current study is
conducted with larger sample sizes.
When comparing the structures of each solo instrument model, as listed in Table
11, there are some interesting similarities between the general instrument categories.
Both wind instrument categories, woodwind and brass, demonstrated the same pattern of
beta weights. Each of the wind instrument categories indicated that the direct effect of
musical expression on overall perceptions of performance quality was greater than the
direct effects of technique on overall perception and technique on musical expression.
The non-wind instrument families indicated that the path between technique and overall
perceptions had a greater effect than the direct effects of both musical expression on
overall perceptions and technique on musical expression.
The individual solo brass instrument model requires a separate discussion. This
model did not demonstrate stability regarding the effects of technique and musical
expression on perception of overall performance quality. This could be due to several
118
issues not present in the woodwind, voice, or string models. One issue is the occurrence
of low mean scores for technique, musical expression, and overall performance quality as
compared to the entire sample (see Tables 7 & 19). Another issue is the occurrence of
relatively low correlations (r = .28 - .53) in comparison to the entire sample (r = .72 .84). Since path estimates are calculated from these descriptive statistics, it is a logical
choice to assume that low means and correlations would have a detrimental effect on a
path analysis.
The low mean score also suggests that the quality of the randomly chosen solo
brass performance was scored lower than the rest of the solo instrument samples. It is
possible that this particular performance could have skewed the results of the path
estimates. A replication of the current study with a wider range of performance samples
to choose from could provide an answer to the question of stability of the proposed model
of performer-controlled musical factors within the brass instrument family.
The significance outcome of the brass model defies a reasonable explanation.
According to the model, brass instrument technique has no significant direct effect on
perceptions of performance quality. Instead the results suggest that technique only has an
indirect effect on quality assessments through musical expression. The main issue with
the significance of this particular model is suspected to be small sample size. The
moderate effect of technique on overall perception gives a hint to the possibility that this
effect could become clearer with the utilization of a large sample.
The estimates of these hypothesized models indicate that it is indeed possible to
test theoretical models of musical performance by using available statistical methods. The
proposed theory of performer-controlled components of perceived performance quality
119
demonstrates this ability. Technique demonstrates a significant effect on musical
expression, and both technique and musical expression demonstrate significant effect on
overall perceptions of performance quality. The stability of this model among individual
instrument categories provides further support for the validity of the proposed model of
aurally perceived performer-controlled musical factors.
CHAPTER 5
Summary and Conclusions
Summary
Previous research on music performance adjudication and the evaluation process
yield consistencies in the conceptual framework of musical performance factors (Abeles,
1971; Sagen, 1983; Burnsed, Hinkle, and King, 1985; Mills, 1987; Bergee, 1987, 1995;
Saunders & Holahan, 1997; Thompson, Diamond, & Balkwill, 1998; Zdzinski & Barnes,
2002; Wrigley, 2005; Johnson and Geringer, 2007; Russell, 2007). The consistencies
revealed in the research studies point to the importance of both technique and musical
expression in the evaluation of performance quality (Levi, 1978; Juslin, 1977b; Zdzinski,
1993; Gabrielson, 1999; Juslin & Lindstrom, 2003; Juslin & Laukka, 2004). The purpose
of this study was to test a hypothesized model of the aurally perceived performercontrolled musical factors that influence assessments of musical performance quality.
This model consists of three main variables: technique, musical expression, and overall
perception of musical performance quality. The model asserts that technique has a direct
effect on overall perceptions of performance quality and an indirect effect through
musical expression. In addition, this model also examines the hypothesized individual
component factors of both technique and musical expression.
Variables from research on musical performance constructs, musical achievement,
performance adjudication, musical expression, and performance measurement were
gathered and categorized into aural, visual, performer-controlled, composer, ensemble
and non-musical influences. Since the purpose of this study is to examine the aurally
120
121
perceived performer-controlled musical factors, any variable not categorized as aural and
performer-controlled was excluded. The remaining factors were examined for
redundancy and appropriateness. The factors selected for the current study were tone,
intonation, rhythmic accuracy, articulation, tempo, dynamics, timbre, and interpretation.
These selected variables were further categorized into either technique or musical
expression (Levi, 1978; Juslin & Lindstrom, 2003; Juslin & Laukka, 2004; Wrigley
2005).
To collect information on the performance factors of interest, the Aural Musical
Performance Quality (AMPQ) measure was constructed. The AMPQ measure was
developed using previous items from performance measurement studies that
demonstrated strong factor loadings and high reliabilities as well as researcher created
items designed after those items demonstrating high factor simple loadings (Abeles,
1971; Bergee, 1987, 1995; Zdzinski, 1993; Saunders & Holohan, 1997; Zdzinski &
Barnes, 2002; Russell, 2007). The total reliability analysis reported a Cronbach’s alpha
reliability score of .977. Individual subscale reliabilities ranged from .789 to .957 with
most alpha reliabilities above .80.
The AMPQ measure was used to evaluate solo performance recordings. The four
performance recordings used for the present study were randomly selected from a total of
50 volunteer solo performances ranging from beginner to professional and represented
brass, string, voice, and woodwind instruments. Volunteer adjudicators (N = 58) from
Florida, Colorado, Oklahoma, and Virginia were recruited to evaluate the four
performances using the AMPQ measure. Each adjudicator evaluated all four recordings.
A total of 232 evaluations using the AMPQ measure were collected.
122
Results from the performance data revealed high positive correlations between
technique, musical expression, and overall perceptions of performance quality (r = .72 to
.84). These correlations lend support to the inclusion of these variables in the same model
(Keith, 2006). Positive correlations were also found between the respective component
factors of technique (r = .66 to .83) and musical expression (r = .45 to .85). A path
analysis examining the effects of technique and musical expression on overall
perceptions of performance quality yielded standardized path coefficients ranging from
0.38 to 0.72. The regression to determine the influences of component factors on
technique produced standardized coefficients ranging from 0.09 (intonation) to 0.41
(articulation). A regression of musical expression on hypothesized component factors
reported significant standardized coefficients ranging from 0.14 to 0.66. The only
variables to yield a non-significant beta weight were tempo (β = 0.01, p > .05) and
intonation (β = 0.09, p > .05).
Individual analyses of the performer-controlled factors categorized by instrument
type revealed the stability of the structure of technique, musical expression, and overall
perceptions of performance quality among brass, string, woodwind, and voice
instruments. The brass model indicated a significant (p < .05) indirect path between
technique and overall perception that is mediated through musical expression, and a
moderate but non-significant path estimate between technique and overall perceptions of
performance quality (β = 0.18, p > .05). This particular result is contributed to the small
sample size for individual instruments.
123
Conclusions
Music performance is indeed a complex process. Quality assessments of musical
performances continue to be an important part of the process that educates musicians.
Whether these assessments are made formally by adjudicators or privately by the
musicians themselves, informed evaluations of performance quality can lead to marked
improvement in overall performance quality. The identification of a structure of factors
that influence assessments of performance quality could provide musicians and educators
with the knowledge to make informed assessments of musical performance quality.
The identification of this successful model of performer-controlled performance
factors satisfies the proposed theory based on logic, time precedence, and related relevant
literature. The structure of the model proposed in the present study illustrates the
importance of technique and musical expression in perceptions of overall performance
quality. Results from the analysis of performance evaluations suggest that deficiencies in
technique will influence not just assessments of technique, but musical expression and
overall perception of performance quality as well. Furthermore, the stability of the
proposed paradigm among string, voice, and woodwind instruments suggests that the
utilization of separate structures for the purpose of performance quality assessment may
not be necessary.
Component factors of both technique and musical expression are also identified in
the model proposed in the current study. The component factors of technique are
identified as tone, intonation, rhythmic accuracy, and articulation. This suggests that a
concentration on the improvement of these individual performance factors could lead to
improvements in overall technique. The identified component factors of musical
124
expression are dynamics, timbre, and interpretation. This suggests that a concentration on
incorporating and improving these performance factors could lead to predictable
increases in perceptions of the expressive qualities of a musical performance. The factor
of tempo was dropped due to the need for further investigation regarding the influence of
tempo on overall perceptions of performance quality.
A larger benefit concerning the identification of musical factors that influence
assessments of performance quality is the information provided to the performer.
Musicians aiming to improve the quality of their musical performances can employ this
model to diagnose deficiencies in preparation and practice strategies. Audiences will
benefit through more enjoyable concert experiences and possibly respond with increased
patronage to the arts and arts education.
On a more global scale, music education can benefit from the added stability of
established theoretical models. The establishment of theoretical models concerning
assessment of music performance could silence those would argue against the ability to
teach music to students and demonstrate predictable educational outcomes. In this age of
accountability in education, music education needs empirical support in order to continue
to spread the joy of making music to students in public and private school settings.
Suggestions for Further Research
A continued examination into the stability of the model of aurally perceived
performer-controlled musical factors is needed. Since musical performance is such a
complex process to examine, it is imperative that we continue to test theories that could
possibly have great impact on music teaching and learning. A replication of the present
125
study with much larger sample sizes is necessary to continue to examine the stability of
the performer-controlled musical factors. In addition, a replication would help to clarify
the stability of the component structures of both technique and musical expression.
Specifically, an examination of the influence of tempo on overall perception and the
possible link to rhythmic accuracy, and an examination of intonation and the possible link
to tone is necessary.
With the replication of the present study, a concentration on obtaining large
samples of homogeneous instrument grouping would be necessary to obtain a clearer
estimation of the stability of the model of performer-controlled musical factors within
individual instrument categories. An emphasis should be placed on the investigation of
the stability of the brass model.
Another research suggestion would be to extend this structure to instrument
categories not utilized in the present study. An investigation into the stability of the
model of performer-controlled musical factors in piano and pitched percussion could be
executed with the same research design employed in the current study. This continued
investigation is necessary to come closer to fully realizing the stability of the current
model across instrument categories.
A continued examination of the component models of technique and musical
expression could also be helpful. One suggestion would be to investigate the
reclassification of the tempo variable into technique or separate component factor. The
identification and classification of individual performer-controlled musical factors would
have implications for improved diagnosis in music instruction.
126
The existence of the proposed model of performer-controlled musical factors also
alludes to the existence of related musical models. These models may include, but are not
limited to, composer cues, visual cues, and ensemble cues. An investigation into the
development of these theoretical models could lead to a greater understanding of the
process of musical performance assessment.
Implications for Teachers
Teachers can utilize information in the present study to help focus strategies for
student music learning. By focusing instruction on technique and musical expression
teachers can help students reach higher levels of achievement in musical performance.
Classroom and studio teachers can also use the individual factors of technique and
musical expression as an aide in the diagnosis of student performance deficiencies.
The model proposed in the current study also provides teachers with theoretical
support for the aspects of music already present in their curriculum. Teachers can utilize
the model of performer-controlled musical factors to illustrate to students the effects of
musical components on overall perceptions of quality. This structure can be used to
create a routine approach to problem solving in a music classroom.
Music performance is an integral part of the music education process. Assessment
of music performance quality is used to determine professional placements, higher
education admission, school ensemble placements, juries, and competitions. The high
stakes nature of these assessments call for as much objectivity as possible. In order to
attain these higher levels of objectivity and accuracy in musical performance assessment,
it is imperative that we continue to explore the nature of musical performance.
127
The testing of theoretical models is necessary to explore and continue to
understand the world in which we live. Music is an omnipresent aspect of every culture.
Yet the structure of music is so complex that we must decompose it in order to
understand it. By continuing to test theories concerning the various aspects of music we
can come closer to understanding and improving experiences with music.
References
Abdoo, F. B. (1980). A comparison of the effects of gestalt and associationist learning
theories on the musical development of elementary school beginning wind and
percussion instrumental students (Doctoral dissertation, University of Southern
California). Dissertation Abstracts International, 41, 1268A.
Abeles, H. F. (1971). An application of the facet-factorial approach to scale construction
in the development of a rating scale for clarinet music performance. Unpublished
doctoral dissertation, University of Maryland, 1971.
Abril, C. R., & Flowers, P. J. (2007). Attention, preference, and identity in music
listening by middle school students of different linguistic backgrounds. Journal of
Research in Music Education, 55, 204-219.
Arnold, J. A. (1995). Effects of competency-based methods of instruction and selfobservation on ensemble directors’ use of sequential patterns. Journal of
Research in Music Education, 43, 127-138.
Asmus, E. P. (1980). Empirical testing of an affective learning paradigm. Journal of
Research in Music Education, 28(3), 143-154.
Asmus, E. P. (1981). Higher order factors of a multidimensional instrument for the
measurement of musical affect. Paper presented at the Research Symposium on
the Psychology and Acoustics of Music, University of Kansas, Lawrence, KS.
Asmus, E. P. (1989). Factor analysis: A look at the technique through the data of
Rainbow. Bulletin of the Council for Research in Music Education, 101.
Asmus, E. (1999). Music assessment concepts. Music Educators Journal, 86(2), 19-24.
Asmus, E. P. (2009). The measurement of musical expression. Paper presented at the
Suncoast Music Education Symposium, Tampa, FL.
Azzara, C. D. (1993). Audiation-based improvisation techniques and elementary
instrumental students' music achievement. Journal of Research in Music
Education, 41, 328-342.
Bean, K. L. (1938). An experimental approach to the reading of music. Unpublished
doctoral dissertation, University of Michigan, 1938. Dissertation Abstracts
International, AAT 0135656.
Becker, W. E., Jr. (1983). Research in economic education. Journal of Economic
Education, 14(2), 4-10.
128
129
Bergee, M. J. (1987). An application of the facet-factorial approach to scale construction
in the development of a rating scale for euphonium and tuba music performance
(Doctoral dissertation, University of Kansas, 1987). Dissertation Abstracts
International, 49 (05), 1086A.
Bergee, M. J. (1993). A comparison of faculty, peer, and self-evaluation of applied brass
jury performances. Journal of Research in Music Education, 41, 19-27.
Bergee, M. J. (1995). Primary and higher-order factors in a scale assessing concert band
performance. Bulletin of the Council for Research in Music Education, 126, 1-14.
Bergee, M. J. (1997). Relationships among faculty, peer, and self-evaluations of applied
music performances. Journal of Research in Music Education, 45, 601-612.
Bergee, M. J. (2003). Faculty interjudge reliability of music performance evaluation.
Journal of Research in Music Education, 51, 137-150.
Bloom, B. S. (1976). Human characteristics and school learning. New York: McGrawHill.
Boyle, J. D., & Radocy, R. E. (1987). Measurement and evaluation of musical
experiences. New York: Schirmer Books.
Brunswick, E. (1952). The conceptual framework of psychology. Chicago: University of
Chicago Press.
Burnsed, V., Hinkle, B., & King, S. (1985). Performance evaluation reliability at selected
concert festivals. Journal of Band Research, 21(1), 22-29.
Butt, D. S., & Fiske, D. W. (1968). Comparison of strategies in developing scales for
dominance. Psychological Bulletin, 70(6), 505–519.
Byo, J. L., & Brooks, R. (1994). A comparison of junior high musicians and music
educators’ performance evaluations of instrumental music. Contributions to
Music Education, 21, 26-38.
Csikszentmihalyi, M. (1975). Beyond boredom and anxiety. San Francisco: Jossey-Bass.
Csikszentmihalyi, M. (1990). Flow: The psychology of optimal experience. New York:
Harper & Row.
Colwell, R. (1963). An investigation of musical achievement among vocal students,
vocal-instrumental students, and instrumental students. Journal of Research in
Music Education, 11, 123-130.
130
Cooksey, J. M. (1977). A facet-factorial approach to rating high school choral music
performance. Journal of Research in Music Education, 25, 100-114.
DCamp, C. B. (1980). An application of the facet-factorial approach to scale construction
in the development of a rating scale for high school band performance (Doctoral
dissertation, University of Iowa, 1980). Dissertation Abstracts International, 41,
1462A.
Deaton, W. L., Poggio, J. P., & Glasnapp, D. R. (1977). A scale to assess the affective
entry level of students. Paper presented at the meeting of the National Council on
Measurement in Education, New York, 1977.
Duerksen, G. L. (1972). Some effects of expectation on evaluation of recorded musical
performance. Journal of Research in Music Education, 20, 268-272.
Duke, R. L. (1987). Observation of applied music instruction: The perceptions of trained
and untrained observers. In C. K. Madsen & C. A. Prickett (Eds.), Applications of
research in music behavior (pp. 115-124). Tuscaloosa: University of Alabama
Press.
Edmonston, W. E., Jr. (1966). The use of the semantic differential technique in the
esthetic evaluation of musical excerpts. American Journal of Psychology, 79, 650652.
Edmonston, W. E., Jr. (1969). Familiarity and musical training in esthetic evaluation of
music. Journal of Social Psychology, 79, 109-111.
Farnum, S. E. (1950) Prediction of success in instrumental music. Unpublished doctoral
dissertation, Harvard University, 1950.
Fiske, H. E. (1975). Judge-group differences in the rating of secondary school trumpet
performances. Journal of Research in Music Education, 23, 186-196.
Fiske, H. E. (1977). Relationship of selected factors in performance adjudication
reliability. Journal of Research in Music Education, 25, 256-263.
Fiske, H.E. (1979). Musical performance evaluation ability: Toward a model of
specificity. Bulletin of the Council for Research in Music Education, 59, 27-31.
Folts, M. L. (1973). The relative effect of two procedures as followed by flute, clarinet,
and trumpet students while practicing, on the development if time quality and on
selected performance skills: An experiment in student use of sound-recorded
material (Doctoral dissertation, New York University). Dissertation Abstracts
International, 34, 1312A.
131
Gabrielsson, A. (1999). Studying emotional expression in musical performance. Bulletin
of the Council for Research in Music Education, 141, 47-53.
Gatewood, E. L. (1927). An experimental study of musical enjoyment. In Schoen (Ed.),
The effects of music. Freeport, NY: Books for Library Press.
Geringer, J. M., & Johnson, C. M. (2007). Effects of duration, tempo, and performance
level on musicians’ ratings of wind band performances. Journal of Research in
Music Education, 55, 289-301.
Geringer, J. M., & Madsen, C. (1998). Musician’s ratings of good versus bad vocal and
string performances. Journal of Research in Music Education, 46, 522-534.
Gillespie, R. (1997). Ratings of violin and viola vibrato performance in audio-only and
audiovisual presentations. Journal of Research in Music Education, 44, 212-220.
Glasnapp, D. R., Poggio, J. P., & Deaton, W. L. (1976). Causal analysis within a mastery
learning paradigm. Paper presented at the meeting of the American Educational
Research Association, San Francisco, 1976.
Gundlach, R. H. (1935). Factors determining the characterization of musical phrases.
American Journal of Psychology, 47, 624-643.
Gutsch, K. U. (1964). One approach toward the development of an individual test for
assessing one aspect of instrumental music achievement. Bulletin of the Council
for Research in Music Education, 2.
Gutsch, K. U. (1965). Evaluation in instrumental music performance: An individual
approach. Bulletin of the Council for Research in Music Education, 4.
Hevner, K. (1935). The affective character of the major and minor modes in music.
American Journal of Psychology, 47, 103-118.
Hevner, K. (1936). Experimental studies of the elements of expression in music.
American Journal of Psychology, 48, 246-268.
Hevner, K. (1937). The affective value of pitch and tempo in music. American Journal of
Psychology, 49, 621-630.
Hevner, K. (1938). Studies in expressiveness of music. Music Teachers National
Association Proceedings, 39, 199-217.
Hewitt, M. P. (2002). Self-evaluation tendencies of junior high school instrumentalists.
Journal of Research in Music Education, 50, 215-226.
132
Hewitt, M. P. (2005). Self-evaluation accuracy among high school and middle school
instrumentalists. Journal of Research in Music Education, 53, 148-161.
Hewitt, M. P. (2007). Influence of primary performance instrument and education level
on music performance evaluation. Journal of Research in Music Education, 55,
18-30.
Hewitt, M. P., & Smith, B. P. (2004). The influence of teaching career level and primary
performance instrument on the assessment of music performance. Journal of
Research in Music Education, 52, 314-327.
Hillbrand, E. K. (1923). Hillbrand sight-singing test. New York: World Book.
Hodges, D. A. (1975). The effects of recorded aural models on the performance
achievement of beginning band classes, Journal of Band Research, 12(1), 30-34.
Hoffren, J. A. (1964a). A test of musical expression. Bulletin of the Council for Research
in Music Education, 2, 32-35.
Hoffren, J. A. (1964b). The construction and validation of a test of expressive phrasing in
music. Journal of Research in Music Education, 12, 159-164.
Horacek, L. (1955). The relationship of mood and melodic pattern in folk songs (Doctoral
dissertation, University of Kansas, 1957). Dissertation Abstracts, 17, 1567.
Iltis, J. L. (1970). The construction and validation of a test to measure the ability of high
school students to evaluate musical performance (Doctoral dissertation, Indiana
University, 1970). Dissertation Abstracts International, 31, 3582A.
Jackson, S. A., & Marsh, H. W. (1996). Development and validation of a scale to
measure optimal experience: The flow state scale. Journal of Sport & Exercise
Psychology, 18, 17-35.
Johnson, C. M., & Geringer, J. M. (2007). Predicting music majors’ overall ratings of
wind band performances: Elements of music. Bulletin of the Council for Research
in Music Education, 173, 25-38.
Jones, H. (1986). An application of the facet-factorial approach to scale construction in
the development of a rating scale for high school vocal solo performance
(Doctoral dissertation, University of Oklahoma, 1986). Dissertation Abstracts
International, 47, 1230A.
Juslin, P. N. (1997a). Emotional communication in music performance: a functionalist
perspective and some data. Music Perception, 14, 383-418.
133
Juslin, P. N. (1997b). Can results from studies of perceived expression in music
performances be generalized across response formats? Psychomusicology, 16, 77101.
Juslin, P. N., & Laukka, P. (2004). Expression, perception, and induction of musical
emotions: A review and a questionnaire study of everyday listening (Stockholm
Music Acoustics Conference, 2003). Journal of New Music Research, 33, 217238.
Juslin, P. N., & Lindstrom, E. (2003). Musical expression of emotions: Modeling
composed and performed features. Paper presented at the 5th Triennial ESCOM
Conference at Hanover University of Music and Drama, Germany.
Keith, T. Z. (2006). Multiple Regression and Beyond. Boston: Pearson Education, 2006.
Kim, S. Y. (2000). Group differences in piano performance evaluation by experienced
and inexperienced judges. Contributions to Music Education, 27(2), 23-36.
Kline, R. B. (2005). Principles and Practice of Structural Equation Modeling. New
York: Guilford Press, 2005.
Knuth, W. (1967). Achievement Tests in Music: Recognition of Rhythm and Melody:
Complete Manual of Directions. rev. ed. Divisions 1, 2, and 3; Forms A and B.
San Francisco: Creative Arts Research Associates, 1967.
Knuth, W. E. (1933). The construction and validation of music tests designed to measure
certain aspects of sight-reading. Unpublished doctoral dissertation, University of
California, 1933. Dissertation Abstracts International AAT 0140276.
Kopiez, R., & Lee, J. I. (2008). Toward a general model of skills involved in sight
reading music. Music Education Research, 10, 41-62.
Kruth, E. C. (1973). A suggested technique for evaluating wind instrument performance.
Journal of Band Research, 10, 24-36.
Langer, S. K. (1953). Feeling and form. New York: Charles Scribner’s Sons.
LeBlanc, A. (1980). Outline of a proposed model of sources of variation in musical taste.
Bulletin of the Council for Research in Music Education, 61, 29-34.
Leonard, C.; House, R. W. (1972). Foundations and principles of music education. New
York: McGraw Hill.
Levi, D. S. (1978). Expressive qualities in music perception and music education. Journal
of Research in Music Education, 26, 425-435.
134
Levinowitz. (1989). An investigation of preschool children's comparative capability to
sing songs with and without words. Bulletin of the Council for Research in Music
Education, 100, 14-19.
Lippitt, G. L. (1973). Visualizing change: Model building and the change process. La
Jolla, CA: University Associates, 1973.
Madsen, C. K., & Geringer, J. M. (1998). Comparison of good versus bad tone
quality/intonation of vocal and string performances: Issues concerning
measurement and reliability of the continuous response digital interface. Paper
presented at the meeting of the Research Commission of the International Society
for Music Education, Johannesburg, South Africa.
Madsen, C. K., Geringer, J. M., & Heller, J. (1991). Comparison of good versus bad
intonation of accompanied and unaccompanied vocal and string performances
using a continuous response digital interface (CRDI). Canadian Music Educator:
Special Research Edition, 33, 123-130.
Madsen, C. K., Geringer, J. M., & Heller, J. (1993). Comparison of good versus bad tone
quality of accompanied and unaccompanied vocal and string performances.
Bulletin for the Council of Research in Music Education, 119, 93-100.
Marchand, D. J. (1975). A study of two approaches to developing expressive
performance. Journal of Research in Music Education, 23, 14-22.
McPherson, G. E., & Thompson, W. F. (1998). Assessing music performance: Issues and
influences. Research Studies in Music Education (10), 12-24.
Miksza, P. (2007). An investigation of observed practice behaviors, self-reported practice
habits, and the performance achievement of high school wind players. Journal of
Research in Music Education, 55, 359-375.
Miksza, P. (2006). Relationships among impulsiveness, locus of control, gender, and
music practice. Journal of Research in Music Education, 54, 308-323.
Miller, G. A., Felbaum, C., Tengi, R., & Langone, H. (2006). Wordnet: A lexical
database for the English language. Retrieved June 16, 2009, from the Princeton
University, Cognitive Science Laboratories Web Site:
http://wordnet.princeton.edu/
Mills, J. (1987). Assessment of solo musical performance- a preliminary study. Bulletin
of the Council for Research in Music Education, 91, 119-125.
Mosher, R. M., (1926). A study of the group method of measurement of sightsinging. Unpublished doctoral dissertation, Columbia University, 1926.
Dissertation Abstracts International, AAT 0127542.
135
Neilson, J. (1973). A blueprint for adjudicators. The Instrumentalist, 28(5), 46-48.
Nichols, J. P. (2005). A factor analysis approach to the development of a rating scale for
snare drum performance. (Doctoral dissertation, University of Iowa). Dissertation
Abstracts International, 46, 3282A.
Norris, C. E., & Borst, J. D. (2007). An examination of the reliabilities of two choral
festival adjudication forms. Journal of Research in Music Education, 55, 237-251.
Oakley, D. L. (1972). An investigation of criteria used in the evaluation of marching
bands. Journal of Band Research, 9(1), 32-37.
Oldefendt, S. J. (1976). Scoring instrumental and vocal musical performances.
Unpublished paper presented at the National Council on Measurement in
Education: San Francisco. [ERIC Document Reproduction Service No. ED 129
839].
Owen, C. D. (1969). A study of criteria for the evaluation of secondary school
instrumentalists when auditioning for festival bands. Unpublished doctoral
dissertation, East Texas State University, 1969.
Payne, D. A. (2003). Applied educational assessment (2nd ed.). Belmont, CA: Wadsworth
Thompson Learning.
Prickett, C. A. (1987). The effect of self-monitoring on the rate of verbal mannerism of
song teachers. In C. K. Madsen & C. A. Prickett (Eds.), Applications of research
in music behavior (pp. 125-134). Tuscaloosa: University of Alabama Press.
Radocy, R. E. (1986). On quantifying the uncountable in musical behavior. Bulletin of the
Council for Research in Music Education, 88, 22-31.
Russell, B. E. (2007). An application of the facet-factorial approach to scale construction
in the development of a guitar performance rating scale. Unpublished master
thesis, University of Miami, 2007.
Russell, B. E. (in press). An application of the facet-factorial approach to scale
construction in the development of a guitar performance rating scale. Bulletin of
the Council for Research in Music Education, 2009.
Rutkowski, J. (1990). The measurement and evaluation of children's singing voice
development. Quarterly Journal of Music Teaching and Learning, 1(1 & 2), 8195.
Sagen, D. P., (1983). The development and validation of a university band performance
rating scale. Journal of Band Research 18(2), 1- 11.
136
Saunders, T. C., & Holahan, J. M. (1997). Criteria-specific rating scales in the evaluation
of high school instrumental performance. Journal of Research in Music
Education, 45, 259-272.
Schillinger, J. (1946). The Schillinger system of musical composition. New York: Carl
Fischer, Inc.
Schleff, J. S. (1992). Critical judgments of undergraduate music education students in
response to recorded performances. Contributions to Music Education, 19, 60-74.
Schleuter, S. L. (1978). Effects of certain lateral dominance traits, music aptitude, and
sex differences with instrumental music achievement. Journal of Research in
Music Education, 26, 22-31.
Silliman, T. E. (1977). The effect of entrance age in achievement and retention in the
beginning band instrument program (Doctoral dissertation, University if
Maryland). Dissertation Abstracts International, 48, 5982A.
Smith, N. E. (1968). A study of certain expressive-acoustic equivalents in the
performance styles of five trumpet players. (Doctoral dissertation, Florida State
University, 1968). Dissertation Abstracts International, 30, 5021A-5022A.
St. Cyr, A. W. (1977). Evaluative criteria for band, orchestra, chorus. Unpublished
doctoral dissertation, Boston College, 1977. Dissertation Abstracts International,
AAT 7718638.
Stecklein, J. E., & Aliferis, J. (1957). The relationship of instrument to music
achievement scores. Journal of Research in Music Education, 5, 3-15.
Stelzer, T. G. W. (1935). Construction, interpretation, and use of a sight reading scale in
organ music, with an analysis of organ playing into fundamental abilities
(Doctoral dissertation, University of Nebraska, 1935). Dissertation Abstracts
International, AAT DP13957.
Stivers, J. D. (1972). A reliability and validity study of the Watkins-Farnum Performance
Scale. Unpublished doctoral dissertation, University of Illinois at UrbanaChampaign, 1972.
Suchor, V. (1977). The influence of personality composition in applied piano groups.
Journal of Research in Music Education, 25(3), 171-83.
Thompson, W., Diamond, C.T.P., & Balkwill, L. (1998). The adjudication of six
performances of a Chopin etude: A study of expert knowledge. Psychology of
Music, 26, 154-175.
137
Thompson, S., & Williamon, A. (2003). Evaluating evaluation: Musical performance
assessment as a research tool. Music Perception, 21, 21-41.
Tiede, R. L. (1971). A study of the effects of experience in evaluating unidentified
instrumental performances on the student conductor’s critical perception of
performance. Doctoral dissertation, University of Illinois at Urbana Champaign,
1971. Dissertation Abstracts International, 32, 4653A-4654A.
Van Gigch, J. P. (1991). System Design Modeling and Metamodeling. New York: Plenum
Press, 1991.
Vasil, T. (1973). The effects of systematically varying selected factors of on music
performance adjudication. Unpublished doctoral dissertation, University of
Connecticut, Storrs, 1973.
Wapnick, J., Flowers, P., Alegant, M., & Jasinskas, L. (1993). Consistency in piano
performance evaluation. Journal of Research in Music Education, 41, 282-292.
Watkins, J. G. (1942). Objective measurement of instrumental music. New York:
Teachers College Bureau of Publications, Columbia University.
Watkins, J. G., & Farnum, S. E. (1954). The Watkins-Farnum Performance Scale, Form
A & Form B. Milwaukee, WI: Hal Leonard.
Wheelwright, L. F. (1940). An experimental study of the perceptibility and spacing of
music symbols. Unpublished doctoral dissertation, Columbia University, 1940.
Dissertation Abstracts International, AAT 0165563.
Whitcomb, R. (1999). Writing rubrics for the music classroom. Music Educators Journal,
85(6), 26-32.
Wrigley, W. J. (2005). Improving music performance assessment. Unpublished doctoral
thesis, Griffith University, 2005.
Zdzinski, S. F. (1991). Measurement of solo instrumental music performance: a review of
literature. Bulletin of the Council for Research in Music Education, 109.
Zdzinski, S. F. (1993). Relationships among parental involvement, selected student
attributes, and learning outcomes in instrumental music. Unpublished doctoral
dissertation, Indiana University.
Zdzinski, S. F., & Barnes, G. V. (2002). Development and validation of a string
performance rating scale. Journal of Research in Music Education, 50, 245-255.
Appendix A
Variables Collected from Performance Assessment Research
Technique
Interpretation
Rhythm
Intonation
Phrasing
Pitch
Musicality
Tone
Balance
Blend
Expression
Dynamics
Amplitude
Tempo
Articulation
Timbre
Accent
Instrumentation
Instrument quality
Difficulty
Arrangement
Attack
Release
Reading
Range
Left-hand
Communication
Melody
Mode
Tonality
Harmony
Memory
Bow
Breathing
138
Rubato
Vibrato
Embouchure
Smoothness
Unity
Continuity
Style
Position
Posture
Ensemble
Accompaniment
Appearance
Conductor
Accuracy
Appendix B
Categorization of Performance Assessment Variables
Performer
Composer
Ensemble
Visual
Tempo
Amplitude
Dynamics
Articulation
Accent
Timbre
Tone
Intonation
Technique
Interpretation
Accuracy
Musicality
Phrasing
Vibrato
Breathing
Attack
Release
Range
Left-hand
Continuity
Smoothness
Reading
Memory
Communication
Mode
Tonality
Harmony
Pitch
Melody
Rhythm
Arrangement
Difficulty
Style
Unity
Balance
Blend
Accompaniment
Conductor
Instrumentation
Embouchure
Position
Posture
Appearance
Bow
Instrument Quality
139
Appendix C
Aural Musical Performance Quality (AMPQ) Measure
Questions:
1. Tone is strong
2. Tone is full
3. Thin tone quality
4. Sound is clear
5. Played out of tune.
6. Performer was able to adjust pitch.
7. Intonation is inconsistent.
8. Intonation is good
9. Correct rhythms
10. Off-beats played properly
11. Rhythm was distorted
12. Insecure rhythm
13. Poor synchronization
14. Attacks and releases were clean
15. Impeccable articulation
16. Articulation is overly percussive
17. Tempo is steady
18. Tempo not controlled
19. The tempo was in good taste
20. Lack of a steady pulse
21. Dynamics are played
22. Dynamics used to help phrasing
23. Good dynamic contrast
24. Appropriate dynamics
25. Timbre was harsh or strident.
26. Demonstrated a singing quality
27. Lacked resonance
28. Timbre appropriate for style
29. The interpretation was musical.
30. Lack of style in performance.
31. Effective musical communication
32. Melodic phrasing
33. Made numerous errors in technique.
34. Insecure technique
35. Precision is lacking
36. Played fluently
SA
SA
SA
SA
SA
SA
SA
SA
SA
SA
SA
SA
SA
SA
SA
SA
SA
SA
SA
SA
SA
SA
SA
SA
SA
SA
SA
SA
SA
SA
SA
SA
SA
SA
SA
SA
140
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
D
D
D
D
D
D
D
D
D
D
D
D
D
D
D
D
D
D
D
D
D
D
D
D
D
D
D
D
D
D
D
D
D
D
D
D
SD
SD
SD
SD
SD
SD
SD
SD
SD
SD
SD
SD
SD
SD
SD
SD
SD
SD
SD
SD
SD
SD
SD
SD
SD
SD
SD
SD
SD
SD
SD
SD
SD
SD
SD
SD
141
37.
38.
39.
40.
41.
42.
43.
44.
Performance not expressive
Performance reflected sensitivity
Melodic expression
Spiritless playing
Overall quality lacking
Excellent performance overall
Poor performance quality
Quality of performance is good
SA
SA
SA
SA
SA
SA
SA
SA
A
A
A
A
A
A
A
A
D
D
D
D
D
D
D
D
SD
SD
SD
SD
SD
SD
SD
SD
Appendix D
Evaluation Packet Instruction Sheet
Please read this page First!!
Title: The Empirical Testing of A Musical Performance Assessment Paradigm.
DIRECTIONS:
To begin:
1. Locate the following materials in the packet:
a. Compact disc (1).
b. Adjudication form(s) (4).
c. Return Envelope- (1 postage paid envelope).
2. Listen to the first track on the compact disc. Read each statement
on the adjudication form carefully and indicate your response by
circling the “SA” if you strongly agree, the “A” if you agree, the
“D” if you disagree, or the “SD” if you strongly disagree. You may
listen to the recording as many times as is needed to complete the
adjudication form. Please indicate which track you are evaluating
by circling the appropriate track number at the top of the
adjudication form.
3. Repeat STEP 3 for each of the remaining tracks.
When you are finished:
1. Please include the following items in the Return Envelope:
a. 4- Adjudication forms (completed).
b. 1- Compact Disc.
2. Send the Return Envelope via U.S. Postal service. Place the return
envelope with the materials mentioned above in the mailbox.
That’s it!
Brian Russell
7700 S.W. 146 Road
Miami, FL 33183
Questions????: Please feel free to call Brian Russell at: (305) 720-4099
142
Appendix E
Waiver of Signed Consent Form
Empirical Testing of a Performance Assessment Paradigm Consent
PURPOSE:
You are being asked to participate in a research study that examines the factors
surrounding musical performance evaluation by the University of Miami in Coral Gables,
Florida. The purpose of this study is to empirically test a performance assessment
paradigm. The results of this study will aid in strengthening the body of research dealing
with success in musical performance and evaluation. All evaluations submitted for the
purposes of this research will remain anonymous and cannot be traced back to you. No
identifiable information (name, address, identification numbers, etc.) will be collected.
Completion of the performance evaluation is considered your consent to participation.
RISKS:
There are no anticipated risks associated with the recording or evaluation form.
BENEFITS:
No direct benefit can be promised to you from your participation in this study. However,
guitar students and teachers may benefit from this research.
COSTS:
There is no cost to participants involved in this study.
PAYMENT TO PARTICIPANT:
There is no payment for participation in this study.
CONFIDENTIALITY:
The investigator will consider your responses confidential to the extent permitted by law.
No identifiable information is necessary to take part in this study.
RIGHT TO WITHDRAW:
Your participation in this study is voluntary; you have the right to withdraw. After the
materials are distributed, you can decide to stop at any time. There are no negative
consequences if you decide to not participate in this study.
OTHER PERTINENT INFORMATION:
The investigator will answer any questions you might have about the study. The
investigator will give you a copy of this consent form and you may contact him at
(305)720-4099. If you have any questions regarding your rights as a research participant
you should contact the University of Miami Human Subjects Research Office at (305)
243-3195.
143
Appendix F
Confirmatory Factor Analysis of AMPQ Items
A supplemental confirmatory factor analysis of the Aural Musical Performance
Quality (AMPQ) measure was conducted to determine the representativeness of items
selected to represent the component factors of tone, intonation, rhythmic accuracy,
articulation, tempo, dynamics, timbre, and interpretation. Confirmatory factor analysis is
a method of establishing evidence for the validity of measures. The model depicted
below is also referred to as a latent variable measurement model.
The AMPQ measure uses 32 Likert scale items to measure performance
achievement. The 32 rectangular boxes on the right side of the model, the observed
variables, represent the items selected for inclusion on the AMPQ measure. The items are
theorized to assess eight different latent constructs. The eight ovals on the left side of the
model represent the latent, unobserved, variables: Latent Tone, Latent Intonation, Latent
Rhythm Accuracy, Latent Articulation, Latent Tempo, Latent Dynamics, Latent Timbre,
and Latent Interpretation. The smaller circles on the leftmost side of the diagram
represent the unique variance unaccounted for by the associated latent variable plus any
measurement error that may exist. For example: u1 represents the specific variance in the
first tone item that is unaccounted for by Latent Tone plus the measurement error for that
item (see Figure F1).
144
145
Figure F1. Confirmatory Factor Analysis of AMPQ Items
146
Model-fit indices indicate that the imposed model exhibits adequate fit to the data
collected (SRMR = .058; TLI = .916; CFI = .926). In short, this means that the proposed
theory is supported by the data, and the proposed model is one viable representation of
the true relations underlying the data. Results of the confirmatory factor analysis indicate
that the items selected to represent the performance factors: tone, intonation, rhythmic
accuracy, articulation, tempo, dynamics, timbre, and interpretation are indeed
representative as indicated by the significant pattern coefficients between the observed
and latent variables (see Table F1). The covariances illustrated in the model are
equivalent, in this case, to correlations between the latent variables. All covariances
depicted in this model are significant.
Table F1
Pattern Coefficients for AMPQ Factor Analysis
Item #
Q1
Q2
Q3
Q4
Q5
Q6
Q7
Q8
Q9









Q10

Q11

Q12

Q13

Latent
Variable
Tone
Tone
Tone
Tone
Intonation
Intonation
Intonation
Intonation
Rhythmic
Accuracy
Rhythmic
Accuracy
Rhythmic
Accuracy
Rhythmic
Accuracy
Articulation
Unstandardized Standardized S.E. C.R.
p
Estimate
Estimate
value
.658
.852
.041 15.894 ***
.770
.905
.044 17.558 ***
.604
.746
.046 13.035 ***
.665
.822
.044 15.028 ***
.511
.687
.045 11.371 ***
.420
.675
.038 11.111 ***
.602
.720
.050 12.093 ***
.682
.877
.042 16.090 ***
.525
.781
.038 13.759 ***
.499
.777
.037 13.646
***
.654
.842
.042 15.387
***
.702
.857
.044 15.816
***
.454
.654
.042 10.807
***
147
Table F1 continued
Q14
Q15
 Articulation
 Articulation
.758
.739
.839
.809
.050 15.303
.051 14.495
***
***
Q16
Q17
 Articulation
 Tempo
.350
.509
.044
.675
.880
.042 16.188
***
***
Q18
 Tempo
.560
.788
.041 13.752
***
Q19
 Tempo
.367
.567
.041
8.935
***
Q20
 Tempo
.615
.789
.045 13.777
***
Q21
 Dynamics
.715
.928
.039 18.294
***
Q22
 Dynamics
.757
.914
.042 17.822
***
Q23
 Dynamics
.601
.823
.040 15.074
***
Q24
 Dynamics
.497
.709
.041 12.167
***
Q25
 Timbre
.606
.749
.046 13.061
***
Q26
 Timbre
.749
.868
.046 16.305
***
Q27
 Timbre
.706
.849
.045 15.733
***
Q28
 Timbre
.539
.793
.038 14.173
***
Q29
 Interpretation
.609
.797
.043 14.273
***
Q30
 Interpretation
.644
.807
.044 14.521
***
Q31
 Interpretation
.720
.887
.043 16.861
***
Q32
 Interpretation
.592
.788
.042 14.022
***
7.979
*** p < .001
The high correlation between Latent Tone and Latent Timbre (r = .89) suggests
the possibility that these two latent variables are actually measuring the same thing. This
revised model can be compared to the original model by constraining the covariance
between Latent Tone and Latent Timbre to one; essentially collapsing these two separate
148
latent variables into one latent variable. This hypothesis was tested and compared to the
original model (Table F2).
Table F2
Model-fit Comparisons
Model
CMIN
df
TLI
CFI
AIC
BIC
Original
841.20
436 .916 .926
1025.20
1342.30
Revised
890.11
437 .906 .917
1072.11
1385.77
The results of this comparison analysis indicate that the revised model does not
demonstrate a significantly better fit than the original model, and actually demonstrates a
mathematically worse fit than the original model according to model-fit indices (TLI =
.906; CFI = .917). The revised model affords one additional degree of freedom (df = 437)
than the original model, but this increase in parsimony is not justified given a comparison
of the model-fit indices. Since the results of the model comparison indicate that the items
selected to represent tone and timbre do indeed measure separate latent variables, the
original measurement model will be retained.
Appendix G
AMOS Output of Estimated Performer-Controlled Musical Factors Model Across
Brass, Woodwind, Voice and String Instruments
Regression Weights: (Group number 1 - Default model)
Estimate S.E. C.R.
P
MUSEX <--- TECHNQ
.589 .038 15.544 ***
OVERALL <--- TECHNQ
.569 .043 13.086 ***
OVERALL <--- MUSEX
.447 .053 8.470 ***
Label
Standardized Regression Weights: (Group number 1 - Default model)
MUSEX
<--- TECHNQ
OVERALL <--- TECHNQ
OVERALL <--- MUSEX
Estimate
.715
.577
.373
Variances: (Group number 1 - Default model)
TECHNQ
d1
d2
Estimate S.E. C.R. P Label
10.202 .949 10.747 ***
3.386 .315 10.747 ***
2.175 .202 10.747 ***
Matrices (Group number 1 - Default model)
Total Effects (Group number 1 - Default model)
MUSEX
OVERALL
TECHNQ
.589
.832
MUSEX
.000
.447
Standardized Total Effects (Group number 1 - Default model)
MUSEX
OVERALL
TECHNQ
.715
.844
MUSEX
.000
.373
149
150
Direct Effects (Group number 1 - Default model)
MUSEX
OVERALL
TECHNQ
.589
.569
MUSEX
.000
.447
Standardized Direct Effects (Group number 1 - Default model)
MUSEX
OVERALL
TECHNQ
.715
.577
MUSEX
.000
.373
Indirect Effects (Group number 1 - Default model)
MUSEX
OVERALL
TECHNQ
.000
.263
MUSEX
.000
.000
Standardized Indirect Effects (Group number 1 - Default model)
MUSEX
OVERALL
TECHNQ
.000
.267
MUSEX
.000
.000
Indirect Effects - Standard Errors (Group number 1 - Default model)
MUSEX
OVERALL
TECHNQ
.000
.035
MUSEX
.000
.000
Indirect Effects (Group number 1 - Default model)
Indirect Effects - Lower Bounds (BC) (Group number 1 - Default model)
MUSEX
OVERALL
TECHNQ
.000
.207
MUSEX
.000
.000
151
Indirect Effects - Upper Bounds (BC) (Group number 1 - Default model)
MUSEX
OVERALL
TECHNQ
.000
.323
MUSEX
.000
.000
Indirect Effects - Two Tailed Significance (BC) (Group number 1 - Default model)
MUSEX
OVERALL
TECHNQ
...
.001
MUSEX
...
...
Appendix H
AMOS Output of Estimated Performer-Controlled Musical Factors: Woodwind
Model
Regression Weights: (Group number 1 - Default model)
Estimate S.E.
MUSEX <--- TECHNQ
.464 .119
OVERALL <--- TECHNQ
.311 .095
OVERALL <--- MUSEX
.480 .094
C.R.
P
3.901 ***
3.273 .001
5.114 ***
Label
Standardized Regression Weights: (Group number 1 - Default model)
MUSEX
<--- TECHNQ
OVERALL <--- TECHNQ
OVERALL <--- MUSEX
Estimate
.459
.332
.519
Variances: (Group number 1 - Default model)
TECHNQ
d1
d2
Estimate S.E.
4.393 .823
3.545 .664
1.783 .334
C.R.
5.339
5.339
5.339
P Label
***
***
***
Matrices (Group number 1 - Default model)
Total Effects (Group number 1 - Default model)
MUSEX
OVERALL
TECHNQ
.464
.534
MUSEX
.000
.480
Standardized Total Effects (Group number 1 - Default model)
MUSEX
OVERALL
TECHNQ
.459
.570
MUSEX
.000
.519
152
153
Direct Effects (Group number 1 - Default model)
MUSEX
OVERALL
TECHNQ
.464
.311
MUSEX
.000
.480
Standardized Direct Effects (Group number 1 - Default model)
MUSEX
OVERALL
TECHNQ
.459
.332
MUSEX
.000
.519
Indirect Effects (Group number 1 - Default model)
MUSEX
OVERALL
TECHNQ
.000
.223
MUSEX
.000
.000
Standardized Indirect Effects (Group number 1 - Default model)
MUSEX
OVERALL
TECHNQ
.000
.238
MUSEX
.000
.000
Indirect Effects - Standard Errors (Group number 1 - Default model)
MUSEX
OVERALL
TECHNQ
.000
.074
MUSEX
.000
.000
Indirect Effects (Group number 1 - Default model)
Indirect Effects - Lower Bounds (BC) (Group number 1 - Default model)
MUSEX
OVERALL
TECHNQ
.000
.128
MUSEX
.000
.000
154
Indirect Effects - Upper Bounds (BC) (Group number 1 - Default model)
MUSEX
OVERALL
TECHNQ
.000
.379
MUSEX
.000
.000
Indirect Effects - Two Tailed Significance (BC) (Group number 1 - Default model)
MUSEX
OVERALL
TECHNQ
...
.000
MUSEX
...
...
Appendix I
AMOS Output of Estimated Performer-Controlled Musical Factors: Voice Model
Regression Weights: (Group number 1 - Default model)
MUSEX <--- TECHNQ
OVERALL <--- TECHNQ
OVERALL <--- MUSEX
Estimate S.E.
.548 .113
.521 .075
.378 .074
C.R.
P
4.857 ***
6.918 ***
5.090 ***
Label
Standardized Regression Weights: (Group number 1 - Default model)
MUSEX
<--- TECHNQ
OVERALL <--- TECHNQ
OVERALL <--- MUSEX
Estimate
.541
.561
.413
Variances: (Group number 1 - Default model)
TECHNQ
d1
d2
Estimate S.E.
3.587 .672
2.606 .488
.820 .154
C.R.
5.339
5.339
5.339
P Label
***
***
***
Matrices (Group number 1 - Default model)
Total Effects (Group number 1 - Default model)
MUSEX
OVERALL
TECHNQ
.548
.728
MUSEX
.000
.378
Standardized Total Effects (Group number 1 - Default model)
MUSEX
OVERALL
TECHNQ
.541
.784
MUSEX
.000
.413
155
156
Direct Effects (Group number 1 - Default model)
MUSEX
OVERALL
TECHNQ
.548
.521
MUSEX
.000
.378
Standardized Direct Effects (Group number 1 - Default model)
MUSEX
OVERALL
TECHNQ
.541
.561
MUSEX
.000
.413
Indirect Effects (Group number 1 - Default model)
MUSEX
OVERALL
TECHNQ
.000
.207
MUSEX
.000
.000
Standardized Indirect Effects (Group number 1 - Default model)
MUSEX
OVERALL
TECHNQ
.000
.223
MUSEX
.000
.000
Indirect Effects - Standard Errors (Group number 1 - Default model)
MUSEX
OVERALL
TECHNQ
.000
.060
MUSEX
.000
.000
Indirect Effects (Group number 1 - Default model)
Indirect Effects - Lower Bounds (BC) (Group number 1 - Default model)
MUSEX
OVERALL
TECHNQ
.000
.127
MUSEX
.000
.000
157
Indirect Effects - Upper Bounds (BC) (Group number 1 - Default model)
MUSEX
OVERALL
TECHNQ
.000
.333
MUSEX
.000
.000
Indirect Effects - Two Tailed Significance (BC) (Group number 1 - Default model)
MUSEX
OVERALL
TECHNQ
...
.000
MUSEX
...
...
Appendix J
AMOS Output of Estimated Performer-Controlled Musical Factors: String Model
Regression Weights: (Group number 1 - Default model)
MUSEX <--- TECHNQ
OVERALL <--- TECHNQ
OVERALL <--- MUSEX
Estimate S.E.
.451 .118
.637 .099
.284 .099
C.R.
P
3.826 ***
6.466 ***
2.871 .004
Label
Standardized Regression Weights: (Group number 1 - Default model)
MUSEX
<--- TECHNQ
OVERALL <--- TECHNQ
OVERALL <--- MUSEX
Estimate
.452
.610
.271
Variances: (Group number 1 - Default model)
TECHNQ
d1
d2
Estimate S.E.
3.500 .656
2.772 .519
1.543 .289
C.R.
5.339
5.339
5.339
P Label
***
***
***
Matrices (Group number 1 - Default model)
Total Effects (Group number 1 - Default model)
MUSEX
OVERALL
TECHNQ
.451
.765
MUSEX
.000
.284
Standardized Total Effects (Group number 1 - Default model)
MUSEX
OVERALL
TECHNQ
.452
.733
MUSEX
.000
.271
158
159
Direct Effects (Group number 1 - Default model)
MUSEX
OVERALL
TECHNQ
.451
.637
MUSEX
.000
.284
Standardized Direct Effects (Group number 1 - Default model)
MUSEX
OVERALL
TECHNQ
.452
.610
MUSEX
.000
.271
Indirect Effects (Group number 1 - Default model)
MUSEX
OVERALL
TECHNQ
.000
.128
MUSEX
.000
.000
Standardized Indirect Effects (Group number 1 - Default model)
MUSEX
OVERALL
TECHNQ
.000
.123
MUSEX
.000
.000
Indirect Effects - Standard Errors (Group number 1 - Default model)
MUSEX
OVERALL
TECHNQ
.000
.058
MUSEX
.000
.000
Indirect Effects (Group number 1 - Default model)
Indirect Effects - Lower Bounds (BC) (Group number 1 - Default model)
MUSEX
OVERALL
TECHNQ
.000
.055
MUSEX
.000
.000
160
Indirect Effects - Upper Bounds (BC) (Group number 1 - Default model)
MUSEX
OVERALL
TECHNQ
.000
.256
MUSEX
.000
.000
Indirect Effects - Two Tailed Significance (BC) (Group number 1 - Default model)
MUSEX
OVERALL
TECHNQ
...
.002
MUSEX
...
...
Appendix K
AMOS Output of Estimated Performer-Controlled Musical Factors: Brass Model
Regression Weights: (Group number 1 - Default model)
MUSEX <--- TECHNQ
OVERALL <--- TECHNQ
OVERALL <--- MUSEX
Estimate S.E.
.295 .135
.207 .130
.519 .123
C.R.
P
2.194 .028
1.590 .112
4.211 ***
Label
Standardized Regression Weights: (Group number 1 - Default model)
MUSEX
<--- TECHNQ
OVERALL <--- TECHNQ
OVERALL <--- MUSEX
Estimate
.279
.182
.481
Variances: (Group number 1 - Default model)
TECHNQ
d1
d2
Estimate S.E.
3.453 .647
3.570 .669
3.087 .578
C.R.
5.339
5.339
5.339
P Label
***
***
***
Matrices (Group number 1 - Default model)
Total Effects (Group number 1 - Default model)
MUSEX
OVERALL
TECHNQ
.295
.361
MUSEX
.000
.519
Standardized Total Effects (Group number 1 - Default model)
MUSEX
OVERALL
TECHNQ
.279
.316
MUSEX
.000
.481
161
162
Direct Effects (Group number 1 - Default model)
MUSEX
OVERALL
TECHNQ
.295
.207
MUSEX
.000
.519
Standardized Direct Effects (Group number 1 - Default model)
MUSEX
OVERALL
TECHNQ
.279
.182
MUSEX
.000
.481
Indirect Effects (Group number 1 - Default model)
MUSEX
OVERALL
TECHNQ
.000
.153
MUSEX
.000
.000
Standardized Indirect Effects (Group number 1 - Default model)
MUSEX
OVERALL
TECHNQ
.000
.134
MUSEX
.000
.000
Indirect Effects - Standard Errors (Group number 1 - Default model)
MUSEX
OVERALL
TECHNQ
.000
.082
MUSEX
.000
.000
Indirect Effects (Group number 1 - Default model)
Indirect Effects - Lower Bounds (BC) (Group number 1 - Default model)
MUSEX
OVERALL
TECHNQ
.000
.048
MUSEX
.000
.000
163
Indirect Effects - Upper Bounds (BC) (Group number 1 - Default model)
MUSEX
OVERALL
TECHNQ
.000
.333
MUSEX
.000
.000
Indirect Effects - Two Tailed Significance (BC) (Group number 1 - Default model)
MUSEX
OVERALL
TECHNQ
...
.016
MUSEX
...
...