Developing standards for empirical examinations of evaluation theory

Forum
Developing Standards for
Empirical Examinations of
Evaluation Theory
American Journal of Evaluation
31(3) 390-399
ª The Author(s) 2010
Reprints and permission:
sagepub.com/journalsPermissions.nav
DOI: 10.1177/1098214010371819
http://aje.sagepub.com
Robin Lin Miller1
Keywords
evaluation theory, evaluation practice, practice-theory relationship, research on evaluation
Evaluation scholars have long called for research on evaluation (Mark, 2001, 2003; Shadish, Cook,
& Leviton, 1991; Smith, 1993; Worthen, 2001) to provide an empirical basis for improving its
theory and practice. Although calls to investigate evaluation have struck a chord in some quarters
of the evaluation community, with the exception of research in the area of evaluation use, these calls
have been infrequently answered. Specific frameworks and guidance on how to study evaluation
have seldom been offered, which may contribute to the relative dearth of research on evaluation.
This article will focus on one area in which research on evaluation is sorely needed, the relationship
between theory and practice. In this article, I develop a preliminary framework for studying critical
aspects of the value of theory to practice.
Why Study Theory’s Use in Practice?
Evaluation theories are intended to provide evaluators with the bases for making the myriad of
decisions that are part of designing and conducting an evaluation. Evaluation theories provide practitioners with ideological perspectives on evaluation (Smith, 2007), sensitizing concepts to guide
practice and, to varying degrees, with specific guidance on matters such as defining the appropriate
role of the evaluator in relationship to the evaluand and to individuals in the settings which house it
(e.g., Ryan & Schwandt, 2002; Skolits, Morrow, & Burr, 2009); selecting evaluation questions and
pairing these with methods (e.g., Greene, 2007; Mark, Henry, & Julnes, 2000; Rossi, Lipsey, &
Freeman, 2004); determining whose informational needs are to be met via the evaluation (e.g.,
Abma & Stake, 2001; Greene, 1997; Mark & Shotland, 1985); selecting who may participate in
shaping the direction of the evaluation and in what fashion (e.g., Cousins & Earl, 1992; Cousins
& Whitmore, 1998; Fetterman, 1994); and identifying when, how, and to whom evaluation findings
are to be disseminated with what purpose (e.g., Patton, 2008; Preskill & Torres, 1999a, 1999b).
1
Michigan State University, MI, USA
Corresponding Author:
Robin Lin Miller, Department of Psychology, Michigan State University, East Lansing, MI, 48824, USA
Email: [email protected]
390
Miller
391
Taking theoretical prescriptions seriously should result in evaluations that are markedly different on
multiple dimensions, including the consequences of doing evaluation in a particular fashion.
By way of example, extending an idea developed by William Shadish and Nick Smith in course
syllabi they each developed in the 1990s, I developed a class assignment in which graduate students
in my program evaluation course had to adopt the role of a leading evaluation theorist. Choices of
theorist included influential scholars such as Michael Scriven, Robert Stake, Michael Quinn Patton,
Yvonna Lincoln, and Carol Weiss, among others. Over the course of the semester, each student had
to conduct an evaluation of precisely the same evaluand following the prescriptions of their theorist
closely. To prepare to design and conduct their evaluations, students relied primarily on their chosen
theorists’ prescriptive writing about why and how evaluation ought to be done, rather than on reading reports of actual evaluations that may have been conducted by those theorists. That is, I pushed
students to envision evaluations based on what theorists say one ought to do, rather than on what
theorists may actually do.
The evaluations that students produced each time I taught this course were strikingly different on
matters such as questions posed, methods applied, information generated, perceived utility of the
evaluation by program stakeholders, nature of judgments about the program, and so on. The students’ observations on how easy or difficult it was to follow the prescriptions and how clear the theorists were regarding the details of practice also varied widely.
This simple exercise highlighted a point made by Smith and others: Sorting through theories and
determining their ultimate feasibility and merit would benefit by close empirical examination of how
evaluation theories can be and are applied in practice, whether they consistently and reliably lead to
successful evaluation, and under what circumstances ‘‘good’’ evaluations are likely to emerge. As
Smith notes,
Theorists present and advocate theories largely in abstract conceptual terms, seldom in concrete terms
based on how the theories would be applied in practice. We need to know how practitioners articulate
or operationalize various models or theories, or whether, in fact, they actually do so. Indeed, it is not clear
what is meant when an evaluator claims to be using a particular theoretical approach. (Smith, 1993, p.
240)
Just as understanding practice can provide a basis for developing theory, Smith argues that studies of
practice and the use of theory in practice can also form the building blocks for developing stronger
evaluation theory (see also Shadish et al., 1991).
Although the benefits to evaluating how theories perform in practice seem obvious, there have
been few attempts to examine theories in this way. We also lack well-developed frameworks for
considering how theories might be examined empirically. Prior attempts to evaluate whether and
how evaluation theories are put to practice suggest an emergent framework for empirically exploring
how theory informs practice and whether particular theories of practice yield better evaluations. In
the remainder of this article, I will propose a set of criteria to guide research on evaluating the theory practice relationship. I will illustrate the value of each criterion by drawing upon published
studies that have investigated theory practice congruence and divergence.
Criterion 1: Operational Specificity
For a theory to be used in practice, it must translate into clear guidance and sensitizing ideas for practitioners, and its theoretical signature must be recognizable. Specific guidance for practice may
include the normative bases for and procedural guidelines regarding when, how, and what evaluation
questions are identified and prioritized; who participates in each stage of the evaluation process;
what role the evaluator assumes; what methods are ideal; how values underlying the theory are best
391
392
American Journal of Evaluation 31(3)
enacted; and how plans for using the evaluation process and its results are considered. Empirical
evaluation of theory, therefore, requires precise articulation of the implications for practice inherent
in the theory, as well as the identification of operational ambiguities. A recognizable signature is
necessary to support the claim that any particular theory adds value to practice.
Shadish and Epstein (1987) surveyed a sample of members of the Evaluation Research Society
and Evaluation Network to examine patterns in evaluators’ approach to their practice and perceived
theoretical influences on those patterns. The questionnaire asked evaluators to report on characteristics of their training and work setting, details regarding the last evaluation that they conducted, and
questions about the influence of 21 publications that the researchers perceived as seminal work
authored by prominent theorists, such as Carol Weiss, Robert Stake, Michael Scriven, and Donald
Campbell, on the respondents’ practice.
Perhaps the most striking finding of the investigation was the overall low level of familiarity with
most of the 21 publications, calling into question the degree to which they were indeed direct influences on practice. According to the study authors, a majority of respondents were unfamiliar with
71% of the seminal writings that appeared on the survey. Thus, among a sample of individuals who
identify with evaluation enough to associate with professional evaluation societies, the theoretical
work under investigation was not readily recognized by practitioners as having influence on their
practice. Christie’s more recent study of Healthy Start evaluators in California (Christie, 2003) also
found that practicing evaluators did not report explicit connections between select evaluation theories and their practice. Although these studies suggest that the link between evaluation theory and
evaluation practice is tenuous, neither study examined in detail the various implications of specific
evaluation theories for practice.
The study of Shadish and Epstein (1987) indicates a connection between classes of theory, contexts of practice, and evaluation purposes. Shadish and Epstein identified four discernable patterns
of practice—academically oriented, service-oriented, decision-oriented, and outcome-oriented, each
of which could be predicted by training, work setting, and theoretical influences. These authors
found, for example, that evaluators who were academically and outcome-oriented (e.g., those who
favored selection of questions based on probable contribution to a substantive body of literature,
focused on causal questions oriented toward academic publication of their evaluation findings) were
most likely to perceive the work of Donald Campbell, Lee Cronbach, and Peter Rossi as influential.
By contrast, service- and decision-oriented evaluators, those who oriented toward selecting questions to meet the informational needs of a client or for accountability purposes, reported Michael
Scriven and Robert Stake as influential. Although the investigation stopped short of exploring indepth and in fine-grained fashion how evaluators used these ideas in practice and to what end, the
investigation provides initial insight into how the operationalization of evaluation theory provides
orienting guideposts for practice among the evaluators who reported familiarity with the identified
works.
Alkin and Christie (2004) use a tree metaphor to assign select theorists to one of three branches,
valuing, methods, and use. The theorists whom Shadish and Epstein (1987) found were associated
with the academically oriented outcome perspective are classified by Alkin and Christie on the
branch of their theory tree associated with knowledge construction, the methods branch. By contrast,
the theorists who were perceived as influential on service- and decision-oriented evaluators are classified on the valuing branch of the theory tree, a branch characterized by its theoretical emphasis on
articulating evaluation’s and evaluators’ roles in determining the value of social programming. The
classifications of Alkin and Christie reflect the importance of methods and values as separate orienting ideas for evaluators occupying distinct professional niches.
Empirical research on theory practice connections indicates evaluators may affiliate with broad
classes of evaluation theories. The research that has been conducted to date on the ways in which
theories may inform practice has not considered the specific prescriptive elements of theories in
392
Miller
393
detail or examined whether and how these prescriptions inform evaluators’ practice decisions. Further empirical examination of theories in practice would be facilitated by careful assessment of how
theoretical prescriptions may translate into practice guidance across the many activities, decisions,
and roles that designing and conducting evaluations entail.
Criterion 2: Range of Application
The study of Shadish and Epstein (1987) highlights another criterion on which to consider and
research theoretical prescriptions for practice. In their research, Shadish and Epstein found that particular theories appeared to have highest relevance to particular practice circumstances. That is, academicians were drawn to theories that placed a high priority on matters of method and the purpose of
evaluation as contributing to a larger knowledge base whereas evaluators outside the academy were
drawn to theories that attended more closely to issues such as determining merit and worth and
decision-oriented use. No single theory may be ideally suited to every situation an evaluator may
encounter. The empirical evaluation of theory must, therefore, consider the described limits of the
theory’s application. What are the most suitable conditions for applying the theory? When the theory
is applied under ideal circumstances, are the processes and outcomes similar to or different from
those that occur when it is applied in circumstances that are less than ideal? How adaptable is theory
across a range of conditions? Descriptively it is also important to identify under what practice circumstances and in pursuit of what evaluative questions any theory has been and can be applied.
Williams (1989) used similarity ratings to develop taxonomy of evaluation practice. Fourteen
theorists provided ratings regarding their theoretical similarity to each of the other theorists included
in the study. The theorists also rated their practice along seven dimensions. Data were analyzed
through multidimensional scaling techniques. Theorists differed along four principal dimensions:
quantitative qualitative methodological preference, accountability versus policy orientation, client
involvement versus noninvolvement, and conceptual use for unspecified users versus decisionoriented use for specific users. The map of theorists’ practice revealed only two dimensions: interpretive descriptive versus causal claims and specific versus general use. Williams concludes from
this that theorists’ prescriptions are more complex than their practice. Importantly, however, she
finds a set of theorists whose practice does not sit at the extremes of the practice dimensions and
who may be considered flexible practitioners who adapt their evaluation practice to circumstances.
Although she does not consider how adaptable their theories are, her findings point to the importance
of considering how relevant any one theoretical approach may be when one encounters the realities
of the field and the evaluator’s professional and disciplinary context.
In one of the few studies that explicitly examined how a particular theory was used in practice, a
study of empowerment evaluation, Miller and Campbell (2006) found that in a small number of
cases evaluators reported a majority of stakeholders were disinterested in and had no time for the
intensive engagement required of them to be part of an empowerment evaluation. The approach,
which relies on stakeholders to engage in an iterative and collective process of taking stock of their
goals, objectives, processes, and outcomes, could not be readily adapted to a situation in which setting members were disinclined toward its use and preferred that the evaluator conduct a different
kind of evaluation. In these cases, the evaluators reported that they had to adopt kindred participatory approaches that required less of setting members to move the evaluations forward. Thus,
empowerment evaluation’s relevance to settings in which staff cannot or will not dedicate their
effort to an evaluation may be low and other models may be of greater relevance. Miller and Campbell also found that what they called a Socratic approach to empowerment evaluation occurred more
often in single-site evaluations than in multisite evaluations and that the Socratic approach best
reflected the principles articulated by empowerment evaluation theorists, again suggesting another
393
394
American Journal of Evaluation 31(3)
boundary on the approach’s range of application. Both studies underscore the need for and benefits
to examining the contingencies governing the range of application of particular evaluation theories
and describing the limits to their application through empirical means.
Criterion 3: Feasibility in Practice
Many theories represent a set of ideals that may not be easily applied in practice. Evaluating theories
of practice should include some assessment of how easy or difficult the prescriptions for practice and
sensitizing ideas in the theory are to apply. Can an evaluator readily do what the theory requires of
him or her?
Smith (1985) examined all of the published case examples of the use of adversary and committee
hearing procedures in evaluation. His review of case examples identified a range of challenges to
using the approach including its high preparation costs, intensive management demands, and expertise requirements. Despite its appeal to democratic deliberative ideals, the infeasibility of the
approach appears to have led to its rare use. Recent debates about the appropriate use of experimental designs in evaluation frequently note that these designs are difficult to do well. The technical,
ethical, skill, and resource requirements associated with these designs have implications for the evaluation circumstances under which it is feasible to follow theoretical prescriptions that emphasize
cause and effect questions and the use of experiments to address those questions (see, for instance,
the edited collection of essays on credible evidence in evaluation by Donaldson, Christie, & Mark,
2009 and the edited collection of essays on fundamental and enduring issues in evaluation by Smith
and Brandon, 2008).
Many theorists agree that evaluation is not simply a technical activity. It is a political and social
activity too. Many of my students, having completed the exercise I described earlier, have been surprised to find that being utilization-focused evaluators would be a great deal easier to do successfully
if only they had the political savvy, seasoned expertise, and interpersonal gifts of Michael Patton!
Emerging taxonomies of core evaluator competencies (e.g., Stevahn, King, Ghere, & Minnema,
2005) also reflect the need for evaluators to be more than technicians.
Skolits and colleagues (2009) identify roles that an evaluator enacts over the course of conducting
an evaluation. The roles that they identify include manager, negotiator, detective, diplomat, judge,
reporter, learner, and researcher, among others. Theories vary in the degree to which they require
and emphasize skills associated with each of these roles and also in the extent to which they call
on the evaluator to enact few or many of these roles. Theories may also have implications for how
roles are ideally enacted and role switches performed. For example, emerging theories of culturally
competent evaluation place strong emphasis on the evaluator as a reflexive learner (e.g., Symonette,
2004). These theories expect evaluators be adept in acquiring informal knowledge about the cultural
rules and perspectives in evaluation settings and in interrogating the self in relation to those with
whom the evaluator interacts. Eisner’s connoisseurship approach (1991) requires the evaluator have
expert authority on and a specialists’ eye for the programmatic substance. Other theories (e.g., Rossi
et al., 2004) place great emphasis on formal knowledge of evaluation and the researcher role. Some
theories may have limits to their feasibility because of the nature and combination of role demands
placed on evaluators or may only be feasible for evaluators who possess particular combinations of
skills and traits.
Although technical and role aspects of evaluation models are only two examples of features that
may influence any evaluation model’s feasibility, determining whether particular theoretical prescriptions are infeasible altogether or under particular circumstances is necessary to permit informed
selection of approaches to practice.
394
Miller
395
Criterion 4: Discernable Impact
Many theories are intended to achieve impacts that result from how the evaluation is conducted. For
instance, theories have been offered to emphasize the value of evaluation to promote democratic dialogue among stakeholders (House & Howe, 1999), facilitate organizational learning (Baizerman,
Compton, & Stockdill, 2002; Preskill, 1999a, 1999b; Sanders, 2003), transform social arrangements
(Mertens, 2009; Vanderplaat, 1995), and improve evaluation influence (Patton, 2008; Wholey,
1983). A critical area of empirical assessment of theory concerns close examination of whether the
use of a particular theory actually leads to the impacts that are expected and desired and whether
unintended effects occur (see also Henry & Mark, 2003).
A principal focus of the Miller and Campbell study on empowerment evaluation (2006) concerned documenting the evidence that empowerment evaluation processes led to empowered outcomes. That is, to what extent did using empowerment evaluation empower individuals and
organizations or redress social injustice? Theoretically, if empowerment evaluation or any other
approach is implemented as intended by its developers, there should be discernable benefits because
of, due to, and linked to the approach itself. Indeed, many theories are justified on the basis of claims
about the desirable outcomes produced by applying the particular approach.
In their review, Miller and Campbell (2006) found only seven cases in which an author attempted
to evaluate the evaluation itself through systematic data collection and provided results on the outcome of using the empowerment evaluation process. They found weak evidence for claims regarding the specific benefits of empowerment evaluation, though did find some evidence that engaging
with the stakeholders and setting in a Socratic manner was associated with more reported benefits
than using the approach in other ways.
Amo and Cousins (2007) recently studied cases of process use in practice. By examining cases,
they identified three broad categories of indicators of process use, learning, behavior, and attitude.
They note, however, ‘‘The literature search conducted in the context of this study, although not
exhaustive, shows a relative paucity of empirical studies examining the concept of process use
directly or indirectly. Almost a decade after the concept was coined, there remains much opportunity
to study, question, and substantiate process use.’’ (Amo & Cousins, 2007, p. 21). They go on to note
the weakness of the empirical evidence regarding process use and the procedures for encouraging it.
Criterion 5: Reproducibility
An important component of determining the impact of evaluation theories is whether any impacts
that are observed can be reproduced over time, occasions, and evaluators. It therefore becomes
essential to know what diverse evaluators actually do when they claim to employ an approach,
whether their implementation of that approach approximates the standards set for it, and whether
the approach can achieve its intended outcomes in diverse evaluators’ hands.
In the study of empowerment evaluation, Miller and Campbell (2006) found that despite its comparatively clear operational guidelines (e.g., broad community and stakeholder involvement in all
aspects of the evaluation, a taking stock process, evaluator acting as coach), case reports indicated
wide variation in how the approach was implemented. For instance, Miller and Campbell identified
one case in which there was no stakeholder input in any aspect of the evaluation. In roughly a third of
the cases, only staff members were involved in the evaluation and in limited ways, such as reviewing
measures selected by the evaluator. Although it is reasonable to expect that those who use an
approach may not follow it prescriptively, the wide variation Miller and Campbell found indicated
that in a majority of cases evaluators had not reproduced a process that was recognizable as empowerment evaluation. Is this because the approach is difficult to reproduce?
395
396
American Journal of Evaluation 31(3)
The data of Miller and Campbell (2006) are not adequate to address this question fully, but their
data provide some evidence that the approach is reproducible. Miller and Campbell identified cases
conducted by evaluators other than Drs. Fetterman and Wandersman who are the originators of
empowerment evaluation, in which relatively close adherence to the principles and procedures Fetterman and Wandersman prescribe was evident. Some evaluators did reproduce the approach successfully. Because theories are to be used by evaluators other than their inventors, examination of
whether evaluators can reproduce the approach and its outcomes are essential. Close examination
of the reproducibility of theories may help to categorize theories regarding the degree to which they
are primarily useful as sensitizing ideologies or sources of practical guidance on carrying out aspects
of evaluation.
Better evaluations, defined by the terms of the various theories, are a major reason for theories to
have initially developed. In crafting their theories, theorists set out to improve evaluation and leverage its impact. Research on evaluation should investigate the impacts of following various
approaches to practice, the extent to which these lead to the qualities of an evaluation for which theorists’ hope, and the degree to which these impacts may be produced consistently.
Consequences of Criterion-Based Investigation of Evaluation Theory
The evaluation profession must gain a better understanding of the requirements needed to concretize, measure, and test the effects of evaluation frameworks and procedures empirically. The criteria
I have proposed here represent an initial step toward articulating what we might do to move forward
research in this area to generate a solid descriptive account of the relationship between real-world
practice and evaluation theory.
The criteria I propose complement the recent general framework for research on evaluation practice proposed by Mark (2008). Mark identifies four general categories of inquiry: context, process,
consequences, and professional issues. He proposes general subcategories of investigation in each
area, such as researching the societal and organizational context in which evaluations occur and suggests potential modes of inquiry for research in each area. The criteria I propose here provide a specific way in which to engage Mark’s framework to include consideration of the practical merits of an
evaluation theory across Mark’s inquiry domains. These criteria provide a point of departure for
addressing a more specific set of research questions on evaluation theory on evaluation theory and
building the empirical base to inform the refinement of contingent theories of practice (see Shadish,
1998; Shadish et al., 1991).1
Employing the framework I have proposed also facilitates posing questions about the theory
practice relationship that are not yet well explored. Questions that emerge from this framework
include those that consider how each of these features of a theory–operational clarity, range of application, implementation feasibility, evaluation impact, and reproducibility may relate to one another.
For instance, how do impacts vary along dimensions of range of application? The framework may
also assist evaluators to identify questions about how aspects of a theory may relate to the domains of
factors identified by Mark. Do experts achieve higher reproducibility than novices, regardless of the
operational clarity of these approaches? How does combining approaches affect the expected
impacts of each approach? Are theories with particular features, such as being high on operational
clarity or feasibility, more or less influential on and used in particular practice contexts? These criteria also provide a unique framework against which theories may be classified and one that may be
of particular benefit to practitioners. By understanding that theories have optimal ranges of application, do or do not combine well with other approaches, or present challenging role demands, practitioners may be better able to sort through and discern how to take the benefits of a range of theories
396
Miller
397
to daily practice and theorists may be pushed to consider the practice components of their theories in
new and more refined ways.
Applying criteria such as I have described has important implications for what evaluators report
in describing practice experiences and also for methodological criteria for selecting and assessing
evaluations in which particular theories have been applied. Perhaps most obvious, cases must be
described with adequate detail to allow others to study them. Details of importance would include
clear statements of the evaluation setting, evaluation purpose, rationale for applying the theoretical
approach, articulation of how the theory was enacted in the particular case, descriptions of all actors
and their roles, a chronological event history of the evaluation, and information on what outcomes
were expected to accrue from applying the approach and when and how these were substantiated.
These criteria highlight the importance of acquiring multiple case examples in which evaluators
believe that they have translated a theory to practice for an adequate sample of cases to be available
for study. Additionally, they point to the need for prospective evaluation of evaluations to become a
standard of practice.
In the real world of practice, evaluators may not use any theory as more than a vague map to facilitate and guide their action. Evaluators may also use combinations of theories rather than apply any
theory in pure fashion. And, the demands of the particular evaluation situation may predominate
over theoretical prescriptions, preferences, and training. Yet, evaluators have invested heavily in
theorizing how to maximize the success of evaluations. Systematic and rigorous research on these
theories can provide essential information in the development of an evidence base for a theoretically
rooted evaluation practice, as well as provide the evidentiary base for the development of practicebased theory.
Note
1. These criteria are distinct in purpose from the criteria articulated for evaluating evaluations in so far as most
meta-evaluation addresses whether evaluations meet practice standards such as the Joint Committee Standards. Meta-evaluations do not typically consider the quality of the theory that underlies the specific practice
instance under consideration.
Acknowledgments
The author thanks Melvin M. Mark, Nicholas Smith, Karen E. Kirkhart, Rebecca Campbell, and
Miles A. McNall for their helpful comments and suggestions on prior drafts of this essay.
Declaration of Conflicting Interests
The author(s) declared no conflicts of interest with respect to the authorship and/or publication of
this article.
Funding
The author(s) received no financial support for the research and/or authorship of this article.
References
Abma, T. A., & Stake, R. E. (2001). Stake’s responsive evaluation: Core ideas and evolution. New Directions
for Evaluation, 92, 7-22.
Alkin, M. C., & Christie, C. A. (2004). An evaluation theory tree. In M. C. Alkin (Ed.), Evaluation roots: Tracing theorists’ views and influences. Thousand Oaks, CA: SAGE.
Amo, C., & Cousins, J. B. (2007). Going through the process: An examination of the operazionalization of process use in empirical research on evaluation. New Directions for Evaluation, 116, 5-26.
The art, craft, and science of evaluation capacity building. Baizerman, M., Compton, D. W., & Stockdill, S. H.
(Eds.). (2002). New Directions for Evaluation, 93.
397
398
American Journal of Evaluation 31(3)
Christie, C. A. (2003). What guides evaluation? A study of how evaluation practice maps on to evaluation theory. New Directions for Evaluation, 97, 7-36.
Cousins, J. B., & Earl, L. M. (1992). The case for participatory evaluation. Educational Evaluation and Policy
Analysis, 14, 397-418.
Cousins, J. B., & Whitmore, E. (1998). Framing participatory evaluation. New Directions for Evaluation, 80,
5-23.
Donaldson, S. I., Christie, C. A., & Mark, M. M. (Eds.). (2009). What counts as credible evidence in applied
research and evaluation practice? Thousand Oaks, CA: SAGE.
Eisner, E. (1991). The enlightened eye. New York, NY: MacMillan.
Fetterman, D. (1994). Empowerment evaluation. Evaluation Practice, 15, 1-15.
Greene, J. C. (2007). Mixed methods in social inquiry: research methods for the social sciences. San Francisco,
CA: Jossey-Bass.
Greene, J. C. (1997). Evaluation as advocacy. Evaluation Practice, 18, 25-35.
Henry, G. T., & Mark, M. M. (2003). Toward an agenda for research on evaluation. New Directions for Evaluation, 97, 69-80.
House, E. R., & Howe, K. R. (1999). Values in evaluation. Thousand Oaks, CA: SAGE.
Mark, M. M. (2008). Building a better evidence for evaluation theory: Beyond general calls to a framework of
types of research on evaluation. In N. L. Smith & P. Brandon (Eds.), Fundamental issues in evaluation
(p. 111 134). New York, NY: The Guilford Press.
Mark, M. M. (2003). Toward an integrative view of the theory and practice of program and policy evaluation. In
S. I. Donaldson & M. Scriven (Eds.), Evaluating social program and problems: Visions for the new
millennium (p. 183 204). Mahwah, NJ: Lawrence Erlbaum Associates.
Mark, M. M. (2001). Evaluation’s future: Furor, futile, or fertile? American Journal of Evaluation, 22,
457-479.
Mark, M. M., Henry, G. T., & Julnes, G. (2000). Evaluation: An integrated framework for understanding, guiding, and improving policies and programs. San Francisco, CA: Jossey Bass.
Mark, M. M., & Shotland, R. L. (1985). Stakeholder-based evaluation and value judgments. Evaluation Review,
9, 605-626.
Mertens, D. M. (2009). Transformative research and evaluation. New York, NY: The Guilford Press.
Miller, R. L., & Campbell, R. (2006). Taking stock of empowerment evaluation: An empirical review. American Journal of Evaluation, 27, 296-319.
Patton, M. Q. (2008). Utilization-focused evaluation (4th ed.). Thousand Oaks, CA: SAGE.
Preskill, H., & Torres, R. T. (1999a). Building capacity for organizational learning through evaluative inquiry.
Evaluation, 5, 42-60.
Preskill, H., & Torress, R. T. (1999b). Evaluative inquiry for learning in organizations. Thousand Oaks, CA:
SAGE.
Rossi, P. H., Lipsey, M. W., & Freeman, H. E. (2004). Evaluation: A systematic approach (7th ed). Thousand
Oaks, CA: SAGE.
Ryan, K. E., & Schwandt, T. A. (Eds.). (2002). Exploring evaluator role and identity. Greenwich, CT: Information Age Publishing.
Sanders, J. R. (2003). Mainstreaming evaluation. New Directions for Evaluation, 99, 3-6.
Shadish, W. R. (1998). Evaluation theory is who we are. American Journal of Evaluation, 19, 1-19.
Shadish, W. R., Cook, T. D., & Leviton, L. C. (1991). Foundations of program evaluation: Theories of practice.
Newbury Park, CA: SAGE.
Shadish, W. R., & Epstein, R. (1987). Patterns of program evaluation practice among members of the
Evaluation Research Society and Evaluation Network. Evaluation Review, 11, 555-590.
Skolits, G., Morrow, J., & Burr, E. (2009). Evaluator responses to evaluation activities and demands: A
re-conceptualization of evaluator roles. American Journal of Evaluation, 30, 275-295.
Smith, N. L. (2007). Empowerment evaluation as ideology. American Journal of Evaluation, 28, 169-178.
398
Miller
399
Smith, N. L. (1993). Improving evaluation theory through the empirical study of evaluation practice. Evaluation Practice, 14, 237-242.
Smith, N. L. (1985). Adversary and committee hearings as evaluation models. Evaluation Review, 9, 735-750.
Smith, N. L., & Brandon, P. (Eds.) (2008). Fundamental issues in evaluation. New York, NY: The Guilford
Press.
Stevahn, L., King, J. A., Ghere, G., & Minnema, J. (2005). Establishing essential competencies for program
evaluators. American Journal of Evaluation, 26, 43-59.
Symonette, H. (2004). Walking pathways toward becoming a culturally competent evaluator: Boundaries, borderlands, and border crossings. New Directions for Evaluation, 102, 95-109.
Vanderplaat, M. (1995). Beyond technique: Issues in evaluating for empowerment. Evaluation, 1, 81-96.
Wholey, J. (1983). Evaluation and effective public management. Boston, MA: Little John.
Williams, J. E. (1989). A numerically developed taxonomy of evaluation theory and practice. Evaluation
Review, 13, 18-31.
Worthen, B. R. (2001). Whither evaluation? That all depends. American Journal of Evaluation, 22, 409-418.
399