1 In Search of a Balanced canadIan federal evaluatIon functIon

The Canadian Journal of Program Evaluation Vol. 26 No. 2 Pages 1–45
ISSN 0834-1516 Copyright © 2012 Canadian Evaluation Society
1
In Search of a Balanced Canadian
Federal Evaluation Function:
Getting to Relevance
Robert P. Shepherd
School of Public Policy and Administration, Carleton University
Ottawa, Ontario
Abstract: In April 2009, the Treasury Board Secretariat enacted a new
Evaluation Policy replacing the previous 2001 version. This new
policy has generated much discussion among the evaluation
community, including the criticism that it has failed to repair the
many shortcomings the function has faced since it centralized
in 1977. This article reviews the history of the federal function
as to why shortcomings persist and makes two assertions. First,
if program evaluation is going to maintain its relevance, it will
have to shift its focus from the individual program and services
orientation to understanding how these programs and services
relate to larger public policy objectives. Second, if program evaluation is to assume a whole-of-government approach, then evidentiary forms must be constructed to serve that purpose. The
author makes the argument that evaluation must be far more
holistic and calibrative than in the past; this means assessing
the relevance, rationale, and effect of public policies. Only in this
way can the function both serve a practical managerial purpose
and be relevant to senior decision-makers.
Résumé:En avril 2009, le Secrétariat du Conseil du Trésor édictait une
nouvelle politique d’évaluation remplaçant la version antérieure
de 2001. Cette nouvelle politique a suscité de grands débats
dans le milieu de l’évaluation. D’après certaines critiques, elle
a omis de combler les nombreuses lacunes qui affectent la fonction depuis sa centralisation, en 1977. Dans cet article, l’auteur
relate l’histoire de la fonction fédérale afin de trouver des motifs
à la persistance de ces lacunes et formule deux affirmations à
cet égard. Premièrement, si l’on veut maintenir la pertinence
de l’évaluation des programmes, il faudra cesser de l’orienter
uniquement sur les programmes et les services, afin de comprendre les liens entre ces programmes et ces services et les
objectifs élargis des politiques publiques. Deuxièmement, si
l’on souhaite adopter une démarche étendue à l’ensemble du
Corresponding author: Robert P. Shepherd, School of Public Policy and
Administration, Carleton University, 1125 Colonel By Dr., 5126RB, Ottawa, ON,
Canada K1S 5B6; <[email protected]>
The Canadian Journal of Program Evaluation
2
gouvernement en matière d’évaluation de programmes, il faut
créer des formes de preuve qui favorise l’atteinte de cet objectif.
D’après l’argument de l’auteur, l’évaluation doit revêtir un caractère nettement plus holistique et calibratif que par le passé.
Par conséquent, il faut évaluer la pertinence, le fondement, et
l’effet des politiques publiques. Ce n’est qu’à cette condition que
la fonction servira à des fins de gestion pratique et se révélera
également utile aux principaux décideurs.
INTRODUCTION
“Design is not just what it looks like and feels like. Design is how it works.” Steve Jobs (Walker, 2003)
Policy and form matter. Canada’s federal government
evaluation policy and function has undergone a great deal of change
since its introduction in 1977. Much of this is due to shifting expectations about the purpose of program evaluation over different time
periods. However, at a very basic level, “the primary aim of evaluation is to aid stakeholders in their decision making on policies and
programs” (Alkin, 2004, p. 127). That is, evaluation must not only
inform grants and contribution program decisions on matters of
program theory, coherence, approach, and results, it must ultimately
serve to provide evidence that government-wide decisions are the
most effective at resolving or contributing to a specified public problem defined by elected officials. Likewise, program evaluation serves
a management function by supporting program improvement and
design, and it serves central agency purposes by demonstrating accountability for the distribution and use of public resources. There
are several other uses, but in public terms the function supports a
variety of actors under various conditions, contexts, and degrees of
complexity.
Likewise, decision-makers’ expectations from evaluation have tended
to shift between various evidentiary forms to support fiscal prudence (i.e., the cautious appropriation of public resources relative
to revenues, Canada, 1997b, p. 6; McKinney & Howard, 1998, pp.
373–374), programmatic effectiveness (i.e., the extent to which a
program is achieving expected outcomes, Centre for Excellence for
Evaluation [CEE], 2009b, Appendix A), or political responses to current or emergent public problems (Aucoin, 2005). Maintaining some
balance among these competing governmental policy and program
concerns and evidentiary forms continues to pose challenges in the
field. As any one of these governmental concerns and forms takes
La Revue canadienne d’évaluation de programme
3
priority or precedence at any given time, program evaluation as a
function must continuously struggle to find its place and determine
its relevance for the many actors who both produce and use evaluation outputs. The fact is that political priorities can and often do
drive line and functional forms and priorities including evaluation
and other oversight activities.
Essential decision-making in government is a complex process in
which evidence from prescribed systematic research and practical
experience from performing government programs and services
mixes with other governmental functions and processes that include
ideation, ideology, public and private interests, institutions, and individuals. The combination of these various systems, processes, and
environmental variables form the basis of decisions taken by political
and administrative actors. Given the social, economic, and political
circumstances at the time, and the political regimes in place, the dependence on different types of evidence will predominate. The questions being posed by political and administrative leaders will vary
over time according to the circumstances, thereby determining the
weight attached to evidence. Program evaluation, like other research
forms, adapts to changing circumstances the best it can to support
what constitutes good governance.
Governments have come and gone, and with them differing conceptions about the most suitable role for various checking and oversight
functions including program evaluation. Some governments have
placed greater emphasis on whole-of-government policy and decisionmaking that responds to public needs, others have considered sound
delivery of programs and services to be paramount, while still others have emphasized economic performance and management to be
pivotal aspects of good governance. With these shifts in matters of
significance, the importance and role of evaluation has waxed and
waned, but evaluators have always tried to demonstrate its value by
providing the advice demanded at the time.
In April 2009, the Treasury Board Secretariat enacted a new Policy
on Evaluation (CEE, 2009b) to replace the 2001 version. This new
policy has generated much discussion among the evaluation community, in much the same way that previous iterations did. Again, the
outcry from the community is that the policy has failed to repair the
several challenges the function has faced since it centralized in 1977.
As in the case of many jurisdictions including the UK, Australia and
the United States, evaluation has had less than stellar impact on
4
The Canadian Journal of Program Evaluation
financial, programmatic, and strategic decisions within these jurisdictions, but this time the buzz from the community is that this is
likely the last chance evaluation will get to demonstrate its value to
political decision-makers (Zussman, 2010). In Canada, as in many
western nations, the overwhelming concern of governments has been
about economic and fiscal prudence.
Given these comments on governmental priorities and evidentiary
forms, this article attempts to highlight two assertions. First, if program evaluation is going to maintain its relevance, it will have to
shift its focus from mainly grants and contributions type programs
and service orientation to understanding how these programs and
services relate to larger public policy objectives. In other words, what
has been traditionally understood as “program evaluation” must
take a whole-of-government perspective and understand more fully
that government is more than the sum of its parts. It must understand whether public policies, institutional arrangements and frameworks, program architectures, regulations and other instruments,
and resource allocations are working cohesively toward common
public policy objectives. Second, if program evaluation is to assume
a whole-of-government approach, then evidentiary forms must also
be constructed to serve that purpose. The values school of evaluation
is premised on the idea that the principal aim of evaluation as a field
must appropriately assign value or merit to government policies, programs, processes, and ultimately results (Alkin, 2004, pp. 12–17). The
criteria upon which it assigns such values will necessarily frame the
inquiry. The 2009 Policy on Evaluation uses a particular set of value
criteria that supports central agency concerns for fiscal prudence and
accountability. In my view, these value criteria are likely to devalue
evaluation to the role of central audit, and may do more damage than
good to the federal function.
In order to appreciate these two assertions, it is important to briefly
review how federal evaluation has changed over time, and how the
function has responded broadly to changing circumstances at the
time. Likewise, evidentiary forms have also shifted to suit the roles
demanded of the function. As such, the first section describes the
evolution of federal evaluation in Canada from roughly 1970 to 2000.
It will attempt to highlight historical emphases or preferences in the
use of evaluation and the rationale for shifts in those preferences.
Some of these shifts are rooted in the new public management and
results-based management, while others were simply political preferences to support government initiatives.
La Revue canadienne d’évaluation de programme
5
The second section sheds some light on characteristics of the new
evaluation policy by first tracking the federal government’s reform
initiatives since 2000. This section will show that the Canadian
policy is leading the function toward supporting centralized politicallevel directives in fiscal prudence and accountability at the expense
of impartial and strategic assessments of programs to inform metalevel decision-making.
The last section provides some overall assessments about the future
of federal evaluation as it extends from the new policy. It questions
evaluation and its usefulness to support overall governmental decision-making. It also attempts to provide some thoughts on how the
evaluation function can best contribute to governmental concerns for
not only accountability, but also policy and program coherence (i.e.,
whether policies and programs as a system of government actions
address meta-level problems such as poverty reduction or building technological and research capacity), budgetary and strategic
planning (i.e., sound financial planning), and program performance
(i.e., the extent to which effectiveness, efficiency, and economy are
achieved by a program; CEE, 2009b, Appendix A). Is this direction
toward an audit-based culture premised on fiscal prudence and accountability likely to be sustainable, or even desirable?
Some Critical Notes on Language and Assumptions
It is important to set the theoretic context of this article, given that
the academic literature and government policy are not always certain about the meaning of terms nor the significance attached to
ideas. First, I take the view of Alkin that evaluation research is
applied social research (Alkin, 2004, p. 127). That is, evaluation
consists of the application of various social research methods to
provide credible information that can aid in forming public policy,
designing programs, and assessing effectiveness and efficiency of
social policies and programs. Evaluation is established to assess
policy-oriented questions that provide guidance on how best to address public problems. This view is consistent with Rossi, Lipsey, and
Freeman’s understanding that “evaluation is the systematic collection and analysis of evidence on the outcomes of programs to make
judgments about their relevance, performance, and alternative ways
to deliver them or to achieve the same results” (2004, p. 4). I use this
particular construct, because it is the one used in the 2009 Policy on
Evaluation (CEE, 2009b, p. 3.1).
6
The Canadian Journal of Program Evaluation
Second, for the purposes of this article, “program evaluation” is considered separately from the larger concept or field of “evaluation.”
Program evaluation refers to “a group of related activities that are
designed and managed to meet a specific public need and are often
treated as a budgetary unit” (CEE, 2009b, Appendix A). In other
words, programs may be part of larger-order strategies, initiatives,
or institutional/corporate-level projects that may comprise several
individual programs. As such, programs are usually administered by
mid-level managers (e.g., EX-01/02 in the federal Canadian system).
The strategic users of evaluation products are generally those at the
senior management ranks including the deputy head and assistant
deputy ministers.
Third, the purpose of any central policy framework such as the
Treasury Board’s Evaluation Policy is to provide guidance on how
the function ought to be directed across departments and agencies. In this context, centralized evaluation policies aim to bridge a
number of key considerations, including identifying the appropriate
evaluand (e.g., tasks, projects, programs, strategies/initiatives, policies); the purpose of evaluation; identifying the appropriate methods of evaluation; determining the timing of evaluations and their
type; identifying the appropriate evaluator competencies to bring
to bear, including whether it ought to be internally or externally
driven; the ethics associated with the research; and the budgeting
constraints. The requirements and preferences of central agents
regarding each of these aspects will ultimately determine how the
function is understood by both those carrying out evaluation studies and those using them. It is these requirements and preferences,
and how these have changed over time, that is the subject of this
article. I contend that the federal function has gradually moved
away from the strategic uses of evaluation for program effectiveness and political responsiveness to central agency concerns for essentially fiscal prudence and accountability, some of the reasons for
which will be discussed.
Fourth, evaluation has traditionally tended to be multifaceted in the
sense that it asks several different types of questions, aimed at different dimensions of evaluands and public actors. That is, evaluation
is commonly concerned with matters of rationale and relevance (why
programs were created and whether these meet identified needs),
program delivery (whether program design matches implementation), program theory (whether theories of change are supported
by evidence), program efficiency (understanding costs of producing
outputs), program effect (attributing effects to the program), and
La Revue canadienne d’évaluation de programme
7
program improvement (alternatives that can be considered that are
more cost-effective or efficient, and whether these are more effective in terms of results achieved). Program monitoring (or program
performance) is generally concerned more with matters of program
implementation and the efficient production of outputs. I maintain
that the federal function has moved toward this latter focus at the
expense of effectiveness evaluation.
Finally, the purpose of evaluation has changed over time. Although
this has already been implied, I think it should be made quite explicit. This is not to suggest that “the good old days” are gone and we
are left with something inferior in the new policy. On the contrary,
the federal function has learned from its history and is indeed attempting to make the function relevant in today’s context when
fiscal prudence and management drives virtually every decision.
This has meant that the function has had to adapt. The challenge is
whether the function is adapting in a way that ensures its continued
relevance and strategic usefulness.
EVOLUTION OF EVALUATION IN CANADA: GETTING TO
RELEVANCE
Framing the Function
Evaluation policies and forms vary from country to country based on
their history, institutions, culture, forms of government, and degree
to which they are held as important or relevant by decision-makers.
The extent to which these policies and forms can be translated into
the regular public management have been widely debated in an attempt to find some common insights into how to make these evaluation policies and their resultant frames more relevant to those who
use evaluation products (Furubo, Rist, & Sandahl, 2002; Mayne,
2006; Pollitt & Bouckaert, 2004). Evaluation functions were created
in several jurisdictions for generally three, not necessarily overlapping, reasons: to assist political decision-makers in making strategic policy decisions; to demonstrate fiscal prudence, efficiency, and
accountability (Good, 2003)1; and to provide decision-makers with
information as to whether the right programs were in place and addressing the right problems effectively. In Canada, attempts have
been made at different times to find a balance among these. However,
as is often the case, one or more of these rationales has predominated
at any given time, leading to a function that has continued to struggle to find its appropriate and legitimate place in the federal schema.
8
The Canadian Journal of Program Evaluation
In response to these priorities at different times, the federal function
has evolved and federal evaluators have debated the appropriate
degree of centralization, and how to position themselves in departments with reasonable independence and control. Evaluators have
also debated appropriate evaluation methodologies ranging from
positivist ideas centring on quantitative research rigour and attribution to more post-positivist and realist ideas of creating an accurate
or realistic picture of a program’s contribution positioned according
to priorities at the time. There have also been competing “visions” of
program evaluation in Canada from program-specific applications to
more holistic or institutional contributions. Finally, there have been
debates about the appropriate purpose of evaluation ranging from
those who argue use is critical (e.g., Patton, 1978; Stufflebeam, 1983);
to those who maintain that the assignment of value or merit criteria
to programs is key (e.g., Alkin, 2004, 2012; Scriven, 1974, 1978; Stake,
2003); to those who believe that the rigour of methodology must prevail (e.g., Campbell, 1957; Rossi & Freeman, 1985; Tyler, 1942; Weiss,
1972). This schema of debates has had significant influence on the
progression of the field generally. However, they have also tended
to divide practitioners, much more so than audit or other oversight
functions, in ways that have left strategic decision-makers with a
great deal of space to define the practice of evaluation over time.
In Canada, evaluation as a field has been influenced by its history,
including the pre-centralization period prior to 1977. Evaluation
was always a part of the federal system, and deputy ministers were
responsible and held to account not only for financial scrutiny, but
also the efficiency and effectiveness of ministry operations. However,
evaluation was regarded in very different ways as an assessment tool
prior to centralization in 1977.
The Office of the Auditor General, established in 1878, was responsible mainly for actuarial assurance, but beginning in the 1960s more
information on effectiveness was demanded that spoke to how well
programs were delivering on governmental policy promises, rather
than simply on financial and procedural efficiency. More than any
other initiative, the Royal Commission on Government Organization
(Glassco Commission; Canada, 1962) framed evaluation at this time.
Departments were not tracking spending and their effect under the
existing centralized expenditure management system established in
1931. The commission recommended decentralizing financial management authority to departments in order to “let the managers
manage” (Canada, 1990) and evaluation was regarded as a means to
achieve greater program accountability. As such, the Treasury Board
La Revue canadienne d’évaluation de programme
9
directed departments to better monitor and evaluate their programs
(Office of the Auditor General of Canada [OAG], 1975) using the Program, Planning, and Budgeting System (PPBS) in 1968, and its successor, the Program Expenditure Management System (PEMS), in
1978. Evaluation was geared to support program accountability and
control at the departmental level and to validate program expenditure data using new quantitative and qualitative research methods
(Stokey & Zeckhauser, 1978).
In addition to the Glassco recommendations, the Auditor General at
the time, Maxwell Henderson (1960–1973), was reporting whether
programs were effective, taking a more holistic approach and adding his own comments on the integrity of government programming
using a government-wide lens (Henderson, 1984).2 Such comments
drew considerable attention to the office and attracted the ire of government (Segsworth, 1990).3 In addition to these developments, the
merits of proposals were assessed by the Cabinet as a whole, rather
than individually, and formalized as memoranda to Cabinet. This
system was created in response to the emergence of powerful departments from the Depression to the post-war period. Under this system,
Cabinet almost always deferred to ministers and their departments
in deliberating on program proposals. There appeared to be a symbiotic relationship between effectiveness evaluation and Cabinet decision-making. This symbiosis was based, to some extent, on confidence
in a highly technocratic and professional public service.Despite these
advances, there were new calls for greater program accountability
given the push to decentralization. These were early signs toward
an evaluative function framed to support program accountability
and performance. This was an important development as precision
was brought to bear on the meaning of “program.” Prior to 1969, a
program was any activity of government on a small or large scale.
With the Planning Programming and Budgeting Guide, program was
defined to mean a “collection of activities having the same objective
or set of objectives” (Treasury Board Secretariat [TBS], 1969, p. 2).
The addition of several new social programs at this time required
a lens to assess innovation and effectiveness, thereby moving the
function from a macro orientation to one that essentially focused on
individual program initiatives—an orientation that has persisted to
this day. In addition, evaluation was being asked to assess program
accountability and performance of these new social programs (Jordan
& Sutherland, 1979, p. 586), which set a new tone for micro-level evaluation. It also set high expectations to assess both effectiveness and
also program performance. The function struggled with both aspects,
especially given that the evaluation community at this time was
10
The Canadian Journal of Program Evaluation
steadfast in its desire to separate these. Given these differences, calls
were being made by departments to centralize and formalize evaluation as a corporate function in order to ensure a consistent approach.
The fear in the evaluation community was that centralization would
move the orientation of evaluation from assisting strategic ministry
decisions to supporting central concerns for accountability.
Experimenting Within the Field: 1970 to 1980
The 1970s were defined by a number of developments framing accountability of programs. First, the Operational Performance Measurement System (OPM) was approved by the Treasury Board in
1973, which required departments to provide performance data to
the Treasury Board before the 1978 forecast. The system was the
foundation of the evaluation framework and the direct input to the
performance measurement requirements (TBS, 1976, p. 4), which
directed departments to improve their evaluation function in order that they would be able to use “adequate and reliable means,
wherever feasible, for performance measurement” (TBS, 1976, p.
5). The function was directed to institute measures mainly of program efficiency related to resource usage and service quality, and
operational or procedural effectiveness for grants and contributions
type programs. Finally, the Treasury Board was given the mandate
to monitor the progress of evaluation units and the quality of their
submissions as these supported the OPM.
While these changes were occurring at the central agency level, the
Auditor General, J. J. Macdonell, released a blistering report of the
government, stating in his 1976 report that “the Government …
has lost, or is close to losing, effective control of the public purse …
financial management and control in the Government of Canada is
grossly inadequate” (OAG, 1976, p. 1.2). This report led in part to the
creation of the Royal Commission on Financial Management and
Accountability in 1976 (Lambert Commission; Canada, 1976), which
reported in March 1979. Where Glassco recommended a strategy
of letting the managers manage, Lambert concluded that central
controls be improved. Specifically, he proposed the creation of a fiscal plan for government, covering five-year periods, which would
see the federal government allocate resources according to centrally
derived priorities, but within the limits of revenues—the beginning
of government concern for mainly fiscal prudence. These plans would
be set jointly by the Financial Secretariat of the Board of Management (Treasury Board), Privy Council Office, and the Department of
Finance, which would assume the lead role for budgeting. The fiscal
La Revue canadienne d’évaluation de programme
11
plan was expected to restore financial accountability by ensuring
central guidance of funding priorities and the setting of spending
ceilings at the departmental level (Sutherland, 1986, 119–124).
Such events contributed to the creation of the first formalized evaluation policy in 1977 with earmarked resources attached from the
Treasury Board (TBS, 1977). In this respect, Canada was second
only to the United States in formally embracing centralized program
evaluation into its machinery. Evaluation units were established,
which reported directly to the heads of departments and agencies.
The intent of this first policy was that all departments were to conduct “periodic evaluations in carrying out their responsibilities for
the management of their programs” on a five-year cycle (coordinated
with the fiscal plan) and that deputies were to make informed management decisions, demonstrate accountability, and provide advice to
ministers on the strategic direction of programs (TBS, 1977, p. 2). Of
note in this first policy was that evaluations were to be “objective” according to three criteria: terms of reference were to be established for
each evaluation project; evaluations were to be conducted independently; and reporting was to be clearly established and communicated
to senior management. What could be defined as the “directive” on
evaluation clarified that evaluations were to assess the operation of
programs (formative), clarify program objectives where necessary
(formative), reduce or eliminate programs (strategic), and identify
those programs that were held in high priority by the government
of the day (strategic). Evaluation would be concerned with supporting sound financial management and accountability in line with the
Lambert recommendations.
The evaluation function worked in concert with the Auditor General,
who was limited to examining the efficiency and economy of government activities (Canada, 1977, p. 7[2]).4 Examination of effectiveness
was left to the evaluation function. However, this was deceiving. In
fact, the policy was a departure from the 1969 Program, Planning
and Budgeting Guide that preceded it. Whereas the Guide assumed
an “objective” and independent
thermostat-like financial control system that constantly
self-corrects and self-reports on itself according to objective, quantitative data and criteria, the Treasury Board’s
policy directed departments and agencies to establish
three-to-five year evaluation strategies entirely based on
the best judgement of program managers [in concert with
senior managers]. (Muller-Clemm & Barnes, 1997, p. 56)
12
The Canadian Journal of Program Evaluation
Sutherland rightly concluded that the Treasury Board change was
an “ad hoc analysis, only a judgemental kind of control … without
providing any formulae for how the information should be weighted,
evaluated and interpreted within the bureaucratic-political context”
(Sutherland, 1990, p. 145). This conclusion cannot be understated:
it represented a departure from past experience and one that dogs
the function today. The previous value of the function was that its
products had a direct input into strategic decision-making. After
the change, without either criteria-driven selection of the objects
of evaluation or an unequivocal requirement that the findings of
evaluation be used, they were easily lost within the system with no
required input into management decisions.
In 1978, the Office of the Comptroller General (OCG) under Harry
Rogers and a separate branch, the Program Evaluation Branch, was
re-established5 to assist departments and agencies to institute and
maintain their evaluation function as per the new policy. The contribution of the Comptroller General at the time was to distinguish
between “big-P” and “little-P” programs. Big-P programs were major
federal government-wide initiatives, which were the subject of attention in the annual Estimates. Little-P programs were departmental
programs focused on particular or functional responsibilities. The
OCG directed evaluation units to focus on little-P programs (Dobell &
Zussman, 1981; Office of the Comptroller General of Canada [OCG],
1979) despite the recommendations of the Lambert Commission to
concentrate on “big” programs on a five-year cycle. The combination
of the new evaluation policy and the orientation undertaken by the
OCG to focus on departmental programs essentially set the evaluation function on a path that has been exceedingly difficult to adjust.
Despite this dramatic shift in policy, the evaluation community had
grown considerably during the 1970s. The Canadian Evaluation Society (CES) was established in 1981, which added lustre to the new
field. Textbooks were being written, academic journals were springing up, evaluation courses were becoming commonplace in public administration programs, and standardized methodologies were being
agreed upon. Overall, the 1970s established major direction for the
field that remains firmly in place.
Engraining Concerns for Fiscal Prudence: 1980 to 1990
The 1980s are characterized as a decade of centralizing and decentralizing responsibilities in line with the rise of the New Public Management (NPM) in Canada. The concern raised by the NPM was that
La Revue canadienne d’évaluation de programme
13
governments were becoming too unwieldy, that governments should
behave more like the private sector to gain efficiencies in the delivery
of public services, and that these services be delivered at an appropriate point of subsidiarity (Aucoin, 1995, pp. 8–10). Several attempts
were made during this period to create a performance management
and data collection system that respected innovations in financial
planning such as PEMS. In 1981, the TBS directed departments
to include performance information in their annual Estimates to
Parliament to judge program performance (TBS, 1982). The challenge for evaluation units was that judgement regarding the type of
performance information to be collected was left mainly to program
managers without necessarily tying these to larger public policy objectives that may include more than one program. Such shifts only
served to reinforce the fact that evaluation was tied to program-level
considerations for process and performance (i.e., outputs production).
Attempts were made to consider results-based outcomes, but without
adequate performance management systems and performance monitoring, tracking results was challenging.
In 1984, amendments were made to the Financial Administration
Act that required evaluation units to gauge the value-for-money of
“little-P” programs. In 1988, the Treasury Board spearheaded the
Increased Authority and Accountability for Ministers and Departments initiative (IMAA), followed by guidance from the OCG, Working Standards, released in 1989. The guidance removed scheduling
decisions for evaluations from program managers and created a
negotiated agreement between departments and the TBS through
memoranda of understanding. The idea was to reframe the function
to support assessing results. It also emphasized the responsibility of
senior managers to evaluate their programs on a regular basis.
In 1989, the Supreme Court ruled that the Auditor General must
seek access to documents through the courts when access has been
refused by Parliament or Cabinet. Access to information on the rationale and relevance of programs has long been a sticking point.
The response to the ruling was greater use of the media by all Auditors General. The cost has been an eroding of the impartiality and
neutrality of the Auditor General (Saint-Martin, 2004). Although
evaluation has not relied on similar strong-arm approaches for data
collection, accessing source documents on rationale and relevance
has been an equal concern.
Although the Office of the Comptroller General was generally concerned with making evaluation a relevant part of government de-
14
The Canadian Journal of Program Evaluation
cision-making, it played a significant role in setting the function’s
direction (Rutman, 1986, p. 20). There was consensus that a key
intent of the new Evaluation Policy in 1977 would focus on summative evaluation with a view to understanding results (CEE, 2005b;
Foote, 1986, pp. 91–92; Maxwell, 1986; Raynor, 1986; Rutman, 1986),
but the fact was that the OCG contributed to a different direction
for the function. Under the leadership of Comptroller General Harry
Rogers in 1983, comptrollers were introduced in all departments responsible in part for the planning and coordination of management
information. Rogers instituted a “challenge function” whereby financial officers would examine departmental proposals for programs
(i.e., programs subject to mid-level managers) to systematic review
before proceeding to external actors such as the TBS and the Privy
Council Office. The result was generally the creation of some unit in
departments to review program proposals for operational integrity—
a focus on formative evaluation with a view to serving line managers
rather than the deputy head as intended (CEE, 2004a, 2005b; Mayne,
1986, p. 99; Rutman, 1986, p. 21).
Meanwhile, federal governments continued to make a concerted
effort to reform their financial and program management regimes.
In 1984, Prime Minister Brian Mulroney announced his intention
to review the size of government, and in 1985 he established the
Neilson Task Force to review all federal programs. However, the
task force did not use evaluations in any significant way. In fact,
evaluation had come under a great deal of skepticism as a result of
this exercise because departmental evaluation units could supply
little of the information the task force needed for a government-wide
perspective (Mayne, 1986, pp. 98–100; Rutman, 1986, pp. 21–22).
Likewise, departmental evaluations tended to show little direction
as to where and how programs could be improved let alone inform
other strategic decision-making exercises (Rist, 1990; Savoie, 1994).
The function faces quite similar problems today as an input to the
Strategic Review exercises of departments.
Once again the Auditor General examined the program evaluation
function and, although it was found that progress was being made
in implementing evaluation in departments, the greatest criticisms
were, ironically, at the government-wide level. At the departmental
level, the audit found that of the 86 evaluation studies assessed, “approximately half the studies which attempted to measure the effectiveness of programs were unable to adequately attribute outcomes
to activities” (OAG, 1983, p. 3.14). At the government-wide level, the
Auditor General concluded that
La Revue canadienne d’évaluation de programme
15
[a]lthough current policy and guidelines recognize the
existence of interdepartmental programs, they fail to
specify procedures to be followed in conducting evaluations of them. The consequence was that interdepartmental programs were not subjected to the same type of
orderly review and evaluation as programs administered
wholly within single departments and agencies. (OAG,
1983, p. 3.22)
That is, evaluations that were able to support larger decisions related to public policy outcomes government-wide remained a serious
concern.
Ultimately, the 1980s ended with growing signs of scepticism. As
Dobell and Zussman concluded about the experience of evaluation
between 1969 and 1981, “a solid decade, almost two, has gone into
changing the words and the forms” (1981, p. 406). In the 1980s, the
function lacked an agreed-upon theory and practice despite some
guidance from Canada (e.g., 1981 TBS guides; Zalinger, 1987) and
the United States (e.g., Boruch, McSweeney, & Soderstrom, 1978;
Campbell, 1975; Tyler, 1942), resistance to addressing the evaluation
needs of decision-makers, and an overall resistance by the evaluation
community to evaluation activities aimed at effectiveness (Sutherland, 1990, p. 163). These pathologies were no better addressed in
the 1990s.
The Function under Criticism: 1990 to 2000
The 1990s did not begin well for the function as the Comptroller
General, Andy MacDonald, released a paper entitled “Into the 1990s:
Government Program Evaluation Perspectives.” The paper questioned the usefulness of the evaluation function arguing that although “federal evaluation clearly pays for itself … it is not a major
player in resource allocation.” It further stated that the review of
programs rather than being on a five-year cycle was much closer to
twelve (OCG, 1991, pp. 3–6). It added that between the IMAA and
Public Service 2000 initiatives and the decentralizing of management authorities in the hands of departments and agencies, the
evaluation function would continue to weaken. The authors of the
report recommended that in order to combat this problem, evaluators
should be centralized under the authority of the OCG (Foote, 1986;
OCG, 1991, p. 21). Although this was not a new idea, resistance was
and continues to be high.
16
The Canadian Journal of Program Evaluation
In addition, the Auditor General again reviewed the evaluation function in 1993. Three chapters were dedicated to this task: Chapter 8
reviewed the field and examined “the case” for maintaining evaluation; Chapter 9 audited the operation of evaluation units in departments and agencies; and Chapter 10 explored how best to make the
evaluation function work better. In addition to these three chapters,
the Auditor General identified in Chapter 1 some “intractable issues,”
one of which was “ensuring that program evaluation asks the tough
questions and assesses significant expenditures” (OAG, 1993a, p.
1.13). The general conclusions were significant:
Program evaluations frequently are not timely or relevant. Many large-expenditure programs have not been
evaluated under the policy. Despite policy requirements
for evaluating all regulatory measures, half have been
evaluated, although other reviews have taken place in
many departments. The development of program evaluation over the last ten years has been primarily in departments’ evaluation units. More progress is required
in developing program evaluation systems that respond
effectively to the interest in effectiveness information
shown by Cabinet, Parliament and the public. (OAG,
1993c, pp. 8.5, 8.6)
The OAG found that most evaluations tended to be formative, focusing on program implementation and improving program design and
management. The OAG also examined the performance of evaluation
units. The results were not encouraging:
in 1991–92, only $28.5 million was spent on program
evaluation across all federal government departments.
Yet program evaluation is charged with considering for
evaluation all the programs and activities of government.
It evaluated about one quarter of government expenditures from 1985–86 to 1991–92, far short of original
expectations that all programs would be evaluated over
five years. For example, only 53 percent of regulatory programs were evaluated over the required seven-year timeframe. The program evaluation capability established
by the federal government in the early 1980s is still in
place, but its strength is declining. Fewer resources tend
to produce fewer studies. (OAG, 1993b, p. 9.1)
With respect to the governance of evaluation units, the OAG found
that
La Revue canadienne d’évaluation de programme
17
[p]riority has been given to meeting the needs of departmental managers. As a result, evaluations examine
smaller program units or lower-budget activities and
focus on operational performance. They are less likely to
challenge the existence of a program or to evaluate its
cost-effectiveness. (OAG, 1993b, p. 9.2)
In September 1993, TBS concluded in a separate examination that
“[f]usion into a single-function Review group carries potentially more
serious negative consequences, and tends to be initiated for quite different reasons that the two preceding forms of linkage” (TBS, 1993,
p. 22).
As a result of its major audit, some recommendations were made to
improve the performance and effectiveness of the evaluation function: evaluations should be subject to external review and assessment in order to ensure objectivity (OAG, 1993d, pp. 10.2, 10.3);
timeliness and relevance of evaluations could be enhanced if linked
to the major decisions of government, especially resource allocation,
program and policy reviews, and accountability reporting (OAG,
1993d, p. 10.4); quality assurance could be improved with monitoring
by the Comptroller General (OAG, 1993d, p. 10.5); and evaluation
units should be made more responsive by linking the function to
decision-makers’ needs (OAG, 1993d, p. 10.6). Evaluations must be
timely, political interference must be decreased, cooptation by program and audit units must be curbed, and recommendations must
have an effective entry point into decision-making (Muller-Clemm
& Barnes, 1997).
The next iterative change to the federal evaluation policy emerged in
1994 under the “Review Policy,” which formally combined audit and
evaluation despite advice to the contrary. The thought was that this
could better serve the reporting needs of program managers. Despite
this error in governance, evaluation was still expected to produce
timely, relevant, objective, practical and cost-effective
evaluation products.… [and to review] the continued
relevance of government policies and programs; the impacts they are producing; and on opportunities for using
alternative and more cost-effective policy instruments
or program delivery mechanisms to achieve objectives.
(TBS, 1994, pp. 14–16)
This was reinforced by such initiatives as the 1994/95 GovernmentWide Review of Year-End Spending, which called on deputy heads
18
The Canadian Journal of Program Evaluation
to consider whether “value-for-money was obtained in expenditures
decisions” (TBS, 1994), and the extent to which expenditures met a
defined program need. Supporting documentation including program
evaluations were sought to validate decisions. This was one of several
attempts to steer audits and evaluations in the direction of verifying
value-for-money.
The OAG assessed the function again in 1996 and found that little
had changed. In particular, it noted that “[e]valuations continue
to emphasize the needs of departmental managers—focusing on
smaller program components and operational matters” (OAG, 1996,
p. 3.3) and even less progress on value-for-money considerations. This
certainly indicates a continued misalignment of evaluation efforts
and the needs of senior decision-makers.
The evaluation policy change also came on the heels of a 1993 study
on public service organization launched by former Clerk of the Privy
Council, Gordon Osbaldeston, who recommended that the Treasury
Board shift its role from functional oversight and control to creating the conditions necessary for improved financial management
performance at the departmental level (Osbaldeston, 1989, p. 174).
In effect, departments and agencies were to have greater responsibility for ensuring management control and oversight. In addition,
Prime Minister Jean Chrétien initiated a government-wide program
review to improve the overall “management” of government. It was
intended, like the Nielson Task Force before it, that the evaluation
function would play a significant role. In fact, a TBS-sponsored report
in 2000 argued that evaluation units needed to do more to assist program managers to develop appropriate evaluation frameworks and
performance measures. However, it found that the Program Review
exercise “led to a serious undermining of the capacity of evaluation
functions” given that most of the effort of evaluation units was spent
on performance measurement, not evaluation. Finally, it reiterated
previous reports calling for a major review of the evaluation policy
given that the 1994 policy “muddies the distinction between audit
and evaluation” (TBS, 2000b, p. 2). In short, the 1994 policy was met
with great disappointment—the function had once again failed to
deliver in the eyes of decision-makers (Gow, 2001; Muller-Clemm &
Barnes, 1997; Pollitt, 2000).
The 1990s also saw the federal government considering performance measurement as a way of improving accountability for results
(Lahey, 2010, p. 2). The PS-2000 initiative called for the reduction of
red tape, empowerment of staff, devolution of authorities to depart-
La Revue canadienne d’évaluation de programme
19
ments, decentralizing decision-making structures, and eliminating
unnecessary regulations. Improving the quality of the public service
was the principal aim, and focused on improving the management
culture of the bureaucracy. In part, it called on public servants to be
more “entrepreneurial,” an idea later cited in Reinventing Government (Osborne & Gaebler, 1992). In this vein, and unlike previous reform efforts, public servants would be given more authority to make
decisions, but under a system of “effective accountability for the use
of the authorities” (Canada, 1990, p. 89). The report proposed the
implementation of results and performance standards for managers
(Canada, 1990, p. 90). Such an accountability structure was meant to
create a culture of service to the public. Few of these ideas were new,
but they did frame additional reform efforts in the 2000s, including
a revision of the Evaluation Policy in 2001.
THE 2000s: MECHANIZING THE FUNCTION
The 2000s was a decade of what could be called the mechanization
and reform of federal governance. The idea was that systems engineering could be brought to bear on rationalizing public management. For program evaluation, it assumed that with centralized
and consistent application of organizational frameworks, oversight
becomes simpler: auditors and evaluators simply check program
performance against departmental and program “placemats” and
“strategic objectives.” Although a key aim of results-based management is to understand results, such results were still conceived at
the programmatic level and assumed they could be aggregated to
strategic level outcomes.
The Antecedents of the Current Reforms: 2000 to April 2009
Public Management Reform: New Comptrollership
The root of current reforms to several federal functions including
evaluation can be traced to the Modern Comptrollership Initiative
(MCI) introduced in 1998 and the subsequent Results for Canadians
Initiative established in 2000. Subsequent reforms emanated from
the Federal Accountability Act enacted in 2006.
An objective of the 1994 Program Review was to improve the overall
management of government. Based on the recommendations of an
independent review panel, Prime Minister Chrétien designated the
Treasury Board as the federal government’s “management board”
20
The Canadian Journal of Program Evaluation
in 1997, which set out to improve the efficiency of resource management and bureaucratic decision-making. MCI was a set of principles,
rather than rules, driven by a commitment to generally accepted
standards, values, and planned results achieved through flexible
delivery models as opposed to centrally driven processes (Canada,
1997a; Library of Parliament, 2003). The review panel recommended
ways to integrate private sector comptrollership practices into federal public management practices, and suggested that central agency
and departmental financial analysts and program managers must
collaborate to prioritize, plan, set goals and objectives, and participate in processes for defining and achieving results. It set out four
key areas for effective departmental stewardship: integrating timely
performance information, instituting sound risk management, effecting appropriate stewardship and control systems, and rallying
public servants around a shared values and ethics code (Canada,
1997a, pp. 3–4).
The general principles of the MCI were formalized into practice
with the Results for Canadians Initiative launched in 2000. This
initiative focused “on results and value for the taxpayer’s dollar, and
demonstrate[d] a continuing commitment to modern comptrollership” (TBS, 2000a, p. 3). This initiative focused on four critical areas
to a well-performing public sector: move to a citizen-focus on government activities, management to be guided by clear values and ethics,
departments and agencies to focus on results, and responsible spending (TBS, 2000a, pp. 5–6). It was in this spirit of stewardship that the
next evaluation policy was crafted. Again, evaluation was regarded
as a way to move government to results-based decision-making.
However, the framework document was critical of the function raising expectations of a repair: “Historically, governments have focused
their attention on resource inputs (what they spend), activities (what
they do), and outputs (what they produce). Accurate information at
this level is important but insufficient in a results-focused environment” (TBS, 2000a, p. 11).
Evaluation Policy 2001
The 2001 Evaluation Policy, which came into effect on 1 April, was
the result of several months of consultations within the federal
evaluation community. The preface of the policy lays out its objectives: “this policy supports the generation of accurate, objective, and
evidenced-based information to help managers make sound, more
effective decisions on their policies, programs, and initiatives and
through this provide results for Canadians” (CEE, 2001, Preface).
La Revue canadienne d’évaluation de programme
21
In effect, the function was regarded as a “management tool” (CEE,
2001, p. 1) intended to support program managers in their efforts at
program monitoring. It outlined two purposes for evaluation:
•To help managers design or improve the design of policies,
programs, and initiatives;
•To provide, where appropriate, periodic assessments of policy or program effectiveness, of impacts both intended and
unintended, and of alternate ways of achieving expected
results. (CEE, 2001, p. 2)
Although the policy promised emphasis on program effectiveness, it
required program line managers to embed evaluation into the lifecycle management of policies, programs, and initiatives by
•Developing Results-based Management Accountability
Frameworks (RMAFs) for new or renewed policies, programs, and initiatives;
•Establishing ongoing performance monitoring and performance measurement practices;
•Evaluating issues related to the early implementation and
administration of the policy, program, or initiative, including
those that are delivered through partnership arrangements
(formative and mid-term evaluation); and
•Evaluating issues related to relevance, results, and costeffectiveness. (CEE, 2001, p. 2)
A key priority for heads of evaluation was to “provide leadership and
direction to the practice of evaluation in the department” by ensuring “strategically focused evaluation plans, working with managers
to help them enhance the design, delivery and performance measurement of the organization’s policies, programs, and initiatives,
and informing senior management and departmental players of any
findings that indicate major concerns respecting the management
or effectiveness of policies, programs or initiatives” (CEE, 2001, pp.
3–4). Likewise, departmental managers must “draw on the organization’s evaluation capacity … and ensure that they have reliable,
timely, objective, and accessible information for decision-making and
performance improvement” (CEE, 2001, p. 4). Interestingly, a TBS
study conducted in April 2004 concluded that
the function has not lived up to the original policy expectations set out in 1997, measuring the effectiveness of
policy and programs. In fact evaluations have resulted
22
The Canadian Journal of Program Evaluation
largely in the operational improvements to and monitoring of programs, rather than more fundamental changes.
(CEE, 2004c, p. 3)
As such, the policy exhorted that “evaluation discipline should be
used in synergy with other management tools to improve decisionmaking” (CEE, 2001, p. 2). However reasonable a goal, policy and
practice were observed to be disconnected.
The evaluation community was critical of the impact of the 2001
policy:
The environment of program evaluation practice today
presents three key interrelated threats: the dominance of
program monitoring, the lack of program evaluation selfidentity, and insufficient connection with management
needs. Various events have propelled performance monitoring and short-term performance measurement to the
front of the management scene; despite the intentions of
the RMAF initiatives that aim to focus people on results,
many managers are now content to possess a performance measurement framework that often focuses on the
obvious outputs rather than providing a more in-depth
assessment of program logic and performance to better
understand why certain results are or are not observed.
(Gauthier et al., 2004, p. 167)
This set of conclusions was supported by a Treasury Board study in
2004, which concluded that although the overall quality of evaluations had certainly improved, more work needed to be done with
respect to producing high-quality effectiveness studies, connecting
evaluation to performance and efficiency, and assisting strategic
decisions (CEE, 2004c. 2005).
Management Accountability Framework
To provide more impetus to program performance support, the MCI
would be formalized into a system of ideal management systems
and practices with the creation of the Management Accountability
Framework (MAF) in 2003. The intent of the MAF was “to develop
a comprehensive system that would attempt to gauge and report on
the quality of management of departments and agencies, and encour-
La Revue canadienne d’évaluation de programme
23
age improvement every year” (Lindquist, 2009, p. 51; OAG, 2002).
MAF was an engineering spectacle designed to improve management
performance in the areas of governance and strategic direction; public service values; policy and programs; people (HR management);
citizen-focused service; risk management; stewardship; accountability, results, and performance; and learning, innovation, and change
management (CEE, 2003). The role of evaluation was to support the
achievement of program performance (CEE, 2004a, p. 4).
Federal Accountability Act 2006
The Federal Accountability Act was enacted on 12 December 2006.
Prime Minister Stephen Harper identified 13 areas in his Action
Plan that this legislation was to address for the purpose of “rebuilding the confidence and trust of Canadians” (Canada, 2006b,
p. 1). One priority called for strengthening the accountability of
departments by clarifying the responsibilities of deputy heads and
bolstering internal audit units (Canada, 2006a). Section 16.1 of the
Act makes “the deputy head or chief executive officer of a department responsible for ensuring an internal audit capacity appropriate to the needs of the department” (Canada, 2006c, p. 188). Section
16.4 of the Act designates the deputy head “the accounting officer of
a department … accountable before the appropriate committees of
the Senate and the House of Commons” (Canada, 2006c, p. 189) for
such matters as “(a) the measures taken to organize the resources
of the department to deliver departmental programs in compliance
with government policies and procedures; (b) the measures taken
to maintain effective systems of internal control” (Canada, 2006c, p.
189). Section 16.2 of the Act also required deputy heads to establish
internal audit committees comprising external members to provide
some external “functional oversight” over internal audit and on those
internal systems requiring management attention under the MAF
(TBS, 2003, 2006).
The combination of the MAF, accounting officer, and audit committee
directives, and the new comptrollership reinforces that management
of the department is “job 1” for deputy heads as opposed to traditional
responsibilities mainly for policy development (Shepherd, 2011). The
role of evaluation in this schema is significant. Although not referred
to directly in the Federal Accountability Act, its role in the oversight
functions of the department is fundamental, as it supports the accounting officer responsibilities of the deputy head. These are spelled
out further in the new Evaluation Policy 2009.
24
The Canadian Journal of Program Evaluation
Expenditure Management System
Amendments to the Expenditure Management System (EMS), established in 2007, are responsible for driving much of the discussion of
reforms to the 2009 Evaluation Policy. The EMS is built on three pillars: Managing for Results (benchmarking and evaluating programs
and demonstrating results); Up-Front Discipline (providing critical
information for Cabinet decision-making by ensuring all funding
proposals have clear measures of success); and Ongoing Assessment
(reviewing all direct program spending on an ongoing basis to ensure
program efficiency and effectiveness) (TBS, 2007, pp. 1–2).
With respect to managing for results, evaluation is expected to support the Management, Results, and Resources Structure Policy
(MRRS) established in 2005, contributing toward a consistent government-wide approach to the collection, management, and reporting
of financial and non-financial information on program objectives, performance, and results. As such, the evaluation policy was identified as
a critical element to support improved reporting to Parliament. The
TBS published the Performance Reporting: Good Practices Handbook
in August 2007, which provided guidance on ways to produce effective
reporting using the MAF. Reporting would be based on departmental
Program Activity Architectures (PAA) that distinguish departmental
from whole-of-government reporting. The PAA was intended to serve
as the basis for all parliamentary reporting through departmental
performance reports, essentially self-report cards on plans and priorities (Report on Plans and Priorities) set at the beginning of each fiscal
year. The idea appeared sound: set out a one-year plan at the beginning of the year and then report on how the organization fared at the
end of the year against those plans. Driven by the TBS, departments
were to inventory their program activities and validate whether they
were contributing to their strategic outcomes. These PAAs were regarded as a departmental “logic model” against which results would
be assessed. The main challenge, however, for the use of PAAs was
that while departments were asked to frame their plans and priorities against strategic objectives and results, funding from the centre
continued to flow through individual programs. In essence, the challenge for evaluation was to assess individual programs—but against
a mechanism interested in policy-level “strategic objectives.” The
engineering was akin to that of evaluators being asked to examine
the performance of an automobile’s brakes when the driver is more
interested in knowing whether the entire car is safe and reliable. The
function continues to struggle with how to support corporate decisions of results when “programs” are the main lens.
La Revue canadienne d’évaluation de programme
25
With respect to up-front discipline, the TBS published a revised
Guide to Preparing Treasury Board Submissions in July 2007. The
purpose of the guide was to improve the quality of information in TB
submissions by requiring departments to demonstrate the “linkages
between policy, program and spending information by requiring that
MRRS information be included” (TBS, 2007, p. 5). The guide also
required that evaluation costs be separated from regular program
costs—a key development that may serve to improve the independence of the function.
Finally, with respect to ongoing assessment, departments and agencies are required to carry out regular “Strategic Reviews.” These
reviews are to be carried out every four years “to assess how and
whether programs are aligned with priorities and core federal roles,
whether they provide value-for-money, whether they are still relevant
in meeting the needs of Canadians, and whether they are achieving
results” (TBS, 2007, p. 5). Reviews are carried out according to the
PAA framework using evaluations, audits, MAF assessments, and
other sources as supporting evidence. The idea is to regularly align
financial resources to government priorities by identifying high- and
low-performing programs and assigning savings to new priorities.
Overall, the idea behind the EMS was to carry out regular evaluation
of programs and policies followed up by a regular financial alignment
exercise every four years, essentially a formal institution of the 1994
Program Review exercise.
The 2009 Evaluation Policy
The Requirements
The current Evaluation Policy took effect on 1 April 2009 and comprises three important elements: policy (overall requirements),
directive (operational requirements), and standard (minimum requirements for quality, neutrality, and utility). The principal objective of the policy “is to create a comprehensive and reliable base
of evaluation evidence that is used to support policy and program
improvement, expenditure management, Cabinet decision-making,
and public reporting” (CEE, 2009b, p. 5.1).
Under Section 6.1, Deputy Heads are now directly responsible for
“establishing a robust, neutral evaluation function in their department” (CEE, 2009b). Such responsibilities include
26
The Canadian Journal of Program Evaluation
•The Head of Evaluation reports directly to the Deputy Head
(6.1.1, 6.1.2);
•A departmental evaluation committee is established to advise the deputy head on all evaluation-related activities
(6.1.3);
•Evaluation findings should be used to inform program, policy, resource allocation, and reallocation decisions (6.1.5);
•A rolling five-year departmental evaluation plan that aligns
with the MRRS, supports the EMS (including Strategic Reviews), and includes all ongoing programs of grants and
contributions is to be maintained and submitted to the TBS
(6.1.7);
•Coverage: Evaluation must include all direct program
spending (excluding grants and contributions) every five
years (6.1.8a); all ongoing grants and contributions programs to be evaluated every five years (6.1.8b); the administrative aspect of major statutory spending is evaluated
every five years (6.1.8c); include programs set to terminate
over a specified period of time (6.1.8d); include specific programs requested by the Secretary of the Treasury Board in
consultation with the deputy head (6.1.8e); and include programs identified in the Government of Canada Evaluation
Plan (6.1.8f);
•Ensure that ongoing performance measurement is implemented throughout the department in order to support the
evaluation of programs (6.1.10);
•The Secretary of the Treasury Board is responsible for functional leadership of the function including monitoring the
health of evaluation as a function (6.3.1a); and developing a
government-wide Evaluation Plan (6.3.1b).
The Directive on the Evaluation Function sets out the responsibilities
of the Head of Evaluation including developing a rolling five-year
evaluation plan (CEE, 2009a, p. 6.1.3a); ensuring the alignment of
the plan as described in the policy with the MRRS (6.1.3b.i.), the
EMS (6.1.3b.ii.), and appropriate coverage (6.1.3b.iii–viii); identifying and recommending to the deputy head and evaluation committee
a risk-based approach for determining the evaluation approach and
level of effort to be applied to individual evaluations (6.1.3c); submitting and implementing the evaluation plan annually (6.1.3d, e); and
“ensuring that all evaluations that are intended to count toward the
coverage requirements of subsections ‘a,’ ‘b,’ or ‘c’ of subsection 6.1.8
of the Policy on Evaluation, include clear and valid conclusions about
the relevance and performance of programs” (6.1.3f).6 This final sub-
La Revue canadienne d’évaluation de programme
27
section in particular makes it clear that Heads of Evaluation carry
out those studies as prescribed by the policy, and that other studies
that may be desired by the deputy head or evaluation committee are
considered if there are adequate resources remaining in the budget.
The Directive also identifies program managers (as defined in the
Evaluation Policy) as responsible for implementing and monitoring
ongoing performance measurement strategies, and ensuring that
credible and reliable performance data are being collected to effectively support evaluation (CEE, 2009a, p. 6.2.1). A key requirement
for program managers is also to develop and implement management
responses and action plans for evaluation reports (6.2.2). This is a
sound development in the Evaluation Policy that builds on similar
requirements in the federal Audit Policy.
An issue of significance in the Evaluation Policy is a reduction from
four to two in the issues to be studied in evaluations. Specifically, the
policy identifies relevance and performance of programs the main
concerns, as opposed to the 2001 policy issues of rationale/relevance,
design/delivery, success/impacts, and cost effectiveness/alternatives.
Although many of these are covered under the new questions, there
remains some frustration in the evaluation community about the
specific focus on central agency concerns rather than on departmental ones attached to these value criteria. The specific value questions
are summarized accordingly:
Relevance
Issue 1:Continued need for the program: Assessment of the extent
to which the program continues to address a demonstrable
need and is responsive to the needs of Canadians.
Issue 2:Alignment with government priorities: Assessment of the
linkages between program objectives and (a) federal government priorities and (b) departmental strategic objectives.
Issue 3:Alignment with federal roles and responsibilities: Assessment of the role and responsibilities for the federal government in delivering the program.
Performance
Issue 4:Achievement of expected outcomes: Assessment of progress
toward specified outcomes (including immediate, intermediate, and ultimate outcomes) with reference to performance
targets, program reach, and program design, including the
linkage and contribution of outputs to outcomes.
28
The Canadian Journal of Program Evaluation
Issue 5:Demonstration of efficiency and economy: Assessment of
resource utilization in relation to the production of outputs
and progress toward expected outcomes.
There are several moving parts to the policy. There is a cacophony
of supporting and ancillary policies relating to oversight and accountability, management, stewardship, planning and budgeting,
and control. The final section attempts to make sense of the current
requirements by focusing on some aspects that are positive additions,
and others that could further challenge the function in the future to
improve.
A POLICY ASSESSMENT: INCREMENTAL IMPROVEMENT OR
MORE EROSION?
In the fall of 2009, the Auditor General, Sheila Fraser, included a
chapter titled “Evaluating the Effectiveness of Programs” in her annual report, which concluded that “departmental evaluations covered
a relatively low proportion of its program expenses—between five
and thirteen percent annually across the six departments [and that]
the audited departments do not regularly identify and address weaknesses in effectiveness evaluation” (OAG, 2009, pp. 2, 11). Clearly,
as this refrain has been stated repeatedly since at least 1978, one
must ask what evaluation units have been doing all this time. As
posited, either evaluators have not been providing a product that
decision-makers want, or they have not been able to deliver on their
commitments.
Although the 2009 audit focused on evaluation coverage, it recognized that departmental capacity for evaluation has traditionally
been hampered by shortages in evaluators, the addition of extra
responsibilities placed on units, and general overload in workloads
requiring the use of contractors (Cathexis Consulting Inc., 2010;
OAG, 2009). Many of these observations were also identified by a
meeting of the CES National Capital Chapter (Canadian Evaluation
Society, 2009). The evaluation community is concerned about the
changes in the evaluation policy and how departmental units will
cope. In particular, they were deeply concerned about the support
they are receiving from TBS to assist them in their responsibilities
and expectations (Canadian Evaluation Society, 2009).
This section concludes with a few brief thoughts on the direction of
the new policy. In particular, I am interested in whether it is likely
La Revue canadienne d’évaluation de programme
29
to move the function back to its traditional roots concerned with
matters of determining policy and program effect, which in my view
makes the function more relevant, or locks it on a critical path to
supporting fiscal prudence that could lead to its continued decline.
Issues of Concern Moving Forward
Based on the several studies to date, the following issues appear to
capture the challenges facing Canadian federal evaluators and evaluation units. There have been some shifts in policy emphasis from
the 2001 version. For example, the TBS is attempting to harmonize
some of the policy engineering around oversight, budgeting, planning, and performance measurement in hopes of making the function
relevant by connecting its products and services to these areas more
effectively. The challenge is that even if this were possible, it will
take time to acclimatize the evaluation community and harmonize
information and reporting systems. Some high level challenges are
• the appropriate evaluand and target of evaluation products
• the “governance” of the Policy on Evaluation
• the appropriate areas of input for evaluation products to
corporate decisions including operational cycles (strategic
reviews and evaluation plans)
• focus of the function (issues and questions) and coverage.
Appropriateness of the Evaluand and Target of Evaluation
Products
Refocusing evaluation on strategic decision-making has been argued
throughout the function’s evolution (Gauthier et al., 2004; Mayne,
1986, 2001; Prieur, 2011). Indeed, evaluation is well-placed to ask the
larger questions of program, initiative, strategy, and policy effectiveness. That is, are departments (and indeed government as a whole)
doing the right things in a way that addresses real public policy
problems? In this respect, evaluation in Canada must effectively
balance two principles: to provide objective and useful findings, conclusions, and recommendations relating to programs; and to support
Parliament’s efforts to serve the public good, the principal target of
public policy. Evaluation was centralized in 1977 with this purpose
in mind, but has gradually moved away from this purpose. To be
relevant, evaluation has to tackle the big questions that perplex
departmental and government-wide decision-makers. In addition, it
has to do better at examining not simply policies and programs but
30
The Canadian Journal of Program Evaluation
the instruments of policy as well, including regulation, exhortative
instruments, guidelines, partnerships, contracts, and hybrid tools
and processes. These are key contributors to effective policies and
programs. The idea is not to evaluate the extent to which program
results can be attributed to these instruments, but to understand
whether these instruments in combination with relevant policies and
programs contribute to resolving specified public problems (Mayne,
2001, 2008; McDavid & Huse, 2006).
The evolution of the evaluation function as shown has been predisposed to examine small evaluands (or “small-p” programs), leading
one to the conclusion that evaluation has not done well at giving
deputy heads the information they need at the strategic level. This
observation was borne out in a recent study that consulted deputy
heads on their evaluation functions, noting that performance measurement and evaluation “are not currently providing deputy heads
with a complete picture of organization-wide performance” (Lahey,
2011, p. 4.4). Targeting errors persist as although these tools are
adept at helping deputy heads to manage programs “individually,”
there is little effort being made to make evaluation a strategic decision-making tool (Lahey, 2011, p. 4.4). This predisposition has limited
the ability of senior departmental decision-makers to make judgements about departmental effectiveness in meeting their strategic
or corporate public policy objectives.
By extension, it is a reasonable conclusion that evaluation continues
to support program managers (those that manage individual programs), and not strategic decision-makers including deputy heads.
Although this is an appropriate user of evaluation, the policy suggests implicitly that program managers are the principal “clients”
of evaluations. This remains a problem: evaluation must be more
strategic than focusing on program processes. The question that
must always be asked is whether public problems are being resolved,
and how programs can be improved or replaced to achieve expected
results (Peters, Baggett, Gonzales, DeCotis, & Bronfman, 2007). Focusing on the needs of program managers suggests that evaluation
is beholden mainly to program units: programs are interested mainly
in program operations, not usually government priorities or perspectives. This is not to suggest that evaluation ought to ignore such
needs, but that they are better aligned with strategic results—the
intent of the MRRS.
On the positive side, it seems reasonable that as deputy heads are
required to identify savings in the Strategic Review process, gaps will
La Revue canadienne d’évaluation de programme
31
be observed in the usefulness of evaluation information. If this argument bears out in reality, then it is reasonable to assume that deputy
heads may become more involved in setting priorities for evaluation,
and becoming more active in the evaluation planning process than
perhaps they were in the past.
The new policy attempts to correct these targeting errors by making
Heads of Evaluation report directly to deputy heads. The implicit
assumption is that both offices will coordinate their planning efforts
so that the basket of programs being evaluated and their scope will
be negotiated and coordinated with other oversight activities, such
as audit. The challenge, however, is that evaluation is regarded under the policy as another deputy head responsibility among other
oversight and stewardship responsibilities (Lahey, 2011, p. 7). The
inclination is to consider evaluation another “ticky-box” exercise
rather than orienting it for strategic purposes.
The Governance of the Evaluation Policy
Although there have been some benefits of a standardized and centralized evaluation function, there have also been costs. An important benefit is that evaluation has contributed in some limited ways
to understanding departmental fiscal program performance. The
downside is that centralization stunts program creativity at the
departmental level by, again, stressing central agency concerns for
fiscal prudence over program effect. As such, evaluation is caught
between serving two masters—the department and the centre. Each
of these has demands that are quite different, which has contributed
to some confusion in the function. For departments, evaluation is best
suited to assess program effectiveness whereas the centre is more
interested in accountability and fiscal prudence. Evaluation units
have been attempting to serve both with equal rigour, and this is not
working out very well as central agency concerns generally win out
in this equation.
If, indeed, stewardship of departmental resources is the primary
responsibility of deputy heads, then it stands to reason that internal
preferences for oversight including evaluation ought to be driven by
deputy heads. The role of central agencies, including the Treasury
Board, is to support, not dictate, matters of what to cover and how.
Alternatively, if the objective of evaluation is to to support government-wide accountability as the principal driver, then a strong case
can be made to centralize evaluation much like the audit function
and remove this from departmental control altogether. Serving both
32
The Canadian Journal of Program Evaluation
departmental and government-wide needs and objectives has not
worked, especially given the accountability and resource use focus
of the evaluation criteria. A strong case can be made that if fiscal
prudence is the driving factor behind evaluation, as is currently the
case, then removing the function from departments is a viable option.
This would leave departments the flexibility needed to evaluate their
own priorities. In other words, evaluation activities could be divided between corporate evaluation and departmental functions. This
would allow central agencies to focus on concerns of accountability,
and departments to build capacity and carry out rigorous outcomesbased studies using questions that make sense in particular contexts.
Such observations were reinforced by deputy heads, who were concerned about the “one-size-fits-all” approach to the policy (Lahey,
2011, p. 4.2iv). They expressed a desire for more flexibility in the
design, scoping, and conduct of evaluations, especially with respect
to large versus small programs, and low- and high-risk programs. In
addition, the policy requires that all evaluation questions assume
equal attention, when perhaps other questions that fall outside the
TBS requirements are more pertinent. The fact that questions preferred by senior decision-makers are not given equal weight, attention, or credit raised some frustrations (Lahey, 2011, p. 4.2iv). In this
respect, the addition of evaluation committees may be a positive innovation in the sense that appropriate scoping advice can be brought
to bear on the design of evaluation project. However, central agency
requirements would appear to be limiting committee responsibilities
mainly to checking departmental work, rather than assisting with
appropriate scoping and other advice, which could be regarded as
added expense on individual projects. Another important conclusion
is that as long as the TBS questions take precedence, there is limited
opportunity for internal evaluators to learn from the studies, given
the constant churn demanded under the coverage requirements.
Appropriate Areas of Input for Evaluation Products
If one accepts that the appropriate target of evaluation products is
senior departmental and agency management then it follows that
evaluation products should support mainly departmental policy
planning, budgeting, and feedback systems. The most appropriate
use of evaluation would be for senior management, in concert with
heads of evaluation, to plan evaluations around a rotating cycle of
examining designated strategic objectives in the PAA. Such plans
would take into consideration senior management’s policy and planning concerns rather than the current focus on program-level ex-
La Revue canadienne d’évaluation de programme
33
penditures. This would allow for a systematic but also strategic
assessment of all departmental programs, thereby realigning evaluation with departmental priorities, not individual program managers’
needs. One must understand program-level concerns for operations,
but not at the expense of effective departmental planning and policy
decision-making.
That being said, the PAA vehicle would have to be amended to include “expected results” that flow from each of the subactivities. At
present, the PAA does not work: it identifies activities; evaluation
examines results—the engineering is wrong. Such a planning cycle
would facilitate more rational planning regarding ways to improve
the effectiveness of a combination of programs toward understood
policy goals, assuming that expected results have been defined appropriately.
With regard to Strategic Reviews, deputies noted new life being
breathed into the evaluation function. However, they also noted that
higher expectations of evaluation are inevitable insofar as credible information can be gleaned on fiscal prudence to support such
reviews. As long as evaluation is regarded as a key input to these
reviews, fiscal prudence becomes the overriding consideration. It
is appropriate to use strategic reviews to calibrate and stunt the
growth in A-base budgets by focusing on the identification of spending priorities. This may be another argument for a corporate evaluation function to support these particular purposes.
Focus of the Function and Coverage
It is not unreasonable as citizens to expect that evaluation will focus
on whether government interventions into public problems are actually working and that all such interventions are examined on a systematic basis. The problem resides in the emphasis that one places
on accountability for spending versus actually resolving problems.
With respect to the application of the evaluation questions, a focus
on relevance (alignment of departmental and government-wide priorities, constitutionality) and performance (impact assessment) are
valid areas of investigation. The challenge with respect to relevance
is that the current focus is about finding ways to reduce federal
expenditures by shifting program responsibilities to other jurisdictions. This serves federal accountability purposes, but does little to
understand the best ways to resolve pressing public problems that
could involve a federal presence.
34
The Canadian Journal of Program Evaluation
With respect to performance, current evaluation methodologies tend
to favour the use of information provided by the program or program
recipients. If evaluation efficacy is to improve, then alternative research methods and information sources will be needed to ensure
attribution, and information validity and reliability. The principal
assumption under the evaluation policy is to use multiple sources
of evidence (CEE, 2009c, p. 6.2.2). This assumption should be revisited to the extent that valid quantitative and qualitative research
methods can support findings without a disproportionate reliance on
embedded sources of program data. As long as this problem persists,
it will be difficult to evaluate programs in a way that identifies arguments aimed at understanding the appropriateness of the program
theory of change, rather than simply on the management of program
operations.
Coverage is a major concern under the new policy. This is not a new
stipulation, but there are indications that the centre will place more
emphasis on enforcement. There is some evidence to suggest that
departments and agencies have begun to “cluster” like programs to
achieve coverage. This is unlikely to work well, as reports will more
than likely show superficial findings based on the fiscal and accountability nature of the centralized TBS questions.
Equally important with respect to coverage, there is very little strategic value to the 100% coverage requirement. Although the public can
be assured that all spending is reviewed on a five-year cycle, there
is little to suggest that, without a serious increase in evaluation resources, much will be contributed by way of understanding the actual
value of these programs in the resolution of public policy problems.
That is, evaluating everything means essentially evaluating nothing
under this schema—all programs are subject to fiscal review, rather
than a strategic assessment of departmental action. Perhaps the
greater concern regarding coverage is that it reinforces an already
short-term public policy focus. Programs are seen to have lifecycles
of five years with demands for immediate results, rather than a focus
in many cases such as climate change for long-term change.
Aside from concerns of policy value, the coverage requirements are
simply unsustainable, especially in times of fiscal restraint and limited numbers of trained evaluators to do the work (Lahey, 2011, p.
4.2). Few departments will be able to fulfill the requirement, and
even if they can, it is questionable what value this will create for the
department other than satisfying accountability concerns. The fact
of the matter is that there is a confluence of accountability processes
La Revue canadienne d’évaluation de programme
35
including requirements generated under the strategic review process, Transfer Payments Policy, Audit Policy, MAF, and hyper-partisan
parliamentary committees. The logical question is whether there is
any value being generated through all of these oversight mechanisms
for the increasing resources being injected into them. If there was
some coordination of effort to actually learn from the products being
generated, deputy heads might consider them more than ticky-box
exercises or hoops to jump through in order to satisfy their performance accords.
SOME CLOSING THOUGHTS
The federal stimulus package has brought on unprecedented spending amounting to more than a $50 billion deficit in 2009/10. This is
combined with the Web of Rules Action Plan, which in 2008/09 was
aimed at reducing reporting requirements of the Treasury Board
policies by 25%, of online human resources reporting across government by 85%, and Management Accountability Framework assessments by 50%. Such measures are aimed at improving federal fiscal
performance, reducing inefficiency, protecting against key risks, and
preserving accountability. At the same time the role of audit is being
bolstered, reinforced, and positioned as the front-line defence against
waste. Where is the concern for program effectiveness?
Effective feedback on overall government effectiveness is being
pushed aside by an obsessive predilection by parliamentarians for
fiscal prudence. This has driven the orientation of oversight mechanisms such as evaluation toward mainly concerns with accountability over effect. One cannot help but conclude the federal government
is diminishing its evaluation function to that of just another auditor and could even be encouraging its swift demise at a time when
the US and other governments are stepping up their commitment
to effectiveness evaluation. My fear is that the stars are aligning
in a way that puts evaluation at a disadvantage. The obsessive focus on austerity can only lead in one direction—governments more
concerned with reducing spending rather than creative solutions to
public problems.
This article has made the argument that federal evaluation must
refocus its efforts and return to the thinking that inspired PPBS
and concern for better aligning programs with strategic solutions
to public problems. The MRRS is one vehicle that reminds departments, and indeed elected officials, that the principal aim of govern-
The Canadian Journal of Program Evaluation
36
ment spending is to support citizen demands for action on wicked
problems. Although it is laudable that evaluation ought to support
these purposes, it is fully realized that the appetite for elected officials to hear evidence-based concerns about the effectiveness of
public resources is highly limited. This has always been the case, and
the point could not be made more clear than the circumstances that
inspired the Gomery Inquiry or recent audits into stimulus spending. Indeed, no elected official wants to be questioned on their policy
choices, nor the means they bring to bear on resolving them. However, this does not mean that public servants should not aspire to a
culture of inquiry into policy and program relevance, effective public
action, and realistic and coordinated solutions. The current public engineering for evaluation is predisposed to understanding grants and
contributions programs. However, governments are increasingly depending on other instruments for intervention including regulation,
exhortation, and self-guided markets with government guidelines. At
present, evaluation focuses very little on these main instruments for
governing. As long as the predisposition to a grants and contributions
approach persists, there can be no progress toward government policy
performance, rather than individual program performance.
With the attention that federal evaluation is currently enjoying, opportunities to improve are being squandered on illogical concerns
mainly for accountability at the expense of government-wide learning. Although austerity and fiscal prudence may be considered by
some to be appropriate goals for evaluation to support, the fact is that
these are short-sighted. Coverage, centrally-driven value criteria,
inflexible planning and scoping requirements, and hyper-accountability concerns limit what evaluation can do. A more supportive central agency focus on building well-trained and competent evaluators
who focus on the right things in the right way is the key to moving
forward. Remaining on the path of enforcement, while expedient in
the short term, does not contribute well to a sustainable and relevant
evaluation function in the long term. Centralized federal evaluation
has been around since 1977 and the concerns remain. It is time for
the community to come together and resolve this dynamic or face an
uncertain future.
Notes
1
With respect to ensuring accountability, Good (2003, p. 166) argues
there are three main reasons for accountability within the Canadian
model of accountability: control, assurance, and learning.
La Revue canadienne d’évaluation de programme
37
2Henderson (1984) speaks about his often raucous relationship with
Prime Ministers Pearson and Trudeau, describing how for the first
time an Auditor General was using his annual report to embarrass
the government. More importantly, he describes his interest in social
science research methods to ascertain program effectiveness.
3For example, Prime Minister Pierre Trudeau had introduced Bill
C-190 in 1969 to limit the responsibilities of the Auditor General.
The bill was dropped in response to media pressure.
4
Section 7(2)(e) states that the Auditor General under subsection (1)
shall call attention to anything that he considers to be of significance
and of a nature that should be brought to the attention of the House
of Commons, including any cases in which he has observed that
“satisfactory procedures have not been established to measure and
report the effectiveness of programs, where such procedures could
appropriately and reasonably be implemented” (Canada, 1977).
5The responsibilities of the OCG were diffused among departments,
Supply and Services, and the Treasury Board Secretariat between
1969 and 1978.
6A risk-based approach is a method for considering risk when planning
the extent of evaluation coverage of direct program spending pending
full implementation of section 6.1.8 of the Policy. Risk criteria may
include the size of the population(s) affected by non-performance of
their individual programs, the probability of non-performance, the
severity of the consequences that could result, the materiality of
their individual programs and their importance to Canadians (CEE
2009b, Annex A).
REFERENCES
Alkin, M. (2004). Evaluation roots: Tracing theorists’ views and influences.
Thousand Oaks, CA: Sage.
Alkin, M. (2012). Evaluation roots: A wider perspective of theorists’ views
and influences. Thousand Oaks, CA: Sage.
Aucoin, P. (1995). Canada: The new public management in comparative
perspective. Montreal, QC: IRPP.
Aucoin, P. (2005). Decision-making in government: The role of program
evaluation (Discussion Paper). Ottawa, ON: Centre for Excellence for
Evaluation, Treasury Board Secretariat.
38
The Canadian Journal of Program Evaluation
Boruch, R., McSweeney, A., & Soderstrom, E. (1978). Randomized field
experiments for program planning, development and evaluation.
Evaluation Quarterly, 2(4), 655–695.
Campbell, D. (1957). Factors relevant to the validity of experiments in social
settings. Psychological Bulletin, 54, 297–312.
Campbell, D. (1975). Assessing the impact of planned social change. In G.M.
Lyons (Ed.), Social research and public policies (pp. 3–45). Hanover,
NH: Dartmouth College, Public Affairs Center.
Canada. (1962). Royal Commission on Government Organization (Glassco
Commission). Ottawa, ON: Supply and Services.
Canada. (1976). Royal Commission on Financial Management and Accountability, Final Report (Lambert Commission). Ottawa, ON: Supply
and Services.
Canada. (1977). Auditor General Act 1977. Ottawa, ON: Justice Canada.
Canada. (1990). Public service 2000: The renewal of the public service of
Canada. Ottawa, ON: Supply and Services.
Canada. (1997a). Report of the independent review panel on the modernization of comptrollership in the government of Canada. Ottawa, ON:
Supply and Services.
Canada. (1997b). Results for Canadians: A management framework for the
government of Canada. Ottawa, ON: Supply and Services.
Canada. (2006a). Accountability Act and Action Plan. Ottawa, ON: Supply
and Services.
Canada. (2006b). Federal Accountability Act and Action Plan [brochure].
Ottawa, ON: Supply and Services.
Canada. (2006c). Statutes of Canada: An Act providing for conflict of interest rules, restrictions on election financing and measures respecting
administrative transparency, oversight and accountability (Federal
Accountability Act). Ottawa, ON: Parliament of Canada.
Canadian Evaluation Society. (2009). As was said, session report: Professional day 2009. Ottawa, ON: CES National Capital Chapter. Retrieved
from www.evaluationcanada.ca
La Revue canadienne d’évaluation de programme
39
Cathexis Consulting Inc. (2010). Evaluator compensation: Survey findings.
Ottawa, ON: Author. Retrieved from www.cathexisconsulting.ca
Centre for Excellence for Evaluation, Treasury Board Secretariat. (2001).
Policy on evaluation 2001. Ottawa, ON: Author.
Centre for Excellence for Evaluation, Treasury Board Secretariat. (2003).
Interim evaluation of the Treasury Board evaluation policy. Ottawa,
ON: Author.
Centre for Excellence for Evaluation, Treasury Board Secretariat. (2004b).
(2004a). Evaluation function in the government of Canada. Ottawa,
ON: Author.
Centre for Excellence for Evaluation, Treasury Board Secretariat. (2004c).
(2004b). Review of the quality of evaluation across departments and
agencies. Ottawa, ON: Author.
Centre for Excellence for Evaluation, Treasury Board Secretariat. (2004a).
(2004c). Study of the evaluation function in the federal government.
Ottawa, ON: Author.
Centre for Excellence for Evaluation, Treasury Board Secretariat. (2005).
Case studies on the uses and drivers of effective evaluation in the
government of Canada. Ottawa, ON: Author.
Centre for Excellence for Evaluation, Treasury Board Secretariat. (2009b).
(2009a). Directive on the evaluation function. Ottawa, ON: Author.
Centre for Excellence for Evaluation, Treasury Board Secretariat. (2009a).
(2009b). Policy on evaluation. Ottawa, ON: Author.
Centre for Excellence for Evaluation, Treasury Board Secretariat. (2009c).
Standard on evaluation for the government of Canada. Ottawa, ON:
Author.
Dobell, R., & Zussman, D. (1981). An evaluation system for government: If
politics is theatre, then evaluation is (mostly) art. Canadian Public
Administration, 24(3), 404–427.
Foote, R. (1986). The case for a centralized program evaluation function
within the government of Canada. Canadian Journal of Program
Evaluation, 1(2), 89–95.
40
The Canadian Journal of Program Evaluation
Furubo, J.E., Rist, R., & Sandahl, R. (Eds.). (2002). International atlas of
evaluation. New Brunswick, NJ: Transaction Press.
Gauthier, B., Barrington, G., Bozzo, S. L., Chaytor, K., Cullen, J., Lahey, R.,
… Roy, S. (2004). The lay of the land: Evaluation practice in Canada
today. Canadian Journal of Program Evaluation, 19(1), 143–178.
Good, D. (2003). The politics of public management: The HRDC audit of
grants and contributions. Toronto, ON: University of Toronto Press.
Gow, I. (2001). Accountability, rationality, and new structures of governance:
Making room for political rationality. Canadian Journal of Program
Evaluation, 16(2), 55–70.
Henderson, M. (1984). Plain talk memoir of an auditor general. Toronto, ON:
McClelland & Stewart.
Jordan, J. M., & Sutherland, S.L. (1979). Assessing the results of public expenditure: Program evaluation in the Canadian federal government.
Canadian Public Administration, 22(4), 581–609.
Lahey, R. (2010). The Canadian M&E system: Lessons learned from 30
years of development (World Bank EDD Working Paper Series No.
23). Washington, DC: World Bank. Retrieved from www.worldbank.
org/ieg/ecd
Lahey, R. (2011). Deputy head consultations on the evaluation function.
Ottawa, ON: Centre for Excellence for Evaluation, Treasury Board
Secretariat.
Library of Parliament, Canada. (2003). Modern comptrollership (PRB-0313E). Ottawa, ON: Author.
Lindquist, E. (2009). How Ottawa assesses departmental/agency performance: Treasury Board’s management accountability framework. In A.
Maslove (Ed.), How Ottawa spends: Economic upheaval and political
dysfunction (pp. 47–88). Montreal, QC: McGill-Queen’s University
Press.
Maxwell, N. (1986). Linking ongoing performance measurement and program evaluation in the Canadian federal government. Canadian
Journal of Program Evaluation, 1(2), 39–44.
Mayne, J. (1986). In defense of program evaluation. Canadian Journal of
Program Evaluation, 1(2), 97–102.
La Revue canadienne d’évaluation de programme
41
Mayne, J. (2001). Addressing attribution through contribution analysis: Using performance measures sensibly. Canadian Journal of Program
Evaluation, 16(1), 1–24.
Mayne, J. (2006). Audit and evaluation in public management: Challenges,
reforms, and different roles. Canadian Journal of Program Evaluation, 21(1), 11–45.
Mayne, J. (2008). Contribution analysis: An approach to exploring cause and
effect. Institutional Learning and Change, 16(May), 1–4.
McDavid, J. C., & Huse, I. (2006). Will evaluation prosper in the future?
Canadian Journal of Program Evaluation, 21(3), 47–72.
McKinney, J., & Howard, L. (1998). Public administration: Balancing power
and accountability (2nd ed.). Westport, CT: Praeger.
Muller-Clemm, W., & Barnes, M. P. (1997). A historical perspective on federal program evaluation in Canada. Journal of Program Evaluation,
12(1), 47–70.
Office of the Auditor General of Canada. (1975). Report of the Auditor General of Canada 1975. Ottawa, ON: Supply and Services.
Office of the Auditor General of Canada. (1976). Report of the Auditor General of Canada 1976. Ottawa, ON: Supply and Services.
Office of the Auditor General of Canada. (1983). Program evaluation (Chapter 3). In Report of the Auditor General of Canada, Ottawa, ON: Supply and Services.
Office of the Auditor General of Canada. (1993a). Matters of special importance and interest (Chapter 1). In Report of the Auditor General of
Canada 1993. Ottawa, ON: Supply and Services.
Office of the Auditor General of Canada. (1993c). (1993b). Program evaluation in departments: The operation of program evaluation units
(Chapter 9). In Report of the Auditor General of Canada 1993. Ottawa, ON: Supply and Services.
Office of the Auditor General of Canada. (1993b). (1993c). Program evaluation in the federal government: The case for program evaluation
(Chapter 8). In Report of the Auditor General of Canada 1993. Ottawa, ON: Supply and Services.
42
The Canadian Journal of Program Evaluation
Office of the Auditor General of Canada. (1993d). The program evaluation
system – Making it work (Chapter 10). In Report of the Auditor General of Canada 1993. Ottawa, ON: Supply and Services.
Office of the Auditor General of Canada. (1996). Evaluation in the federal
government (Chapter 3). In Report of the Auditor General of Canada
1996. Ottawa, ON: Supply and Services.
Office of the Auditor General of Canada. (2002). Financial management and
control in the government of Canada (Chapter 5). In Report of the
Auditor General of Canada 2002. Ottawa, ON: Supply and Services.
Office of the Auditor General of Canada. (2009). Evaluating the effectiveness of programs (Chapter 1). In Report of the Auditor General of
Canada 2009. Ottawa, ON: Supply and Services.
Office of the Comptroller General of Canada. (1979). Internal audit and
program evaluation in the government of Canada. Ottawa, ON: Supply and Services.
Office of the Comptroller General of Canada. (1991). Into the 1990s: Government program evaluation perspectives. Ottawa, ON: Author.
Osbaldeston, G. (1989). Keeping deputy ministers accountable. Toronto, ON:
McGraw-Hill Ryerson.
Osborne, S., & Gaebler, T. (1992). Reinventing government: How the entrepreneurial spirit is transforming the public sector. Reading, MA:
Addison-Wesley.
Patton, M. Q. (1978). Utilization focused evaluation: The new century text
(3rd ed.). Thousand Oaks, CA: Sage.
Peters, J., Baggett, S., Gonzales, P., DeCotis, P., & Bronfman, B. (2007). How
organizations implement evaluation results. Proceedings of the 2007
International Energy Program Evaluation Conference, Chicago, IL,
35–47.
Pollitt, C. (2000). How do we know how good public services are? In B. G.
Peters & D. Savoie (Eds.), Governance in the 21st century: Revitalizing the public service (pp. 119–254). Montreal, QC: McGill-Queen’s
University Press.
Pollitt, C., & Bouckaert, G. (2004). Public management reform: A comparative analysis. New York, NY: Oxford University Press.
La Revue canadienne d’évaluation de programme
43
Prieur, P. (2011). Evaluating government policy. Draft paper.
Raynor, M. (1986). Using evaluation in the federal government. Canadian
Journal of Program Evaluation, 1(2), 1–10.
Rist, R. (1990). Program evaluation and the management of government.
New Brunswick, NJ: Transaction Press.
Rossi, P., & Freeman, H. (1985). Evaluation: A systematic approach (3rd ed.).
Beverly Hills, CA: Sage.
Rossi, P. H., Lipsey, M., & Freeman, H. (2004). Evaluation: A systematic approach (7th ed.). Thousand Oaks, CA: Sage.
Rutman, L. (1986). Some thoughts on federal level evaluation. Canadian
Journal of Program Evaluation, 1(2), 19–27.
Saint-Martin, D. (2004). Managerialist advocate or ‘control freak’? The
Janus-faced Office of the Auditor General. Canadian Public Administration, 47(2), 121–140.
Savoie, D. (1994). Thatcher, Reagan, Mulroney: In search of a new bureaucracy. Toronto, ON: University of Toronto Press.
Scriven, M. (1974). Evaluating program effectiveness or, if the program is
competency-based, how come the evaluation is costing so much? ERIC
Document No. SP008 235 (ED 093866).
Scriven, M. (1978). Merit vs. value. Evaluation News, 20–29.
Segsworth, R. V. (1990). Auditing and evaluation in the government of
Canada: Some reflections. Canadian Journal of Program Evaluation,
5(1), 41–56.
Shepherd, R. (2011). Departmental audit committees and governance: Improving scrutiny or allaying public perceptions of poor management?
Canadian Public Administration, 54(2), 277–304.
Stake, R. (2003). Standards-based and responsive evaluation. Thousand
Oaks, CA: Sage.
Stokey, E., & Zeckhauser, R. (1978). A primer for policy analysis. New York,
NY: W.W. Norton.
44
The Canadian Journal of Program Evaluation
Stufflebeam, D. (1983). The CIPP model for program evaluation. In G. F.
Madaus, M. S. Scriven, & D. Stufflebeam (Eds.), Evaluation models: Viewpoints in educational and human services evaluation (pp.
117–141). Boston, MA: Kluwer-Nijhoff.
Sutherland, S. L. (1986). The politics of audit: The Federal Office of the
Auditor General in comparative perspective. Canadian Public Administration, 29(1), 118–148.
Sutherland, S. L. (1990). The evolution of program budget ideas in Canada:
Does Parliament benefit from estimates reform? Canadian Public
Administration, 33(2), 133–164.
Treasury Board Secretariat. (1969). Planning programming and budgeting
guide of the government of Canada. Ottawa, ON: Author.
Treasury Board Secretariat. (1976). Measurement of the performance of
government operations (Circular 1976-25). Ottawa, ON: Author.
Treasury Board Secretariat. (1977). Evaluation of programs by departments
and agencies (Circular 1977-47). Ottawa, ON: Author.
Treasury Board Secretariat. (1982). Circular 1982-8. Ottawa, ON: Author.
Treasury Board Secretariat. (1993). Linkages between audit and evaluation
in federal departments. Ottawa, ON: Author.
Treasury Board Secretariat. (1994). Manual on review, audit and evaluation. Ottawa, ON: Author.
Treasury Board Secretariat. (2000a). Results for Canadians: A management
framework for the government of Canada. Ottawa, ON: Author.
Treasury Board Secretariat. (2000b). Study of the evaluation function. Ottawa, ON: Author.
Treasury Board Secretariat. (2003). Management accountability framework.
Ottawa, ON: Author. Retrieved from www.tbs-sct.gc.ca/maf-crg_e.asp
Treasury Board Secretariat. (2006). Directive on departmental audit committees. Ottawa, ON: Author.
Treasury Board Secretariat. (2007). Government response to the fourth report of the Standing Committee on Public Accounts: The expenditure
La Revue canadienne d’évaluation de programme
45
management system at the Government Centre and the expenditure
management system in departments. Ottawa, ON: Author.
Tyler, R. (1942). General statement on evaluation. Journal of Educational
Research, 35, 492–501.
Walker, R. (2003, November 30). The guts of a new machine. The New
York Times. Retrieved from http://www.nytimes.com/2003/11/30/
magazine/30IPOD.html?ex=1386133200&en=750c9021e58923d5&e
i=5007&partner=USERL
Weiss, C. (1972). Evaluation research: Methods of assessing program effectiveness. Englewood Cliffs, NJ: Prentice Hall.
Zalinger, D. (1987). Contracting for program evaluation resources. Canadian Journal of Program Evaluation, 2(2), 85–87.
Zussman, D. (2010). What ever happened to program evaluation? IT in
Canada. Retrieved from www.itincanada.ca
Robert Shepherd is Associate Professor at the School of Public
Policy and Public Administration, Carleton University. He assumed
the role of supervisor for the Diploma in Policy and Program Evaluation in 2011. In 1986, he co-founded a management consulting firm
in Ottawa focusing on public management and program evaluation
research. Between 1986 and 2007, he also served in various capacities within government, assuming various short-term assignments
in government departments, including the Canada School of Public
Service (CSPS), and most recently as Head of Evaluation, for the
Canadian Food Inspection Agency (2006-2007). His research interests lie mainly in the areas of public management reform efforts,
primarily in Westminster countries. As such, he is concerned about
the role of parliamentary agents and officers, the changing roles of
audit and evaluation functions, how various oversight roles such as
ombudsmen and other offices relate to their departments and carry
out their work, and the changing relationships between central agencies and departments to manage within an increasingly austere and
oversight laden system.