The Canadian Journal of Program Evaluation Vol. 26 No. 2 Pages 1–45 ISSN 0834-1516 Copyright © 2012 Canadian Evaluation Society 1 In Search of a Balanced Canadian Federal Evaluation Function: Getting to Relevance Robert P. Shepherd School of Public Policy and Administration, Carleton University Ottawa, Ontario Abstract: In April 2009, the Treasury Board Secretariat enacted a new Evaluation Policy replacing the previous 2001 version. This new policy has generated much discussion among the evaluation community, including the criticism that it has failed to repair the many shortcomings the function has faced since it centralized in 1977. This article reviews the history of the federal function as to why shortcomings persist and makes two assertions. First, if program evaluation is going to maintain its relevance, it will have to shift its focus from the individual program and services orientation to understanding how these programs and services relate to larger public policy objectives. Second, if program evaluation is to assume a whole-of-government approach, then evidentiary forms must be constructed to serve that purpose. The author makes the argument that evaluation must be far more holistic and calibrative than in the past; this means assessing the relevance, rationale, and effect of public policies. Only in this way can the function both serve a practical managerial purpose and be relevant to senior decision-makers. Résumé:En avril 2009, le Secrétariat du Conseil du Trésor édictait une nouvelle politique d’évaluation remplaçant la version antérieure de 2001. Cette nouvelle politique a suscité de grands débats dans le milieu de l’évaluation. D’après certaines critiques, elle a omis de combler les nombreuses lacunes qui affectent la fonction depuis sa centralisation, en 1977. Dans cet article, l’auteur relate l’histoire de la fonction fédérale afin de trouver des motifs à la persistance de ces lacunes et formule deux affirmations à cet égard. Premièrement, si l’on veut maintenir la pertinence de l’évaluation des programmes, il faudra cesser de l’orienter uniquement sur les programmes et les services, afin de comprendre les liens entre ces programmes et ces services et les objectifs élargis des politiques publiques. Deuxièmement, si l’on souhaite adopter une démarche étendue à l’ensemble du Corresponding author: Robert P. Shepherd, School of Public Policy and Administration, Carleton University, 1125 Colonel By Dr., 5126RB, Ottawa, ON, Canada K1S 5B6; <[email protected]> The Canadian Journal of Program Evaluation 2 gouvernement en matière d’évaluation de programmes, il faut créer des formes de preuve qui favorise l’atteinte de cet objectif. D’après l’argument de l’auteur, l’évaluation doit revêtir un caractère nettement plus holistique et calibratif que par le passé. Par conséquent, il faut évaluer la pertinence, le fondement, et l’effet des politiques publiques. Ce n’est qu’à cette condition que la fonction servira à des fins de gestion pratique et se révélera également utile aux principaux décideurs. INTRODUCTION “Design is not just what it looks like and feels like. Design is how it works.” Steve Jobs (Walker, 2003) Policy and form matter. Canada’s federal government evaluation policy and function has undergone a great deal of change since its introduction in 1977. Much of this is due to shifting expectations about the purpose of program evaluation over different time periods. However, at a very basic level, “the primary aim of evaluation is to aid stakeholders in their decision making on policies and programs” (Alkin, 2004, p. 127). That is, evaluation must not only inform grants and contribution program decisions on matters of program theory, coherence, approach, and results, it must ultimately serve to provide evidence that government-wide decisions are the most effective at resolving or contributing to a specified public problem defined by elected officials. Likewise, program evaluation serves a management function by supporting program improvement and design, and it serves central agency purposes by demonstrating accountability for the distribution and use of public resources. There are several other uses, but in public terms the function supports a variety of actors under various conditions, contexts, and degrees of complexity. Likewise, decision-makers’ expectations from evaluation have tended to shift between various evidentiary forms to support fiscal prudence (i.e., the cautious appropriation of public resources relative to revenues, Canada, 1997b, p. 6; McKinney & Howard, 1998, pp. 373–374), programmatic effectiveness (i.e., the extent to which a program is achieving expected outcomes, Centre for Excellence for Evaluation [CEE], 2009b, Appendix A), or political responses to current or emergent public problems (Aucoin, 2005). Maintaining some balance among these competing governmental policy and program concerns and evidentiary forms continues to pose challenges in the field. As any one of these governmental concerns and forms takes La Revue canadienne d’évaluation de programme 3 priority or precedence at any given time, program evaluation as a function must continuously struggle to find its place and determine its relevance for the many actors who both produce and use evaluation outputs. The fact is that political priorities can and often do drive line and functional forms and priorities including evaluation and other oversight activities. Essential decision-making in government is a complex process in which evidence from prescribed systematic research and practical experience from performing government programs and services mixes with other governmental functions and processes that include ideation, ideology, public and private interests, institutions, and individuals. The combination of these various systems, processes, and environmental variables form the basis of decisions taken by political and administrative actors. Given the social, economic, and political circumstances at the time, and the political regimes in place, the dependence on different types of evidence will predominate. The questions being posed by political and administrative leaders will vary over time according to the circumstances, thereby determining the weight attached to evidence. Program evaluation, like other research forms, adapts to changing circumstances the best it can to support what constitutes good governance. Governments have come and gone, and with them differing conceptions about the most suitable role for various checking and oversight functions including program evaluation. Some governments have placed greater emphasis on whole-of-government policy and decisionmaking that responds to public needs, others have considered sound delivery of programs and services to be paramount, while still others have emphasized economic performance and management to be pivotal aspects of good governance. With these shifts in matters of significance, the importance and role of evaluation has waxed and waned, but evaluators have always tried to demonstrate its value by providing the advice demanded at the time. In April 2009, the Treasury Board Secretariat enacted a new Policy on Evaluation (CEE, 2009b) to replace the 2001 version. This new policy has generated much discussion among the evaluation community, in much the same way that previous iterations did. Again, the outcry from the community is that the policy has failed to repair the several challenges the function has faced since it centralized in 1977. As in the case of many jurisdictions including the UK, Australia and the United States, evaluation has had less than stellar impact on 4 The Canadian Journal of Program Evaluation financial, programmatic, and strategic decisions within these jurisdictions, but this time the buzz from the community is that this is likely the last chance evaluation will get to demonstrate its value to political decision-makers (Zussman, 2010). In Canada, as in many western nations, the overwhelming concern of governments has been about economic and fiscal prudence. Given these comments on governmental priorities and evidentiary forms, this article attempts to highlight two assertions. First, if program evaluation is going to maintain its relevance, it will have to shift its focus from mainly grants and contributions type programs and service orientation to understanding how these programs and services relate to larger public policy objectives. In other words, what has been traditionally understood as “program evaluation” must take a whole-of-government perspective and understand more fully that government is more than the sum of its parts. It must understand whether public policies, institutional arrangements and frameworks, program architectures, regulations and other instruments, and resource allocations are working cohesively toward common public policy objectives. Second, if program evaluation is to assume a whole-of-government approach, then evidentiary forms must also be constructed to serve that purpose. The values school of evaluation is premised on the idea that the principal aim of evaluation as a field must appropriately assign value or merit to government policies, programs, processes, and ultimately results (Alkin, 2004, pp. 12–17). The criteria upon which it assigns such values will necessarily frame the inquiry. The 2009 Policy on Evaluation uses a particular set of value criteria that supports central agency concerns for fiscal prudence and accountability. In my view, these value criteria are likely to devalue evaluation to the role of central audit, and may do more damage than good to the federal function. In order to appreciate these two assertions, it is important to briefly review how federal evaluation has changed over time, and how the function has responded broadly to changing circumstances at the time. Likewise, evidentiary forms have also shifted to suit the roles demanded of the function. As such, the first section describes the evolution of federal evaluation in Canada from roughly 1970 to 2000. It will attempt to highlight historical emphases or preferences in the use of evaluation and the rationale for shifts in those preferences. Some of these shifts are rooted in the new public management and results-based management, while others were simply political preferences to support government initiatives. La Revue canadienne d’évaluation de programme 5 The second section sheds some light on characteristics of the new evaluation policy by first tracking the federal government’s reform initiatives since 2000. This section will show that the Canadian policy is leading the function toward supporting centralized politicallevel directives in fiscal prudence and accountability at the expense of impartial and strategic assessments of programs to inform metalevel decision-making. The last section provides some overall assessments about the future of federal evaluation as it extends from the new policy. It questions evaluation and its usefulness to support overall governmental decision-making. It also attempts to provide some thoughts on how the evaluation function can best contribute to governmental concerns for not only accountability, but also policy and program coherence (i.e., whether policies and programs as a system of government actions address meta-level problems such as poverty reduction or building technological and research capacity), budgetary and strategic planning (i.e., sound financial planning), and program performance (i.e., the extent to which effectiveness, efficiency, and economy are achieved by a program; CEE, 2009b, Appendix A). Is this direction toward an audit-based culture premised on fiscal prudence and accountability likely to be sustainable, or even desirable? Some Critical Notes on Language and Assumptions It is important to set the theoretic context of this article, given that the academic literature and government policy are not always certain about the meaning of terms nor the significance attached to ideas. First, I take the view of Alkin that evaluation research is applied social research (Alkin, 2004, p. 127). That is, evaluation consists of the application of various social research methods to provide credible information that can aid in forming public policy, designing programs, and assessing effectiveness and efficiency of social policies and programs. Evaluation is established to assess policy-oriented questions that provide guidance on how best to address public problems. This view is consistent with Rossi, Lipsey, and Freeman’s understanding that “evaluation is the systematic collection and analysis of evidence on the outcomes of programs to make judgments about their relevance, performance, and alternative ways to deliver them or to achieve the same results” (2004, p. 4). I use this particular construct, because it is the one used in the 2009 Policy on Evaluation (CEE, 2009b, p. 3.1). 6 The Canadian Journal of Program Evaluation Second, for the purposes of this article, “program evaluation” is considered separately from the larger concept or field of “evaluation.” Program evaluation refers to “a group of related activities that are designed and managed to meet a specific public need and are often treated as a budgetary unit” (CEE, 2009b, Appendix A). In other words, programs may be part of larger-order strategies, initiatives, or institutional/corporate-level projects that may comprise several individual programs. As such, programs are usually administered by mid-level managers (e.g., EX-01/02 in the federal Canadian system). The strategic users of evaluation products are generally those at the senior management ranks including the deputy head and assistant deputy ministers. Third, the purpose of any central policy framework such as the Treasury Board’s Evaluation Policy is to provide guidance on how the function ought to be directed across departments and agencies. In this context, centralized evaluation policies aim to bridge a number of key considerations, including identifying the appropriate evaluand (e.g., tasks, projects, programs, strategies/initiatives, policies); the purpose of evaluation; identifying the appropriate methods of evaluation; determining the timing of evaluations and their type; identifying the appropriate evaluator competencies to bring to bear, including whether it ought to be internally or externally driven; the ethics associated with the research; and the budgeting constraints. The requirements and preferences of central agents regarding each of these aspects will ultimately determine how the function is understood by both those carrying out evaluation studies and those using them. It is these requirements and preferences, and how these have changed over time, that is the subject of this article. I contend that the federal function has gradually moved away from the strategic uses of evaluation for program effectiveness and political responsiveness to central agency concerns for essentially fiscal prudence and accountability, some of the reasons for which will be discussed. Fourth, evaluation has traditionally tended to be multifaceted in the sense that it asks several different types of questions, aimed at different dimensions of evaluands and public actors. That is, evaluation is commonly concerned with matters of rationale and relevance (why programs were created and whether these meet identified needs), program delivery (whether program design matches implementation), program theory (whether theories of change are supported by evidence), program efficiency (understanding costs of producing outputs), program effect (attributing effects to the program), and La Revue canadienne d’évaluation de programme 7 program improvement (alternatives that can be considered that are more cost-effective or efficient, and whether these are more effective in terms of results achieved). Program monitoring (or program performance) is generally concerned more with matters of program implementation and the efficient production of outputs. I maintain that the federal function has moved toward this latter focus at the expense of effectiveness evaluation. Finally, the purpose of evaluation has changed over time. Although this has already been implied, I think it should be made quite explicit. This is not to suggest that “the good old days” are gone and we are left with something inferior in the new policy. On the contrary, the federal function has learned from its history and is indeed attempting to make the function relevant in today’s context when fiscal prudence and management drives virtually every decision. This has meant that the function has had to adapt. The challenge is whether the function is adapting in a way that ensures its continued relevance and strategic usefulness. EVOLUTION OF EVALUATION IN CANADA: GETTING TO RELEVANCE Framing the Function Evaluation policies and forms vary from country to country based on their history, institutions, culture, forms of government, and degree to which they are held as important or relevant by decision-makers. The extent to which these policies and forms can be translated into the regular public management have been widely debated in an attempt to find some common insights into how to make these evaluation policies and their resultant frames more relevant to those who use evaluation products (Furubo, Rist, & Sandahl, 2002; Mayne, 2006; Pollitt & Bouckaert, 2004). Evaluation functions were created in several jurisdictions for generally three, not necessarily overlapping, reasons: to assist political decision-makers in making strategic policy decisions; to demonstrate fiscal prudence, efficiency, and accountability (Good, 2003)1; and to provide decision-makers with information as to whether the right programs were in place and addressing the right problems effectively. In Canada, attempts have been made at different times to find a balance among these. However, as is often the case, one or more of these rationales has predominated at any given time, leading to a function that has continued to struggle to find its appropriate and legitimate place in the federal schema. 8 The Canadian Journal of Program Evaluation In response to these priorities at different times, the federal function has evolved and federal evaluators have debated the appropriate degree of centralization, and how to position themselves in departments with reasonable independence and control. Evaluators have also debated appropriate evaluation methodologies ranging from positivist ideas centring on quantitative research rigour and attribution to more post-positivist and realist ideas of creating an accurate or realistic picture of a program’s contribution positioned according to priorities at the time. There have also been competing “visions” of program evaluation in Canada from program-specific applications to more holistic or institutional contributions. Finally, there have been debates about the appropriate purpose of evaluation ranging from those who argue use is critical (e.g., Patton, 1978; Stufflebeam, 1983); to those who maintain that the assignment of value or merit criteria to programs is key (e.g., Alkin, 2004, 2012; Scriven, 1974, 1978; Stake, 2003); to those who believe that the rigour of methodology must prevail (e.g., Campbell, 1957; Rossi & Freeman, 1985; Tyler, 1942; Weiss, 1972). This schema of debates has had significant influence on the progression of the field generally. However, they have also tended to divide practitioners, much more so than audit or other oversight functions, in ways that have left strategic decision-makers with a great deal of space to define the practice of evaluation over time. In Canada, evaluation as a field has been influenced by its history, including the pre-centralization period prior to 1977. Evaluation was always a part of the federal system, and deputy ministers were responsible and held to account not only for financial scrutiny, but also the efficiency and effectiveness of ministry operations. However, evaluation was regarded in very different ways as an assessment tool prior to centralization in 1977. The Office of the Auditor General, established in 1878, was responsible mainly for actuarial assurance, but beginning in the 1960s more information on effectiveness was demanded that spoke to how well programs were delivering on governmental policy promises, rather than simply on financial and procedural efficiency. More than any other initiative, the Royal Commission on Government Organization (Glassco Commission; Canada, 1962) framed evaluation at this time. Departments were not tracking spending and their effect under the existing centralized expenditure management system established in 1931. The commission recommended decentralizing financial management authority to departments in order to “let the managers manage” (Canada, 1990) and evaluation was regarded as a means to achieve greater program accountability. As such, the Treasury Board La Revue canadienne d’évaluation de programme 9 directed departments to better monitor and evaluate their programs (Office of the Auditor General of Canada [OAG], 1975) using the Program, Planning, and Budgeting System (PPBS) in 1968, and its successor, the Program Expenditure Management System (PEMS), in 1978. Evaluation was geared to support program accountability and control at the departmental level and to validate program expenditure data using new quantitative and qualitative research methods (Stokey & Zeckhauser, 1978). In addition to the Glassco recommendations, the Auditor General at the time, Maxwell Henderson (1960–1973), was reporting whether programs were effective, taking a more holistic approach and adding his own comments on the integrity of government programming using a government-wide lens (Henderson, 1984).2 Such comments drew considerable attention to the office and attracted the ire of government (Segsworth, 1990).3 In addition to these developments, the merits of proposals were assessed by the Cabinet as a whole, rather than individually, and formalized as memoranda to Cabinet. This system was created in response to the emergence of powerful departments from the Depression to the post-war period. Under this system, Cabinet almost always deferred to ministers and their departments in deliberating on program proposals. There appeared to be a symbiotic relationship between effectiveness evaluation and Cabinet decision-making. This symbiosis was based, to some extent, on confidence in a highly technocratic and professional public service.Despite these advances, there were new calls for greater program accountability given the push to decentralization. These were early signs toward an evaluative function framed to support program accountability and performance. This was an important development as precision was brought to bear on the meaning of “program.” Prior to 1969, a program was any activity of government on a small or large scale. With the Planning Programming and Budgeting Guide, program was defined to mean a “collection of activities having the same objective or set of objectives” (Treasury Board Secretariat [TBS], 1969, p. 2). The addition of several new social programs at this time required a lens to assess innovation and effectiveness, thereby moving the function from a macro orientation to one that essentially focused on individual program initiatives—an orientation that has persisted to this day. In addition, evaluation was being asked to assess program accountability and performance of these new social programs (Jordan & Sutherland, 1979, p. 586), which set a new tone for micro-level evaluation. It also set high expectations to assess both effectiveness and also program performance. The function struggled with both aspects, especially given that the evaluation community at this time was 10 The Canadian Journal of Program Evaluation steadfast in its desire to separate these. Given these differences, calls were being made by departments to centralize and formalize evaluation as a corporate function in order to ensure a consistent approach. The fear in the evaluation community was that centralization would move the orientation of evaluation from assisting strategic ministry decisions to supporting central concerns for accountability. Experimenting Within the Field: 1970 to 1980 The 1970s were defined by a number of developments framing accountability of programs. First, the Operational Performance Measurement System (OPM) was approved by the Treasury Board in 1973, which required departments to provide performance data to the Treasury Board before the 1978 forecast. The system was the foundation of the evaluation framework and the direct input to the performance measurement requirements (TBS, 1976, p. 4), which directed departments to improve their evaluation function in order that they would be able to use “adequate and reliable means, wherever feasible, for performance measurement” (TBS, 1976, p. 5). The function was directed to institute measures mainly of program efficiency related to resource usage and service quality, and operational or procedural effectiveness for grants and contributions type programs. Finally, the Treasury Board was given the mandate to monitor the progress of evaluation units and the quality of their submissions as these supported the OPM. While these changes were occurring at the central agency level, the Auditor General, J. J. Macdonell, released a blistering report of the government, stating in his 1976 report that “the Government … has lost, or is close to losing, effective control of the public purse … financial management and control in the Government of Canada is grossly inadequate” (OAG, 1976, p. 1.2). This report led in part to the creation of the Royal Commission on Financial Management and Accountability in 1976 (Lambert Commission; Canada, 1976), which reported in March 1979. Where Glassco recommended a strategy of letting the managers manage, Lambert concluded that central controls be improved. Specifically, he proposed the creation of a fiscal plan for government, covering five-year periods, which would see the federal government allocate resources according to centrally derived priorities, but within the limits of revenues—the beginning of government concern for mainly fiscal prudence. These plans would be set jointly by the Financial Secretariat of the Board of Management (Treasury Board), Privy Council Office, and the Department of Finance, which would assume the lead role for budgeting. The fiscal La Revue canadienne d’évaluation de programme 11 plan was expected to restore financial accountability by ensuring central guidance of funding priorities and the setting of spending ceilings at the departmental level (Sutherland, 1986, 119–124). Such events contributed to the creation of the first formalized evaluation policy in 1977 with earmarked resources attached from the Treasury Board (TBS, 1977). In this respect, Canada was second only to the United States in formally embracing centralized program evaluation into its machinery. Evaluation units were established, which reported directly to the heads of departments and agencies. The intent of this first policy was that all departments were to conduct “periodic evaluations in carrying out their responsibilities for the management of their programs” on a five-year cycle (coordinated with the fiscal plan) and that deputies were to make informed management decisions, demonstrate accountability, and provide advice to ministers on the strategic direction of programs (TBS, 1977, p. 2). Of note in this first policy was that evaluations were to be “objective” according to three criteria: terms of reference were to be established for each evaluation project; evaluations were to be conducted independently; and reporting was to be clearly established and communicated to senior management. What could be defined as the “directive” on evaluation clarified that evaluations were to assess the operation of programs (formative), clarify program objectives where necessary (formative), reduce or eliminate programs (strategic), and identify those programs that were held in high priority by the government of the day (strategic). Evaluation would be concerned with supporting sound financial management and accountability in line with the Lambert recommendations. The evaluation function worked in concert with the Auditor General, who was limited to examining the efficiency and economy of government activities (Canada, 1977, p. 7[2]).4 Examination of effectiveness was left to the evaluation function. However, this was deceiving. In fact, the policy was a departure from the 1969 Program, Planning and Budgeting Guide that preceded it. Whereas the Guide assumed an “objective” and independent thermostat-like financial control system that constantly self-corrects and self-reports on itself according to objective, quantitative data and criteria, the Treasury Board’s policy directed departments and agencies to establish three-to-five year evaluation strategies entirely based on the best judgement of program managers [in concert with senior managers]. (Muller-Clemm & Barnes, 1997, p. 56) 12 The Canadian Journal of Program Evaluation Sutherland rightly concluded that the Treasury Board change was an “ad hoc analysis, only a judgemental kind of control … without providing any formulae for how the information should be weighted, evaluated and interpreted within the bureaucratic-political context” (Sutherland, 1990, p. 145). This conclusion cannot be understated: it represented a departure from past experience and one that dogs the function today. The previous value of the function was that its products had a direct input into strategic decision-making. After the change, without either criteria-driven selection of the objects of evaluation or an unequivocal requirement that the findings of evaluation be used, they were easily lost within the system with no required input into management decisions. In 1978, the Office of the Comptroller General (OCG) under Harry Rogers and a separate branch, the Program Evaluation Branch, was re-established5 to assist departments and agencies to institute and maintain their evaluation function as per the new policy. The contribution of the Comptroller General at the time was to distinguish between “big-P” and “little-P” programs. Big-P programs were major federal government-wide initiatives, which were the subject of attention in the annual Estimates. Little-P programs were departmental programs focused on particular or functional responsibilities. The OCG directed evaluation units to focus on little-P programs (Dobell & Zussman, 1981; Office of the Comptroller General of Canada [OCG], 1979) despite the recommendations of the Lambert Commission to concentrate on “big” programs on a five-year cycle. The combination of the new evaluation policy and the orientation undertaken by the OCG to focus on departmental programs essentially set the evaluation function on a path that has been exceedingly difficult to adjust. Despite this dramatic shift in policy, the evaluation community had grown considerably during the 1970s. The Canadian Evaluation Society (CES) was established in 1981, which added lustre to the new field. Textbooks were being written, academic journals were springing up, evaluation courses were becoming commonplace in public administration programs, and standardized methodologies were being agreed upon. Overall, the 1970s established major direction for the field that remains firmly in place. Engraining Concerns for Fiscal Prudence: 1980 to 1990 The 1980s are characterized as a decade of centralizing and decentralizing responsibilities in line with the rise of the New Public Management (NPM) in Canada. The concern raised by the NPM was that La Revue canadienne d’évaluation de programme 13 governments were becoming too unwieldy, that governments should behave more like the private sector to gain efficiencies in the delivery of public services, and that these services be delivered at an appropriate point of subsidiarity (Aucoin, 1995, pp. 8–10). Several attempts were made during this period to create a performance management and data collection system that respected innovations in financial planning such as PEMS. In 1981, the TBS directed departments to include performance information in their annual Estimates to Parliament to judge program performance (TBS, 1982). The challenge for evaluation units was that judgement regarding the type of performance information to be collected was left mainly to program managers without necessarily tying these to larger public policy objectives that may include more than one program. Such shifts only served to reinforce the fact that evaluation was tied to program-level considerations for process and performance (i.e., outputs production). Attempts were made to consider results-based outcomes, but without adequate performance management systems and performance monitoring, tracking results was challenging. In 1984, amendments were made to the Financial Administration Act that required evaluation units to gauge the value-for-money of “little-P” programs. In 1988, the Treasury Board spearheaded the Increased Authority and Accountability for Ministers and Departments initiative (IMAA), followed by guidance from the OCG, Working Standards, released in 1989. The guidance removed scheduling decisions for evaluations from program managers and created a negotiated agreement between departments and the TBS through memoranda of understanding. The idea was to reframe the function to support assessing results. It also emphasized the responsibility of senior managers to evaluate their programs on a regular basis. In 1989, the Supreme Court ruled that the Auditor General must seek access to documents through the courts when access has been refused by Parliament or Cabinet. Access to information on the rationale and relevance of programs has long been a sticking point. The response to the ruling was greater use of the media by all Auditors General. The cost has been an eroding of the impartiality and neutrality of the Auditor General (Saint-Martin, 2004). Although evaluation has not relied on similar strong-arm approaches for data collection, accessing source documents on rationale and relevance has been an equal concern. Although the Office of the Comptroller General was generally concerned with making evaluation a relevant part of government de- 14 The Canadian Journal of Program Evaluation cision-making, it played a significant role in setting the function’s direction (Rutman, 1986, p. 20). There was consensus that a key intent of the new Evaluation Policy in 1977 would focus on summative evaluation with a view to understanding results (CEE, 2005b; Foote, 1986, pp. 91–92; Maxwell, 1986; Raynor, 1986; Rutman, 1986), but the fact was that the OCG contributed to a different direction for the function. Under the leadership of Comptroller General Harry Rogers in 1983, comptrollers were introduced in all departments responsible in part for the planning and coordination of management information. Rogers instituted a “challenge function” whereby financial officers would examine departmental proposals for programs (i.e., programs subject to mid-level managers) to systematic review before proceeding to external actors such as the TBS and the Privy Council Office. The result was generally the creation of some unit in departments to review program proposals for operational integrity— a focus on formative evaluation with a view to serving line managers rather than the deputy head as intended (CEE, 2004a, 2005b; Mayne, 1986, p. 99; Rutman, 1986, p. 21). Meanwhile, federal governments continued to make a concerted effort to reform their financial and program management regimes. In 1984, Prime Minister Brian Mulroney announced his intention to review the size of government, and in 1985 he established the Neilson Task Force to review all federal programs. However, the task force did not use evaluations in any significant way. In fact, evaluation had come under a great deal of skepticism as a result of this exercise because departmental evaluation units could supply little of the information the task force needed for a government-wide perspective (Mayne, 1986, pp. 98–100; Rutman, 1986, pp. 21–22). Likewise, departmental evaluations tended to show little direction as to where and how programs could be improved let alone inform other strategic decision-making exercises (Rist, 1990; Savoie, 1994). The function faces quite similar problems today as an input to the Strategic Review exercises of departments. Once again the Auditor General examined the program evaluation function and, although it was found that progress was being made in implementing evaluation in departments, the greatest criticisms were, ironically, at the government-wide level. At the departmental level, the audit found that of the 86 evaluation studies assessed, “approximately half the studies which attempted to measure the effectiveness of programs were unable to adequately attribute outcomes to activities” (OAG, 1983, p. 3.14). At the government-wide level, the Auditor General concluded that La Revue canadienne d’évaluation de programme 15 [a]lthough current policy and guidelines recognize the existence of interdepartmental programs, they fail to specify procedures to be followed in conducting evaluations of them. The consequence was that interdepartmental programs were not subjected to the same type of orderly review and evaluation as programs administered wholly within single departments and agencies. (OAG, 1983, p. 3.22) That is, evaluations that were able to support larger decisions related to public policy outcomes government-wide remained a serious concern. Ultimately, the 1980s ended with growing signs of scepticism. As Dobell and Zussman concluded about the experience of evaluation between 1969 and 1981, “a solid decade, almost two, has gone into changing the words and the forms” (1981, p. 406). In the 1980s, the function lacked an agreed-upon theory and practice despite some guidance from Canada (e.g., 1981 TBS guides; Zalinger, 1987) and the United States (e.g., Boruch, McSweeney, & Soderstrom, 1978; Campbell, 1975; Tyler, 1942), resistance to addressing the evaluation needs of decision-makers, and an overall resistance by the evaluation community to evaluation activities aimed at effectiveness (Sutherland, 1990, p. 163). These pathologies were no better addressed in the 1990s. The Function under Criticism: 1990 to 2000 The 1990s did not begin well for the function as the Comptroller General, Andy MacDonald, released a paper entitled “Into the 1990s: Government Program Evaluation Perspectives.” The paper questioned the usefulness of the evaluation function arguing that although “federal evaluation clearly pays for itself … it is not a major player in resource allocation.” It further stated that the review of programs rather than being on a five-year cycle was much closer to twelve (OCG, 1991, pp. 3–6). It added that between the IMAA and Public Service 2000 initiatives and the decentralizing of management authorities in the hands of departments and agencies, the evaluation function would continue to weaken. The authors of the report recommended that in order to combat this problem, evaluators should be centralized under the authority of the OCG (Foote, 1986; OCG, 1991, p. 21). Although this was not a new idea, resistance was and continues to be high. 16 The Canadian Journal of Program Evaluation In addition, the Auditor General again reviewed the evaluation function in 1993. Three chapters were dedicated to this task: Chapter 8 reviewed the field and examined “the case” for maintaining evaluation; Chapter 9 audited the operation of evaluation units in departments and agencies; and Chapter 10 explored how best to make the evaluation function work better. In addition to these three chapters, the Auditor General identified in Chapter 1 some “intractable issues,” one of which was “ensuring that program evaluation asks the tough questions and assesses significant expenditures” (OAG, 1993a, p. 1.13). The general conclusions were significant: Program evaluations frequently are not timely or relevant. Many large-expenditure programs have not been evaluated under the policy. Despite policy requirements for evaluating all regulatory measures, half have been evaluated, although other reviews have taken place in many departments. The development of program evaluation over the last ten years has been primarily in departments’ evaluation units. More progress is required in developing program evaluation systems that respond effectively to the interest in effectiveness information shown by Cabinet, Parliament and the public. (OAG, 1993c, pp. 8.5, 8.6) The OAG found that most evaluations tended to be formative, focusing on program implementation and improving program design and management. The OAG also examined the performance of evaluation units. The results were not encouraging: in 1991–92, only $28.5 million was spent on program evaluation across all federal government departments. Yet program evaluation is charged with considering for evaluation all the programs and activities of government. It evaluated about one quarter of government expenditures from 1985–86 to 1991–92, far short of original expectations that all programs would be evaluated over five years. For example, only 53 percent of regulatory programs were evaluated over the required seven-year timeframe. The program evaluation capability established by the federal government in the early 1980s is still in place, but its strength is declining. Fewer resources tend to produce fewer studies. (OAG, 1993b, p. 9.1) With respect to the governance of evaluation units, the OAG found that La Revue canadienne d’évaluation de programme 17 [p]riority has been given to meeting the needs of departmental managers. As a result, evaluations examine smaller program units or lower-budget activities and focus on operational performance. They are less likely to challenge the existence of a program or to evaluate its cost-effectiveness. (OAG, 1993b, p. 9.2) In September 1993, TBS concluded in a separate examination that “[f]usion into a single-function Review group carries potentially more serious negative consequences, and tends to be initiated for quite different reasons that the two preceding forms of linkage” (TBS, 1993, p. 22). As a result of its major audit, some recommendations were made to improve the performance and effectiveness of the evaluation function: evaluations should be subject to external review and assessment in order to ensure objectivity (OAG, 1993d, pp. 10.2, 10.3); timeliness and relevance of evaluations could be enhanced if linked to the major decisions of government, especially resource allocation, program and policy reviews, and accountability reporting (OAG, 1993d, p. 10.4); quality assurance could be improved with monitoring by the Comptroller General (OAG, 1993d, p. 10.5); and evaluation units should be made more responsive by linking the function to decision-makers’ needs (OAG, 1993d, p. 10.6). Evaluations must be timely, political interference must be decreased, cooptation by program and audit units must be curbed, and recommendations must have an effective entry point into decision-making (Muller-Clemm & Barnes, 1997). The next iterative change to the federal evaluation policy emerged in 1994 under the “Review Policy,” which formally combined audit and evaluation despite advice to the contrary. The thought was that this could better serve the reporting needs of program managers. Despite this error in governance, evaluation was still expected to produce timely, relevant, objective, practical and cost-effective evaluation products.… [and to review] the continued relevance of government policies and programs; the impacts they are producing; and on opportunities for using alternative and more cost-effective policy instruments or program delivery mechanisms to achieve objectives. (TBS, 1994, pp. 14–16) This was reinforced by such initiatives as the 1994/95 GovernmentWide Review of Year-End Spending, which called on deputy heads 18 The Canadian Journal of Program Evaluation to consider whether “value-for-money was obtained in expenditures decisions” (TBS, 1994), and the extent to which expenditures met a defined program need. Supporting documentation including program evaluations were sought to validate decisions. This was one of several attempts to steer audits and evaluations in the direction of verifying value-for-money. The OAG assessed the function again in 1996 and found that little had changed. In particular, it noted that “[e]valuations continue to emphasize the needs of departmental managers—focusing on smaller program components and operational matters” (OAG, 1996, p. 3.3) and even less progress on value-for-money considerations. This certainly indicates a continued misalignment of evaluation efforts and the needs of senior decision-makers. The evaluation policy change also came on the heels of a 1993 study on public service organization launched by former Clerk of the Privy Council, Gordon Osbaldeston, who recommended that the Treasury Board shift its role from functional oversight and control to creating the conditions necessary for improved financial management performance at the departmental level (Osbaldeston, 1989, p. 174). In effect, departments and agencies were to have greater responsibility for ensuring management control and oversight. In addition, Prime Minister Jean Chrétien initiated a government-wide program review to improve the overall “management” of government. It was intended, like the Nielson Task Force before it, that the evaluation function would play a significant role. In fact, a TBS-sponsored report in 2000 argued that evaluation units needed to do more to assist program managers to develop appropriate evaluation frameworks and performance measures. However, it found that the Program Review exercise “led to a serious undermining of the capacity of evaluation functions” given that most of the effort of evaluation units was spent on performance measurement, not evaluation. Finally, it reiterated previous reports calling for a major review of the evaluation policy given that the 1994 policy “muddies the distinction between audit and evaluation” (TBS, 2000b, p. 2). In short, the 1994 policy was met with great disappointment—the function had once again failed to deliver in the eyes of decision-makers (Gow, 2001; Muller-Clemm & Barnes, 1997; Pollitt, 2000). The 1990s also saw the federal government considering performance measurement as a way of improving accountability for results (Lahey, 2010, p. 2). The PS-2000 initiative called for the reduction of red tape, empowerment of staff, devolution of authorities to depart- La Revue canadienne d’évaluation de programme 19 ments, decentralizing decision-making structures, and eliminating unnecessary regulations. Improving the quality of the public service was the principal aim, and focused on improving the management culture of the bureaucracy. In part, it called on public servants to be more “entrepreneurial,” an idea later cited in Reinventing Government (Osborne & Gaebler, 1992). In this vein, and unlike previous reform efforts, public servants would be given more authority to make decisions, but under a system of “effective accountability for the use of the authorities” (Canada, 1990, p. 89). The report proposed the implementation of results and performance standards for managers (Canada, 1990, p. 90). Such an accountability structure was meant to create a culture of service to the public. Few of these ideas were new, but they did frame additional reform efforts in the 2000s, including a revision of the Evaluation Policy in 2001. THE 2000s: MECHANIZING THE FUNCTION The 2000s was a decade of what could be called the mechanization and reform of federal governance. The idea was that systems engineering could be brought to bear on rationalizing public management. For program evaluation, it assumed that with centralized and consistent application of organizational frameworks, oversight becomes simpler: auditors and evaluators simply check program performance against departmental and program “placemats” and “strategic objectives.” Although a key aim of results-based management is to understand results, such results were still conceived at the programmatic level and assumed they could be aggregated to strategic level outcomes. The Antecedents of the Current Reforms: 2000 to April 2009 Public Management Reform: New Comptrollership The root of current reforms to several federal functions including evaluation can be traced to the Modern Comptrollership Initiative (MCI) introduced in 1998 and the subsequent Results for Canadians Initiative established in 2000. Subsequent reforms emanated from the Federal Accountability Act enacted in 2006. An objective of the 1994 Program Review was to improve the overall management of government. Based on the recommendations of an independent review panel, Prime Minister Chrétien designated the Treasury Board as the federal government’s “management board” 20 The Canadian Journal of Program Evaluation in 1997, which set out to improve the efficiency of resource management and bureaucratic decision-making. MCI was a set of principles, rather than rules, driven by a commitment to generally accepted standards, values, and planned results achieved through flexible delivery models as opposed to centrally driven processes (Canada, 1997a; Library of Parliament, 2003). The review panel recommended ways to integrate private sector comptrollership practices into federal public management practices, and suggested that central agency and departmental financial analysts and program managers must collaborate to prioritize, plan, set goals and objectives, and participate in processes for defining and achieving results. It set out four key areas for effective departmental stewardship: integrating timely performance information, instituting sound risk management, effecting appropriate stewardship and control systems, and rallying public servants around a shared values and ethics code (Canada, 1997a, pp. 3–4). The general principles of the MCI were formalized into practice with the Results for Canadians Initiative launched in 2000. This initiative focused “on results and value for the taxpayer’s dollar, and demonstrate[d] a continuing commitment to modern comptrollership” (TBS, 2000a, p. 3). This initiative focused on four critical areas to a well-performing public sector: move to a citizen-focus on government activities, management to be guided by clear values and ethics, departments and agencies to focus on results, and responsible spending (TBS, 2000a, pp. 5–6). It was in this spirit of stewardship that the next evaluation policy was crafted. Again, evaluation was regarded as a way to move government to results-based decision-making. However, the framework document was critical of the function raising expectations of a repair: “Historically, governments have focused their attention on resource inputs (what they spend), activities (what they do), and outputs (what they produce). Accurate information at this level is important but insufficient in a results-focused environment” (TBS, 2000a, p. 11). Evaluation Policy 2001 The 2001 Evaluation Policy, which came into effect on 1 April, was the result of several months of consultations within the federal evaluation community. The preface of the policy lays out its objectives: “this policy supports the generation of accurate, objective, and evidenced-based information to help managers make sound, more effective decisions on their policies, programs, and initiatives and through this provide results for Canadians” (CEE, 2001, Preface). La Revue canadienne d’évaluation de programme 21 In effect, the function was regarded as a “management tool” (CEE, 2001, p. 1) intended to support program managers in their efforts at program monitoring. It outlined two purposes for evaluation: •To help managers design or improve the design of policies, programs, and initiatives; •To provide, where appropriate, periodic assessments of policy or program effectiveness, of impacts both intended and unintended, and of alternate ways of achieving expected results. (CEE, 2001, p. 2) Although the policy promised emphasis on program effectiveness, it required program line managers to embed evaluation into the lifecycle management of policies, programs, and initiatives by •Developing Results-based Management Accountability Frameworks (RMAFs) for new or renewed policies, programs, and initiatives; •Establishing ongoing performance monitoring and performance measurement practices; •Evaluating issues related to the early implementation and administration of the policy, program, or initiative, including those that are delivered through partnership arrangements (formative and mid-term evaluation); and •Evaluating issues related to relevance, results, and costeffectiveness. (CEE, 2001, p. 2) A key priority for heads of evaluation was to “provide leadership and direction to the practice of evaluation in the department” by ensuring “strategically focused evaluation plans, working with managers to help them enhance the design, delivery and performance measurement of the organization’s policies, programs, and initiatives, and informing senior management and departmental players of any findings that indicate major concerns respecting the management or effectiveness of policies, programs or initiatives” (CEE, 2001, pp. 3–4). Likewise, departmental managers must “draw on the organization’s evaluation capacity … and ensure that they have reliable, timely, objective, and accessible information for decision-making and performance improvement” (CEE, 2001, p. 4). Interestingly, a TBS study conducted in April 2004 concluded that the function has not lived up to the original policy expectations set out in 1997, measuring the effectiveness of policy and programs. In fact evaluations have resulted 22 The Canadian Journal of Program Evaluation largely in the operational improvements to and monitoring of programs, rather than more fundamental changes. (CEE, 2004c, p. 3) As such, the policy exhorted that “evaluation discipline should be used in synergy with other management tools to improve decisionmaking” (CEE, 2001, p. 2). However reasonable a goal, policy and practice were observed to be disconnected. The evaluation community was critical of the impact of the 2001 policy: The environment of program evaluation practice today presents three key interrelated threats: the dominance of program monitoring, the lack of program evaluation selfidentity, and insufficient connection with management needs. Various events have propelled performance monitoring and short-term performance measurement to the front of the management scene; despite the intentions of the RMAF initiatives that aim to focus people on results, many managers are now content to possess a performance measurement framework that often focuses on the obvious outputs rather than providing a more in-depth assessment of program logic and performance to better understand why certain results are or are not observed. (Gauthier et al., 2004, p. 167) This set of conclusions was supported by a Treasury Board study in 2004, which concluded that although the overall quality of evaluations had certainly improved, more work needed to be done with respect to producing high-quality effectiveness studies, connecting evaluation to performance and efficiency, and assisting strategic decisions (CEE, 2004c. 2005). Management Accountability Framework To provide more impetus to program performance support, the MCI would be formalized into a system of ideal management systems and practices with the creation of the Management Accountability Framework (MAF) in 2003. The intent of the MAF was “to develop a comprehensive system that would attempt to gauge and report on the quality of management of departments and agencies, and encour- La Revue canadienne d’évaluation de programme 23 age improvement every year” (Lindquist, 2009, p. 51; OAG, 2002). MAF was an engineering spectacle designed to improve management performance in the areas of governance and strategic direction; public service values; policy and programs; people (HR management); citizen-focused service; risk management; stewardship; accountability, results, and performance; and learning, innovation, and change management (CEE, 2003). The role of evaluation was to support the achievement of program performance (CEE, 2004a, p. 4). Federal Accountability Act 2006 The Federal Accountability Act was enacted on 12 December 2006. Prime Minister Stephen Harper identified 13 areas in his Action Plan that this legislation was to address for the purpose of “rebuilding the confidence and trust of Canadians” (Canada, 2006b, p. 1). One priority called for strengthening the accountability of departments by clarifying the responsibilities of deputy heads and bolstering internal audit units (Canada, 2006a). Section 16.1 of the Act makes “the deputy head or chief executive officer of a department responsible for ensuring an internal audit capacity appropriate to the needs of the department” (Canada, 2006c, p. 188). Section 16.4 of the Act designates the deputy head “the accounting officer of a department … accountable before the appropriate committees of the Senate and the House of Commons” (Canada, 2006c, p. 189) for such matters as “(a) the measures taken to organize the resources of the department to deliver departmental programs in compliance with government policies and procedures; (b) the measures taken to maintain effective systems of internal control” (Canada, 2006c, p. 189). Section 16.2 of the Act also required deputy heads to establish internal audit committees comprising external members to provide some external “functional oversight” over internal audit and on those internal systems requiring management attention under the MAF (TBS, 2003, 2006). The combination of the MAF, accounting officer, and audit committee directives, and the new comptrollership reinforces that management of the department is “job 1” for deputy heads as opposed to traditional responsibilities mainly for policy development (Shepherd, 2011). The role of evaluation in this schema is significant. Although not referred to directly in the Federal Accountability Act, its role in the oversight functions of the department is fundamental, as it supports the accounting officer responsibilities of the deputy head. These are spelled out further in the new Evaluation Policy 2009. 24 The Canadian Journal of Program Evaluation Expenditure Management System Amendments to the Expenditure Management System (EMS), established in 2007, are responsible for driving much of the discussion of reforms to the 2009 Evaluation Policy. The EMS is built on three pillars: Managing for Results (benchmarking and evaluating programs and demonstrating results); Up-Front Discipline (providing critical information for Cabinet decision-making by ensuring all funding proposals have clear measures of success); and Ongoing Assessment (reviewing all direct program spending on an ongoing basis to ensure program efficiency and effectiveness) (TBS, 2007, pp. 1–2). With respect to managing for results, evaluation is expected to support the Management, Results, and Resources Structure Policy (MRRS) established in 2005, contributing toward a consistent government-wide approach to the collection, management, and reporting of financial and non-financial information on program objectives, performance, and results. As such, the evaluation policy was identified as a critical element to support improved reporting to Parliament. The TBS published the Performance Reporting: Good Practices Handbook in August 2007, which provided guidance on ways to produce effective reporting using the MAF. Reporting would be based on departmental Program Activity Architectures (PAA) that distinguish departmental from whole-of-government reporting. The PAA was intended to serve as the basis for all parliamentary reporting through departmental performance reports, essentially self-report cards on plans and priorities (Report on Plans and Priorities) set at the beginning of each fiscal year. The idea appeared sound: set out a one-year plan at the beginning of the year and then report on how the organization fared at the end of the year against those plans. Driven by the TBS, departments were to inventory their program activities and validate whether they were contributing to their strategic outcomes. These PAAs were regarded as a departmental “logic model” against which results would be assessed. The main challenge, however, for the use of PAAs was that while departments were asked to frame their plans and priorities against strategic objectives and results, funding from the centre continued to flow through individual programs. In essence, the challenge for evaluation was to assess individual programs—but against a mechanism interested in policy-level “strategic objectives.” The engineering was akin to that of evaluators being asked to examine the performance of an automobile’s brakes when the driver is more interested in knowing whether the entire car is safe and reliable. The function continues to struggle with how to support corporate decisions of results when “programs” are the main lens. La Revue canadienne d’évaluation de programme 25 With respect to up-front discipline, the TBS published a revised Guide to Preparing Treasury Board Submissions in July 2007. The purpose of the guide was to improve the quality of information in TB submissions by requiring departments to demonstrate the “linkages between policy, program and spending information by requiring that MRRS information be included” (TBS, 2007, p. 5). The guide also required that evaluation costs be separated from regular program costs—a key development that may serve to improve the independence of the function. Finally, with respect to ongoing assessment, departments and agencies are required to carry out regular “Strategic Reviews.” These reviews are to be carried out every four years “to assess how and whether programs are aligned with priorities and core federal roles, whether they provide value-for-money, whether they are still relevant in meeting the needs of Canadians, and whether they are achieving results” (TBS, 2007, p. 5). Reviews are carried out according to the PAA framework using evaluations, audits, MAF assessments, and other sources as supporting evidence. The idea is to regularly align financial resources to government priorities by identifying high- and low-performing programs and assigning savings to new priorities. Overall, the idea behind the EMS was to carry out regular evaluation of programs and policies followed up by a regular financial alignment exercise every four years, essentially a formal institution of the 1994 Program Review exercise. The 2009 Evaluation Policy The Requirements The current Evaluation Policy took effect on 1 April 2009 and comprises three important elements: policy (overall requirements), directive (operational requirements), and standard (minimum requirements for quality, neutrality, and utility). The principal objective of the policy “is to create a comprehensive and reliable base of evaluation evidence that is used to support policy and program improvement, expenditure management, Cabinet decision-making, and public reporting” (CEE, 2009b, p. 5.1). Under Section 6.1, Deputy Heads are now directly responsible for “establishing a robust, neutral evaluation function in their department” (CEE, 2009b). Such responsibilities include 26 The Canadian Journal of Program Evaluation •The Head of Evaluation reports directly to the Deputy Head (6.1.1, 6.1.2); •A departmental evaluation committee is established to advise the deputy head on all evaluation-related activities (6.1.3); •Evaluation findings should be used to inform program, policy, resource allocation, and reallocation decisions (6.1.5); •A rolling five-year departmental evaluation plan that aligns with the MRRS, supports the EMS (including Strategic Reviews), and includes all ongoing programs of grants and contributions is to be maintained and submitted to the TBS (6.1.7); •Coverage: Evaluation must include all direct program spending (excluding grants and contributions) every five years (6.1.8a); all ongoing grants and contributions programs to be evaluated every five years (6.1.8b); the administrative aspect of major statutory spending is evaluated every five years (6.1.8c); include programs set to terminate over a specified period of time (6.1.8d); include specific programs requested by the Secretary of the Treasury Board in consultation with the deputy head (6.1.8e); and include programs identified in the Government of Canada Evaluation Plan (6.1.8f); •Ensure that ongoing performance measurement is implemented throughout the department in order to support the evaluation of programs (6.1.10); •The Secretary of the Treasury Board is responsible for functional leadership of the function including monitoring the health of evaluation as a function (6.3.1a); and developing a government-wide Evaluation Plan (6.3.1b). The Directive on the Evaluation Function sets out the responsibilities of the Head of Evaluation including developing a rolling five-year evaluation plan (CEE, 2009a, p. 6.1.3a); ensuring the alignment of the plan as described in the policy with the MRRS (6.1.3b.i.), the EMS (6.1.3b.ii.), and appropriate coverage (6.1.3b.iii–viii); identifying and recommending to the deputy head and evaluation committee a risk-based approach for determining the evaluation approach and level of effort to be applied to individual evaluations (6.1.3c); submitting and implementing the evaluation plan annually (6.1.3d, e); and “ensuring that all evaluations that are intended to count toward the coverage requirements of subsections ‘a,’ ‘b,’ or ‘c’ of subsection 6.1.8 of the Policy on Evaluation, include clear and valid conclusions about the relevance and performance of programs” (6.1.3f).6 This final sub- La Revue canadienne d’évaluation de programme 27 section in particular makes it clear that Heads of Evaluation carry out those studies as prescribed by the policy, and that other studies that may be desired by the deputy head or evaluation committee are considered if there are adequate resources remaining in the budget. The Directive also identifies program managers (as defined in the Evaluation Policy) as responsible for implementing and monitoring ongoing performance measurement strategies, and ensuring that credible and reliable performance data are being collected to effectively support evaluation (CEE, 2009a, p. 6.2.1). A key requirement for program managers is also to develop and implement management responses and action plans for evaluation reports (6.2.2). This is a sound development in the Evaluation Policy that builds on similar requirements in the federal Audit Policy. An issue of significance in the Evaluation Policy is a reduction from four to two in the issues to be studied in evaluations. Specifically, the policy identifies relevance and performance of programs the main concerns, as opposed to the 2001 policy issues of rationale/relevance, design/delivery, success/impacts, and cost effectiveness/alternatives. Although many of these are covered under the new questions, there remains some frustration in the evaluation community about the specific focus on central agency concerns rather than on departmental ones attached to these value criteria. The specific value questions are summarized accordingly: Relevance Issue 1:Continued need for the program: Assessment of the extent to which the program continues to address a demonstrable need and is responsive to the needs of Canadians. Issue 2:Alignment with government priorities: Assessment of the linkages between program objectives and (a) federal government priorities and (b) departmental strategic objectives. Issue 3:Alignment with federal roles and responsibilities: Assessment of the role and responsibilities for the federal government in delivering the program. Performance Issue 4:Achievement of expected outcomes: Assessment of progress toward specified outcomes (including immediate, intermediate, and ultimate outcomes) with reference to performance targets, program reach, and program design, including the linkage and contribution of outputs to outcomes. 28 The Canadian Journal of Program Evaluation Issue 5:Demonstration of efficiency and economy: Assessment of resource utilization in relation to the production of outputs and progress toward expected outcomes. There are several moving parts to the policy. There is a cacophony of supporting and ancillary policies relating to oversight and accountability, management, stewardship, planning and budgeting, and control. The final section attempts to make sense of the current requirements by focusing on some aspects that are positive additions, and others that could further challenge the function in the future to improve. A POLICY ASSESSMENT: INCREMENTAL IMPROVEMENT OR MORE EROSION? In the fall of 2009, the Auditor General, Sheila Fraser, included a chapter titled “Evaluating the Effectiveness of Programs” in her annual report, which concluded that “departmental evaluations covered a relatively low proportion of its program expenses—between five and thirteen percent annually across the six departments [and that] the audited departments do not regularly identify and address weaknesses in effectiveness evaluation” (OAG, 2009, pp. 2, 11). Clearly, as this refrain has been stated repeatedly since at least 1978, one must ask what evaluation units have been doing all this time. As posited, either evaluators have not been providing a product that decision-makers want, or they have not been able to deliver on their commitments. Although the 2009 audit focused on evaluation coverage, it recognized that departmental capacity for evaluation has traditionally been hampered by shortages in evaluators, the addition of extra responsibilities placed on units, and general overload in workloads requiring the use of contractors (Cathexis Consulting Inc., 2010; OAG, 2009). Many of these observations were also identified by a meeting of the CES National Capital Chapter (Canadian Evaluation Society, 2009). The evaluation community is concerned about the changes in the evaluation policy and how departmental units will cope. In particular, they were deeply concerned about the support they are receiving from TBS to assist them in their responsibilities and expectations (Canadian Evaluation Society, 2009). This section concludes with a few brief thoughts on the direction of the new policy. In particular, I am interested in whether it is likely La Revue canadienne d’évaluation de programme 29 to move the function back to its traditional roots concerned with matters of determining policy and program effect, which in my view makes the function more relevant, or locks it on a critical path to supporting fiscal prudence that could lead to its continued decline. Issues of Concern Moving Forward Based on the several studies to date, the following issues appear to capture the challenges facing Canadian federal evaluators and evaluation units. There have been some shifts in policy emphasis from the 2001 version. For example, the TBS is attempting to harmonize some of the policy engineering around oversight, budgeting, planning, and performance measurement in hopes of making the function relevant by connecting its products and services to these areas more effectively. The challenge is that even if this were possible, it will take time to acclimatize the evaluation community and harmonize information and reporting systems. Some high level challenges are • the appropriate evaluand and target of evaluation products • the “governance” of the Policy on Evaluation • the appropriate areas of input for evaluation products to corporate decisions including operational cycles (strategic reviews and evaluation plans) • focus of the function (issues and questions) and coverage. Appropriateness of the Evaluand and Target of Evaluation Products Refocusing evaluation on strategic decision-making has been argued throughout the function’s evolution (Gauthier et al., 2004; Mayne, 1986, 2001; Prieur, 2011). Indeed, evaluation is well-placed to ask the larger questions of program, initiative, strategy, and policy effectiveness. That is, are departments (and indeed government as a whole) doing the right things in a way that addresses real public policy problems? In this respect, evaluation in Canada must effectively balance two principles: to provide objective and useful findings, conclusions, and recommendations relating to programs; and to support Parliament’s efforts to serve the public good, the principal target of public policy. Evaluation was centralized in 1977 with this purpose in mind, but has gradually moved away from this purpose. To be relevant, evaluation has to tackle the big questions that perplex departmental and government-wide decision-makers. In addition, it has to do better at examining not simply policies and programs but 30 The Canadian Journal of Program Evaluation the instruments of policy as well, including regulation, exhortative instruments, guidelines, partnerships, contracts, and hybrid tools and processes. These are key contributors to effective policies and programs. The idea is not to evaluate the extent to which program results can be attributed to these instruments, but to understand whether these instruments in combination with relevant policies and programs contribute to resolving specified public problems (Mayne, 2001, 2008; McDavid & Huse, 2006). The evolution of the evaluation function as shown has been predisposed to examine small evaluands (or “small-p” programs), leading one to the conclusion that evaluation has not done well at giving deputy heads the information they need at the strategic level. This observation was borne out in a recent study that consulted deputy heads on their evaluation functions, noting that performance measurement and evaluation “are not currently providing deputy heads with a complete picture of organization-wide performance” (Lahey, 2011, p. 4.4). Targeting errors persist as although these tools are adept at helping deputy heads to manage programs “individually,” there is little effort being made to make evaluation a strategic decision-making tool (Lahey, 2011, p. 4.4). This predisposition has limited the ability of senior departmental decision-makers to make judgements about departmental effectiveness in meeting their strategic or corporate public policy objectives. By extension, it is a reasonable conclusion that evaluation continues to support program managers (those that manage individual programs), and not strategic decision-makers including deputy heads. Although this is an appropriate user of evaluation, the policy suggests implicitly that program managers are the principal “clients” of evaluations. This remains a problem: evaluation must be more strategic than focusing on program processes. The question that must always be asked is whether public problems are being resolved, and how programs can be improved or replaced to achieve expected results (Peters, Baggett, Gonzales, DeCotis, & Bronfman, 2007). Focusing on the needs of program managers suggests that evaluation is beholden mainly to program units: programs are interested mainly in program operations, not usually government priorities or perspectives. This is not to suggest that evaluation ought to ignore such needs, but that they are better aligned with strategic results—the intent of the MRRS. On the positive side, it seems reasonable that as deputy heads are required to identify savings in the Strategic Review process, gaps will La Revue canadienne d’évaluation de programme 31 be observed in the usefulness of evaluation information. If this argument bears out in reality, then it is reasonable to assume that deputy heads may become more involved in setting priorities for evaluation, and becoming more active in the evaluation planning process than perhaps they were in the past. The new policy attempts to correct these targeting errors by making Heads of Evaluation report directly to deputy heads. The implicit assumption is that both offices will coordinate their planning efforts so that the basket of programs being evaluated and their scope will be negotiated and coordinated with other oversight activities, such as audit. The challenge, however, is that evaluation is regarded under the policy as another deputy head responsibility among other oversight and stewardship responsibilities (Lahey, 2011, p. 7). The inclination is to consider evaluation another “ticky-box” exercise rather than orienting it for strategic purposes. The Governance of the Evaluation Policy Although there have been some benefits of a standardized and centralized evaluation function, there have also been costs. An important benefit is that evaluation has contributed in some limited ways to understanding departmental fiscal program performance. The downside is that centralization stunts program creativity at the departmental level by, again, stressing central agency concerns for fiscal prudence over program effect. As such, evaluation is caught between serving two masters—the department and the centre. Each of these has demands that are quite different, which has contributed to some confusion in the function. For departments, evaluation is best suited to assess program effectiveness whereas the centre is more interested in accountability and fiscal prudence. Evaluation units have been attempting to serve both with equal rigour, and this is not working out very well as central agency concerns generally win out in this equation. If, indeed, stewardship of departmental resources is the primary responsibility of deputy heads, then it stands to reason that internal preferences for oversight including evaluation ought to be driven by deputy heads. The role of central agencies, including the Treasury Board, is to support, not dictate, matters of what to cover and how. Alternatively, if the objective of evaluation is to to support government-wide accountability as the principal driver, then a strong case can be made to centralize evaluation much like the audit function and remove this from departmental control altogether. Serving both 32 The Canadian Journal of Program Evaluation departmental and government-wide needs and objectives has not worked, especially given the accountability and resource use focus of the evaluation criteria. A strong case can be made that if fiscal prudence is the driving factor behind evaluation, as is currently the case, then removing the function from departments is a viable option. This would leave departments the flexibility needed to evaluate their own priorities. In other words, evaluation activities could be divided between corporate evaluation and departmental functions. This would allow central agencies to focus on concerns of accountability, and departments to build capacity and carry out rigorous outcomesbased studies using questions that make sense in particular contexts. Such observations were reinforced by deputy heads, who were concerned about the “one-size-fits-all” approach to the policy (Lahey, 2011, p. 4.2iv). They expressed a desire for more flexibility in the design, scoping, and conduct of evaluations, especially with respect to large versus small programs, and low- and high-risk programs. In addition, the policy requires that all evaluation questions assume equal attention, when perhaps other questions that fall outside the TBS requirements are more pertinent. The fact that questions preferred by senior decision-makers are not given equal weight, attention, or credit raised some frustrations (Lahey, 2011, p. 4.2iv). In this respect, the addition of evaluation committees may be a positive innovation in the sense that appropriate scoping advice can be brought to bear on the design of evaluation project. However, central agency requirements would appear to be limiting committee responsibilities mainly to checking departmental work, rather than assisting with appropriate scoping and other advice, which could be regarded as added expense on individual projects. Another important conclusion is that as long as the TBS questions take precedence, there is limited opportunity for internal evaluators to learn from the studies, given the constant churn demanded under the coverage requirements. Appropriate Areas of Input for Evaluation Products If one accepts that the appropriate target of evaluation products is senior departmental and agency management then it follows that evaluation products should support mainly departmental policy planning, budgeting, and feedback systems. The most appropriate use of evaluation would be for senior management, in concert with heads of evaluation, to plan evaluations around a rotating cycle of examining designated strategic objectives in the PAA. Such plans would take into consideration senior management’s policy and planning concerns rather than the current focus on program-level ex- La Revue canadienne d’évaluation de programme 33 penditures. This would allow for a systematic but also strategic assessment of all departmental programs, thereby realigning evaluation with departmental priorities, not individual program managers’ needs. One must understand program-level concerns for operations, but not at the expense of effective departmental planning and policy decision-making. That being said, the PAA vehicle would have to be amended to include “expected results” that flow from each of the subactivities. At present, the PAA does not work: it identifies activities; evaluation examines results—the engineering is wrong. Such a planning cycle would facilitate more rational planning regarding ways to improve the effectiveness of a combination of programs toward understood policy goals, assuming that expected results have been defined appropriately. With regard to Strategic Reviews, deputies noted new life being breathed into the evaluation function. However, they also noted that higher expectations of evaluation are inevitable insofar as credible information can be gleaned on fiscal prudence to support such reviews. As long as evaluation is regarded as a key input to these reviews, fiscal prudence becomes the overriding consideration. It is appropriate to use strategic reviews to calibrate and stunt the growth in A-base budgets by focusing on the identification of spending priorities. This may be another argument for a corporate evaluation function to support these particular purposes. Focus of the Function and Coverage It is not unreasonable as citizens to expect that evaluation will focus on whether government interventions into public problems are actually working and that all such interventions are examined on a systematic basis. The problem resides in the emphasis that one places on accountability for spending versus actually resolving problems. With respect to the application of the evaluation questions, a focus on relevance (alignment of departmental and government-wide priorities, constitutionality) and performance (impact assessment) are valid areas of investigation. The challenge with respect to relevance is that the current focus is about finding ways to reduce federal expenditures by shifting program responsibilities to other jurisdictions. This serves federal accountability purposes, but does little to understand the best ways to resolve pressing public problems that could involve a federal presence. 34 The Canadian Journal of Program Evaluation With respect to performance, current evaluation methodologies tend to favour the use of information provided by the program or program recipients. If evaluation efficacy is to improve, then alternative research methods and information sources will be needed to ensure attribution, and information validity and reliability. The principal assumption under the evaluation policy is to use multiple sources of evidence (CEE, 2009c, p. 6.2.2). This assumption should be revisited to the extent that valid quantitative and qualitative research methods can support findings without a disproportionate reliance on embedded sources of program data. As long as this problem persists, it will be difficult to evaluate programs in a way that identifies arguments aimed at understanding the appropriateness of the program theory of change, rather than simply on the management of program operations. Coverage is a major concern under the new policy. This is not a new stipulation, but there are indications that the centre will place more emphasis on enforcement. There is some evidence to suggest that departments and agencies have begun to “cluster” like programs to achieve coverage. This is unlikely to work well, as reports will more than likely show superficial findings based on the fiscal and accountability nature of the centralized TBS questions. Equally important with respect to coverage, there is very little strategic value to the 100% coverage requirement. Although the public can be assured that all spending is reviewed on a five-year cycle, there is little to suggest that, without a serious increase in evaluation resources, much will be contributed by way of understanding the actual value of these programs in the resolution of public policy problems. That is, evaluating everything means essentially evaluating nothing under this schema—all programs are subject to fiscal review, rather than a strategic assessment of departmental action. Perhaps the greater concern regarding coverage is that it reinforces an already short-term public policy focus. Programs are seen to have lifecycles of five years with demands for immediate results, rather than a focus in many cases such as climate change for long-term change. Aside from concerns of policy value, the coverage requirements are simply unsustainable, especially in times of fiscal restraint and limited numbers of trained evaluators to do the work (Lahey, 2011, p. 4.2). Few departments will be able to fulfill the requirement, and even if they can, it is questionable what value this will create for the department other than satisfying accountability concerns. The fact of the matter is that there is a confluence of accountability processes La Revue canadienne d’évaluation de programme 35 including requirements generated under the strategic review process, Transfer Payments Policy, Audit Policy, MAF, and hyper-partisan parliamentary committees. The logical question is whether there is any value being generated through all of these oversight mechanisms for the increasing resources being injected into them. If there was some coordination of effort to actually learn from the products being generated, deputy heads might consider them more than ticky-box exercises or hoops to jump through in order to satisfy their performance accords. SOME CLOSING THOUGHTS The federal stimulus package has brought on unprecedented spending amounting to more than a $50 billion deficit in 2009/10. This is combined with the Web of Rules Action Plan, which in 2008/09 was aimed at reducing reporting requirements of the Treasury Board policies by 25%, of online human resources reporting across government by 85%, and Management Accountability Framework assessments by 50%. Such measures are aimed at improving federal fiscal performance, reducing inefficiency, protecting against key risks, and preserving accountability. At the same time the role of audit is being bolstered, reinforced, and positioned as the front-line defence against waste. Where is the concern for program effectiveness? Effective feedback on overall government effectiveness is being pushed aside by an obsessive predilection by parliamentarians for fiscal prudence. This has driven the orientation of oversight mechanisms such as evaluation toward mainly concerns with accountability over effect. One cannot help but conclude the federal government is diminishing its evaluation function to that of just another auditor and could even be encouraging its swift demise at a time when the US and other governments are stepping up their commitment to effectiveness evaluation. My fear is that the stars are aligning in a way that puts evaluation at a disadvantage. The obsessive focus on austerity can only lead in one direction—governments more concerned with reducing spending rather than creative solutions to public problems. This article has made the argument that federal evaluation must refocus its efforts and return to the thinking that inspired PPBS and concern for better aligning programs with strategic solutions to public problems. The MRRS is one vehicle that reminds departments, and indeed elected officials, that the principal aim of govern- The Canadian Journal of Program Evaluation 36 ment spending is to support citizen demands for action on wicked problems. Although it is laudable that evaluation ought to support these purposes, it is fully realized that the appetite for elected officials to hear evidence-based concerns about the effectiveness of public resources is highly limited. This has always been the case, and the point could not be made more clear than the circumstances that inspired the Gomery Inquiry or recent audits into stimulus spending. Indeed, no elected official wants to be questioned on their policy choices, nor the means they bring to bear on resolving them. However, this does not mean that public servants should not aspire to a culture of inquiry into policy and program relevance, effective public action, and realistic and coordinated solutions. The current public engineering for evaluation is predisposed to understanding grants and contributions programs. However, governments are increasingly depending on other instruments for intervention including regulation, exhortation, and self-guided markets with government guidelines. At present, evaluation focuses very little on these main instruments for governing. As long as the predisposition to a grants and contributions approach persists, there can be no progress toward government policy performance, rather than individual program performance. With the attention that federal evaluation is currently enjoying, opportunities to improve are being squandered on illogical concerns mainly for accountability at the expense of government-wide learning. Although austerity and fiscal prudence may be considered by some to be appropriate goals for evaluation to support, the fact is that these are short-sighted. Coverage, centrally-driven value criteria, inflexible planning and scoping requirements, and hyper-accountability concerns limit what evaluation can do. A more supportive central agency focus on building well-trained and competent evaluators who focus on the right things in the right way is the key to moving forward. Remaining on the path of enforcement, while expedient in the short term, does not contribute well to a sustainable and relevant evaluation function in the long term. Centralized federal evaluation has been around since 1977 and the concerns remain. It is time for the community to come together and resolve this dynamic or face an uncertain future. Notes 1 With respect to ensuring accountability, Good (2003, p. 166) argues there are three main reasons for accountability within the Canadian model of accountability: control, assurance, and learning. La Revue canadienne d’évaluation de programme 37 2Henderson (1984) speaks about his often raucous relationship with Prime Ministers Pearson and Trudeau, describing how for the first time an Auditor General was using his annual report to embarrass the government. More importantly, he describes his interest in social science research methods to ascertain program effectiveness. 3For example, Prime Minister Pierre Trudeau had introduced Bill C-190 in 1969 to limit the responsibilities of the Auditor General. The bill was dropped in response to media pressure. 4 Section 7(2)(e) states that the Auditor General under subsection (1) shall call attention to anything that he considers to be of significance and of a nature that should be brought to the attention of the House of Commons, including any cases in which he has observed that “satisfactory procedures have not been established to measure and report the effectiveness of programs, where such procedures could appropriately and reasonably be implemented” (Canada, 1977). 5The responsibilities of the OCG were diffused among departments, Supply and Services, and the Treasury Board Secretariat between 1969 and 1978. 6A risk-based approach is a method for considering risk when planning the extent of evaluation coverage of direct program spending pending full implementation of section 6.1.8 of the Policy. Risk criteria may include the size of the population(s) affected by non-performance of their individual programs, the probability of non-performance, the severity of the consequences that could result, the materiality of their individual programs and their importance to Canadians (CEE 2009b, Annex A). REFERENCES Alkin, M. (2004). Evaluation roots: Tracing theorists’ views and influences. Thousand Oaks, CA: Sage. Alkin, M. (2012). Evaluation roots: A wider perspective of theorists’ views and influences. Thousand Oaks, CA: Sage. Aucoin, P. (1995). Canada: The new public management in comparative perspective. Montreal, QC: IRPP. Aucoin, P. (2005). Decision-making in government: The role of program evaluation (Discussion Paper). Ottawa, ON: Centre for Excellence for Evaluation, Treasury Board Secretariat. 38 The Canadian Journal of Program Evaluation Boruch, R., McSweeney, A., & Soderstrom, E. (1978). Randomized field experiments for program planning, development and evaluation. Evaluation Quarterly, 2(4), 655–695. Campbell, D. (1957). Factors relevant to the validity of experiments in social settings. Psychological Bulletin, 54, 297–312. Campbell, D. (1975). Assessing the impact of planned social change. In G.M. Lyons (Ed.), Social research and public policies (pp. 3–45). Hanover, NH: Dartmouth College, Public Affairs Center. Canada. (1962). Royal Commission on Government Organization (Glassco Commission). Ottawa, ON: Supply and Services. Canada. (1976). Royal Commission on Financial Management and Accountability, Final Report (Lambert Commission). Ottawa, ON: Supply and Services. Canada. (1977). Auditor General Act 1977. Ottawa, ON: Justice Canada. Canada. (1990). Public service 2000: The renewal of the public service of Canada. Ottawa, ON: Supply and Services. Canada. (1997a). Report of the independent review panel on the modernization of comptrollership in the government of Canada. Ottawa, ON: Supply and Services. Canada. (1997b). Results for Canadians: A management framework for the government of Canada. Ottawa, ON: Supply and Services. Canada. (2006a). Accountability Act and Action Plan. Ottawa, ON: Supply and Services. Canada. (2006b). Federal Accountability Act and Action Plan [brochure]. Ottawa, ON: Supply and Services. Canada. (2006c). Statutes of Canada: An Act providing for conflict of interest rules, restrictions on election financing and measures respecting administrative transparency, oversight and accountability (Federal Accountability Act). Ottawa, ON: Parliament of Canada. Canadian Evaluation Society. (2009). As was said, session report: Professional day 2009. Ottawa, ON: CES National Capital Chapter. Retrieved from www.evaluationcanada.ca La Revue canadienne d’évaluation de programme 39 Cathexis Consulting Inc. (2010). Evaluator compensation: Survey findings. Ottawa, ON: Author. Retrieved from www.cathexisconsulting.ca Centre for Excellence for Evaluation, Treasury Board Secretariat. (2001). Policy on evaluation 2001. Ottawa, ON: Author. Centre for Excellence for Evaluation, Treasury Board Secretariat. (2003). Interim evaluation of the Treasury Board evaluation policy. Ottawa, ON: Author. Centre for Excellence for Evaluation, Treasury Board Secretariat. (2004b). (2004a). Evaluation function in the government of Canada. Ottawa, ON: Author. Centre for Excellence for Evaluation, Treasury Board Secretariat. (2004c). (2004b). Review of the quality of evaluation across departments and agencies. Ottawa, ON: Author. Centre for Excellence for Evaluation, Treasury Board Secretariat. (2004a). (2004c). Study of the evaluation function in the federal government. Ottawa, ON: Author. Centre for Excellence for Evaluation, Treasury Board Secretariat. (2005). Case studies on the uses and drivers of effective evaluation in the government of Canada. Ottawa, ON: Author. Centre for Excellence for Evaluation, Treasury Board Secretariat. (2009b). (2009a). Directive on the evaluation function. Ottawa, ON: Author. Centre for Excellence for Evaluation, Treasury Board Secretariat. (2009a). (2009b). Policy on evaluation. Ottawa, ON: Author. Centre for Excellence for Evaluation, Treasury Board Secretariat. (2009c). Standard on evaluation for the government of Canada. Ottawa, ON: Author. Dobell, R., & Zussman, D. (1981). An evaluation system for government: If politics is theatre, then evaluation is (mostly) art. Canadian Public Administration, 24(3), 404–427. Foote, R. (1986). The case for a centralized program evaluation function within the government of Canada. Canadian Journal of Program Evaluation, 1(2), 89–95. 40 The Canadian Journal of Program Evaluation Furubo, J.E., Rist, R., & Sandahl, R. (Eds.). (2002). International atlas of evaluation. New Brunswick, NJ: Transaction Press. Gauthier, B., Barrington, G., Bozzo, S. L., Chaytor, K., Cullen, J., Lahey, R., … Roy, S. (2004). The lay of the land: Evaluation practice in Canada today. Canadian Journal of Program Evaluation, 19(1), 143–178. Good, D. (2003). The politics of public management: The HRDC audit of grants and contributions. Toronto, ON: University of Toronto Press. Gow, I. (2001). Accountability, rationality, and new structures of governance: Making room for political rationality. Canadian Journal of Program Evaluation, 16(2), 55–70. Henderson, M. (1984). Plain talk memoir of an auditor general. Toronto, ON: McClelland & Stewart. Jordan, J. M., & Sutherland, S.L. (1979). Assessing the results of public expenditure: Program evaluation in the Canadian federal government. Canadian Public Administration, 22(4), 581–609. Lahey, R. (2010). The Canadian M&E system: Lessons learned from 30 years of development (World Bank EDD Working Paper Series No. 23). Washington, DC: World Bank. Retrieved from www.worldbank. org/ieg/ecd Lahey, R. (2011). Deputy head consultations on the evaluation function. Ottawa, ON: Centre for Excellence for Evaluation, Treasury Board Secretariat. Library of Parliament, Canada. (2003). Modern comptrollership (PRB-0313E). Ottawa, ON: Author. Lindquist, E. (2009). How Ottawa assesses departmental/agency performance: Treasury Board’s management accountability framework. In A. Maslove (Ed.), How Ottawa spends: Economic upheaval and political dysfunction (pp. 47–88). Montreal, QC: McGill-Queen’s University Press. Maxwell, N. (1986). Linking ongoing performance measurement and program evaluation in the Canadian federal government. Canadian Journal of Program Evaluation, 1(2), 39–44. Mayne, J. (1986). In defense of program evaluation. Canadian Journal of Program Evaluation, 1(2), 97–102. La Revue canadienne d’évaluation de programme 41 Mayne, J. (2001). Addressing attribution through contribution analysis: Using performance measures sensibly. Canadian Journal of Program Evaluation, 16(1), 1–24. Mayne, J. (2006). Audit and evaluation in public management: Challenges, reforms, and different roles. Canadian Journal of Program Evaluation, 21(1), 11–45. Mayne, J. (2008). Contribution analysis: An approach to exploring cause and effect. Institutional Learning and Change, 16(May), 1–4. McDavid, J. C., & Huse, I. (2006). Will evaluation prosper in the future? Canadian Journal of Program Evaluation, 21(3), 47–72. McKinney, J., & Howard, L. (1998). Public administration: Balancing power and accountability (2nd ed.). Westport, CT: Praeger. Muller-Clemm, W., & Barnes, M. P. (1997). A historical perspective on federal program evaluation in Canada. Journal of Program Evaluation, 12(1), 47–70. Office of the Auditor General of Canada. (1975). Report of the Auditor General of Canada 1975. Ottawa, ON: Supply and Services. Office of the Auditor General of Canada. (1976). Report of the Auditor General of Canada 1976. Ottawa, ON: Supply and Services. Office of the Auditor General of Canada. (1983). Program evaluation (Chapter 3). In Report of the Auditor General of Canada, Ottawa, ON: Supply and Services. Office of the Auditor General of Canada. (1993a). Matters of special importance and interest (Chapter 1). In Report of the Auditor General of Canada 1993. Ottawa, ON: Supply and Services. Office of the Auditor General of Canada. (1993c). (1993b). Program evaluation in departments: The operation of program evaluation units (Chapter 9). In Report of the Auditor General of Canada 1993. Ottawa, ON: Supply and Services. Office of the Auditor General of Canada. (1993b). (1993c). Program evaluation in the federal government: The case for program evaluation (Chapter 8). In Report of the Auditor General of Canada 1993. Ottawa, ON: Supply and Services. 42 The Canadian Journal of Program Evaluation Office of the Auditor General of Canada. (1993d). The program evaluation system – Making it work (Chapter 10). In Report of the Auditor General of Canada 1993. Ottawa, ON: Supply and Services. Office of the Auditor General of Canada. (1996). Evaluation in the federal government (Chapter 3). In Report of the Auditor General of Canada 1996. Ottawa, ON: Supply and Services. Office of the Auditor General of Canada. (2002). Financial management and control in the government of Canada (Chapter 5). In Report of the Auditor General of Canada 2002. Ottawa, ON: Supply and Services. Office of the Auditor General of Canada. (2009). Evaluating the effectiveness of programs (Chapter 1). In Report of the Auditor General of Canada 2009. Ottawa, ON: Supply and Services. Office of the Comptroller General of Canada. (1979). Internal audit and program evaluation in the government of Canada. Ottawa, ON: Supply and Services. Office of the Comptroller General of Canada. (1991). Into the 1990s: Government program evaluation perspectives. Ottawa, ON: Author. Osbaldeston, G. (1989). Keeping deputy ministers accountable. Toronto, ON: McGraw-Hill Ryerson. Osborne, S., & Gaebler, T. (1992). Reinventing government: How the entrepreneurial spirit is transforming the public sector. Reading, MA: Addison-Wesley. Patton, M. Q. (1978). Utilization focused evaluation: The new century text (3rd ed.). Thousand Oaks, CA: Sage. Peters, J., Baggett, S., Gonzales, P., DeCotis, P., & Bronfman, B. (2007). How organizations implement evaluation results. Proceedings of the 2007 International Energy Program Evaluation Conference, Chicago, IL, 35–47. Pollitt, C. (2000). How do we know how good public services are? In B. G. Peters & D. Savoie (Eds.), Governance in the 21st century: Revitalizing the public service (pp. 119–254). Montreal, QC: McGill-Queen’s University Press. Pollitt, C., & Bouckaert, G. (2004). Public management reform: A comparative analysis. New York, NY: Oxford University Press. La Revue canadienne d’évaluation de programme 43 Prieur, P. (2011). Evaluating government policy. Draft paper. Raynor, M. (1986). Using evaluation in the federal government. Canadian Journal of Program Evaluation, 1(2), 1–10. Rist, R. (1990). Program evaluation and the management of government. New Brunswick, NJ: Transaction Press. Rossi, P., & Freeman, H. (1985). Evaluation: A systematic approach (3rd ed.). Beverly Hills, CA: Sage. Rossi, P. H., Lipsey, M., & Freeman, H. (2004). Evaluation: A systematic approach (7th ed.). Thousand Oaks, CA: Sage. Rutman, L. (1986). Some thoughts on federal level evaluation. Canadian Journal of Program Evaluation, 1(2), 19–27. Saint-Martin, D. (2004). Managerialist advocate or ‘control freak’? The Janus-faced Office of the Auditor General. Canadian Public Administration, 47(2), 121–140. Savoie, D. (1994). Thatcher, Reagan, Mulroney: In search of a new bureaucracy. Toronto, ON: University of Toronto Press. Scriven, M. (1974). Evaluating program effectiveness or, if the program is competency-based, how come the evaluation is costing so much? ERIC Document No. SP008 235 (ED 093866). Scriven, M. (1978). Merit vs. value. Evaluation News, 20–29. Segsworth, R. V. (1990). Auditing and evaluation in the government of Canada: Some reflections. Canadian Journal of Program Evaluation, 5(1), 41–56. Shepherd, R. (2011). Departmental audit committees and governance: Improving scrutiny or allaying public perceptions of poor management? Canadian Public Administration, 54(2), 277–304. Stake, R. (2003). Standards-based and responsive evaluation. Thousand Oaks, CA: Sage. Stokey, E., & Zeckhauser, R. (1978). A primer for policy analysis. New York, NY: W.W. Norton. 44 The Canadian Journal of Program Evaluation Stufflebeam, D. (1983). The CIPP model for program evaluation. In G. F. Madaus, M. S. Scriven, & D. Stufflebeam (Eds.), Evaluation models: Viewpoints in educational and human services evaluation (pp. 117–141). Boston, MA: Kluwer-Nijhoff. Sutherland, S. L. (1986). The politics of audit: The Federal Office of the Auditor General in comparative perspective. Canadian Public Administration, 29(1), 118–148. Sutherland, S. L. (1990). The evolution of program budget ideas in Canada: Does Parliament benefit from estimates reform? Canadian Public Administration, 33(2), 133–164. Treasury Board Secretariat. (1969). Planning programming and budgeting guide of the government of Canada. Ottawa, ON: Author. Treasury Board Secretariat. (1976). Measurement of the performance of government operations (Circular 1976-25). Ottawa, ON: Author. Treasury Board Secretariat. (1977). Evaluation of programs by departments and agencies (Circular 1977-47). Ottawa, ON: Author. Treasury Board Secretariat. (1982). Circular 1982-8. Ottawa, ON: Author. Treasury Board Secretariat. (1993). Linkages between audit and evaluation in federal departments. Ottawa, ON: Author. Treasury Board Secretariat. (1994). Manual on review, audit and evaluation. Ottawa, ON: Author. Treasury Board Secretariat. (2000a). Results for Canadians: A management framework for the government of Canada. Ottawa, ON: Author. Treasury Board Secretariat. (2000b). Study of the evaluation function. Ottawa, ON: Author. Treasury Board Secretariat. (2003). Management accountability framework. Ottawa, ON: Author. Retrieved from www.tbs-sct.gc.ca/maf-crg_e.asp Treasury Board Secretariat. (2006). Directive on departmental audit committees. Ottawa, ON: Author. Treasury Board Secretariat. (2007). Government response to the fourth report of the Standing Committee on Public Accounts: The expenditure La Revue canadienne d’évaluation de programme 45 management system at the Government Centre and the expenditure management system in departments. Ottawa, ON: Author. Tyler, R. (1942). General statement on evaluation. Journal of Educational Research, 35, 492–501. Walker, R. (2003, November 30). The guts of a new machine. The New York Times. Retrieved from http://www.nytimes.com/2003/11/30/ magazine/30IPOD.html?ex=1386133200&en=750c9021e58923d5&e i=5007&partner=USERL Weiss, C. (1972). Evaluation research: Methods of assessing program effectiveness. Englewood Cliffs, NJ: Prentice Hall. Zalinger, D. (1987). Contracting for program evaluation resources. Canadian Journal of Program Evaluation, 2(2), 85–87. Zussman, D. (2010). What ever happened to program evaluation? IT in Canada. Retrieved from www.itincanada.ca Robert Shepherd is Associate Professor at the School of Public Policy and Public Administration, Carleton University. He assumed the role of supervisor for the Diploma in Policy and Program Evaluation in 2011. In 1986, he co-founded a management consulting firm in Ottawa focusing on public management and program evaluation research. Between 1986 and 2007, he also served in various capacities within government, assuming various short-term assignments in government departments, including the Canada School of Public Service (CSPS), and most recently as Head of Evaluation, for the Canadian Food Inspection Agency (2006-2007). His research interests lie mainly in the areas of public management reform efforts, primarily in Westminster countries. As such, he is concerned about the role of parliamentary agents and officers, the changing roles of audit and evaluation functions, how various oversight roles such as ombudsmen and other offices relate to their departments and carry out their work, and the changing relationships between central agencies and departments to manage within an increasingly austere and oversight laden system.
© Copyright 2026 Paperzz