The Economics of Information in Human Capital Formation Evidence from Two Randomized Experiments on Information Efforts via Formative Testing in Secondary Education1 Joris Ghyselsξ2, Carla Haelermansξψ and Fernao Prince ϕ ξ Top Institute for Evidence-Based Education Research (TIER), Maastricht University, the Netherlands ψ Centre for Innovations and Public Sector Efficiency Studies (IPSE Studies), Delft University of Technology, the Netherlands ϕ Sophianum, Gulpen, the Netherlands Abstract This paper studies the effect of obtaining information about the knowledge expected from students, using regular tests (information efforts) as a substitute for obtaining knowledge via traditional class instruction and homework (knowledge efforts). The effect on student performance is evaluated in two randomized experiments that increase information available to students through digital formative tests and feedback. Both were conducted in an 8th grade Biology class among 114 prevocational students in the Netherlands. The results show that educational outcomes are largely improved when information effort in the human capital formation function is increased at the expense of the knowledge effort. Moreover, the results show that tests with extended, personalised feedback are more effective than similar weekly tests with little feedback. This implies that the efficiency of information effort can be further enhanced by incorporating feedback in regular formative tests. JEL-Classification – I20, I21, C93. Key words – Human Capital Formation; Information; Digital Formative Tests; Randomized Field Experiments; Secondary Education. 1 This paper has greatly benefited from the comments and feedback from Bart Golsteyn, Trudie Schils, Marieke Heers, Wim Groot and Henriette Maassen van den Brink. 2 Corresponding author: [email protected], PO Box 616, 6200 MD Maastricht, the Netherlands. 1 1. Introduction Casual observation suggests that many students try to minimize their study effort and maximize their effort to obtain information about the exam, as becomes clear in an extreme way from the 2013 New York Times article on students getting arrested for stealing state exams (2013). These students tried to optimize their performance by an effort to obtain a maximum of information on the learning goals and the content of the exam in order to minimize their effort on having to study for the exam, and thereby maximize the chances for success. Although these students have obviously gone too far in their effort to obtain information, the notion of information efforts (partly) substituting the acquisition of skills and knowledge, is definitely an interesting one that has not been exploited in education economics so far. When Cunha and Heckman (2007), for example, discuss the ‘technology of skill formation’, they do not dwell on the nature of the technology, but rather discuss the timing and the level of investments made. More generally, in a traditional human capital formation setting, ability and effort are seen as the basic determinants of the learning outcome and most studies analyse the investment decision regarding effort. This study delves deeper into the components of learning efforts and elaborates on the form of the human capital formation function. In effect, one may distinguish learning efforts that are directly linked to the acquisition of skills and knowledge, (i.e. studying, or “knowledge efforts”) from efforts that aim at clarifying the learning goals and the extent to which students have reached the goals at a particular point (“information efforts”). In most education systems, clarifying learning goals is mainly done at the start of a course, after which students put in knowledge efforts to obtain the knowledge necessary to fulfil these goals. The extent to which students have reached those goals is typically only measured by tests at the end of an instruction period, so-called “summative tests”, which are translated in grades. As the use of digital learning materials and other educational technology is increasing in the educational sector, it is becoming feasible to introduce intermediate tests to provide students (and teachers) with more information on both learning goals and the extent to which students have reached these goals. The intermediate tests are called “formative”, because they are meant to stimulate learning without serving as a qualifying instrument. 2 In this paper, we use two randomised experiments to explore the potential role of repeated digital formative tests and feedback during these tests, in increasing information efforts of students relative to knowledge efforts, such that student performance increases. More particularly, we look into the effect of trading the usual knowledge transfer activities, such as class instruction and self-study, for activities that clarify the expectations of teachers to students, regarding their current levels of knowledge and skills they should aim for. In the two field experiments we studied two particular setups of information efforts: 1) weekly formative tests, and 2) giving extended feedback on these tests. Educational psychologists have studied the effect of testing on the learning process for decades (Phelps, 2012). They identified both direct and indirect positive effects and started distinguishing summative from formative tests. The information value of the test which is used for subsequent decision making (either at the student or the class or school-level), is a typical example of an indirect benefit of a test. Direct benefits of testing relate to the learning process itself. While taking a test, students (attempt to) retrieve information from their memory which reinforces their understanding of the subject and facilitates on the stakes of the test and to be higher when extensive feedback is provided (Larsen & Butler, 2013). Moreover, research shows that repeated testing with feedback is more effective than spending an equal amount of time on additional studying of the same learning material (Roediger & Karpicke, 2006). Consequently, testing is shown to be an integral part of a highly remembering at a later stage (‘retention’). These direct benefits of testing are shown not to depend on the productive learning environment. The recent spread of computers used for instruction and practice in education has greatly facilitated the incorporation of tests in the learning process, making it easier for teachers to implement regular tests. Suppliers of educational software typically use tests to construct individualised learning trajectories, because individualisation is widely believed, and has been shown, to optimize learning (see e.g. Haelermans & Ghysels, 2013). The direct and indirect learning benefits of testing are less often stressed, but may be equally important. Bringing the above elements together, we study the effect of obtaining information about the knowledge expected from students, using regular computer tests (information efforts) as a substitute for obtaining knowledge via traditional class instruction and (digital) homework (knowledge efforts) on student performance. We evaluate this for 114 8th grade students in their Biology class in the Netherlands. The results show that formative testing is very effective, and that the efficiency of information effort can be further enhanced when 3 students do not only take weekly informative tests, but are also provided with information about the nature of their mistakes and successes. The main contribution of this paper is threefold. First, the separation of the effort part of human capital formation into knowledge efforts and information efforts provides a new insight in the theory on how to make the learning process more productive and increase student performance. Second, the use of randomized experiments to study the effect of information effort allows for causal analysis, which provides evidence on the effectiveness of substituting knowledge effort for information effort. Lastly, conducting two experiments allows us to separate the effect of information via formative testing only from the effect of information effort via direct feedback, indicating the order of importance of the two ways of obtaining information. The remainder of this paper is structured as follows: Section 2 elaborates on the role of information in the human capital formation process and Section 3 describes the related educational literature. Section 4 presents the context of the experiments, e.g. a description of the field experiments and the content of each experiment. Section 5 presents the empirical model, methodology, and the data. In Section 6, we discuss the baseline results and the extended regression results. Section 7 concludes the paper and discusses the policy consequences of the findings. 2. The role of information in the human capital formation process Ever since the seminal work on human capital of Gary Becker (1964) and Theodore Schultz (1961), the formation of skills has enjoyed the continuous attention of economic scholars. Among others, it has been studied which policy interventions are most effective in nurturing cognitive and non-cognitive skills that are valued in the labour market or in society at large. Carneiro and Heckman (2003), for example, put forward that interventions early in life are to be preferred over later interventions. Part of Becker’s contribution relates to the fact that socially optimal human capital investment depends on optimal foresight of the individuals who are to make the investment decision (future earnings have to be taken into account) and of perfectly functioning capital markets (which have to generate student loans to facilitate the required investments). All the mentioned references link (acquired) levels of human capital to 4 outcomes later in life (often labour income) and stress the importance of a free flow of correct information to assure well-informed decision making.3 In the human capital formation literature, however, much less attention is paid to the role of optimal information in the short run, i.e. during the process of human capital formation in early life, typically while children are part of the school system. A recent exception is the article by Azmat and Ireberri (2010), which discusses the learning effect of the provision of relative feedback, i.e. feedback on the rank of the student in her or his class. Apart from the social comparison aspect, the article also differs from our approach in the focus on feedback through report cards, i.e. with grades of summative tests at the end of a study semester. Children in school need to be informed about the required knowledge and skills. Teaching is not about knowledge transfer only, but it also more instrumentally deals with the clarification of expectations, with well-known examples like lists of learning goals in college text books or sample exam questions. Obviously, every teacher has to strike a balance between the goal of the transfer of content and the more instrumental goal of clarifying expectations. Likewise, a student needs to balance his or her effort with respect to studying the learning material. Retention in memory is important, but the selection of learning material is equally relevant. In essence a balance needs to be reached in the educational production function. Teachers need to design a learning process which transfers knowledge, makes sure the students acquire the related skills and offers students opportunities to check whether their level of course mastery matches the expectations of their teacher. Conceptually, one might separate the usual effort term in the human capital function into efforts relating to knowledge acquisition (knowledge effort)4 and efforts related to 3 In this short introduction to the literature we do not dwell on the large body of literature on the relative effectiveness of school inputs. As summarised by Todd and Wolpin (2003), this strand of the literature tends to incorporate three inputs (ability, family endowments and school/teacher inputs), while ignoring the effort of the learning individual, which makes it evidently less relevant for our research question. Yet, because we incorporate family endowments as past inputs in the ability indicator (see footnote 6) and will focus on an experiment organised by the school regarding the learning process in class, many results of the educational production function literature still apply. We will return to this issue below. 4 In the remainder of this paper we will refer to this type of efforts as “knowledge efforts” to distinguish them from efforts that focus on the clarification of teachers’ expectations rather than the content of the course. Evidently, these learning efforts are not limited to knowledge but also incorporate practical skills related to the particular field of study. In other words, “knowledge efforts” refer both to theory and practice. 5 information gathering regarding the level reached and required (information effort). In the following formula we refer to those as eK and eI respectively,5 and put them alongside ability a as determinants of the human capital formation function f which produces a desired outcome (success) S in a particular schooling process: (1) In equation (1) we treat ability a as a compound measure incorporating both innate ability and skills acquired in the past,6 because we want to focus on one period of instruction and study the consequences of variation in the distribution between knowledge and information effort (eI and eK) within that period, while taking stock of earlier periods. In the Cunha and Heckman (2007) discussion of educational technology, the focus is on the timing and size of the total effort (which they refer to as ‘investment’), which is distributed over various periods. In contrast, this paper looks at educational technology within one period and discusses a lower level of instructional design, namely the distribution of two types of learning activities in class. In effect, when designing the learning process, a teacher takes into account a fixed time constraint T (the instruction time available over a teaching period) and distributes the two types of learning activities so as to exhaust the available time: (2) For simplicity, we do not model potential spill-overs of the instructional design into other activities of students like homework and proceed in a ceteris paribus environment with, for example, students’ motivation and their time allocation outside of class as constants. We will return to these aspects at a later stage, but assume for now that students are receptors of the teaching process and the teacher is the active agent who optimises the teaching process by 5 In may be immediately clear that all teaching activities involve both aspects, but for simplicity we assume that separation is possible. In effect some activities focus more on knowledge transfer (e.g. traditional classroom teaching) while others are oriented towards updating expectations (e.g. feedback after a test). 6 See Cunha and Heckman (2007) for a discussion of this way of compounding. As mentioned in footnote 3, one may want to incorporate in ability also all non-school influences that have contributed to the current skill and knowledge level of the student (e.g. parental involvement). 6 maximising equation (1) subject to the time constraint in equation (2). Equally for simplicity, we equate the cost of effort to its time cost. Hence, a unit (e.g. an hour) of information effort is equally costly as a unit of knowledge effort. Moreover, we assume the human capital formation function f to exhibit the usual characteristics of a production function: to be twice differentiable and to have decreasing marginal returns in all its inputs (a, eI , eK). The optimality conditions of the simple maximisation problem of equations (1) and (2) give the usual result that in the optimum the teacher has to make sure that the marginal contributions of the effort variables are equal. In a traditional teaching context, intermediate tests are the main source of information updates. Teachers start an instruction period with the clarification of teaching goals and then start lecturing and provide exercises. After this time filled with almost exclusively knowledge efforts, teachers have their students take a test and give them a grade (‘summative testing’). The fact that the test updates both the teacher’s and the students’ information on the students’ current skill level is not immediately taken into account, as the test is in most cases the closing activity of an instruction period and after the test a new topic is addressed. Nevertheless a test (and its grade) might serve to warn students (and teachers) about potential lacks in the skills and knowledge levels which they may have to remedy before the final exam. Thus, it generates new information for both actors.7 Evidently, a midterm or even a weekly test takes time to administer8 and thus consumes part of the available instruction time T (and homework time9). Therefore, these tests are often given the least possible time in class. Technically, this situation originates from the belief that the teaching process is characterised by the combination of, on the one hand, decreasing returns to all types of learning efforts and, on the other hand, a higher average return of knowledge effort relative to information effort. In this case, the marginal contributions of both types of effort are equalised by combining a large proportion of transfer efforts eK with a relatively small proportion of information efforts eI. 7 See Azmat & Iriberri (2010) for an evaluation of the learning effect of this type of feedback. 8 And also require some of the teacher’s time to correct, especially if they have to be made into a source of substantive feedback on the knowledge level of the student, but we do not dwell on the teacher’s effort in this paper. 9 The time students use to study for the test is assumed to substitute part of the time they spend on homework. 7 Recently however, educationalists have come to agree upon a different understanding of the role of information efforts in the learning process. In the following section we will describe in some detail how the current consensus states that the learning process is more effective when students receive feedback on their progress during the instruction period. Moreover, information technology (IT) has facilitated the incorporation of self-tests which inform students (and teachers) of the progress they make (‘formative tests’). Computer assisted tests look promising because they have a much lower learning cost (time to take and correct) and have the potential of conveying personalized messages in relative detail (i.e. linked to specific errors of the student). Combined, IT and a novel class of information activities may have transformed the original human capital formation process f into a new process f’ with more productive information efforts: (3) If we assume knowledge effort to have remained equally productive, or (4) and there are no changes in the total instruction time T, then the new human capital formation process f’ will lead to an optimum with a higher level of outcome, S’>S and relatively more time devoted to information effort (see Annex for a mathematical derivation). Obviously, the specific parameterisation of the maximisation framework will determine the exact outcome and ratio between information and knowledge effort. As an application of the above framework, this paper studies three variations of the human capital formation function through experiments in a real-life setting. It evaluates the effect of intermediate testing, while holding other contextual factors constant. Before elaborating on the exact form of the three human capital formation functions and the experiment setup, we will look into results of educational and psychological studies to gather a further understanding of what determines the effectiveness of the human capital formation process. 8 3. Educational Literature In previous research scholars have investigated the effectiveness of information efforts in the form of so-called “formative tests”. These are tests that students take during a particular learning period to provide them with a status update regarding their learning progress, but these tests in general do not have any consequences for a student’s grade on the topic. The impact of formative testing on the learning process has been called the ‘testing effect’, which Roediger and Butler (2011, p. 20) describe as “the finding that retrieval from memory (i.e. as during a test) produces better retention than restudying the same information for an equivalent time”.10 Roediger and Karpicke (2006), for example, found that an intervention group who experienced testing performed better than a control group who studied the topic three times (once in class, twice afterwards). The authors mention standardized effect sizes from other studies from around 0.54 when the control group writes no tests at all, which decreases to 0.15 when the control group writes at least one test. Very much in line with this, Phelps (2012) finds a ‘bare bones’ average effect size of 0.55 in his meta-analysis. So the effects of the testing effect seem moderately sized. Roediger and Karpicke (2006) also suggest that testing stimulates the learning center in the brain because both active and more unconscious knowledge is retrieved during a test. Moreover, they observe indirect benefits of testing, which relate to increased learning activities of students (because they expect continuous testing), a deepening of the understanding by students and improved metacognitive skills, because tests also indicate students potential flaws of their learning techniques. More recently, Roediger and Butler (2011) analysed the effect of giving feedback on the correct answers in multiple choice questions. They concluded that feedback is effective, especially when feedback is substantive, i.e. contains some explanation on the right answer (short instruction text), instead of nothing but an indication of the correct option in the multiple choice question. Another recent addition to knowledge about the testing effect refers to the type of tests given. To optimize learning, the retrieval activity stimulated through the formative tests should match educational goals. If, for example, factual knowledge is the goal, then the tests 10 Earlier research on the testing effect did not control for the number of times a student studied the material, which meant that tests added a supplementary learning opportunity. To avoid this type of explanation for the testing effect, all more recent studies compare testing to a situation with equal additional learning opportunities (in amount of time) for the control group. 9 should also ask students to reproduce facts and, likewise, if application is the goal, the intermediate tests should also contain applications, rather than, for example, ask for facts or concepts (Larsen & Butler, 2013). Finally, as applies to educational activities in general, the testing effect can benefit from adequate repetition and spacing. Retention has been shown to improve with repeated testing (as in deliberate practicing) and when the tests were spaced over time, instead of given immediately after the initial contact with the learning material (Larsen & Butler, 2013). We will discuss below how our experiments evaluate human capital production functions with various combinations of these innovations in Dutch secondary education. We basically hold the technology for knowledge effort constant and vary the amount and setup of information effort, incorporating as treatment a set of formative, multiple choice tests and looking into the role of extended feedback. While doing so, we also take into account the latest findings of educational research making the tests fit the educational goal of the course period and separating the tests from the instruction period (placing them systematically at the start of the next instruction moment). 4. Context of the Study Dutch secondary education offers four different levels of education, which are concluded with a national examination in the last year. Students enter secondary school at age 12 and are directed into a specific level of education (‘track’) based on the results of a standardized primary school ability test and the recommendation given in primary school. As such, the Dutch educational system is, internationally speaking, among the “early tracking” systems. Most schools do not focus on one particular track, but offer a variety of levels of education, making it possible for students to switch between levels during their school career. The four different levels of secondary education are the following: - practical prevocational school (four years); - theoretical prevocational school (four years); - higher general continued education (five years); - pre-academic education (six years). The study at hand focuses on the second year of both the practical and theoretical prevocational level in one school. The school under study offers all types of education. However, the experiment is conducted in the Biology classes of the second year (8th grade) of prevocational education. All students are taught at the same level of prevocational education, 10 as this school also only makes the distinction between practical and theoretical prevocational education from year 3 on. The school under study is a typical, average sized secondary school (about 1650 students in total) in a rural area. 4.1 The Setup of the Randomised Experiments In this paper, we aim to analyse the causal effect of information efforts on student performance. Because the potential correlation of characteristics of the students and unobservable factors with both the treatment and the outcome variables could be problematic, we use exogenous variation in the treatment by means of a randomised experiment to ensure that this potential correlation is absent. Based on the findings of educational scientists, two types of information efforts were selected for evaluation in two randomised experiments: the incorporation of a weekly formative test with feedback, i.e. an indication whether the answer is correct and, in addition to that, short, personalized instruction texts. The first experiment consists of formative tests with extensive feedback. The second experiment consists of weekly formative test with only basic feedback (only an indication of whether the answer is correct). The second experiment allows comparing a more traditional human capital formation process without intermediate tests as information efforts, with a basic form of a test-enhanced process. The first experiment also looks at the productivity of receiving feedback. 11 Figure 1 – Overview of the Timeline of both Field Experiments Experiment Direct Feedback T0 Januari 2013 Week 8 Pretest – Start Stratified assignment experiment direct students to feedback treatment and control group T1 T0 Experiment Formative Testing T1 Weeks 8 – 15 Week 16 Week 20 Weeks 20-25 Week 26 Week 28 Weekly formative tests Posttest – End experiment direct feedback Pretest – Start experiment formative testing Weekly formative tests for treatment group only Posttest – End experiment formative testing End of school year 12 Figure 1 shows the timeline of both field experiments, of which the first was on direct feedback and the second on formative testing. These experiments took place during the Biology lessons, for which students got two lessons of 50 minutes each week. Randomization took place at the beginning of 2013. Stratified randomization at the student level was applied, so as to make sure that the experiment and control group were equally distributed regarding the students’ primary school ability test score, the class they belong to, their gender and age and state of grade repetition. From the total of 114 students, 56 students were assigned to the treatment group and 58 to the control group. Both experiments had the same treated and control students. The first field experiment, on the information gathered from direct feedback during formative tests, consisted of 8 weeks in total, starting in week 8 and ending in week 16 of 2013. The topic of this first experiment was ‘sexuality and relationships’. The second experiment, on information provision via weekly formative tests, consisted of 6 weeks in total, from week 20 to week 25 of 2013. The topic of the course taught during these 6 weeks was ‘heredity and evolution’. As these topics had not been taught to the students before, and the primary school ability tests is based on general knowledge, and not focused on Biology, we decided to have the students write a simple multiple choice pretest at the start of each experiment as well, to get an indication on their starting level with respect to these specific topics. In between these two experiments, there were a couple of holiday weeks and a lesson free week. Next to that, the two topics which were discussed during the experiments are very different from one another. Therefore, we do not expect the order of the experiments to influence the outcome of the experiments. During the two experiments, one of the weekly lessons relied on computer driven instruction and took place in a computer room, whereas the other lesson was teacher directed and took place in a regular classroom, with some group work from time to time. This setup, of combined computer and traditional lessons, was chosen to be able to administer the digital weekly tests, to enable students to work in their own pace during the computer lesson, but to also enable the teacher to give class instruction on the topic. Therefore, all students used digital instruction and assignments, as well as their book, together with classroom instruction, to study the course content. The digital instruction was a combination of the digital instruction package from the book publisher, sections from the book that were digitalized by 13 the teacher and additional digital material from the internet (such as exercises, puzzles, short movies, etc.). For the experiments, the multiple choice formatives tests took place during the computerized lesson. At the end of the course, all students wrote a posttest in the first lesson in the week after the experiment, which mainly consisted of multiple choice questions, although there were a few open questions as well. The content and setup of each experiment is discussed separately in the next two paragraphs. The teacher was present in the computerized lesson to answer questions and explain things when necessary. Because of the computerized lesson, the teacher had more time to devote to individual questions during these lessons. It cannot be said whether there were more questions from treated than from control students, or vice versa, as they were in the same class and students were not openly labelled as being part of either the control or the experiment group, to avoid contamination of the effect. In any case, the additional time available to the teacher is likely not to have been devoted to students in the experimental group only. The teacher was the same for the complete research population, which avoids teacher bias in the measurement of the effect. 4.2 Experiment on information effort via direct feedback during formative testing During the 8 weeks of the first experiment, the students had two lessons of Biology per week, as do all students in this grade. In these 16 lessons, 8 subtopics of ‘sexuality and relationships’ were discussed. Every first lesson of the week was computerized, whereas the second lesson was taught in a traditional class room. In the first week, the pretest was written. It was a multiple choice test and took about 15 minutes. Each week, at the start of the computerized lesson, all students wrote a small formative test, consisting of only multiple choice questions, lasting for about 10 minutes. The students in the treatment group got specific feedback, depending on their answer, after each question. Feedback was provided on each question answered by students. The feedback focused on whether and why a certain answer was wrong, if the question was answered incorrectly, but feedback was also provided to repeat why this was indeed the correct answer, if the question was answered correctly. At the end of the formative test, treatment group students were provided with an overview of which questions they answered correctly and which questions they got wrong. 14 Control group students did not get specific feedback on their answers during the test, and they were only able to see which questions they answered correctly and which questions they got wrong at the end of the course. After the last lesson of the last week (week 15) of the experiment, all students wrote a posttest during the first (computerized) Biology lesson of week 16, which mainly consisted of multiple choice questions, but also contained a few open questions, covering all 8 topics of ‘sexuality and relationships’ that were discussed during the past 8 weeks. Hence, the treatment in this first experiment is the provision of specific and rich feedback as compared to hardly any feedback in an environment with already enhanced information effort, because all students experience the weekly testing11. 4.3 Experiment on information effort via formative testing During the 6 weeks of the experiment on information via formative testing the students had two lessons of Biology per week as well, similar to the previous experiment. During these 12 lessons, 8 subtopics of ‘heredity and evolution’ were discussed. The general setup of the experiment is similar to the previous experiment: Every first lesson of the week was computerized, whereas the second lesson was taught in a traditional class room. In the first week, the pretest was written at the start of the computerized lesson. The pretest was a multiple choice test and lasted about 15 minutes. Each week, at the start of the computerized lesson, the students in the treatment group wrote a small formative test, consisting of only multiple choice questions, lasting for about 10 minutes. At the end of this test, students were provided with an overview of which questions they answered correctly and which questions they answered wrong. The control group students worked in their own digitalized environment on the topic of that week and/or made (digital) homework assignments. In total, the treatment group wrote 5 formative tests during the experimental period. In the week after the experiment officially ended, week 26, all students wrote a posttest during the first (computerized) Biology lesson of the week, which mainly consisted of multiple choice 11 As already discussed in Section 2, it is hard to measure the exact distribution between content oriented instruction and educational activities that clarify the required knowledge and skill level. Yet, during the eight weeks of the experiment, the students in the control group did not receive feedback, while the treated students did. All other educational activities were open and available to both groups. All students, for example, had to write digital formative tests and had to make the same (digital) homework. 15 questions, but also contained a few open questions, covering all 8 topics of ‘heredity and evolution’ that were discussed during the past 6 weeks. Hence, in this second experiment, the treatment consisted of the incorporation of information effort into the instruction time of students for about 10% of time in class (estimated to cover 100 minutes of effective time per week), while the control group maintained the usual instruction path, which devotes a large part of the time to knowledge effort (content oriented study time).12 To sum up both experiments, the first experiment evaluates whether rich feedback is effective, on top of the effect of testing, whereas the second experiment evaluates the effect of formative testing in itself. Rather than to vary the composition of the instruction time (knowledge versus information effort), such as the second experiment, the first experiment varies the human capital formation function in the sense that information effort is organised differently, which according to the literature (see Section 3) is hypothesized to provide better outcomes and thus to be more efficient. 5. Methodology and Data 5.1 Methodology To identify the Average Treatment Effect (ATE) of additional information provision to students on test scores we use the notation first used by Rosenbaum and Rubin (1983). We observe a student i’s posttest score and the treatment , which results in the following equation: , Where is the posttest score for treated students and (5) is the posttest score for untreated students. Since the randomization ensures the independence between the treatment and potential outcomes, we identity the ATE, defined as τ, as follows: 12 Similar to the previous experiment, the only difference between treatment and control students is that the students in the control group did not take tests, while the treated students did. Again, all other educational activities were open and available to both groups. 16 . (6) We can estimate the ATE using either simple T-tests or using a linear regression. The linear regression is estimated as follows: , Where is the treatment status of student i, (7) are the students’ observable characteristics, such as ability, and other student characteristics, which are independent of the treatment and are the residuals at the student level. The ATE is determined for both experiments separately. 5.2. Data Table 1 presents the descriptive statistics of students involved in this study. Note that these experiments use the same student sample as was used in Haelermans et al. (2014). Table 1 shows that the 114 students are rather equally divided over the five classes, with each having between 18 and 26 students. The average age was a little over 13, although age ranges from 12 to 15, mainly due to grade repetition, which is quite common in the Netherlands, especially in prevocational education. Two percent of the students repeated a grade and 57 students (50%) are female. The average primary school ability test score is 530 (test is scored in a potential range of 500 to 550), with a minimum of 516 and a maximum of 545. The average monthly income of the neighbourhood in which the student is living (an indicator of Socio Economic Status, SES) is 2129 euros, but ranges from about 1500 to 3400 euros per month. Lastly, Table 1 shows the average score (between 0 and 1, interpreted as percentage answered correct) at the small knowledge tests (pretests) that are written before the new topic started. They are developed to measure prior knowledge of that specific topic and consisted of very easy questions on the topic, based on knowledge students should have gathered in previous years. The average score of the small knowledge test on ‘sexuality and relationships’ is 0.56 (56%), whereas the average score at the small knowledge test on ‘heredity and evolution’ is 0.59 (59%). 17 Table 1 – Descriptive statistics Obs. Average St. Dev. Min Max Age 114 13.21 0.52 12 15 Female 114 0.50 0.50 0 1 Grade repetition 114 0.02 0.13 0 1 Primary school ability test total score 110 530.60 5.74 516 545 108 2129.93 314.09 1532 3358 114 0.59 0.10 0.3 0.83 114 0.56 0.10 0.3 0.74 Average monthly income neighbourhood Pretest prior knowledge test on ‘heredity and evolution’ Pretest prior knowledge test on ‘sexuality and relationships’ Table 2 – Verification of the equality of expectations: T-statistics and Mann-Whitney Statistics of treatment and control group regarding student characteristics at the start of the experiments n Control group Variable Average n Treatment group Std. Dev. Average T-statistic Std. Dev. Age 58 13.18 0.51 56 13.23 0.54 -0.43 Female 58 0.53 0.50 56 0.46 0.50 0.74 Grade repetition 58 0.00 0.00 56 0.04 0.19 -1.45 Primary school ability test total score 56 531.30 5.30 54 529.87 6.12 1.20 55 2107.60 319.86 53 2153.09 309.32 -0.75 58 0.59 0.10 56 0.59 0.11 0.22 58 0.55 0.10 56 0.56 0.10 -0.34 Average monthly income neighbourhood Pretest prior knowledge test on ‘heredity and evolution’ Pretest prior knowledge test on ‘sexuality and relationships’ χ²-statistic Variable n n Class 58 56 7.2923 Place of residency 58 56 2.5984 * = significant at the 10% level; **=significant at the 5% level; ***=significant at the 1% level. 18 Table 2 presents the observable characteristics of the treatment and control group, as well as the T-statistics/χ²-statistics on the differences between the groups. A joint F-test on all these characteristics (F (30,74)=0.47, p=0.98) also does not show a significant difference. Although in some cases the point estimates differ considerably (e.g. for grade repetition and gender), the lack of difference in both pretest scores indicates that there is no difference in prior knowledge. Therefore, apart from the randomization, these statistics indicate that we can trust (with a significance level of 5%) the treatment and control group to represent the same population. 6. Results 6.1. Baseline results The first results we present (Table 3) are the simple T-statistics of the difference between treatment and control group at the posttest of the two experiments in which we provided the students with additional information. In Table 3, we see that both experiments show highly significant positive effects of providing students with additional information. The effect of the expansion of information effort through intermediate tests is about 8 percentage points, which is a medium to large standardized effect of about two thirds of a standard deviation. The effect of a switch in the technology of information transfer (extended feedback with the test) is about 5 percentages points, which corresponds with a medium standardized effect of .45 of a standard deviation. Consequently, the basic effect of the incorporation of tests (a rebalancing of instruction time towards a larger proportion of information effort) has a considerably larger point estimate than extended feedback (an additional13 extension of the specificity of information). The former has about 1.5 times the effect of the latter. 13 Theoretically, the extended feedback is an effect that is added to the basic testing effect, but the additional nature of the effect is not tested, because we have consecutive experiments rather than parallel evaluations of two treatment conditions. 19 Table 3 – Baseline Results: T-tests of the two information experiments Variable Control Treatment group (n=58) group (n=56) Mean Standardi- difference zed effect Average Std. Dev. Average Std. Dev. size T-statistic Posttest Experiment Information via Formative Testing 0.51 0.12 0.59 0.12 0.08 0.67 -3.44 *** 0.51 0.11 0.56 0.11 0.05 0.45 -2.36 ** Posttest Experiment Information via Feedback while Testing Test scores are between 0 and 1, 1 being the highest performance. * = significant at the 10% level; **=significant at the 5% level; ***=significant at the 1% level. 6.2 The Returns to Additional Information Efforts The next step is to analyse the returns to additional information provision using regression analysis, for both experiments separately. The results of these regression analyses are presented in Tables 4 and 5. In both tables, we present five models: Model 1 is the basic model in which no covariates are included. In model 2, we include the average score for the pretest T0. In model 3, we add student characteristics and variables that account for student ability and past education, such as age, gender, grade repetition and the total scores for the standardized exit exam of primary education. Moreover, we add dummies reflecting to which class a pupil belonged so as to control for potential class effects. In the fourth model we add the environmental variable “average income of the neighbourhood”. This variable is taken from administrative records of the tax administration and reflects the average SES of all households within a particular neighbourhood. As such model 4 combines measures of student characteristics (age, gender), student ability (e.g. grade repetition and primary school ability test) with indicators of the environment (e.g. income) and current education (e.g. class). It thus avoids overestimating the direct impact of the treatment. Lastly, in model 5, we replace the environmental variable “average income of the neighbourhood” with a fixed effect for place of residency, which is a compound measure of all variance that can be attributed to the place of residency of the student. 20 The results presented in Table 4, on the returns to direct feedback, show that the difference in average score on the posttest between treatment and control group is 0.05 (0.45 of a standard deviation), which is the same as the difference that was shown in Table 3, and this is consistently the case for all five models presented in Table 4. The inclusion of covariates does neither change the magnitude of the effect nor the significance of the effect. As could be expected from a randomized experimental setup, the positive, medium effect of testing with direct feedback as an information effort is very robust across models, even when we include neighbourhood information. The results in Table 5, on the returns of formative testing without feedback, show a very similar story. Here we also see a very consistently significant coefficient, which is exactly the same as in the baseline results of Table 3. The medium to large effect of formative testing is also very robust and it is 1.5 times larger than the effect of additional information via formative testing with feedback, similar as was shown in Table 3. Note that it is possible that there are some spill-over effects, given that the students receive instruction and engage in group work during the non-computerized lesson of the week at the level of their class, hence in a mixed environment of experiment and control students. In the case there are spill-over effects, the effects we find of 0.45 and 0.67 of a standard deviation are lower bounds. 21 Table 4 – The Effect a Relative Increase of Information Effort via Enhanced Feedback during Formative Tests Model 1 Model 2 Model 3 Model 4 Model 5 Number of obs. 114 114 110 104 110 R-squared 0.05 0.15 0.23 0.23 0.37 (1,112) 5.57 (2,111) 9.98 (10,99) 2.94 (11,92) 2.54 (11,92) 2.54 0.02 0.00 0.00 0.01 0.01 F-stat Prob > F Treatment (qualitative deepening of information effort via feedback) 0.05 2.36 ** Pretest score 0.05 2.57 ** 0.05 2.68 *** 0.05 2.37 ** 0.05 2.18 ** 0.34 3.71 *** 0.27 2.65 *** 0.26 2.37 *** 0.26 2.11 ** Age -0.01 -0.57 -0.01 -0.41 * -0.01 -0.29 Female -0.02 -1.11 -0.02 -1.01 -0.04 -1.85 * Grade repetition 0.02 0.26 0.03 0.33 0.02 0.24 Primary school ability test total score 0.00 1.67 * 0.00 1.87 ** 0.00 1.37 Class 2 0.04 1.27 0.04 1.26 0.01 0.40 Class 3 0.06 1.98 * 0.06 1.81 * 0.01 0.37 Class 4 -0.01 Class 5 0.07 -0.33 2.25 ** -0.03 -0.70 0.07 2.23 ** 0.00 0.57 -0.09 0.04 -1.74 * 0.94 Average monthly income neighbourhood Place of Residency fixed effects Constant yes 0.51 36.54 *** 0.31 5.49 *** -1.45 -1.21 -1.88 -1.46 -1.12 -0.84 Reference is class 1. Test scores are between 0 and 1, 1 being the highest performance. * = significant at the 10% level; **=significant at the 5% level; ***=significant at the 1% level. 22 Table 5 – The Effect of a Relative Increase of Information Effort via Formative Testing Model 1 Model 2 Model 3 Model 4 Model 5 Number of obs. 114 114 110 104 110 R-squared 0.10 0.24 0.42 0.42 0.61 (10,99) 7.02 (11,92) 6.15 (11,92) 6.15 0.00 0.00 0.00 F-stat (1,112) 11.84 Prob > F (2,111) 17.54 0.00 0.00 Treatment (enhancement of information effort by weekly tests) 0.08 Pretest score 3.44 *** 0.07 3.59 *** 0.09 4.35 *** 0.08 3.84 *** 0.08 4.16 *** 0.48 4.60 *** 0.44 4.37 *** 0.44 4.22 *** 0.44 4.00 *** ** Age -0.06 -3.02 *** -0.06 -2.87 *** -0.05 -2.66 Female -0.04 -1.81 * -0.04 -1.67 * -0.03 -1.40 Grade repetition -0.07 -0.84 -0.06 -0.82 -0.12 -1.59 Primary school ability test total score 0.00 1.66 0.00 1.77 0.00 0.81 Class 2 0.01 0.23 0.01 0.42 -0.01 -0.26 Class 3 0.06 2.11 0.06 2.05 0.02 0.52 Class 4 -0.01 -0.23 0.00 -0.12 -0.04 -0.94 Class 5 0.03 0.85 0.03 0.94 0.01 0.41 0.00 1.48 * ** * ** Average monthly income neighbourhood Place of Residency fixed effects Constant yes 0.51 32.06 *** 0.24 4.02 *** -0.91 -0.75 -1.27 -0.98 0.15 0.12 Reference is class 1. Test scores are between 0 and 1, 1 being the highest performance. * = significant at the 10% level; **=significant at the 5% level; ***=significant at the 1% level. 23 6.3 Heterogeneity of the Effect When Azmat and Iriberri (2010) analyze the effect of informing students about their relative performance (grade rank within their class), they test a more specific functional form of the human capital formation function than we have developed so far. In effect, they assume – as many others in the literature (see Todd & Wolpin, 2003) – that complementarities exist between the inputs. As a result, the effect of information differs according to the ability level of the student and, hence, one may expect the distribution of the outcomes to change form, i.e. becoming wider as highly able students fare better than less able students. In other words, if complementarities between the inputs are a substantively important part of the human capital formation function, one may assume our experiment with information effort to exhibit heterogeneous effects. More generally, it may be of concern for the instructional designer (the teacher) and the school management whether the intervention favors some or rather all of the pupils. To test the distributional effect of the experiment, we ran a set of quantile regressions, investigating the difference of the distribution of outcomes between the control and the treatment group. We summarize the results in Table 6. The table shows the coefficients of the treatment dummy for four positions of the distribution (20, 40, 60 and 80%) and, as a reference, repeats the OLS coefficients already reported for model 5 in Tables 4 and 5. The estimates basically confirm the general results we obtained earlier for the complete distribution. The impact of formative testing (experiment 2) is stronger than the additional impact of extensive feedback (experiment 1), but it is positive across the distribution. Moreover, additional statistical tests cannot reject the hypothesis that the coefficients are of equal size for all the quantiles tested. Hence, much like Azmat and Iriberri (2010) did in a Spanish high school setting, we find that the provision of information benefits all types of students. The full distribution of outcomes is shifted, which is also illustrated in a purely descriptive sense in Figure 2 and Figure 3. 24 Table 6 – The distributional effect of the experiments: quantile regression coefficients Model 5 Coefficient estimate estimate experiment 2 (Table 5) Q40 Q60 Q80 for experiment 1 (Table 4) Coefficient Q20 0.05 ** 0.04 0.06 * 0.06 * 0.03 0.08 *** 0.06 ** 0.06 ** 0.08 *** 0.10 *** for N = 110. * = significant at the 10% level; **=significant at the 5% level; ***=significant at the 1% level. All regressions include the same set of control variables as our original specification 5 (see tables 4 and 5 for an overview). Significance tests in the quantile regression derive from bootstrapped standard errors. Though the point estimates of the quantile coefficients vary in size, they are not different in a statistically significant way. 2 0 1 Kernel density 3 4 Figure 2 –Kernel density estimate of the post-test of experiment 1 .2 .4 .6 .8 Value of post-test experiment 1 Control 1 Experiment 2 1 0 Kernel density 3 4 Figure 3 –Kernel density estimate of the post-test of experiment 2 .2 .4 .6 Value of post-test experiment 2 Control .8 Experiment 25 7. Conclusions and discussion 7.1 Conclusions In this paper, we study the technology of education and focus on the productivity of learning activities that provide students with information on their learning progress (‘information efforts’) relative to the more traditional instructional activities (‘knowledge efforts’). More specifically, we wonder whether it pays to trade teaching time for tests and whether it matters how tests are organised. The answer is that it does pay, for both cases. Our first experiment confirms that educational outcomes are improved when teachers incorporate formative, multiple choice tests with extended, personalised feedback in their classes. The results are significantly higher than for students that got similar weekly tests with hardly any feedback (an indication of answers right and answers wrong at the end of the course). Moreover, the second experiment shows that only administering these weekly tests with basic feedback also generates positive and significant effects, and that the effect size of the latter experiment is (much) larger than the effect size of the first experiment. In other words, the efficiency of information effort can be further enhanced when students do not only take weekly informative tests, but are also given information about the nature of their mistakes and successes. However, administering weekly formative tests in itself is already very effective. All in all, the experiments show that incorporating intermediate testing makes for a more effective learning process, and thereby a more productive human capital formation function. 7.2 Discussion Our (relatively) large results, especially given the limited time span and relatively small sample size, are very comparable with the standardized effect sizes of formative testing and feedback found in educational literature. Our effect of 0.67 of a standard deviation for formative testing is a bit larger than what most experiments find. However, experiments with feedback often find a larger effect size than the 0.45 that we find, but it should be noted that these experiments compare testing with feedback with no tests at all, whereas we compare the situation with and without feedback, where both treatment and control group get formative tests. Although we have not tested this, it is likely that the effect size of an experiment where the control group does not receive any information, from tests or feedback, generates an effect size of both our effect sizes added up, i.e. above an effect size of around 1. This again would be similar to what is found in literature (see e.g. Phelps, 2012). 26 Having said that, it is interesting to note that the effect of formative testing is so much larger than the effect of extended feedback, while the latter experiment was conducted in a larger time span (8 versus 6 weeks). Apparently, the switch between knowledge effort and information effort for about 10% of instruction time is more effective than the deepening of the information, a switch in the method (‘technology’) of information transfer, so to speak. However, it should be reiterated that the difference in effect size represents the difference between a comparison of some and no information versus a little and a lot of information, which makes the effects not directly comparable. Indeed, the feedback experiment was conducted in an environment which takes the formative testing experiment for granted, because all students were given the weekly, formative tests. Because the experiment on extended feedback produces significant increases in the outcome, future research should look into this matter more deeply. So far, the results suggest that it pays to sacrifice instruction time, by allowing students to spend time on tests during class. When, as we call it, knowledge effort is traded for information effort, the outcome of learning is improved and, hence, the efficiency of education is enhanced, because a better outcome is reached in the same time. Whether the latter is true from the viewpoint of the student is not entirely for certain. Earlier research (see Section 3) pointed out that testing may have indirect effects, as it stimulates students to put more effort into studying at home. The latter is not directly tested in this paper, but group discussions after the experiments revealed that treatment group students substitute some time they would otherwise spend on homework for time preparing he formative tests, in the second experiment. Therefore, they did not spend extra time on their homework during the experiment. Todd and Wolpin (2003) suggest that indirect effects may also be at play for parents, who may compensate for school inputs (substitution) or become motivated by them (complementarity). We have no data to investigate this part of the educational process, but assume the latter not to be a major part of the explanation, because the experiment is hardly visible to the parents. Furthermore, the additional effort of the teacher in developing the digital formative tests is a cost factor that needs to be taken into account when thinking of implementation. However, it was a one-time effort (as these tests can be used again next year). The question is if most teachers would have to prepare these tests at all, as they are often provided by the publisher when digital learning material is purchased by a school. The cost of answer-driven feedback is probably higher, as the teacher will have to develop the algorithm in the computerized learning environment. However, this also is a one-time cost. 27 Although the additional effect of providing feedback on wrong (and correct) answers is definitely present, providing information via formative tests on itself is very effective already, at a much lower cost. Especially in this era in which computerized education is becoming more common, additional information provision via these types of formative tests is relatively easy to implement at a low cost, as soon as schools start working with digital learning materials. In essence, the policy consequences of these findings are clear. Much as in earlier North-American research on the topic (Phelps, 2012; Larsen & Butler, 2013), we find in the context of prevocational education in the Netherlands that regular multiple choice testing (with or without extended feedback) improves the learning outcomes of students. Educational policy makers may want to stimulate teachers (and teachers’ educators) to adopt the testing environment widely. However, the theoretical framework we developed at the beginning of this paper, advocates for some caution. While the experiments substantiate that incorporating intermediate tests is likely to improve the learning outcomes, they do not specifically prove the necessity of a particular instructional choice. It may well be that the original balance between information and knowledge effort was suboptimal and the tilting of the balance introduced by the experiments, was the crucial element rather than the choice for formative tests. In other words, it remains unclear whether information effort can be treated as a homogeneous class of instructional activities or, rather, whether the internal composition of the overall effort type also matters. To clarify the latter one may think of experiments that hold constant the time distribution between information and knowledge effort and vary only the setup of the information activities. Moreover, the experiments do only test three particular combinations of information and knowledge effort and hence provide hardly any information on the shape of the human capital formation function. To the extent that formative tests and feedback on these tests are activities that can be seen as homogeneous, our empirical results are in line with the assumption of decreasing marginal returns, as we observed the top-up of testing with extended feedback to yield positive, but smaller gains than testing in itself. Over time, further experiments should therefore clarify what the optimal balance is between information effort and knowledge effort. In our experiments about 10% of instruction time was spent on testing. The remainder was divided between classic classroominstruction, computer assisted instruction and practising time. Future research may shed light 28 on the form of the human capital formation function and clarify, for example, at which point the various educational activities are equal in their marginal contribution to the outcome. 29 References Azmat, G., & Iriberri, N. (2010). The importance of relative performance feedback information: evidence from a natural experiment using high school students. Journal of Public Economcis, 94, 435-452. Becker, G. S. (1964). Human Capital: A theoretical and empirical analyis with special reference to education (3rd ed.). Chicago: University of Chicago Press. Carneiro, P. M., & Heckman, J. J. (2003). Human Capital Policy. NBER Working Paper 9495. Cambridge MA: NBER. Cunha, F., & Heckman, J. (2007). The technology of skill formation. American Economic Review, 97(2), 31-47. Haelermans, C., & Ghysels, J. (2013). The Effect of an Individualized Online Practice Tool on Math Performance - Evidence from a Randomized Field Experiment. TIER Working Paper. Haelermans, C., Ghysels, J., & Prince, F. (2014). Increasing performance by differentiated teaching? Experimental evidence of the student benefits of digital differentiation. British Journal of Educational Technology, accepted for publication. Larsen, D. P., & Butler, A. C. (2013). Test-enhancing learning. In K. Walsh (Ed.), Oxford Textbook of Medical Education (pp. 443-452). Oxford: Oxford University Press. Phelps, R. P. (2012). The effect of testing on student achievement. International Journal of Testing, 12(1), 21-43. Roediger, H. L., & Butler, A. C. (2011). The critical role of retrieval practice in long-term retention. Trends in Cognitive Science, 15(1), 20-27. Roediger, H. L., & Karpicke, J. D. (2006). The power of testing memory - Basic research and implications for educational practice. Perspectives on Psychological Scienc, 1(3), 181-210. Rosenbaum, P. R., & Rubin, D. B. (1983). The central role of the propensity score in observational studies for causal effects. Biometrika, 70(1), 41-55. doi: 10.1093/biomet/70.1.41 Schultz, T. W. (1961). Investment in Human Capital. American Economic Review, 51(1), 117. Teffer, P. (2013, June 16). 4 Dutch Suspects Arrested in Widening Exam Scandal, New York Times. Retrieved from http://www.nytimes.com/2013/06/17/world/europe/17iht- educbriefs17.html 30 Todd, P. E., & Wolpin, K. I. (2003). On the specification and estimation of the production function for cognitive achievement. Economic Journal, 113(F3-F33). 31 Annex: Efficient human capital production with rising productivity of information effort Given the fixed length of lesson T and the assumption that only time counts, in the lessons, the output maximisation problem can be written as in equations (A1) and (A2) (A1) (A2) Which leads, via the general optimality condition of efficient production (marginal rate of technical substitution equals price ratio) (equation (A3)) MRTS (A3) To the optimality condition of equation (A4) (A4) Which gives a particular, optimal time allocation ( , ) satisfying the property: (A5) If subsequently, a more productive technology f’ is introduced such that (A6) while leaving all other aspects of the problem unaltered, i.e. both eI and eK exhibit strictly positive, but marginally decreasing returns in any production function ( f and f’ ) and the productivity of knowledge effort eK is the same in both production functions (equation A7) (A7) and the teacher is still determined to use all time available T in order to maximise the learning output S, then it follows from the optimality condition in equation (A3), and specifically from the fact that the new and higher productivity level of information effort can only be equalled at the margin on the side of the production factor knowledge effort by reducing the amount of knowledge effort (i.e. decreasing marginal returns), 32 that in the new equilibrium ( , ): (A8) (A9) (A10) Or in words: a higher level of learning output allocation to information effort effort is reached with a relatively higher time (and, automatically, a lower time allocation to knowledge ) 33
© Copyright 2026 Paperzz