University of Massachusetts Amherst ScholarWorks@UMass Amherst Masters Theses 1911 - February 2014 Dissertations and Theses 1982 A construct validity study of the sentence verification technique as a method of measuring reading comprehension. Douglas J. Lynch University of Massachusetts Amherst Follow this and additional works at: http://scholarworks.umass.edu/theses Lynch, Douglas J., "A construct validity study of the sentence verification technique as a method of measuring reading comprehension." (1982). Masters Theses 1911 - February 2014. 1745. http://scholarworks.umass.edu/theses/1745 This thesis is brought to you for free and open access by the Dissertations and Theses at ScholarWorks@UMass Amherst. It has been accepted for inclusion in Masters Theses 1911 - February 2014 by an authorized administrator of ScholarWorks@UMass Amherst. For more information, please contact [email protected]. A Construct Validity Study of the Sentence Verification Technique as a Method of Measuring Reading Comprehension A Thesis Presented By DOUGLAS JAY LYNCH Submitted to the Graduate School of the University of Massachusetts in partial fulfillment of the requirements for the degree of MASTER OF SCIENCE September 1982 Psychology A CONSTRUCT VALIDITY STUDY OF THE SENTENCE VERIFICATION TECHNIQUE AS A METHOD OF MEASURING READING COMPREHENSION A Thesis Presented By DOUGLAS JAY LYNCH James M. Royer, Chairperson of Committee / lancy A./Myers, Member Ronald K. Hambleton, Member Bonnie Strickland, Department Head Psychology TABLE OF CONTENTS INTRODUCTION 1 Chapter I. NORM REFERENCED READING COMPREHENSION TEST QUESTIONS Characteristics of norm referenced reading comprehension tests Do reading comprehension tests measure reading comprehension? Test performance is not highly associated with reading the passage Reading comprehension tests and intelligence tests Reading test performance varies with type of test question II. 4 4 6 7 12 21 AN EXPERIMENT INVESTIGATING THE CONSTRUCT VALIDITY OF THE SENTENCE VERIFICATION TECHNIQUE Method Results and discussion Final discussion Concluding remarks TABLES and FIGURES 35 50 58 69 74 76 BIBLIOGRAPHY 105 APPENDIX 109 • • * 111 LIST OF TABLES 1 . 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16.. Mean Reading Comprehension Test Performance either with Passages or Without Passages .... Structural Variables used by Dunn (et. al 1981) to Predict Test Performance The Prediction of Reading or Aptitude Test Performance from Earlier Tests Test Performance as a function of Type of Test Question and Deleated Text Wide Range Reading Test and Metropolitan Reading Test scores from the Direct Instruction Project Sample of Test Sentences used by Sachs (1974) Sample of Test Sentences used in the SVT Mean Proportion Correct SVT Scores by Content Type and Expertise of Subjects Pairs of Tests Administered throughout the Semester Mean Proportion correct Scores by Passage and Time of Test Session Mean Proportion Correct SVT Scores by Question Type and Time of Administration .... Mean Proportion Correct SVT Scores by Question Type, Content of Passage and Group Mean Proportion Correct SVT Scores by Question Type, Content of Passage, and Passage Pair Analysis of Variance Table of Proportion Correct SVT Performance Mean Combined SVT and Confidence Rating by Content Type and Time of Administration Mean Combined SVT and Confidence Rating by Question Type and Time of Administration Mean Combined SVT and Confidence Rating by Question Type, Content of Passage, and Group Mean Combined SVT and Confidence Rating by Question Type, Content of Passage, and Passage Pair Analysis of Variance Table of Combined SVT and Confidence Rating Variable . . 17. 18. 19. iv 76 77 78 79 80 81 82 83 84 85 88 89 90 91 92 93 94 95 96 20. 21. 22. 23. 24. 25. 26. 27. Mean Confidence Ratings per SVT Test Sentence for Correct and Incorrect SVT Responses by Content Type and Time of Test Session Mean d' scores by Content Type and Time of Test Session Analysis of Variance Table of d' Variable Sample of Idea Units from one Passage Mean Recall Scores by Content and Time of Test Session Analysis of Variance Table of Recall Variable Correlations between SVT and Recall Performance by Time of Test Session and Content Type for d' and Proportion Correct Scores Conditional Probabilities of SVT Performance given Recall Performance by Content Type ... v 97 98 99 100 101 102 103 104 LIST OF FIGURES 1. Mean SVT Proportion Correct, Combined SVT and Confidence Rating, and d' Scores by Content Type and Time of Test Session 86 vi INTRODUCTION Overv iew of the Thesis The primary investigate purpose construct the verification technique as comprehension. The chapters. The first suggesting that two assessing of this validity organized chapter presents the reading comprehension tests methods critiqued chapter procedure and in most into two research common methods of on referenced norm are inadequate. multiple the to the sentence of is thesis of is method of measuring reading a comprehension reading thesis one choice are The two cloze the procedure. The second chapter presents an experiment investigating the construct validity of the sentence verification techn ique The first chapter reports research which suggests that cloze tests and multiple choice test questions are inadequate methods of measuring on comprehension reading norm referenced reading comprehension tests. There chapter. Each are three sections section presents within the first investigating research 1 either the 2 cloze technique assessing multiple or reading choice comprehension. presents evidence that test technique The of first section performance on several major norm referenced reading comprehension tests using multiple choice questions is not dependent the passages The second from section demonstrating norm performance several studies referenced reading highly is intelligence test performance. suggests possibility the measuring presents that intelligence comprehension. associated This relationship may tests the rather that chapter comprehension reading be reading than The third section of the first evidence reading the questions were derived. reports that comprehension test with which on test performance varies with the type of test question. The second chapter of this thesis investigates construct validity technique as comprehension. a of method previous research which used task to experiments sentence assess by presents memory other a of text comprehension are described technique in example of an sentence researchers verification reading measuring of chapter The verification sentence the the verification sentences. Two used the which assess detail. The reading primary 3 purpose of chapter the is experiment which extends these while supporting description the previous two argument the verification technique is a that an studies, sentence the valid method of of measuring read ing comprehension it's most general form, In the read ing stud en ts technique The . the semester. same read ing s the sentence of college ver if ica tion tuden ts were tested early and late in laboratory whether there comprehension are by performance assesses All of the students were members of psychology investigates which comprehension measured as the experiment related to course was performance the The . experiment improvement for knowledge the in the those passages the presumably learned in the psychology cour se students CHAPTERI NORM REFERENCED READING COMPREHENSION TEST QUESTIONS Characte ristics of Comprehension Tests many school Norm Referenced Reading Norm referenced tests are used . systems to assess reading comprehension. These tests use different types of questions. the most common types of test choice Achievement Reading; Test; test questions Metropolitan Two questions referenced reading comprehension tests multiple are (cf. of norm on cloze and California Achievement Test, Iowa Test of Basic Skills, Reading; Wide Range Reading Test) . Cloze tests are constructed by deleating every word from a text. either deleated. supply nth Usually the deleated words are every fifth random or function word. to by or select The subject's task is the word which has been The test scores are usually reported as the absolute number of correct responses, or the percentage of correct responses out of the total possible responses The second question type of test 4 which is often 5 used in norm referenced the multiple question choice can be first part is questions based. The one of these is answers choice upon passage which may be (commonly answer the to correct the the the "poses the question", specifying question. the four or five answers. is The several The second part is the stem of stem component multiple text or The the information required third The into three components. passage sentences of text. question. question. divided the are reading comprehension tests is called answer, and distractors) The Usually other the keyed as are incorrect responses. There questions different are used types reading in of multiple comprehension tests. distinguishing feature of the different relationship Johnston between (Note described 1) passage-question passage the relationships developed by Pearson and multiple choice Johnson types is The the the question. and three types from a (1978). of typology The three are textually explicit, textually implicit, and scriptally implicit. types of Textually in choice explicit questions have both the information the question stem and single questions sentence in the the correct answer stated in passage. a Textually implicit 6 questions have the information in the question stem and correct the passage. the answer in different sentences of Scriptially implicit questions have some the of information required to answer the question in the passage, but subject the must supply additional information from world knowledge or "script" [1] of the topic. may These three types of multiple choice questions be used in testing situations in which the passage is either available or not available for subject the selects three types of test types of text answer. an questions rereading as Therefore, with the crossed with the two availability conditions, there are six possible types of multiple choice questions used to measure reading comprehension. Do comprehension read ing comprehension ? tests measure reading This section of the chapter will review evidence suggesting that many reading comprehension may not be measuring reading comprehension. The first area of research demonstrates that examinees can tests perform using a at above multiple passages. The chance levels on standardized tests choice second without format area evidence that performance on of norm reading research referenced the presents reading 7 comprehension tests highly is associated performance on intelligence tests. this chapter reading function reports research comprehension of the with The last section of that performance indicates varies that as a test question used to assess reading comprehension Test the performance is not highly associated with reading passage demonstrated multiple reading major A . that choice study by examinees questions who of responded 5th, 6th and reading comprehension without reading performance grade students either passages. the scores tests from an Table 1 reports passage. after reading or He gathered unusually large test sample comprehension the mean scores for the six reading compared to mean scores of the tests when answered The the referenced tests when subjects answered questions without passage Tuiman answer in norm (n=600) of students on six major reading subjects the standardized passage. the multiple choice questions contained the to comprehension tests could receive above chance 4th, tests. (1973-1974) several scores even if they didn't read had Tuiman the table questions also after reading the shows the mean test scores 8 which would be expected if basis of chance alone. students answered on the It is clear from Tuiman's data the students can receive test scores considerably above chance without reading Cunningham (1981) report indicates that when they answer the passages. members at University of can Center choice for Illinois, study which questions without Study the of Reading, answered 45 questions from comprehension passages. the formal and Seventeen secretaries and staff fifth grade level reading reading Royer also perform above chance multiple reading passages. less a adults the the test a without For 36 out of the test items, the scores were significantly above chance. Tuiman and Gray conducted (1972) study a demonstrating that 7th grade students could answer most of the multiple choice questions included in comprehension test The researchers used even three passages contained in original form, the non-function 2) words if the these three types versions tests: deleated, and of reading the text was incomplete. passages 50% of the words deleated. a of 1) with 3) the reading the passages in 30% of the the passages with The students read one of passages and answered multiple choice questions based on the passages. The mean test 9 performance of group the which read reduced 30% passages was only 13% less than the group that read the unmutilated passages. The group that read the text which had half of the words missing had scores 23% less than the group that read the whole text. The studies by Tuiman (1973-1974; 1972) and Royer and Cunningham (1981) performance when examinees did not upon Tuiman which Gray, & investigated test read passages the multiple choice questions were based. the They found that test performance was not dependent upon reading the text and Cook (1981) performance passages. A study by Drum, Calfee, provides additional evidence that is not dependent test upon comprehending the passage part of multiple choice questions. Drum, Calfee, relationship norm between referenced performance and on reading comprehension value for each test question. A p by tests. value is the proportion of all Test examinees particular test question correct. reported were by major 18 was indexed study and performance comprehension the the tests analysed They tests. the p investigated (1981) several structural properties of reading the Cook obtained from the test publishers. The p who get a values used in norming statistics Drum (et.al., 1981) 10 specified the structural components of the test by identifying sixteen predictor variables characterizing the passage, the stem answer, and the of the incorrect question, the correct answers. The sixteen variables mentioned above are listed in Table were There 2. four variables for each of the four components of the test quesion. By using al., a multiple regression procedure, Drum (et. found the 16 predictor variables accounted 1981) for 72% of the variance in test performance. only of 12% However, variance was associated with the this predictor variables representing the structure passage. Structural question and accounted the in variability correct in of answers for 59% of the variance in test performance. The small amount of associated with variability passage test in performance characteristics poses significant question about norm referenced tests multiple use choice questions. test performance a which Although the primary purpose of the test is to assess the text, the the stem of the incorrect and 4 appears comprehension to be more of highly associated with reading questions and answers. Drum (et. suggests test al . , 1981) address performance is a second issue that difficult to interpret. 1 Passage difficulty inconsistent. and item difficulty question on but stem answer. or the same test may have comprehension of the of a question may difficult al. "vocabulary, require phrase. is that proposi tional test-taking manner." report 1981) f demands (p. are From the their tests, Drum difficulty the density, syntax, changed in all Another similar passage, a structural analysis of 18 norm referenced (et. often For example, one passage may have easier words than the words in the stem the are a of and confounded 511) One implication of the Drum (et. al that be reading test scores may . , study 1981) difficult to unambiguously interpret. components the test do not vary systematically and of Since characteristics of the passage are not performance, test to one can structural the highly rarely related isolate the variables which account for test performance. research This section of the chapter has presented on multiple choice test performance. the research answered text, at the suggests above chance relative characteristics in that test Although much of questions can be levels without reading the independence of structural the stem and answers from the text 12 presents another correctly comprehend stem or possibility. answers possibility, the Examinees text, but find Given this and the evidence that subjects can answer reading the tests could easily misrepresent comprehension ability. correctly test score the question incomprehensible. the questions correctly without responded could might performance. In In the student's reading a the case where without comprehend the passages of the student reading the passage, the overestimate another passage, reading instance, the comprehension student might a but test, choose an incorrect response because the question stem or answers were incomprehensible. which underestimates This might lead to ability when reading text. chapter presents a student's the The next reading test score comprehension section of this different problem in interpreting test scores from norm referenced reading tests: a comprehension comprehension performance may be confounded with intelligence. Read ing comprehension tests intelligence tests describe . research This section of the which intelligence and reading comprehension suggests chapter and as measured by standardized will tests are closely related. 13 The research that will be cited contention by demonstrating high scores on intelligence supports correlations tests and between scores on reading comprehension tests. Initially evidence of these correlations be will presented, discussion of two alternative this high followed interpretations by of a the high correlations between intelligence test performance and reading comprehension test performance. Several studies report high correlations between intelligence test scores and reading comprehension test scores. and Harootunian (1966), for example, assessed 8th grade student's performance on the California Test of Mental Test 7th Maturity, (vocabulary and California the comprehension) , Achievement the Iowa Every-Pupil Test of Basic Skills, and an assortment 15 by other tasks. Guilford abilites. These tasks were similar to those used isolating in example, For different intellectual the "Critical Thinking" required the subject to decide whether inferences given information Words" task required the words when they calculated an scores these on of logical, was were subjects given in tercorr el a tion 17 and to part of matrix task from the "Incomplete identify a word. between whole Having the tests or tasks, he found reading 14 test scores (r=.57) were and highly correlated with IQ scores with Critical Thinking scores (r=.53). different study by Guice (1969) found that A reading comprehension scores of college students as measured by the Co-operative English Test of Reading was correlated with .64 their Comprehension scores from the Otis Quick Scoring Test of Mental Abilites. Sassenrath (1972-1973) presents analysis a different type of of the relationship between intelligence test performance and reading comprehension test performance. He collected 4th grade test scores of college, high school and students. intelligence, reading reading subskills. questions 1961; The used conceivable subskill previous in reading tests analysis", "word "perception of He wanted test to many other and research (Homes These researchers what & Singer used test they believed were all the subskills. A few examples of are "vocabulary in isolation", "verbal sounds", verbal "phoneme score similarity", relationships". calculated an in tercorrelation matrix composite represented The subskills were assessed by test assess to scores comprehension, Singer 1964,1965). questions test using and Sassenrath a single for each of the subskill tests. identify common reading factors which were 15 shared by factor analysed scores. several The result which the of correlations load same interest comprehension the same the tests. Therefore, between the he test logic behind factor analysis is that the correlations represent different on the underlying here same factor may cognitive trait. that is the The reading test and the intelligence test loaded on factor for both college high and school students A very different study by high positive Thorndike (1973-1974) found correlations between intelligence test scores and reading comprehension test scores. found such high He correlations between norm referenced reading comprehension test scores and intelligence test scores that he concluded that reading beyond decoding is inseparable from reasoning. from an elementary school aptitide test one year and comprehension test the a He gathered district norm next. test eight years. early performance on which Therefore, grades will an reading the performance data and the same students over scores Thorndike reports that test elementary used referenced represents reading comprehension test aptitude test scores predict amount of the variance in the scores of a a from substantial test given 16 later the in elementary grades. seven tests sequentially to variance in Table emphasize the the 7th grade Otis Alpha and reading test scores that is accounted reading or aptitide variance of performance. only grade the for by earlier The grade two for seven of 62. Otis H of the Alpha test The grade three Otis Alpha aptitude test accounts for 7.6% more grade seven aptitude test variance than the reading test Reading amount the 8th grade test performance. Metropolitan Reading test accounts reports the 3 tests and aptitude at the second tests are grade. both very effective in predicting later test performance. Several studies have been reported suggesting reading comprehension related. evidence The intelligence and for this are assertion that closely is the consistently high positive correlations between reading comprehension test scores The discussion will now and aptitude test turn to the two alternative interpretations of these high correlations. interpretation comprehension tests supported essentially The by second correlations and interpretation may performance that is be due scores. intelligence on first The reading on tests are the same cognitive process. suggests to that the high similarities in the test 17 rather than cognitive processes. Thorndike comprehension (1973-1974) is argues highly a that reading complex cognitive process which is inseparable from reasoning. If this is true, there are certain implications for how difficult it may be to improve the reading comprehension of students who perform poorly on norm referenced reading tests. These students may have general intellectual which may be difficult to improve. deficits Thorndike writes that a barrier "is set by the child's limited comprehension of what he reads, which see now as not primarily we deficit in one or more specific and reading skills but as And this barrier promises to the way of a wide range of future learnings." in In 147) (p. teachable reflection of generally meagher a intellectual processes. stand readily a improvements other in words, student Thorndike suggests that reading comprehension may be limited by the student's intelligence. An alternative correlations are comprehension similarities the cognitive reading of high the between intelligence test scores and norm referenced reading there interpretation processes comprehension. test scores argues between the tests rather than underlying Norm intelligence referenced and reading 8 comprehension tests tests the in be similar at least two ways: multiple dependent choice on test inferential comprehension, and used may 2) intelligence to performance on some of 1) questions reasoning there may be may be more than on language similar procedures by the publishers of tests in selecting items for both types of tests, thereby creating tests that are which may artificially measuring the same process. One type of require multiple inferential information the question reasoning implicit test question. of choice In is this type of question, part which needed is question correctly is in the text. information knowledge. of this must be answer the to Another part of the supplied from the reader's world Royer and Cunningham (1981) give an example type of question. The passage from which the question was derived was about going scriptally the camping. passage The a boy did and father his describe what not instrument was used to pound tent stakes. However, the question asked: "What did John's father probably use to pound the tent stakes?" may very knowledge. respond: well A depend child "pound The response upon without the of the reader's camping subject previous experience the stake with a rock", while a may child 19 with camping experience may mark "pound the stake with The scriptally question rich appears store knowledge of implicit Intelligence Scale effect For for titled "information". assess "knowledge multiple test also has a In example, this test, second on Wechsler a test subtest items an individual with an average (Ellias & Ellias "general information" subtest assessing reason comprehension general the the a The Stanford-Binet Intelligence 57) world knowledge (Kagen A test performance Children (WISC) has that Ellias 1977, p. This the opportunity may be able to acquire". & choice knowledge. also tests. response: require both reading skill and general intelligence keyed hachet." a to may the tests & Lang 1978). that and performance aptitude tests on are reading highly related may be due to item selection practices when the tests written. are Norm referenced reading comprehension tests are often called achievement tests. Popham comments (1978) upon the similarities between achievement tests and aptitude tests. He contends that norm are revised scores. referenced to If achievement achieve a test an tests appropriate continually distribution of quesion is consistently answered 20 correctly by examinees, that question will from the test. This suggests be removed that if teachers are successful in teaching the particular skill assessed by the test test. item, After achievement the test item may be removed from the few a test years, will Popham become contends less sensitive the to instructional effects if students do well on the tests. In time, the intelligence achievement test will measure just what tests were designed measure. to Intelligence tests assess cognitive processes which are "brought to school", rather than what school". If selection procedure positive Popham' "in contention is correct, the item s would correlation learned is contribute to high the between intelligence test scores and norm referenced reading comprehension test scores. This section of the thesis has that reading related to choice evidence comprehension test performance is highly intelligence test presented questions test which performance. require the Multiple reader to generate inferences may use similar cognitive processes to those used in intelligence tests. demonstrate comprehension whether are Research needs to intelligence inherently related and reading or whether the association is merely due to test characteristics. One 21 way this could be done is to show that readers can improve their reading comprehension performance without changing their intelligence. If inferential multiple students use to choice cognitive questions processes cognitive processes they would use on test, then require similar to the intelligence an norm referenced reading comprehension test performance may be confounded with intelligence. It is possible that another type of test question would yield a different students. wanted reading This most the comprehension comprehension presents a accurate reading problem for assessment performance. yield different score If two scores, for the teacher who a of reading types of questions which is more addressed this one accurate? The third section of this problem greater in detail chapter — focusing on research that demonstrates reading test performance varies type of test question used Read ing test question . to performance This section wi th type inadequate methods of test the chapter continues the argument that cloze and multiple choice test are the assess comprehension. varies of with for assessing questions reading 22 comprehension. emphasized is the type of The major problem that will be that reading test scores vary both with test question used measure to reading comprehension, and with the subjects who take the test. In other words, reading performance appears change to depending upon the type of test question and on certain characteristics research will different of the be Initially reported investigating how cloze tests yield performance comprehension reader. indices tests. from other reading Then research will be presented demonstrating differences in reading test scores due to the type of multiple choice test question and the world knowledge that the reader brings to the test. The cloze procedure comprehension the missing reportedly measures reading by assessing whether subjects can supply words from partially a deleated text. However, Carroll (1972) suggests that the cloze test is primarily a language rather than semantic understanding of measurement of syntactic redundancy a of text. Tuiman, Blanton, and Gray (1975) designed an experiment investigate to al . and , Carroll's 1975) study is Gray (1972) chapter. a assertion. different analysis of the experiment Two groups The Tuiman (et. of Tuiman reported earlier in this subjects answered either a 23 cloze test or a multiple choice test assessing their comprehension of the same group was text. Each test divided further into three conditions which differed in the amount of text which was subject: a 1) complete text, 2) that Table the deleated, six H test reports the mean conditions. test test because the and 3) (compared of the of performance reduced complete the to deleation the redundancy words cloze text test) eliminated the Since they believe the poor text. performance was due to the inability of the subject use redundancy, text performance in measure a Tuiman and Gray argue that the scores are so much lower on the cloze by had 50% of the words from the complete text deleated. for read text that had 30% of a the words from the complete text text question of regular a syntactic they infer cloze that test redundancy is cloze to test primarily a rather than reading comprehension Table are 4 also demonstrates that considerably scores. questions different This suggests may assess that cloze scores test from^ mul tiple choice test the two types of test different reading processes. study by Weaver (1963) provides further evidence A that cloze tests and multiple choice tests measure different 24 skills. Weaver (1963) college administered students. Eight of The remaining tests assessed language abilities. reading 18 tests to these were cloze tests. variety of a He factor analysed reading and the scores from the tests, finding that the eight cloze tests loaded on the factor. same The cloze test scores were highly correlated with each other, but cloze test scores not correlated with the scores were from other tests of reading comprehension. This research suggests that test scores from tests may present comprehension comprehension a different performance tests. Test estimate than scores certain characteristics of subjects. varied scores with different types of of reading other will depending on the type of multiple choice cloze reading also change question and Research showing multiple choice questions is cited below. Johnston (Note which subjects 1) recently reported a received different scores on comprehension test depending on the type of study a in reading multiple choice test question and the subject's world knowledge. He manipulated the type of the availability of text and multiple choice question, the relative knowledge the 25 readers had of the text. Eighth grade students from rural an school passages. and from Each passage had passages were school different a read three theme. The about corn, urban transit problems, and the Civil War. vocabulary urban a According to their test test, rural readers had a scores on a greater knowledge of corn, while the urban readers had greater knowledge of urban transit problems. They had equivalent knowledge of the Civil War. These texts were assessed with the three multiple thesis: choice textually scriptally questions mentioned explicit, implicit. A textually types earlier implicit, the information in textually explicit questions. textually question information is in the answer information. a sentences. use answer. across questions, different sentence In order to answer a the from textually information Scriptally implicit test questions have part of the answer in the must answer implicit implicit question, the reader must combine across and Therefore the reader does not have to combine information In the single sentence in the passage contains both the question information and sentences. in of text, but his world knowledge to generate the a reader reasonable These three quesion types were further varied 26 assess to the centrality of the test question. Test questions either assessed central ideas (main ideas) of the text or periferal ideas of the text. The other varied manipulation whether the Johnston's in subjects could experiment reread the text. This condition presumably placed different demands upon long term memory text when ( LTM) answering by altering the availability of the multiple choice questions. There were three memory demand conditions: no demand on LTM, slight demand on LTM, and greatest demand on Subjects could condition. were In reread the the slight text demand in the no condition, demand subjects not allowed to reread the text, but they answered the questions immediately after their initial In LTM. this test condition, the readers may have used their short term memory of the text to questions. demand The upon complete an LTM, third reading. test involved answer some condition and requiring of the greatest subjects to intervening task five minutes after their initial reading of the passage and then answer the questions without rereading the text. In varying demand, summary, Johnston assessed reading comprehension subject and knowledge, centrality of question the test type, memory question. He 27 reports three thesis: 1) results which are relevant to this prior knowledge of the topic accounted for a significant amount of the variance in test performance. Subjects with greater knowledge of percentage of theme had higher a correct answers than subjects with less knowledge of that theme. affected a The type of test 2) performance. The percentage question of correct responses was highest on textually explicit questions, followed Scriptally by textually implicit questions. implicit test questions had the correct responses. lowest percentage of Subjects scored higher when the 3) text could be reread than when they had to rely on their memory of the text. The basic conclusions which can be drawn from these results are that multiple choice reading performance varies with three conditions: the test question, whether theme. This previous suggests knowledge that assessments of It of the norm might yield reading performance. There may even be different types of questions test. of different referenced tests of reading comprehension different type subjects can reread the text, and whether subjects have text test on the same would be interesting to investigate whether the type of test question and the centrality of the 28 questions are equally represented referenced reading comprehension tests. of test question and major norm Both the type in the centrality of the question could be identified with considerable effort. However, the suggestion that multiple choice test performance is affected by the world knowledge of the reader a presents special problem in interpreting test scores. The problem is particularly evident receive low scores comprehension test. on a norm when referenced reading correctly suggest The scores may that the examinees have poor reading ability. the examinees may also have test and due to a subjects performed However, poorly on mismatch between their world knowledge the content of the multiple choice passages on test. The same examinees might perform at world reading test in which the passages matched their knowledge. examinees This could have reasoning their reading suggests to their world knowledge. referenced reading comprehension assess changes in reading tests that comprehension ability underestimated if the passage content does relate the higher a level if they answered another multiple choice comprehension the Or, were if used not norm to comprehension performance over time, the tests might be insensitive to changes in 29 reading ability. There is some suggestion that norm referenced reading comprehension tests may be insensitive to gains reading in comprehension socio-economic children. performance of lower The evidence comes from the educational improvement program called Follow Through. Follow Through was Congress in 1967 services. provide to primary grades with federal project authorized a poor educational, Appropriations did children health, by the in and social not allow all three of these services, so Follow Through was converted into massive educational established a research project. a The project variety of different programs to improve the education of poor children. Each of these programs attempted to improve the reading comprehension of poor children Reading comprehension performance, as norm referenced tests, over control groups in any 1977) that measured by did not improve significantly of the programs (Becker, This has generally been interpreted as evidence . the improve. children's But reading comprehension not it is also possible that the tests were insensitive to gains in reading comprehension. thorough did examination of one of the A more Follow Through 30 programs will suggest that reading comprehension have improved, but the norm referenced test used to assess reading comprehension was gain. The program might which insensitive that to most clearly presents this pattern was the Direct Instruction program. In this program, reading was initially taught emphasis on decoding skills. Later, stressed reading comprehension. comprehension (1977), According "skills" were structured format, with children and use information and had program Becker to taught within learning a extract to "rules" of problem solving. These skills were instructed which the with by using considerable structure. of the content areas were astronomy, content areas For example, muscle few a function, and measurement. Reading performance referenced was used tests. Metropolitan assess reports Total reading the decoding Reading Test comprehension by two norm performance. (MAT) was performance. The used to Table 5 reading percentile scores for the WRAT in pre-kindergar ten compared Instruction measured The Wide Range Reading Test (WRAT) assess to was did pre-kindergar ten not to assess children. after grade three. Direct reading comprehension of These scores suggest 31 decoding performance improved considerably due to the Direct Instruction experience. Table also 5 reports that the same children had less gain in the MAT reading comprehension scores. the 40th percentile. compared to mean After grade three they scored at This percentile control percentile of 20. a who Through (Becker Given different is the did not participate test scores, one may of children arrive Becker participating at suggested because examples experience of penicillin, the their vocabulary of 53*0 says He they were too the children had little exposure to the appropriate words in (p. comprehension. the vocabulary items on the MAT were too difficult for the children. "amazon ant, Direct in argues that the children (1977) did not improve considerably in reading these Follow interpretations of the reading comprehension Instruction. difficult in Carnine, 1978). & these abilities al." That be percentile of lower socio-economic children after the third grade He should He Becker homes. items from disease-causing argues children were these enrolled in cites MAT: the germs, et. beyond the the Follow Through program. Another interpretation of the Direct Instruction 32 test scores is that the children's comprehension performance may have been by the MAT test. reading underestimated The test might have required them to know vocabulary terms as Becker noted. The test may also have had multiple choice test items which required the children to generate inferences base which children they had improved performance, they never than the 40th percentile in have if knowledge a developed. greatly might from Since their the decoding received scores higher they answered different types of reading comprehension test questions. The Direct Instruction reading comprehension results have provided an example of how difficult it is to interpret reading disadvantaged method of children. measuring naturally comprehension The test scores need for an alternative reading comprehension stems from the evidence presented in this chapter. The primary focus of the chapter has been to build argument that norm referenced multiple choice test items. this thesis sentence presents verification comprehension. an cloze tests The second chapter of experiment technique the reading comprehension tests are inadequate when they consist of and of to which measure used the reading The sentence verification technique may provide a viable alternative to traditional means assessing reading comprehension. 34 Footnotes [1] The term "script has been used researchers to refer to the structure of memory. (cf. Shank & Abelson, i 977) by several knowledge in CHAPTERII AN EXPERIMENT INVESTIGATING THE CONSTRUCT VALIDITY OF THE SENTENCE VERIFICATION TECHNIQUE This chapter presents an the construct technique validity as a comprehension. In experiment of investigating the sentence verification method of measuring it's most general form, reading the sentence verification technique involves having subjects read or listen to a sentence or passage. presented with decision whether relevant judging sentence that they are asked about. This sentence the whether from decision true is present the to "different" in a The subjects are then a or is make a involve judging false, experiment, sentence the can to or, more it can involve "same" the or sentence the subject was exposed the earlier phase of the experiment. to Same judgments can be made on the basis of exact similarity between an original sentence and of semantic relationship) a test sentence, or on the similarity between original 35 (i.e., and a test basis paraphrase sentence. 36 Different judgments can be based on total disimilarity between original and test sentence, or on the basis altering as one to alter of or more words in an original sentence so it's meaning relative to the original a sentence sen tence Researchers have frequently used verification task to assess memory of discourse. Sachs (1967, using a 1974), sentence for example, conducted verification results of these relevant the to two use task studies of two studies to assess memory. are the The also particularly sentence verification technique for measuring reading comprehension. Sachs conducted two experiments in had make to a decision of whether changed based on their memory of the sentence. The first study (1967) memory with original a task in which passage and which subjects sentences were original passage investigated sentence subjects heard the test sentences. both the The second study (1974) replicated the listening task of the first study and extended the task by having subjects read passage and respond to test print. The sentences which were a in second study will be described in greater detail Sachs (1974) had college students either listen to 37 a passage and respond to aural test sentences or read passage and respond to written test sentences. cases, different which were based Sachs used forms on five of the test a In both sentences were used original passage sentence. test sentences: identical, semantic, passive-active, formal, and lexical. Table example of each form of test sentence. lists one 6 Having heard or read the passage, the subjects decided whether the test sentence was sentence. They could not listen to the passage or that reread changed the passage. semantic, sentences from the all passage again, The correct decision would be passive-active, were original formal, "changed" from and lexical the original sen tence Sachs also varied the amount of material between the original interpolated sentence and when the subject heard or read the test sentence. interpolated material was from 0 The range of to 80 syllables. In the condition with 0 interpolated syllables, sentence would be presented immediately subject read or heard the original sentence. were added the after test the Sentences between the original sentence and the test sentence to create the interpolated additional text sentences provided a syllables. These continuation of the 38 passage, and were consistent with the theme of the original sentence. Sachs measured the percentage of correct for decisions the different test sentences at various levels of intervening material. She found the pattern of the data was similar conditions. The for both the listening and reading accuracy decreased for correctly identifying that the test sentence was changed from the original with an increase in the amount of material. After syllables had intervened between 80 the orginal sentence and the ability detect to accuracy at the that these results words of subjects hear difficulty or these subjects and lexical test considerably reduced from syllable level. provide evidence Sachs suggested that the exact are not retained in memory when read discourse. sentence the results, sentence, formal identifying changes in overall meaning of Given 0 sentence a test that sentences were changed was the intervening Sachs "gist" of the sentence is a Subjects had few words when the remained the same. contends that meaning or retained in memory rather than the exact wording of the sentence. The Sachs sentence experiments verification illustrate the use of technique to assess memory of a a 39 sentence. first Royer Hastings, and Hook , researchers technique to reported two use to assess reading verification technique were the verification comprehension. which in was 1980) sentence a experiments ( used They sentence a measure to reading comprehension of elementary school students. The sentence verification technique used (et. and al., 1980) then respond to rereading the text. is whether by requires the subject to read a series of test sentences Royer text a without The basic decision for the subject test sentences have the same meaning or different meaning from the original text sentence. sentence verification technique will be referred the SVT. SVT sentence, are a sentence. SVT. the test (et.al., as original a paraphrase meaning change sentence, and a distractor detail 7 in lists an example of each form of the the method described be 1980) is study and the the same in section of the thesis The procedure for writing the sentences this thesis. of sentence, Table experiment. of to The four test sentences These four test sentences will greater The The SVT consists of four test sentences for each sentence in the text. the a for experiment four kinds both the Royer reported in 40 The subject's task in the Royer study was to decide whether meaning or new a sentence. meaning (et. al . 1980) , test sentence had an old a compared to the passage The subjects did not reread the passage, but made an old or new response on the basis of their memory of the text. the In Hastings, first and Hook the 6th grades read difficulty. experiment The reported by (1980), students from the 5th and three passages which varied in text students' teachers had selected text which were two grades below, on grade level, and grades above the reading level of the class. a fifth grade approximately Royer, student read a text two Therefore which was at grade three reading difficulty, grade five, and grade seven difficulty. In second the experiment reported by Royer, Hastings, and Hook (1980), fourth grade and sixth grade students read passages levels which were either below their reading grade level, on their grade level, or two grade levels above their current grade grade two level. reading This range of text difficulty meant that the above grade text for the fourth grade students the same students. text as was the on grade level for sixth grade Likewise, the below grade text for sixth 41 grade readers was the on grade text for fourth grade readers The procedure reported was by Royer (et. the al . same , for 1980). be presented together. in which the students learned how sentence the two studies practice session a respond to to Royer (et.al., responses and 1980) scores d' SVT on each passage. used the proportion of correct SVT because response considered a z a scores for dependent variables. d' measure of they are accuracy. A They criterion free a score d' may 3.0 is basis of chance. correct responses and the showed that SVT text difficutly. different grade made on Analysis of the proportion of d' scores each for passage performance declined with increasing In their comparisons of levels (experiment two), Royer (et. higher A d' very high score, indicating very little a probability that the subject's responses were the be score representing the probability that the subject is responding on the basis of chance. of the verification technique, the students read the three passages, responding to used After studies Since the SVT was used in both studies, the results of will both who al read . , 1980) students the same at text found that the grade level students received higher SVT scores 42 than the lower grade level students. This is what one would expect, given the greater experience of the older students. Therefore, the Royer (et. contributed to the al . , study 1980) evidence that the SVT was method of measuring reading comprehension in a valid two ways the SVT was sensitive to text difficulty, and 2) 1) SVT was sensitive to expected differences the student in reading ability. A (Note a recent experiment by Royer, Lynch, and Bulgarelli 2) extended the investigation of using the SVT as method of experiment assessing reading investigated comprehension. whether reading comprehension varied for readers with different matter expertise Their when they degrees read of subject passages within or outside their presumed knowledge area. Royer, Lynch, and Bulgarelli groups differing in general (Note 2) had three educational experence and psychology expertise read passages about psychology and non psychology topics. Undergraduates, previous university psychology courses were with the least expertise. the moderate the no group Upper division undergraduate students with several previous were with courses expertise in groups. psychology Psychology graduate students had the greatest amount of expertise. 43 All of the subjects read non psychology passages and psychology passages. Each subject in Royer (et. psychology six Note 2) read three , non pool of a psychology passages. assembled into test packets. were . passages and three non psychology passages. The passages were drawn from and al complete, there which were evenly were distributed psychology The six tests were When two six the test packets sets of six passages the in undergraduate, advanced undergraduate, and graduate groups. Royer, Lynch, reading and Bulgarelli comprehension (Note performance by assessed 2) measuring the subject's responses to six sentence verification tests. Each test had 12 test sentences. SVT so that it contained paraphrase sentences, distractor sentences. the test each passage original, each for sentence paraphrase, They constructed each original test 3 3 meaning sentences, change, and 3 3 By using four different forms of passage, they were able to assess with an meaning equal proportion of change, and distractor test sentences. Royer, Lynch, and Bulgarelli (Note 2) measured both the amount and three of time subjects used to read the passages variables assessing accuracy: proportion 44 correct on the SVT SVT performance and data and similar the to on d» , a the SVT and , combination of a confidence rating. Since the d' combined SVT-conf idence rating data was the proportion correct data, only the proportion correct data will be described here. The results of the study were consistent interpretation that the comprehension. Table 8 correct SVT was reports with measuring mean the reading proportion SVT scores for the three expertise groups. analysis of variance indicated that performance improved result with greater educational experience. was that significantly reading higher comprehension for upper the An scores Another scores were level undergraduates than undergraduates, even though they read the passages in the same amount of time. was sensitive which a reading to This suggests that the SVT comprehension related to the reading time for Royer (et. al . , Note 2) a differences passage. found that SVT performance improved with greater educational experience. They had initially expected SVT performance to improve more the psychology passages compared passages across the three Groups. a significant Content of passage to the non psychology They failed to X for find Group interaction. These results could mean that overall reading "ability" 45 increased accordance in with increasing educational experience, thereby producing better or they could mean that the increased with performance, knowledge required comprehend both the psychology and passages SVT the non-psychology advancing educational experience, and it was the knowledge gain increased results. The results seem passages to suggest that than the non had such wide ranging topics that students with greater general knowledge have rather general ability that produced the pattern of psychology may to education and utilized their world knowledge to comprehend the non psychology passages as well as psychology the passages There are numerous that previous text. of experiments knowledge which in several ways. that subjects who had found affects the comprehension of Previous knowledge may affect the text have comprehension Johnston (Note greater knowledge 1) of reported a topic (compared to subjects with less knowledge of the topic) had higher scores on his These reading comprehension test. subjects did not receive higher test scores when they read Subjects text also outside recall their high knowledge greater amounts of a area. previously heard text if they have more knowledge of the theme of 46 the text (Voss, Vesonder, Spilich, 1980). of recall is also similar to previous particular topic. influenced by (Anderson, Lachman, knowledge of subject's schemata Bransford Subjects 1971). knowledge of a The pattern of recalled ideas may be the 1977; The pattern certain Franks, & may also types of of the 1971; develop text. topic Dooling & previous For example, through repeated experience with stories, subjects may learn as: that stories consist of parts such beginnings, settings, reactions, outcomes, and endings. Several experiments have found subjects have better recall for particular parts of stories. the stories appear be to & Glenn , 1979) parts of based upon their previous knowledge of story "scripts" ( Mandler Stein The & Johnson, 1977; . The second alternative interpretation of the Royer, Lynch, and the groups overall Bulgarelli may reading have (Note differed ability. results suggests that 2) in "intelligence" or The Royer (et.al., Note 2) study could not distinguish between the two alternative interpretations of the results. The experiment in this thesis stems naturally the Royer, Lynch, and Bulgarelli (Note thesis experiment is essentially a 2) study. construct from The validity 47 experiment investigating whether the SVT is method of assessing reading comprehension. valid a Therefore, it extends the research from both the Royer, Lynch, and Bulgarelli (Note (1980) 2) and the Royer, Hastings, and Hook experiments. There are thesis the SVT. experiment effects areas which of investigation in the extends previous research on The first area of investigation addressed the association ability. three between SVT performance and general The current experiment is designed to of performance. ability on reading reduce comprehension The experiment uses reading comprehension measurements on the same subjects at two different time periods. reading If there are comprehension improvements performance, in it is not likely that their general ability has also improved. Royer, Lynch, and Bulgarelli (Note 2) subjects' In research, the three groups were at such different academic levels was plausible that the it subjects could differ in ability. The current experiment uses subjects with a much more narrow range of academic experience. The second extension from previous research is current experiment will utilize both SVT the and free recall performance as indices of reading comprehension. 48 Both of the previous studies used the SVT as the only method of measuring reading comprehension. The third extension is that readers in experiment which may develop relevant is thesis knowledge base or schemata a interpreting to the the psychology passages they will be reading. Previous develop research schemata has shown that subjects based upon the organization of text. Most of this research has demonstrated the story & schemas upon the recall of Johnson, 1977; experiment Stein Glenn, & designed is affects The 1979). assess to may be developing the thesis reading may develop that schemata of the structure of text from psychology journals. text of story text (Mandler a comprehension of college students during the time they may Schemata of psychology the students conduct psychology as experiments and write lab reports which conform to the general guidelines of psychology journal articles. The hypotheses experiment reported builds interest of from the in this chapter. demonstrated that the world in previous These the current research areas research areas knowledge of the reader influences their reading comprehension, and the SVT has received support as a valid technique for measuring 49 reading comprehension. Therefore, the experiment reported below addresses two basis hypotheses: should be improvement in the SVT scores text (between tests) scores early and late non psychology text. positive correlation scores. If the between results of psychology reading comprehension which is not matched by an of of there 1) improvement 2) SVT this in SVT There should be scores and a recall experiment are as predicted, the results will provide additional evidence supporting the interpretation that the SVT is method of of measuring reading comprehension. a valid METHOD Subjects and Design. The 82 subjects in this experiment were students enrolled in an upper division undergraduate psychology course entitled Methods of Inquiry in Psychology. Almost all of the students were psychology majors. The curriculum of the course required students to conduct experiments and write laboratory reports which followed a style similar to the American Psychological Association format for journal articles (cf. Publications Manual, 1974). students attended two and one-half hours lecture and another per The week of two and one-half hours per week of laboratory experiments. The reading comprehension tests were the regularly scheduled lecture early semester test session and The students received the a time. late tests in fashion, as noted in Table 9. psychology passage and a taken a during There was an test session. counterbalanced Students received both a non psychology passage at each test session. Materials. All of the materials were selected from larger pool of passages Royer, Lynch, and and SVT Bulgarelli 50 tests (Note used 2) in study. a the Each 51 subject read four passages that were in length. The concerned with topic. Two of psychology and concerned with non psychology topics. passages sentences passages differed from each other total number of words and were twelve the passages and The psychology were re-written psychological abstracts taken non psychology passages were reviews from the Non-Fiction in Brief New the Appendix There original, re-written section book of the All of the passages are reproduced in . were four types paraphrase, of SVT test Table 7. Original sentences sentence from the passage. same meaning as sentences the different preserved meaning of listed identical to sentences in a had original sentence, but the original sentence, but one or alter were Paraphrase the meaning was expressed with change sentences: meaning change, and distractor. An example of each type of test sentence is to 1960's. York Times Sunday Book Review Section appearing in the late 1960's. the were two from psychology journals published in the late The in most a the Meaning words. of the words of the few words were sentence. changed Distractor sentences were consistent with the general theme of the passage, but unrelated to any original sentence. The 52 distractor sentences were written (intuitively) the same as the original sentence to be length, in difficulty, and syntactical structure. The SVT test forms. There Each form written were had in test Form 1 assessing sentences of comprehension the four in test The forms were The first step was to write that the first six the sentences sentences. several steps. so used four forms based on each passage. test 12 were passage, test of and sentences the the were initial next six six test sentences were based on the last six passage sentences. The purpose of this ordering was to increase the amount of time intervening between appearance the of an original passage sentence and the appearance of a sentence thereby based on that original sentence, reducing the possibility that the response sentence memory. would be based on The second step to test test the the contents of short term was to randomize test the sentences within the first six sentences and the second six sentences. in which to the The result of this procedure was a test the order of test questions did not correspond the order of sentences in first test sentence fourth passage sentence. the passage. might be For example, derived from the 53 The third step was to randomly select one type of test sentence (original, paraphrase, meaning change, or distractor) appear in the test. to this procedure was that the total an equal proportion sentence i.e., ( change, and of 3 each the construction of Form the 3 types all on Form the 3 test sentence on 3rd .... Forms and 4 also Therefore, the 2, 3, 12th test sentences on all the 12 sentences on Form test question varied from one below describes were chosen for Forms 2, If the paraphrase Form types 2 4 If the first test sentence 1. test forms assessed the same sentence from the example and 3, was derived from the fourth passage sentence, 1 2nd, the meaning These three steps completed assessed the fourth passage sentence. as test 1 based on Form first 1st, of paraphras The order of test sentences on Forms 2, were on test form had to have of original, distractor). 3 The restriction first test 3, test and to another. 4 from Form of Form selected from this set for 1. 1 was a the first test sentence of was randomly selected from one of the of test sentences. The types of test sentences sentence sentence, However, the type of 1. form how passage remaining If a distractor sentence was Form 2, the first test 54 sentence on Form remaining test change. was 3 sentence sentence 12 types: selected from the original or meaning If a meaning change sentence was selected the first test sentence All randomly on Form 4 Form on the 3, in the same manner. procedure, each sentence of the assessed with each test would be an original test sentence. test sentences on each of the selected first for type test By forms were following this original passage was of test sentence across the four test forms. Another aspect of the materials which is is the type of answer sheet. important After the subject read passage, they would respond to two answer sheets. first was a sheet had was a The second twelve old or new responses and was used for the sentence verification task. response The recall answer sheet where the subject wrote his/her best recollection of the passage. answer a Next to each five point confidence scale to record the degree of confidence the subject had in each old or new response. Procedure semester . The subjects were and late in the semester. test session was during the 5th early tested week in the The early semester of the course. 55 The late semester test session was during the 12th week of the course. Subjects read the passages and responded group. to the free recall tasks and SVT tests as The testing was done in the same lecture a room, and at the same time, as their regular course lecture. After the subjects were seated, one was distributed each to test student. envelope Subjects instructed to write their mother's maiden name test envelope. This was were on done for two reasons the to 1) assure the student that performance on the reading test would not affect their course grade, and that two so 2) different passages could be conveniently distributed subjects at the late semester test session. envelopes had been previously arranged was a The test that so to there balanced distribution of passages and test forms. The test envelopes were stacked envelope had different that so passages every and SVT tests. example, the first envelope contained passages with SVT Form with SVT Form repeated the Form 2. envelope. 1, other 1 For & the second envelope had passages 2 & The 1. alternating third and fourth 4 3 envelopes passages, but used SVT test This pattern was repeated through the eighth The ninth envelope would start the sequence again with passage 1 & 4, SVT Form 1. 56 The order of materials within each envelope was: 1) directions, first passage, 3) recall answer 2) sheet, 4) SVT for the first passage, 5) second passage, 6) recall answer sheet, 7) SVT for the second passage. After the experimentor stressed subjects explained that demonstrated read a directions, how to complete the tests. passages on the could not be reread, the He and blackboard at the front of the room how to respond to the four types of SVT test sentences. After questions were answered, he instructed the the the subjects to start the test. While experimentor were subjects the watched rereading were for taking evidence passages. He did test, that the subjects not witness any rereading of the passages. Most of the subjects completed the test hour. the The subjects envelopes experimentor. and within placed all the materials back in returned Subjects returned the envelopes. session, the experimentor the left the Before removed envelopes room the to the after they second tests psychology and test the test contents from each envelope, and replaced the contents with different an two non psychology passages and 57 The late semester test essentially session. session session The two differences from the were: form". Each each conducted in the same manner as the early semester test 1) early semester subjects selected the envelopes with their mother's maiden name, and 2) test, was subject received after completing the an "experimental credit This form was given to their course instructor. subject received course credit which was added their semester average in the laboratory course. to RESULTS and DISCUSSION The data was following analysed hypotheses determine to were whether supported. The first hypothesis was that there should be improvement in psychology text SVT scores between the early and test. not the the late This improvement in psychology SVT scores should be matched by improvement on the SVT scores on the non psychology text. should be The second hypothesis was there positive correlation between SVT scores and a recall scores. Three dependent variables were sentence verification test analysed performance. variable was the proportion of correct test The second variable combined SVT according was to SVT scores converted to first responses. with The third d' scores the procedure of signal detection analysis (Swets, Tanner, Birdsall, & Propor tion Correc t correct improved scores . 1961; Banks, 1970). Mean psychology passage relative passages from the early test session session. The performance confidence estimates for each test sentence. variable assessing to to proportion non psychology the late test Mean psychology scores increased from .750 to .777 whereas non psychology means decreased 58 from .747 59 to Table 10 indicates mean and difference scores .726. for both individual passages and the two test sessions. type of passage Panel A of Figure proportion correct SVT scores by 1 passage for shows the content at early and late administrations. Performance compared (.763) was to higher non on psychology psychology passages passages (.737). Performance varied with the type of test sentence, with the highest mean performance on distractor (.857) and original test sentences (.839). performance was .693, Meaning mean followed by mean performance on paraphrase test sentences (.610). 11, change noted As Table in performance improved from the early test session the late test session on distractor and meaning to change test sentences, and performance declined for paraphrase test sentences. for The mean peformance remained stable original test sentences across the two test times. Performance on the sentence types interacted with other factors in the experiment, such as the pair of passages and the order of the test set. data for The mean performance these interactions is shown in Tables 12 and 13. A hierarchical variance was repeated utilized to measure test for analysis of the Content type 60 (psychology or non psychology passage) session interaction. factors: The analysis used (Content 2 administration) X type) X (Pairs 2 of or late test sessions) of tests) X Question type is tests was a (Question 4 nested within of psychology subject X test and at on passage, which particular a type non either (Groups 2 test the following consisted of the order subjects received pair of (Time 2 psychology tests given to the same early Time X the SVT) and paired are nested within time of administration. Group between subject factor. The Content interaction level, (F type was marginally 78)=3.928, (1, p= .05 = 3 .976) Time X With . of significant while which at the p=.05 critical the The at sources of were less relevant to the experimental hypothesis, but which were were Content of passage, follows: as F this analysis of variance design, there were many sources of variance. variance administration 1) statistically significant 2) Question type, 3) Question type X Time of test, 4) Question type X Pairs of passage, Question type Content X X Content Pairs. 5) X Question type Group, 7) X Group, Question Type 6) X The F value of each effect and the appropriate level of significance are reported with the 61 complete analysis of variance table listed in Table Combination subjects rated response on that Variable the a Following . their degree point scale. 5 subject was each of (A mark of "5" "very of the response.) assessing SVT confidence sure" received a incorrect and "1" for a response that in indicated of their old/new were "not at all The second dependent variable performance rating response, confidence response, while "1" indicated they sure" SVT 14. was the product the SVT performance. correct response and to SVT a test possible range of scores for the of this (Subjects "-1" for an a sentence.) combination The variable was from -5 to +5 with no zero point. shown As increased in Table session. Mean session session. combined to a a mean Panel B of Figure mean of 2.20 a declined mean of 2.19 at of 1 variable mean of 2.46 at the late a performance psychology passages, from test , psychology passages from for at the early test session to test the 15, 2.03 at non for the early the late test displays the pattern of the combined variable data. The mean combined passages (2.33) was variable greater score than for psychology the mean combined 62 variable score Mean for non psychology (2.11). performance was highest on orginal test sentences and distractor test sentences (3.10) change test sentences had (3.02). Meaning mean score of 1.76, while a paraphrase test sentences had The passages mean a score of 1.00. mean performance on the sentence types varied with the time of test session. scores increased sessions for Table between meaning 16 the shows that mean and late test early change and distractor test sentences, while mean scores declined for original and paraphrase test between sentence sentences. type Additional interactions and other factors are shown in Tables 17 and 18. The combination variable same hierarchical variable. test analysis as analysed using session was the the proportion correct The interaction of Content type Time X of not significant at the p<.05 level. Several other effects were even was statistically significant, though they are less relevant to the experimental hypotheses. The following statistically Question type Question Type sources significant: X Time, X 3) 1) Question of variance Question type X were type, 2) Group, 4) Pairs of passages, 5) Question type Content X Pairs of passages. The F values and X levels 63 of statistical significance are listed complete source of variance table in Table with the 19. Table 20 shows the mean confidence ratings per test sentence for correct or incorrect SVT responses within psychology and non psychology tests at the two test times. Confidence ratings increased from early to late test sessions for both psychology and non psychology passages. However, for incorrect sentences confidence ratings increased more sentences than they did for correct on both psychology and non psychology tests. There is no obvious explanation for why subjects become more confident incorrect compared in responses which were scored their to would confidence responses in which were scored correct. d* scores passages . The and d scores improved for non psychology semester test to the reports mean the late semester passages within each content sessions. Panel pattern of the d' C of test. summed area Figure 21 the two across for 1 Table the presents two a test graphic scores. The analysis of analysis of psychology passages from the early scores d' both d' scores required a different variance design from the design used with 64 the proportion correct and combination variables. scores d' are responses to The based upon the distribution of correct the sentences, whereas the proportion correct variable and the combination SVT test sentences and confidence rating variable are based upon each test sentence from the could not have of score 1 SVT. Analysis by sentence type factor. a per SVT for the d» d' The restriction variable required an ANOVA design with fewer factors than the used with the proportion variable was analysed with 2 (Content of passage) correct a 2 significant at ANOVA design variable. The d' (Time of test session) X analysis interaction of Content type X by necessity Time the p<.05 level. of variance. The of test not was The complete sources of variance table is listed in Table 22. Free the Recall . The subject's assessing free derived for each proportion of correct idea units in protocal recall. passage. was dependent the Idea The variable units were intuitively idea units for one passage are listed in Table 23. The mean proportion of correct idea units stable for non psychology passages for the psychology passages from .36 ( .33) remained but decreased at the early test 65 to .32 at the late test session. mean recall scores for psychology Table 24 reports the and non psychology passages at the early and late test sessions. The recall variable ANOVA design as the significant Time ( 1 ,81 ) =4 analysed variable. • Content of X .46 ,p< .05 d was with the same This analysis found passage interaction, a F No other effects were significant. . The complete analysis of variance table is listed in Table 25. The recall protocals were the proportion experimen tor idea unit 112 free . of correct In order scoring, recall initially to idea check the analysed units by for the reliability of an independent evaluator rescored passages. The overall correlation between the two scorers was r=.84 (p<.001). The second major hypothesis of interest predicted positive correlation recall performance. correct SVT, d ? a between SVT performance and free Correlations scores and between proportion recall scores indicated significant relationships (p<.01) for both combinations Table 26 reports these correlations. correlation betwen SVT proportion recall scores was r=.37, (p<.001). correct The overall scores and This correlation is based upon 328 observations (82 subjects responding to 66 passages) . Conditional Probability passages prior to The subjects free recalled the . responding probabilities were calculated relationship and between the SVT to Conditional . assess the to degree of recalling particular information the assessment of that information on the SVT. order calculate to the conditional probability responding to the SVT given recall of idea tests of analysis. were randomly the selected for from test session and 32 subjects from the later test session. there subjects of units, This selection allowed for 32 subjects early the 64 In was Within the early or late an passage. equal Pooling proportion across the of test tests for each 8 test two session, times, the analysis of conditional probabilities was based upon 16 subjects per passage. In most sentence was cases the greater number than of one. idea A units in a subject did not receive credit for recalling an entire sentence if they recalled less that 50% of the idea units in If they recalled 50% or more idea a units sentence. in that sentence, they were credited with correct recall of the sen tence 67 Table 27 lists the conditional probabilities of SVT performance by content type given previous recall performance. As can be seen, the liklihood for correct performance on the SVT is greater if the subject had correctly recalled at least 50% of the idea units in The pattern of the conditional probability data is the sentence. reasonable. score The overall may (.757) conditional be had correctly SVT. to data. in The examining the probability the SVT task once the of subject recalled at least 50% of the idea units was larger (.81) the considered probability correctly responding mean proportion correct SVT than the overall mean However, performance on if the subjects did not recall at least 50% of the idea units of the sentence, they had probability of .72 that they would respond correctly the SVT text sentence. lower than mean performance on the SVT. This relationship is reasonable because it is what would to to This conditional probability is overall the a one expect if subjects either comprehended or failed comprehend comprehend passages. the the Those subjects who text well should receive higher scores on both the recall task and the SVT compared to overall mean on performance the SVT. Following the same 68 reasoning, those subjects who text as well do not comprehend the should receive lower scores on both the recall task and the SVT test compared to overall mean performance on the SVT. The conditional probabilities based idea upon units criteria in a reported here were less than perfect recall (i.e., 50% of the in a sentence). If a more stringent had been adopted (i.e., 75% of the idea units sentence), the patten reported above may have been quite different. FINAL DISCUSSION This section is organized into two general areas of discussion corresponding hypotheses. The first section considers performance on the to two experimental the SVT tests. The second section discusses the recall data correlations and the between recall and SVT hypothesis was per formance 15! P attern there 91 SVT scores should The . first be improvement in the SVT scores for the psychology passages between the sessions which scores for the dependent the non not clearly direction, dependent of SVT variable 15) non combined only test Mean All three the proportion neared significance. The variables the present d 1 assessing a pattern which scores (Table 21) and SVT and confidence rating scores (Table showed greater differences between psychology late indicated interaction in performance supports the hypothesis. mean passages. but mean scores of the other two accuracy and matched by change in the SVT psychology variables predicted correct was early psychology and passage comprehension at the late test 69 70 session compared to the early test session. scores d« and the mean The mean combined SVT and confidence rating scores improved more for the psychology passages than the non psychology passages. the greater compared differences to variables. the early at direction the test Although the SVT Figure late session illustrates 1 test for performance session all three was in the predicted by the hypothesis, the interaction was only marginally significant when proportion correct was the dependent variable. Content type d' variable variable. Time of test interaction for either X or The significant There was no significant the the combined SVT and confidence rating failure find to statistically a interaction may be due to several factors. One of the most plausible will be discussed below. One of the issues raised earlier was that reading comprehension schemata of text organization (cf. 1977; Stein & Glenn, in Mandler organization of Johnson, & students Methods class may very well have acquired knowledge chapter is aided by reader's The 1979). this psychology a in the schematic articles. The laboratory experiences and written assignments required them to learn the format of major sections of psychology articles: introduction, method, results, and 71 discussion. The APA format was repeatedly stressed throughout the course. However, this experiment format. the the psychology passages which were used in did not strictly conform to this APA The passages did not have headings different indicating research journal sections. the passages necessarily follow the APA Neither did sequence of introduction, method, results, and discussion sections. It is conceivable that the subject's performance have improved if would there were schematic markers in the passage corresponding to the APA format. Another related factor concerns the type and amount of "psychology knowledge" The psychology passages represent research acquired areas which course curriculum or laboratory the " two psychology idenf if ication of experiment this in by the subjects. were born students" and abilities", their Method's course dealt "teaching young children utilizing correlational analysis" and experiences Because with the experiment psychology passage bilingual "operant with an the superior as conditioning", "transfer of learning tasks", project". Whereas concerned children not the Methods in experiences. passages first did a "project "independent did not match lab content, the 72 students may not have gained enough functional knowledge from their course which was relevant to the experimental test. Free Recall and SVT performance of this experiment correlation between Consistent positive was . The second there recall should scores correlations be and were hypothesis a positive SVT scores. found between recall performance and both proportion correct scores . Free recall of performance measuring the is a subject's Kintsch and his colleagues (Kintsch, van d' for both psychology and non psychology passages (Table 26) method and Dijk, Vipond, 1978; 1980) commonly accepted memory of text. Kintsch 1979; utilize free recall protocals in assessing reading comprehension. evidence supporting relationship the & Further between free recall and reading comprehension was found by Bransford and Johnson They (1972). found reader's subjective estimates of their level of reading comprehnsion reading a passage corresponded to their level of recall performance. recall after The positive scores the validity of and correlations between free SVT scores contributes evidence to the reading comprehension. SVT as a method of measuring 73 There was Content Mean of passage recall passages between for There is pattern the but the no of Time interaction performance test session, stable significant a results. with declined mean session X recall scores. for recall psychology clearly test psychology early test session and the late the non of evident scores passages (Table 24). explanation However, remained for this one explanation which might account for the pattern is subjects may have been less motivated to write complete psychology protocals at the late test session than they were test session. motivated to session for complete the task reasons. two was a at First, novel they test might experience. be And some students may have believed performance on session the experiment may would the experiment. not By the late have intriguing and students may have decided grades early early the the tasks would affect their course grade. test the The students may have been more highly motivated if the task second, at been their less course be affected by their performance in CONCLUDING REMARKS This chapter of the thesis has which contributes to the conclusion verification technique is reading presented comprehension. a evidence that the sentence valid method of measuring The experiment presented here is important because it complements previous research which supported the construct validity of the SVT. At this date, experimental research demonstrated that the SVT is sensitive level of (Royer, text and Hastings, increased Hook, & to reading 1980), 2) 1) has difficulty skill ability differences in reading comprehension of readers with different degrees of subject matter expertise when they read passages within or outside their presumed knowledge area (Royer, Lynch, & Bulgarelli (Note 2). The experiment in this chapter has demonstrated two more points which suggest the reading SVT is comprehension: reading a valid 1) The SVT is sensitive comprehension method matched ability. This with 2) measuring to changes in subjects who increase their of knowledge that applies to text in matter. of a particular subject increase in knowledge is presumably not an increase Responses to 7^ in the general SVT intellectual are positively 75 correlated with free recall of the studies same text. These are building an increasingly stronger argument that the SVT is comprehension a valid method of measuring reading TABLES AND FIGURES i 76 76 TABLE M A p™p tS EITHER St WITH : 1 COMPREHENSION TEST PERFORMANCE PASSAGES OR WITHOUT PASSAGES a With Passages Name of Test Without Passage Chance b Nelson Reading 45. 96 29. 36 18. 75 California Achievement Test 26. 66 14. 36 10. 10 SRA Achievement Test 37. 17 22. 17 15. 00 29. 54 22. 27 11. 25 Metropolitan Achievement Test (Intermediate) 28. 82 20. 27 11. 25 Iowa Test Basic Skills 27. 05 19. 29 11. 50 Metropolitan Achievement Test (Elementary) 'Cited in Tuiman 1 (1973-1974). 'Mean test score estimated on the basis of chance. Chance is defined as n/4 where n is the number of test items. This test contained a few * 5 choice test items. c 77 TABLE 2 STRUCTURAL VARIABLES USED BY DUNN fET AL " TO PREDICT TEST PERFORMANCE Passage Components Percent Content Words Unique Information Percent Content-Function Words Average Sentence Length Stem Components Percent Percent Percent Percent Content Words New Content Words Non-Dale-Chall Words Content Function Words Correct Choice Components Percent Content Words Percent New Content Words Percent Non-Dale-Chall Words External Information Incorrect Choice Components Percent Content Words Percent New Content Words Percent Non-Dale-Chall Words Plausibility 78 TABLE 3 THE PREDICTION OF READING OR APTITUDE TEST PERFORMANCE FROM EARLIER TESTS a Variable Being Predicted 7th Grade Otis Added Predictor Gr. Gr. Gr. Gr. Gr. Gr. Gr. 1: 2: 3: 4: 5: 6: 7: California Mental Maturity 8th Grade Reading Increase . 375 . 624 . R Increase . 334 .251 . 522 . 188 700 .076 . 616 . 094 . 784 . 084 . 708 . 092 Otis Beta . 823 .039 . 726 . 018 Stanford Reading . 843 . 020 . 778 . 052 . 808 . 030 Metropolitan Reading Otis Alpha Metropolitan Reading Otis Beta Cited in Thorndike (1973-1974) 79 TABLE 4 TEST PERFORMANCE AS A FUNCTION OF TYPE OF TEST QUESTION AND DELETED TEXT Type of Test Question Type of Text Complete Cloze a Multiple Choice b 26. 33 25.17 30% reduced 5.98 21.90 50% reduced 6.64 19. 33 Tuiman, Blanton, Gray (1975) b- Tuiman, Gray (1972) * 80 TABLE 5 WIDE RANGE READING TEST AND METROPOLITAN READING FROM THE DIRECT INSTRUCTION PROJECT TEST SCORES Grade Level Pre K Wide Range Test Metropolitan Test Test scores are percentiles. 18 Post 83 40 3 81 TABLE 6 SAMPLE OF TEST SENTENCES USED BY SACHS a BaSG: Immoral Semantic: Lexical: fatheTS consider ed owning slaves to b The founding fathers didn't consider owningg slaves to be immoral. Passive/Active: Formal: 1112 Owning slaves was considered to be immoral by the founding fathers. The founding fathers considered owning slaves immoral. The founding fathers thought owning slaves to be immoral. Cited in Sachs (1974) 82 TABLE 7 SAMPLE OF TEST SENTENCES USED IN THE SVT Original: Then suddenly, one windy, cold day, the bright § leaves tumble to the ground in a goldenshower Paraphrase Then abruptly, on some gusty, brisk day leaves fall from the trees like a colorful : the rain.' Meaning Change: Then suddenly, one windy, cold day, the dead branches tumble to the ground in a dangerous shower Distractor: Jerry collects the brightest, mo st colorful leaves for his mother to use in her Fall decorations 83 TABLE 8 MEAN PROPORTION CORRECT SVT SCORES BY CONTENT TYPE AND EXPERTISE OF SUBJECTS 3 Group Non-Psychology Content Type Maj or Undergrads Psychology Non-Psychology Cited in Royer, Lynch Advanced Psychology Undergrads Psychology Graduate Students .78 .81 .88 75 .80 .85 . $ Bulgarelli (Note 2) 84 TABLE 9 PAIRS OF TESTS ADMINISTERED THROUGHOUT THE SEMESTER Time of Test Session Group Early A B Late 1 § 4 2 § 3 2 § 3 1 $ 4 re£lec Psychology tests, while tests J reflect non-psychology tests. 2 4 ! 3 § 85 TABLE 10 MEAN PROPORTION CORRECT SVT SCORES BY PASSAGE A(jt AND TIME OF ADMINISTRATION ™^ Time of Administration Passage Type Early Non-Psychology Note Cell means are based upon 41 subjects Late Fl 8: A ) Mean SVT Proportion Correct, (B) Combined SVT and Confidence Rating, and (C) d' Scores by Content Type and Time of Test Session „ 1 - ( A. Mean SVT Proportion Correct Scores 79 Psychology 77 75 73 Non-Psychology Early Late Time of Test Session B. Mean Combined SVT and Confidence Rating 2.5 Psychology 2.4 2.3 2.2 2.1 ¥ 2.0 Non-Psychology Early Late Time of Test Session C. Mean d' Scores 1.8 1.7 Psychology 1.6 Non-Psychology 1.5 1.4 Early Late Time of Test Session 88 TABLE 11 MEAN PROPORTION CORRECT SVT SCORES BY QUESTION TYPE AND TIME OF ADMINISTRATION Time of Administration Question Type Early Late 839 .840 Paraphrase .650 .571 Meaning Change .663 Original . Distractor . Note. 843 All cells are based upon 164 responses. . 724 .873 89 TABLE 12 MEAN PROPORTION CORRECT SVT SCORES BY QUESTION TYPE CONTENT OF PASSAGE AND GROUP Group l a ' Group 2 b Psychology Original Paraphrase Meaning Change Dis tractor 847 701 734 846 .844 .579 680 .874 .809 .680 .651 863 .855 .479 709 .847 . . . . . Non-Psychology Original Paraphrase Meaning Change Dis tractor . . Group 1 received passages 1 § 4 in the early test session and passages 2 § 3 in the late test session. i Group 2 received passages 2 § 3 in the early test session and passages 1 5 4 in the late test session. Note. All cells are based upon 82 responses. TABLE 13 MEAN PROPORTION CORRECT SVT SCORES BY QUESTION TYPE CONTENT OF PASSAGE, AND PASSAGE PAIR Pair 1 Pair Psychology Original Paraphrase Meaning Change Distractor 840 596 684 853 .851 .687 .730 867 . Non- Psychology Original Paraphrase Meaning Change Distractor Passages 1 § 4. Passages 2 § 3. Note. 826 517 746 859 All cells are based upon 82 responses 838 .643 .613 .850 . ' 2 91 TABLE 14 ANALYSIS OF VARIANCE TABLE OF PROPORTION CORRECT SVT PERFORMANCE Mean Square Time (T) Subjects (S) Group (G) S:G Pairs (P) Error a Contents 79 1 78 1 78 (C) 1 C x T C x S C x G 1 79 1 CS:G Error b Question Type Q x T Q x S Q x G QS:G Error 78 78 (Q) 3 3 237 3 c Q x C Q x C x T Q x C x S Q x C x G 234 3 3 237 3 117 1.939 . 3. 720 1. 153 .26450 .20910 .06460 00226 .06275 05323 4.969 3.928 1.214 4.44272 26046 08376 .55146 .07776 .29332 .05974 74. 368 .05607 06820 05947 .17203 05803 19810 05885 .953 . .05 .042 1.179 . . 3 . . . 234 Q x P .00957 15898 30505 15712 .09453 .08201 1 . . QCS:G 234 Q x C x P Error d 3 . 234 . . 360 402 231 1.302 .001 4. 1. 9. .001 4.910 .01 1.159 1.011 2. 923 .986 3. 366 .01 .05 .05 92 TABLE 15 MEAN COMBINED SVT AND CONFIDENCE RATING BY CONTENT TYPE AND TIME OF ADMINISTRATION Time of Administration Note_. Content Type Early Late Psychology 2. 20 2. 46 Non-Psychology 2. 19 2.03 All cells are based on 82 subjects. TABLE 16 MEAN COMBINED SVT AND CONFIDENCE RATING BY QUESTION U lim TYPE AND TIME OF ADMINISTRATION ^ ^ Time of Administration Question Type Paraphrase Note_. All cells are based upon 164 responses. TABLE 17 MEAN COMBINED SVT AND CONFIDENCE RATING BY QUESTION TYPE CONTENT OF PASSAGE, AND GROUP Content Type Psychology Original Paraphrase Meaning Change Distractor 3.18 1.74 2.10 3.02 3.01 2.97 3.23 . 75 1.69 3.18 Non-Psychology Original Paraphrase Meaning Change Distractor 1 .64 1.46 2.99 - .13 1.83 2.89 Group 1 received passages 1 § 4 in the early test session and passages 2 § 3 in the late test session. Group 2 received passages 2 § 3 in the early test session and passages 1 § 4 in the late test session. Note. All cells are based upon 82 responses. 95 TABLE 18 MEAN COMBINED SVT AND CONFIDENCE RATING BY QUESTION TYPE CONTENT OF PASSAGE, AND PASSAGE PAIR Content Type Pair 1 Pair Psychology Original Paraphrase Meaning Change Distractor 3.04 2.93 3.15 1.60 2.07 3.26 3.06 3. 14 .14 2.29 1.37 1.00 3. 14 2 .90 1. 72 Non- Psychology Original Paraphrase Meaning Change Distractor Passages 1 § 4 Passages 2 $ 3 Note_. All cells are based upon 82 responses . 74 2 ' 96 TABLE 19 ANALYSIS OF VARIANCE TABLE OF COMBINED SVT AND CONFIDENCE RATING Mean Square Time (T) Subjects (S) Group (G) S:G Pairs (P) Error a Contents 1 79 78 1 78 1 79 Q x T Q x S Q x G QS:G Q x C Q x C x T Q x C x S Q x C x G . 6.43 3. 43 1 . . . . 75 62 46 01 47 330. 60247 82. 27 3 25.91850 6. 50292 39. 28163 6. 08268 29. 05354 4. 01849 6.45 1.63 237 3 3 c .53 6 59 19. 03 3 234 Q x P Error 78 78 (Q) 24496 .12403 4. 29779 9. 16050 4. 1 CS:G Error b . 16. 02498 15. 63797 1 C x T C x S C x G Question Type . 1 (C) .98235 12 17908 35 15889 11. 88447 6.33939 1. 18480 234 3 3 237 3 QCS:G 234 Q x C x P Error d 3 234 3.42911 4.39724 4.14361 9. 88652 4.06998 17.38122 4. 08322 .001 .005 9. 78 1. 51 .001 7.23 .001 .84 1.08 1.01 2.42 1.00 4.26 .01 97 TABLE 2 0 MEAN CONFIDENCE RATINGS PER SVT TEST SENTENCE AND INCORRECT SVT RESPONSES BY CONTENT FOR CORRECT TYPE AND TIME OF TEST SESSION Time of Test Sessi on Content Type Early- Late 4.23 3.77 4.29 3.95 Psychology Non-Psychology Correct Incorrect TABLE 21 MEAN D PRIME SCORES BY CONTENT TYPE AND TIME OF ADMINISTRATION Time of Administration Note. Content Type Early Late Psychology 1.56 1.71 Non-Psychology 1.56 1.69 All cells are based on 82 subjects. TABLE 2 2 ANALYSIS OF VARIANCE TABLE d' SCORES ON SVT TESTS Source Time (T) Content (C) Subject X T Subject X C Time X Content Subject X T X C df Mean Squar 1 1. 76 1 81 81 1.01 1.57 1.42 1 .44 81 1.07 100 IDEA UNITS FOR PSYCHOLOGY TEST #2 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. Recently, an experimental method was developed for teaching young children bilingual abilities they were instructed in French (exclusively! they used English at home. There were two monolingual control groups each was instructed in their maternal language the control groups/experimental were matched by environment and socio-economics also matched for IQ Instruction was for 2 years They were tested for communication skills Controls were tested in their maternal language Experimentals were tested in both French and English One experiment measured abilities as decoders of novel information the other experiment measured encoding Experimentals did as well as controls in both experiments The apparent result of the experiment was that young children instructed exclusively in a foreign language could apply abilities developed mainly through teacher pupil interaction (apply to) non-academic, peer to peer communications there was no decrement of maternal language performance This contradicts the idea that learning a second language will handicap the first The handicap may become evident over time as increasing complexity of the language requires more complex communication skills. TABLE 24 MEAN RECALL SCORES BY CONTENT TYPE AND TIME OF TEST SESSION Time of Test Session Type of Passage Early Late Psychology 36 32 Non-Psychology 33 33 102 TABLE 2 5 ANALYSIS OF VARIANCE TABLE OF RECALL SCORES Source Time (T) Content (C) Subject X T Subject X C Time X Content Subject X T X C df 1 1 81 81 1 81 Mean Square 228 890 36. 890 128. 841 95. 606 487. 805 109. 447 . 1. 78 .39 4.46 n. s n . s 05 103 TABLE 26 CORRELATIONS BETWEEN SVT AND RECALL PERFORMANCE BY ADMINISTRATION AND CONTENT TYPE FOR D PRIME TIME OF AND PROPORTION CORRECT SCORES Time of Administration Earl Y Variable Psych. 12 32 13 14 (n s . Late Non-Psych. Psych. .49 .43* .46* 56 .38* .33* . *p<. 01 a Free recall and Proportion Correct Free recall and D Prime Scores Note . All cells are based on 82 subjects. Non-Psych 104 TABLE 2 7 CONDITIONAL PROBABILITIES OF SVT PERFORMANCE BY CONTENT TYPE GIVEN RECALL PERFORMANCE Content Type a b Correct/Correct Correct/ Incorrect b The probability of answering correctly on the SVT having correctly recalled at least 50* of the idea units for that sentence. The probability of answering correctly on the SVT having failed to recall at least S0% of the idea units for that sentence Note_l. The mean proportion correct performance on psychol ogy passages was .768. Mean proportion correct performance on non-psychology passages was .744. The mean proportion of correctly recalled idea units on psychology passages was .34. The mean proportion of correctly recalled idea units on non psychology passages was .33. Note__2. Each of the cells in the above table is based upon 32 subjects. BIBLIOGRAPHY 105 BIBLIOGRAPHY Anderson R.C., Schema-directed processes in language comprehension. Center for the study of rea ding. Technical Wl * report 50, Un iversity ot Illinois, 1977. Banks W.P Signal detection theory and human memory Psychological Bulletin 1970 74_ (2), 81-99. . , Becker, W Teaching reading and language to the disadvantaged: what we've learned from field research Harvard Educational Review 47_, 4, November 1977 518^543. . , Becker W.C., Carnine, D.W., Direct instruction: A behavior theory model for comprehensive educational intervention with the disadvantaged. Paper presented at the Eighth Symposium on Behavior Modification, Caracas, Venezuela February 1978. Bransford, J.D., Franks, J.J., The abstraction of linguistic ideas. Cognitive Psychology 1971, 2, 331-350. , Bransford, J.D., Johnson, M.K., Contextual prerequisites for understanding: Some investigations of comprehension and recall. Journal of Verbal Lear ning and Verbal Behavior 1972, 11_, 717-726. , Carroll, J.B., Defining language comprehension: some speculations. In J.B. Carroll and R.O. Freedle (Eds.), Language Comprehension and the acquisition of knowledge Washington, D.C Winston § Sons, 1972. . : Dooling, D.J., Lachman, R. Effects of comprehension on retention of prose. Journal of Experimental Psychology, su~ 1971 88 216-222 , . , , Drum, P. A., Calfee, R.C., Cook, L.K., The effects of surface structure variables on performance in reading comprehension tests. Reading Research Ouarterly, L 1981, 16 486-514. (4), 4 — Elias, M.F., Elias, P.K., Elias, J.W., Basic Processes in Adult Developmental Psychology St. Louis: C.V. Mosby, 1977. . Guice, B.M., The use of the cloze procedure for improving reading comprehension of college students. Journal of Reading Behavior, 1969, 1 (3), 81-92. 105 , 106 HarootunianB. Intellectual abilities and reading achieveThe E1 ^enta ry School Journal 1966, 66, 386 , Holmes J A Singer, H. The substrata-factor theory Substrata factor differences underlying reading ability in known groups at the high school levfl. U.^O Cooperative Research Project No. 538, S.A.E. 8176, 1961? Kagan J., Lang C Psychology and Education: An I ntroduction. New York: Harcourt Brace Jovanovich, 1978. ^ . Kints ch^W ^On^modeling^comprehension. . Kintsch, W van Dijk, T. sion and production. 363-394. , Educational Psychol - Toward a model of test comprehenPsychology — Review, 1978 85 — ' — Mandler, J.M., Johnson, N.S., Remembrance of things parsedStory structure and recall. Cognitive Psychology , 1977, Pearson, R.D., Johnson, D.D. Teaching Reading Comp rehension. New York: Holt, Rmehart $ Winston, 1978. , Popham, W.J., Criterian-Ref erenced Measurement Cliffs, N.J.: Prentice-Hall, 1978. . Englewood Publication Manual of the American Psychological Associatio n C2na Edition) Washington, D.C.: American Psychological Association, 1974. . Royer, J.M., Cunningham, D.J. On the theory and measurement of reading comprehension. Contemporary Ed ucational Psychology 1981, 6, 187-216": , , Royer, J. M. Hastings, N. A sentence verifica$ Hook, C. tion technique for measuring reading comprehension. Journal of Reading Behavior 1979, n, 355-363. , , , , Sachs, J.S., Memory in reading and listening to discourse. Memory and Cognition 1974, 2, 95-100. , Sachs, J.S., Recognition memory for syntactic and semantic aspects of connected discourse. Perception and Psycho physics 1967, 2, 437-442. , 107 Sassenrath, J M., Alpha factor analysis of reading measures at the elementary, secondary, and college levels Journal of Reading Behavior. 1972-1973, 5 (4), 304-316. Shank, R. Abelson, R. Scripts, Plans, Goals, an d Understandmgs. Hillsda le, N.J.: Harlbaum, 1977. , Singer, H. Subs trata- factor patterns accompanying ment of power of reading, elementary through developcollege levelThe Philos ophical and Sociologica l Bases of * ea in Efourteenth yearbook o t the National Readin j g s Conference 1964. , , Singer, H. Substrata- factor reorganization accompanying development speed and power of reading at the elementary school level. U.S.O.E., Cooperative Research Project No. 2001, 1965. m , Stem, N.S., Glenn, C.G., An analysis of story comprehension in elementary school children. In R. Freedle (Ed.), New directions in discourse p rocessing. Hillsdale N.J. Ablex, 1979. : Swets, J. A., Tanner, W.P., Birdsall, T.G., Decision processes in perception. Psycholog ical Review, 1961, 68 301-340. — Thorndike, R.L., Reading as reasoning. Quarterly 1973-1974, 9, 135-147. Reading Research , Tuiman, J.J., Determining the passage dependency of comprehension questions in five major tests. Reading Research Quarterly 1973-1974, 9 (2), 206-223. , Tuiman, J.J., Blanton, W. Gray, G. A note on cloze as a measure of comprehension. Journal of Psychology, 90 ' (20), 159-162. , , ^ — Tuiman, J.J., Gray, G., The effect of reducing the redundancy of written messages by deletion of function words. Journal of Psychology 1972 8_2 299-306. , , , Vipond, D. Micro- and macroprocesses in text comprehension. Journal o f Verbal Learning and Verbal Behavior, 1980, 19, 276-296. , Voss, J.F., Vesonder, G.T., Spilich, G.J., Text generation and recall by high knowledge and low knowledge individuals. Journal of Verbal Learning and Verbal Behavior 1980, 19, 651-667. , 108 Weaver, W.W. Kingston, A.J., A factor analysis of the cloze procedure and other measures of reading and language Journal of Communication 1963, 13 (Dec), 252-261* , . REFERENCE NOTES 1. Johnston, P. Question type and the assessment of reading comprehension. Paper presented at the annual meeting of the American Educational Research Association New York, March 1982. 2. Royer, J.M., Lynch, D. J. Bulgarelli, C. Using the sentence verification technique to assess the comprehension of technical text. Paper presented at the annual meeting of the American Educational Research Association. New York, March 1982. , APPENDIX 109 110 Psychology Passage 1 Between 1956 and 1965, teachers and counselors in 90 schools used a list of 14 behavioral criteria to select 1,503 ninth-grade students to participate in a special counseling program for superior students. Students who were selected generally ranked in the top 5% of their class and above the 95th percentile on standard measures of academic performance. The selected students' birth orders were compared with census figures and with chance expectancies based on the number of children in their families. Significant over-representations of firstborns were found for 9 of the 10 years and for every family size. Because of this consistency over a ten-year period, variability due to cultural change could be practically disregarded. Furthermore, the selected students represented only about 1 out of 5 possible students who ranked in the top 5% of their class. Although the study did not assess the possibility, it seems quite likely that firstborns would not be significantly over-represented in the entire top 5% of their class. The excess of firstborns among the selected students may have reflected their teachers' judgments of their academic performance in ways other than conventional measures of academic performance or ranking. It is known that a strong relationship exists between being firstborn and having high levels or drive states on several traits, such as achievement motivation, seriousness, and adult orientation. Behavioral differences such as these between firstborns and other offspring could have been instrumental in the teachers' selection process. Thus, behavioral differences specific to firstborns, rather than other factors such as superior intelligence, may have accounted for the significant overrepresentation of firstborns among the students selected for participation in the laboratory. These factors may also have accounted for the striking similarities which were found between the over-representation of firstborns in the selected population reported in this study, and the overrepresentation of firstborns in populations of eminent persons reported in previous studies. Ill Psychology Passage 2 Recently, an experimental method was developed for teaching young children bilingual abilities by instructing them exclusively in a second language (French), while having them use their native language (English) at home and outside the school. The children in this experimental group were compared with two monolingual control groups who were instructed only in French or English depending on which was their maternal language. The experimental group and the control groups were matched by socio-economic, environmental, and IQ criteria to avoid confounding factors. After two years of instruction for all groups, the groups were tested for communication skills. The monolinguals of course, were tested only in their maternal language, while the expermental bilinguals were tested in both French and English. One experiment examined their abilities as decoders of novel information. The other experiment tested their proficiency of encoding. In both instances, the experimental bilinguals were found to be as capable as the matched monolingual control groups. The apparent result of the experiment was that young children instructed exclusively in a foreign language could apply abilities developed mainly through teacher-pupil interaction to non- academic peer-to-peer communications. This occurred with no decrement in maternal language perfrmance. This evidence contradicts the notion that a bilingual' s progress in one language will be balanced or offset by a handicap in the other. Of course, the handicap may become more evident over time as increasing complexity of the languages requires higher levels of mastery in communication skills. , , 112 Non Psychology Passage 3 Edmund Halley (apparently pronounced "Haw-lee") was one of the greatest of the seventeenth century astronomers. According to his biographer, Mr. Ronan (a British science writer and editor), the name Halley is familiar, of course because of the comet named after him. The comet, signed as far back as 239 B.C., was most recently seen in 1910. However, most people do not realize that in addition to his work on the comet, it was Halley who first made use of Newton's mechanical equations - -publ ished in 1687--to predict that the comet would return at Christmastime, 1758. And return it did- -it was first sighted on Christmas Day of that year--seventeen years after Halley' s death at eighty-six. Halley was a polyhis tor- - at home in all the sciences, arts, and letters of his time. He made several daring sea voyages of exploration. He also plotted the earth's winds and magnetic fields in North and South America. He was a man of great charm and tact, and was, perhaps, the only person who was in a position to persuade the neurotic Newton to publish his greatest work, the "Principia." In fact, Halley supervised the printing of the book, read print, and even paid the printer out of his own pocket. Mr. Ronan s biography of Halley makes delightful reading. From it one gets a sense of how science functioned in one of its greatest epochs. ' 113 Non Psychology Passage 4 Mrs. Elizabeth A Memoir was written by Elizabeth Anderson with help from Gerald R. Kelley. Mrs. Anderson now eighty-four, was Sherwood Anderson's third wife. She met him in New York (where she was managing the Doubleday Doran bookstore) and lived with him in New Orleans Paris and rural Virginia until 1929. At that time, he sent her'to visit her parents and then wrote her a one-line letter: "I just wish you would not come back." Mrs. Anderson then moved to Taxco, Mexico, renewed a friendship with William Spratling, whom she had known in New Orleans, and opened what became a successful dress shop. Her book ends with Spratling' s death in an automobile accident in 1967, of which she comments: "I miss Bill Spratling so very much more than I ever missed Sherwood Anderson." It is a curious book, bland in describing her early years, dutiful and matter-of-fact about the Anderson years, and chatty about the Mexican years that followed. The writing is clearly that of Mr. Kelly, a professional journalist. But Mrs. Anderson's observations on her celebrated friends are just as clearly her own. "Others might eat an apple, Sherwood experienced it," she says. And, "Edna St. Vincent Millay always had a coterie of followers but did not care about them one way or the other." Or, "Bill Faulkner's studied courtesies and Southern mannerisms were a pose." :
© Copyright 2024 Paperzz