International Journal of Educational Investigations Available online @ www.ijeionline.com Vol.2, No.5: 128-137, 2015 (May) ISSN: 2410-3446 Computer Adaptive Test (CAT): Advantages and Limitations Marzieh Rezaie 1*, Mohammad Golshan 1 1. Department of English, Maybod Branch, Islamic Azad University, Maybod, Iran. * Corresponding Author’s Email: [email protected] Abstract – Today, the use of computers and electronic devices has become widespread all around the world .Moreover, in language testing field, technology has been an instrument which led to expansion and innovation in language testing. For example, computer adaptive test (CAT) is a kind of computer-based test which adapts to the ability levels' of the participants. Due to the effects of computer technology in the field of language assessment and testing, this study attempts to investigate different aspects of Computer Adaptive Test (CAT). The results of this study show that the use of CATs in language testing may have its own advantages and disadvantages. Therefore, authorities should be aware of the positive and negative aspects of a computer adaptive test when they want to administer a test in adaptive format. Keywords: computer, adaptive, test, testing, technology I. INTRODUCTION The use of computers and electronic devices has become widespread all around the world; specifically, computers and on-line processes were increasingly used for evaluating the language proficiency of English learners (Fleming & Hiple, 2004). These improvements in computer technologies have affected many parts of educational settings such as learning, testing and assessment (Bennett, 2002; Pommerich, 2004). As a result, students can learn their lessons with the aid of computer (CALL); and computerized tests such as CBE (Computer Based Education), CBT (Computer Based Test), CBELT (Computer Based English Language Test), CAA (Computer Aided/Assisted Assessment), CAT (meaning either Computer Aided/Assisted or Computer Adaptive Test), CALT (Computer Adaptive Language Testing) can be used for testing purposes. As it is mentioned above, the emergence of computers, Information and Communication Technology (ICT) has affected different areas of educational setting. We can see that the realm of computerized tests is growing and revolutionizing language assessment and testing. Therefore, language proficiency standard tests such as TOEFL and IELTS can be administered in computerized formats. Due to the effects of computer technology in the field of language assessment and testing, this study attempts to investigate different aspects of Computer Adaptive Test (CAT). 128 M. Rezaie et al. II. COMPUTER BASED TESTS Computer based tests are tests in which computers are used in the testing process. Computer based testing can have a number of advantages. For example, the use of computer in testing field leads to productivity and innovation in this area (Al-Amri, 2009). The standardization of test administration conditions is another advantage of computer-based testing (CBT) (Al-Amri, 2009). Besides, CBT assists test developers to provide the same test conditions for all examinees, regardless of the tests’ population size (Al-Amri, 2009). Moreover, CBT can give immediate feedback to students and to testers, and examinees can take test at any time (Mojarrad et al., 2013). Additionally, different performance data such as latency information can be collected through computer based testing (Olsen et al., 1989). However, computer based testing may also have some drawbacks. For example, examinees need computer literacy in order to eliminate the mode effect on computer-based testing (Alderson, 2000). Additionally, administering computer based tests requires an equipped computer lab (Rudner, 1998). Furthermore, some of the students may get anxious when tests are presented on a computer (Young et al., 1996). Besides, open ended questions are not usually presented in computerized formats because these kinds of questions are usually scored by human (Brown, 2003). Moreover, human interaction doesn't exist in these tests (Brown, 2003). Due to the important role of computer in language testing, different scholars were motivated to explore comparability of paper-based tests and computer-based tests. For example, Hosseini et al. (2014) studied on comparability of test results of computer based tests (CBT) and paper and pencil tests (PPT) among English language learners in Iran. 106 Iranian English students which have been selected randomly from Azad University of Tehran participated in this study. All of the participants were second year students. One group was administered two parallel multiple-choice tests, one in paper based format and the other in computerized format. These tests were selected from General English Book. At first, the paper-based format of achievement test was administered to the participants; after two weeks, the computerized equivalent test was given to them. Moreover, a questionnaire which was chosen from Computer Attitude Scale (CAS) made by Loyd and Gressard (1984) and validated by Berberoglu and Calikoglu (1992) was given to the participants to explore their familiarity and attitude towards computer tests. The results of this investigation indicated that participants had better performance in paper-based tests than computer-based test. Additionally, the findings of some other research showed that examinees performed better in paper-based tests compared to computer-based tests (Coniam, 2006; Cumming et al., 2006; Salimi et al., 2011). Besides, Mazzeo et al. (1991) reported that mathematics and English scores of their examinees were greater in paper-based tests in comparison to computer-based tests. However, De Angelis (2000) conducted a study on the students who took a dental hygiene course. The findings of De Angelis (2000) showed that students' midterm scores in computer-based test were greater than paper-based test. 129 M. Rezaie et al. Moreover, the results of several studies showed that in equivalent test administration conditions, there was not a significant difference between the performance of examinees in CBT and PPT except that examinees prefer computer-based test to paper-based test mode Bachman, 2000; Jamieson, 2005; Chapelle, 2007; Douglas & Hegelheimer, 2007). The findings of Mason, Patry, and Bernstein (2001) also revealed that there was not any significant difference between the results of computer-based and paper-based tests. Nordin et al. (2010) also studied on the effectiveness of online and conventional mode of Malaysian English Competency Test (MECT) among undergraduates of University Kebangsaan Malaysia (UKM). In this study, the results showed that students' marks in paper test of Malaysian English Competency Test (MECT) and Electronic Malaysian English Competency Test (E-MECT) were not significantly different. Moreover, user friendliness, interface design, interactivity and time management play are features which can affect the success of an online test (Nordin et al., 2010). Additionally, Boo (2012) explored the comparability of scores obtained from computer and paper-and-pencil versions of the Iowa Tests of Educational Development and the examinees' attitudes about these two modes. The results of this study revealed that the scores obtained from computer and paper-and-pencil tests were comparable in terms of scaling (means and standard deviations), internal consistency, and criterion- and construct-related validity. However, examinees preferred the computerized tests to paper and pencil tests. Besides, Mojarrad et al. (2013) compared paper and pencil reading comprehension assessment to computer-based assessment. 66 male English as a Foreign Language (EFL) learners aged 8 to 12 years from Iran participated in this study. Two different reading comprehension tests with the same level of difficulties on paper and computer screen were administered to the examinee. An attitude questionnaire was given to them to investigate their attitudes towards computerized testing. The findings indicated that examinees' reading comprehension scores across testing modes were not significantly different. Furthermore, most of the students prefer to take the test on computer. III. COMPUTER ADAPTIVE TEST (CAT) Language performance can be assessed through different procedures. Adaptive testing is one of them. Computer adaptive testing is also called “tailored testing” (Madsen, 1991). Noijons (1994, p. 38) defines adaptive testing as "an integrated procedure in which language performance is elicited and assessed with the help of a computer, consisting of three integrated procedures including: 1. Generating the test 2. Interaction with candidate 3. Evaluation of response". 130 M. Rezaie et al. Furthermore, computer adaptive test (CAT) is a kind of computer-based test which adapts to the ability levels' of the participants.In computer adaptive tests, participants' responses to earlier items determine which subsequent items should be administered to them. If an examinee answers the item correctly, the next item will be more complicated. On the other hand, if the examinee answers the item wrongly, the subsequent item will be easier. This process will continue until the examinee's ability level is estimated. Generally, in computer adaptive testing, an item bank is used. This item bank includes a large number of test items which have different difficulty levels and the test items will be selected from it (Larson, 1989). The key point is that the quality of each item should be verified. Item discrimination and item difficulty are two important variables which should be calculated in this regard. Item response theory (IRT) is a suggested method which is used for analysis of computer adaptive testing (Baker, 1983). IV. THE ORIGIN OF COMPUTER ADAPTIVE TEST (CAT) Generally, computerized testing originated in the early 1970s (Drasgow, 2002; Wainer, 1990). The emergence of new technologies resulted in development and implementation of computerized testing in large-scale testing programs such as licensure, certification, admissions, and psychological tests (Kim & Huynh, 2007). Specifically, the first Computer Adaptive Test (CAT) was created by Larson and Madsen (1985) at Brigham Young University, in the USA. Besides, After Larson and Madsen (1985), several scholars (e.g., Kaya-Carton, Carton & Dandonoli, 1991; Burston & Monville-Burston, 1995; Brown & Iwashita, 1996; Young, Shermis, Brutten & Perkins, 1996) were motivated to construct and develop more computer adapted tests throughout the 1990s. As a result, some standardized tests were administered in computer-adaptive testing format. In 1998, the Test of English as a Foreign Language (TOEFL) began to use computer-adaptive testing format (Mojarrad et al., 2013). Besides, the Graduate Record Examinations (GRE) has been taken in computer-adaptive format for several years (Mojarrad et al., 2013). Today, we can see different testing programs like Graduate Record Exam, Graduate Management Admission Test, Scholastic Aptitude Test, Microsoft’s qualifications, etc. have used adaptive testing as their testing method (Giouroglou & Economides, 2004). V. ADVANTAGES OF CATS Scholars have mentioned a number of advantages regarding the use CATs; for example, test management is flexible, scores are immediately available, and it may motivate examinees (Linacre, 2000; Rudner, 1998) because items are appropriate for their own level and their test anxiety may be reduced (Mulkern, 1998). Besides, efficiency is the main advantage of CATs (Weiss, 1990; Straetmans & Eggen, 1998). CATs are cost saving compared to conventional methods. In conventional methods, different tests should be given to different groups of students and it is very time consuming to prepare different tests. Marking is also very time-consuming. 131 M. Rezaie et al. The use of CATs decreases the amount of time needed for test preparation and marking and it can increase consistency of the results (Callear and King 1997).The administration of computer adaptive tests is more time saving compared to paper and pencil tests. Moreover, the number of items which are used in computer-adaptive tests and the needed time for answering those items are less than traditional paper and pencil tests (Madsen, 1991; Wainer, 1990). Additionally, the use of CATs can be very beneficial, where a large number of learners should be placed into different classes immediately (Weiss, 1990); because through the flexi-level strategy, examinees do not need to answer a large number of questions which are too difficult or too easy for them. In fact, in computer adaptive tests, the examinees can be given different tests which are appropriate for their own specific level (Larson and Madsen, 1985). Because each test is adapted to each examinee level, more information can be gathered from computer adaptive tests compared to traditional tests (Young et al., 1996). Additionally, one of the advantages of CATs, is "greater precision of measurement", which causes more accurate mastery classification"(Weiss, 1990, p. 454). This greater precision of measurement is due to using items that are at the maximum discrimination level. Besides, the score of each examinee was determined based on both the percentage of questions which were answered correctly and the difficulty level of these questions. As a result, if two examinees answer the equal percentage of questions correctly, the one who answers more difficult questions gets higher score (Economides and Roupas, 2007). VI. DRAWBACKS OF CATs Like other type of tests computer adaptive tests have their own special limitations. Therefore, test administrators should be aware of these constraints when they want to use them. Researchers have mentioned some of these drawbacks. For example, test administration procedures of CATs are different compared to paper and pencil tests, and this can be problematic for some examines (Rudner, 1998). Besides, CATs are based on the Item Response Theory model (IRT) which cannot be used for all item types. Therefore, CATs are not applicable to open-ended questions and items which cannot be calibrated easily (Rudner, 1998). Additionally, another important drawback of CATs is that an examinee is not allowed to go back and change answers because the next items are selected based on the previous answered items (Rudner, 1998). Moreover, item calibration is an important factor which affects the success of a CAT. If the items are not appropriately calibrated on the difficulty/ability scale, the test will be neither valid nor reliable. Additionally, at each difficulty level, several items should be used to make repeated measures at that level possible. Consequently, a large bank of items should be created (Larson, 1987). Furthermore, the determination of performance cutoff points for grading, placement, advancement, remediation, and so on is one of the limitations of computer adaptive tests. Different departments may have different parameters regarding cutoff scores (Larson, 1987). Besides, other scholars believe that Unidimensionality is a drawback of CATs (Madsen 1991, Henning 1991, Canale 1986, Tung 1986; as cited in Starr-Eggle). Generally, computer adaptive tests are based on the item-response-theory (IRT) or latent-trait models. The assumption 132 M. Rezaie et al. which must be met in order to use these models is unidimensionality. It means that all test items must measure one trait (Larson, 1987). Some researchers claim that latent-trait models are not suitable for tests which measure more than one trait (Madsen 1991, Henning 1991, Canale 1986, Tung 1986; as cited in Starr-Eggle).Moreover, developing CAT is time consuming (Madsen 1991, Henning 1991, Canale 1986, Tung 1986; as cited in Starr-Eggle). Furthermore, security can be one of the limitations of CATs. Thus, in CATs, an item pool should be used in order to test the examinees' knowledge. In this process, some items may be chosen more frequently than other items and some items may be memorized and passed on to the other examinees (Wainer and Eignor; 2000). VII. CONCLUSION To sum up, technological advancements have moved very rapidly since the last century. Computers have become the most useful facilitator in achieving the majority of our goals. Moreover, in language testing field, technology has been an instrument which led to expansion and innovation in language testing. Today, computer adaptive tests (CATs) make the testing process in line with the needs of e-generation of second language learners by making the testing process innovative, flexible, individualized, efficient and fast. Hopefully, the use of CATs can have different implications. For example, in academic institutions, it may take a long time to assess a large number of students, and it may be difficult to provide accurate and fast results to the examinees; CATs can help academic institution in this regard. Through the use of CATs, the results of the assessment can be available very quickly, the need for human resources can be reduced, and assessment can be more efficient. Besides, the use of CATs can also have implications for teachers and students. Generally, successful teachers and students should be aware of the new advancements in teaching and testing field and upgrade their skills. Computer literate teachers and students can use computer technology and be innovative. Specifically, through use of CATs, students can assess themselves at any time and at their own pace; consequently, they will become autonomous. Moreover, through use of CATs, scoring and assessment will not be a time consuming process and so on. However, the use of technology in language testing may have its own caveats. Therefore, authorities should be aware of the positive and negative aspects of a computer adaptive test when they want to administer a test in adaptive format. The important point regarding the use of computer adaptive tests is that to what extent the scores obtained from a computer adaptive test are comparable to the scores obtained from a paper-and-pencil test or a computer-based test (CBT). In this regard, Wang and Kolen (2001) believe that it is challenging to establish score comparability between CAT and paper test versions because CATs are given online rather than on paper and the other issue is the adaptive nature of CAT tests versus the non-adaptive format of paper-and-pencil administrations. However, in some CAT programs, comparability between 133 M. Rezaie et al. CAT and paper versions of a test has been established through use of score adjustments like those used in test equating (Segall, 1993; Eignor, 1993). Furthermore, Schaeffer et al. (1995) studied on comparability of scores obtained from linear computer-based (CBT) and computer adaptive (CAT) versions of the three GRE General Test measures. The results of this study showed that the scores of verbal and quantitative CATs were comparable to their CBT counterparts. However, the scores of analytical CAT were not comparable to the analytical CBT scores. As mentioned above, several scholars studied on comparability of computer adaptive tests and paper-and-pencil tests or computer based test. However, this is an area which requires further investigation. REFERENCES Al-amri, S., (2009).Computer-based testing vs. paper-based testing: establishing the comparability of reading tests through the evolution of a new comparability model in a Saudi EFL context. Unpublished doctoral dissertation. University of Essex, England. Alderson, J. C. (2000). Technology in Testing: the Present and the Future. System, 28(4), 593603. Baker, F. B. (1983). The basic of item response theory. Portmouth, NH: Heinemann Bachman, L. F., (2000).Modern language testing at the turn of the century: Assuring that what we count counts. Language testing, 17(1), 1-42. Berberoglu, G. & Calikoglu, G. (1992).The construction of a Turkish computer attitude scale. Studies in Educational Evaluation, 24 (2), 841-845. Bennett, R. E. (2002). Inexorable and inevitable: the continuing story of technology and assessment .The Journal of technology, Learning, and Assessment, 1(1), 1-23. Bennett, R.E., Braswell, J., Oranje, A., Sandene, B., Kaplan, B., & Yan, F. (2008). Does it matter if I take my mathematics test on computer? A second empirical study of mode effects in NAEP. Journal of Technology, Learning, and Assessment, 6(9) 1-39. Boo, J. & Vispoel, W. (2012). Computer versus paper-and-pencil assessment of educational development: A comparison of psychometric features and examinee preferences. Psychological Reports, 111, 443-460 Brown, H. D. (2003). Language assessment Principles and Classroom Practices. USA: Longman Brown, A. &Iwashita, N. (1996).The role of language background in the validation of a computer- adaptive test. System, 24(2), 199-206 Burston, J. & Monville-Burston, M. (1995). Practical design and implementation considerations of a computer-adaptive foreign language test: The Monash/ Melbourne French, CALICO Journal, 13(1), 26-46.CAT 134 M. Rezaie et al. Callear, D. & King, T. (1997) .Using computer based tests for information science. ALT-J 5(1), 27-32 Chapelle, C. A. (2007). Technology and second language acquisition. Annual Review of Applied Linguistics.27, 98-114. Coniam, D. (2006). Evaluating computer-based and paper-based versions of an English-language listening test. ReCALL, 18, 193–211. Cumming, A., R. Kantor, K. Baba, U. Erdosy & M. James (2006). Analysis of discourse features and verification of scoring levels for independent and integrated prototype written tasks for next generation TOEFL (TOEFL Monograph 30). Princeton, NJ: Educational Testing Service. De Angelis, S. (2000). Equivalency of computer-based and paper-and-pencil testing. Journal of Allied Health 29, 161-164. Economides, A.A. & Roupas, C. (2007). Evaluation of computer adaptive testing systems. International Journal of Web Web-Based Learning and Teaching Technologies, 2(1). Drasgow, F (2002). The work ahead: A psychometric infrastructure for computerized adaptive tests. In C.N. Mills, M.T. Potenza, J.J. Fremer, & W.C. Ward (Eds.), Computer-based testing: Building the foundation for future assessments. (pp. 67–88).Hillsdale, NJ: Lawrence Erlbaum. Douglas, D., & Hegelheimer, V. (2007). Assessing language using computer technology.Annual Review of Applied Linguistics, 27, 115–132. Grigoriadou, M., Papanikolaou, K., Kornilakis, H. & Magoulas, G. (2001). INSPIRE: An intelligent system for personalized instruction in a remote environment. Proceedings of 3rd Workshop on Adaptive Hypertext and Hypermedia, Sonthoven, Germany, 13–24. Hosseini, M., Zainol Abidin, M., & Baghdarnia, M. (2014). Computer -based tests (CBT) and paper and pencil tests (PPT) among English Language Learners in Iran. Procedia - Social and Behavioral Sciences, 98, 659 – 667. Eignor, D. R. (1993). Deriving comparable scores for computer adaptive and conventional tests: An example using the SAT (Research No. 93-55). Princeton, NJ: Educational Testing Service Fleming, S. & Hiple, D. (2004). Foreign language distance education at the University of Hawai'i. In C. A. Spreen, (Ed.), New technologies and language learning: issues and options (Tech. Rep. No.25) (pp. 13-54). Honolulu, HI: University of Hawai'i, Second Language Teaching & Curriculum Center. Jamieson, J. M. (2005).Trends in computer-based second language assessment. Annual Review of Applied Linguistics, 25, 228–242. 135 M. Rezaie et al. Kaya-Carton, E., Carton, A. S. & Dandonoli, P. (1991). Developing a computer- adaptive test of French reading proficiency. In P. Dunkel (Ed.), Computer- assisted language learning and testing: Research issues and practice (pp. 259-84) .New York: Newbury House. Kim, D. H., & Huynh, H. (2007). Comparability of Computer and Paper-and-Pencil Versions of Algebra and Biology Assessments. Journal of Technology, Learning, and Assessment, 6(4). Larson, J. W. (1987). Computer – assisted language testing: is it portable? ADFL Bulletin, 18(2), 20-24. Larson, J. W. & Madsen. H. S. (1985). Computer-adaptive language testing: Moving beyond computer-assisted testing. CALICO Journal, 2(3), 32-6. Larson, J. W. & Madsen. H. S. (1989). S-CAPE: A Spanish Computerized Adaptive Placement Exam. In F. Smith (Ed.), Modern Technology in Foreign Language Education: Application and Projects. Lincolnwood, IL: National Textbook. Loyd, B. H., &Gressard, C. (1984). Reliability and factorial validity of computer attitude scales. Educational and Psychological Measurement, 44, 501-505. Linacre, J. M. (2000). Computer-adaptive testing: A methodology whose time has come. . In S. Chae, U. Kang, E. Jeon & J. M. Linacre (Eds.), Development of computerized middle school achievement test (in Korean). Seoul, South Korea: Komesa Press. Madsen, H. S. (1991): Computer-adaptive testing of listening and reading comprehension. In P. Dunkel (Ed.), Computer-assisted language learning and testing: Research issues and practice (pp. 237-257). New York: Newbury House. Mason, B. J., Patry, M., & Bernstein, D. J. (2001). An examination of the equivalence between non-adaptive computer-based and traditional testing. Journal of Educational computing research, 24, 29-39. Mazzeo, J. Druesne, B., Raffeld, P., Checketts, K. &Muhlstein, A. (1991). Comparability of computer and paper-and-pencil scores for two CLEP general examinations (Report No.91-5). Princeton, NJ: ETS. Mojarrad, H, Hemmati, F, Jafari Gohar, M, & Sadeghi , A. (2013). Computer-based assessment (CBA) vs. Paper/pencil-based assessment (PPBA): An investigation into the performance and attitude of Iranian EFL learners' reading comprehension. International Journal of Language Learning and Applied Linguistics World, 4(4), 418-428. Mulkern, A.E. (1998). Frequently Asked Questions about Computer-Adaptive Testing (CAT). Retrieved April 11, 2015 from http://carla.acad.umn.edu/CATFAQ.html Noijons, J. (1994) .Testing computer assisted language tests: Towards a checklist for CALT. CALICO Journal 12(1), 37 – 58. Nordin, N. M., Arshad, S. R., Razak, N., A, Jusoff, K. (2010).The validation and development of electronic language test. Studies in Literature and Language, 1(1), 1-7. 136 M. Rezaie et al. Olsen, J., Maynes, D., Slawson, D., & Ho, K. (1989). Comparison of paper-administered, computer-administered and computerized achievement tests. Journal of Educational Computing Research, 5, 311-326. Rudner, L. M. (1998). An online, interactive, computer adaptive testing tutorial. Retrieved April 16, 2015, from http://EdRes.org/scripts/cat Pommerich, M. (2004).Developing computerized versions of paper-and-pencil tests: Mode effects for passage-based tests. The Journal of Technology, Learning, and Assessment, 2(6), 3-44. Salimi, H., Rashidy, A., Salimi, A. H., & AminiFarsani, M. (2011). Digitized and non-Digitized Language Assessment: A Comparative Study of Iranian EFL Language Learners. International Conference on Languages, Literature and Linguistics IPEDR Vol.26. IACSIT Press, Singapore. Schaeffer, G. A., Steffen, M., Golub-Smith, M.L., Mills, C.N., & Durso, R. (1995). The introduction and comparability of the computer adaptive GRE general test (GRE Board Professional Report No. 88-08aP). Princeton, NJ: Educational Testing Service. Segall, D. O. (1993). Score equating verification analyses of the CAT-ASVAB. Briefing presented to the Defense Advisory Committee on Military Personnel Testing, Williamsburg, VA. Starr-Egger,F.(2001). German computer adaptive language tests - better late than never. CALICO Journal, 11(4). Straetmans, G.J.M., & Eggen T.J.H.M. (1998, January). Computerized adaptive testing: what it is and how it works. Educational Technology, 82-89. Young, F., Shermis, M. D., Brutten, S. & Perkins, K. (1996). From conventional to computer adaptive testing of ESL reading comprehension. System, 24(1), 32-40. Wainer, H. (1990): Introduction and history. In H. Wainer (Ed.), Computerized adaptive testing: A primer (pp.1-22). Hillsdale, NJ: Lawrence Earlbaum. Wainer, H. (1990).Computerized adaptive testing: A primer. Hillsdale, N J: LEA. Wainer, H. & Eignor, D. (2000). Caveats, pitfalls and unexpected consequences of implementing large-scale computerized testing. In H. Wainer et al. (Ed.), Computer adaptive testing: A primer (pp. 271-99). Hillsdale, NJ: Lawrence Erlbaum Associates. Wang, T., & Kolen, M. J. (2001). Evaluating comparability in computerized adaptive testing: Issues, criteria and an example. Journal of Educational Measurement, 38 (1), 19-49. Weiss, D. J. (1990) .Adaptive testing. In H. J. Walberg & G. D. Haertel (Eds.), The International Encyclopedia of Educational Evaluation (pp. 454-458). Oxford: Pergamon Press. 137
© Copyright 2026 Paperzz