Accting., Mgmt. &Info. Tech., Vol. 3, No. 2, pp. 19-95, Printed in the USA. All rights reserved. 1993 Copyright 0 09594022193 $6.00 + .OO 1993 Pergamon Press Ltd. AN EMPIRICAL STUDY OF SPREADSHEET ERROR-FINDING PERFORMANCE Dennis F. Galletta* Dolphy Abraham** Mohamed El Louadit William Lekse” Yannis A. Pollalis$ Jeffrey L. Sampler5 *University of Pittsburgh **Loyola Marymount University t UniversitP du Qukbec ir Trois-Rivikes $Syracuse University EjLondon Business School Abstract-Several well-founded concerns exist about the integrity and validity of electronic spreadsheets. Thirty CPAs and 30 MBA students volunteered to seek up to two errors planted in each of six spreadsheets to discover if expertise in the domain of accounting and the spreadsheet program would facilitate error-finding performance. Subjects only found, on average, about 55% of the simple, conspicuous errors on the small spreadsheets. As expected, both accounting and spreadsheeting expertise contributed to the subjects’ error-finding rate, and those who performed this task with both types of expertise found the largest number of errors in the shortest time. Interestingly, while CPAs were more accurate than non-CPAs in finding accounting-related errors, spreadsheet experts did not outperform novices in finding spreadsheet formula errors. Also, while spreadsheet expertise contributed to greater speed, accounting expertise did not. Future research would further investigate the contribution of spreadsheet expertise to the error-finding task. Practitioners should be aware of the difficulties in finding even simple errors, and develop training programs to facilitate spreadsheet auditors’ performance. Keywords: Spreadsheets, Spreadsheet errors, Errors, Auditing, Expertise, Experiment. INTRODUCTION Electronic spreadsheet software has altered completely the face of computing practice in the past decade. A recent study (AMA, 1988) reveals that spreadsheet software has become the most common computer application employed, used by over 91% of a sample of end users. Indeed, one can scarcely browse a trade publication published in the last decade without encountering scores of advertisements that describe software for building or enhancing spreadsheets. Also, a subscription to a computer-related publication often earns the reader frequent mailings that announce materials or training sessions for enhancing spreadsheet usage. Concurrent with the growth in popularity of electronic spreadsheet software is a growth in concern about the high frequency and impact of errors in spreadsheet models. Although researchers have built some understanding of the conditions under which errors are made, To request a copy of the experimental materials, please contact Dennis F. Galletta, School of Business, University of Pittsburgh, Pittsburgh, PA 15260. 19 Joseph M. Katz Graduate 80 D.F. GALLETTA et al. the possible effects of those errors, and useful tips for their avoidance, they have provided no assurance of an imminent, complete eradication of spreadsheet errors. It appears imperative to devote significant research efforts to error detection while we await technology that will eliminate errors altogether. Error detection can be affected by several factors, not the least of which is the background of the persons attempting to find them. In this experimental study, two domains of expertise are explored for their contribution to error-finding performance of laboratory subjects; experts and novices in accounting and spreadsheet usage were asked to find and circle errors planted in spreadsheets provided to them. ERRORS AND EXPERTISE Difficulties in the construction and usage of spreadsheets can be considered symptoms of how end-user computing technology is managed: In concert with a general lack of planning (Alavi, Nelson, & Weiss, 1987-88; Munro, Huff, & Moore, 1987-88) and control (Galletta & Hufnagel, 1992), tasks are transferred from trained system developers to large numbers of users who are already fully employed performing their own tasks (Galletta & Heckman, 1990). Users therefore face a confusing environment (Brancheau, Vogel, & Wetherbe, 1985; Lederer & Sethi, 1986; Rockart & Flannery, 1983), employing tools that are in many ways more difficult and error-prone than are languages for developing computer programs (Xenakis, 1987). It therefore seems likely that a staggering number of untrained end users embark on formidable software development tasks, including design, construction, and ongoing maintenance (AMA, 1988; Sumner & Klepper, 1987), and even training and support for others (Brancheau, Vogel, & Wetherbe, 1985). Unfamiliar, difficult tasks are a breeding ground for errors, and firms should exercise caution in the management of end-user computing (Davis, 1984). There is evidence from both the laboratory (Brown & Gould, 1987) and the field (Creeth, 1985) that somewhere between one-third and one-half of all spreadsheets contain errors. In one study, only 5 of 19 spreadsheets sampled from actual practice were “totally error free” (Davies & Ikin, 1987, p. 55). Error severity appears to be quite high; errors in actual spreadsheet usage are known to have caused losses in the hundreds of thousands (Kee & Mason, 1988) to millions (Davies & Ikin, 1987; Ditlea, 1987) of dollars. Much of the effort expended by other researchers (e.g., Ballou, Pazer, Belardo, & Klein, 1987; Olson & Nilsen, 1987-88) and practitioners (e.g., Anderson & Bernard, 1988; Pearson, 1988; Stang, 1987) has been devoted to describing how errors are made while spreadsheets are being built, estimating their effects, and prescribing techniques or software to prevent them. Error prevention is perhaps the most desirable goal, but there is presently no “silver bullet” (Brooks, 1987) that will eradicate all errors made while constructing computer algorithms or models. Efforts in error prevention range from automating portions of the spreadsheet construction process to offering well-designed training programs. While the market potential of work-saving software constantly leads developers to chip away at the former goal, progress has been very slow due to wide variations in how spreadsheets are used. Also, each time difficult economic conditions become an imminent threat, many organizations cut back on training programs. We should therefore turn our attention to detection of spreadsheet errors for the fore- SPREADSHEET ERROR-FINDING PERFORMANCE 81 seeable future; there appears to be a paucity of research in the empirical literature in this area. Although computer software is available that checks for the existence of several common errors, users are warned by the vendors that there are significant limitations to such software. Advances will undoubtedly be made in spreadsheet audit software, but automatic detection of all possible errors is not yet on the horizon. Therefore, there is presently a need for better understanding how humans detect errors in electronic spreadsheets. The role of expertise There are several approaches one can take in investigating spreadsheet error detection. As examples of the many studies that would be performed as part of a complete research program, investigators would focus on the people now performing the error-detection task, the errors themselves, the conditions under which errors are and are not found, and the impact of the spreadsheet software itself on how the task should be approached. Attempting to investigate the latter presents a problem because there is neither coursework nor certification available to demonstrate expertise in spreadsheet error-finding; it is not obvious who should perform this task at the present time. Two territories covered by persons engaged in spreadsheet error-finding are the domain area of the task (e.g., accounting) and the device used (e.g., Lotus l-2-30). These areas differ markedly, and therefore must be considered separate dimensions of expertise. Such a differentiation has already been employed successfully by Mackay and her colleagues (Mackay, Barr, & Kletke, 1992; Mackay & Elam, 1992; Mackay & Lamb, 1991). Likewise, the errors contain two possible dimensions: They are based on the domain of accounting, or they involve the use of the device (the spreadsheet software) on which the spreadsheet model is constructed. Hypotheses are presented below that suggest a correspondence between the areas of expertise and types of errors that can be found by spreadsheet auditors. We address simultaneously pragmatic and theoretical issues. A key point is pragmatic: Who should be assigned to such tasks and how can we explain differences in their performance? Theoretical issues are also important: Is the accounting expertise implied by the CPA certificate valuable in error-finding? Is there a correspondence between spreadsheet auditors’ experience and types of errors found? In this study, we seek to determine the value of both kinds of expertise in spreadsheet error-finding. Accounting (domain) expertise Many researchers have investigated accounting expertise, especially in the context of auditing. One of the main measures of this expertise has been years of audit experience. Although such a surrogate has proven successful in cognitive psychology research, the empirical evidence for using experience to explain differences in audit (and other judgemental) performance is mixed (Ashton, 1991; Johnson, 1988). The lack of success in using accounting experience as a surrogate for expertise has been explained by several researchers to include auditors’ lack of task feedback (Ashton, 1991), researchers’ neglect of the critical variables knowledge and ability (Bonner, Lewis, & Marchant, 1990; Frederick & Libby, 1986), the absence of a relationship between experimental tasks and subjects’ areas of expertise (Bonner, 1990), and the unstructuredness and frequent absence of an objective answer when auditors perform such a task (Davis & Solomon, 1989; Hamilton & Wright, 1982; Johnson, 1988; Messier, 1983). There is no conceptual disagreement that audit expertise includes audit experience 82 D.F. GALLETTA et al. (Ashton, 1991; Davis & Solomon, 1989). The above explanations of the lack of empirical evidence of the connection between the two constructs seem to advise us that the experience must cover the target task, auditors must have been given adequate feedback while gaining the experience, and subjects must possess not only experience, but knowledge and ability as well. If these criteria are taken into account in studying auditor expertise, it would seem likely that expert performance exceeds that of novices, in a manner similar to that generally found in the cognitive psychology literature (for example, see Chase & Simon, 1973; Chi, Glaser, & Rees, 1982; Larkin, McDermott, Simon, & Simon, 1980). That is, accounting experts will accurately identify more errors, more quickly than will novices. Because this study employs basic accounting material found in elementary accounting courses, it is expected that CPAs have the requisite experience, knowledge, ability, and feedback to perform well on the spreadsheet error-detection task compared to novices who have only been exposed to the concepts through basic coursework. Stated more formally: Hl: CPAs will locate more errors than will non-CPAs. H2: CPAs will locate errors more quickly than will non-CPAs. Consistent with the semantic-syntactic differentiation in human-computer interaction presented by Shneiderman (1992), some errors can involve accounting principles (semantics), and others can involve spreadsheet commands (syntactics) to capture the accounting principles. It is reasonable to expect that semantic (accounting domain) errors will be captured more thoroughly and more quickly by CPAs than by non-CPAs because their training and experience only addressed the accounting domain. H3: CPAs will locate accounting non-CPAs. Computer program debugging domain errors more thoroughly and quickly than will (device) expertise There is a great deal of research investigating differences between experts and novices in a variety of computer-related domains, including system design and programming (Adelson & Soloway, 1985; Bonar & Soloway, 1985; Soloway, Ehrlich, Bonar, & Greenspan, 1982; Vitalari & Dickson, 1983) and computer program debugging (Gould, 1975; Gugerty & Olson, 1986; Oman, Curtis, & Nanja, 1989; Soloway & Ehrlich, 1984; Vessey, 1985). Although there have been empirical studies that focused on spreadsheet creation in general (e.g., Brown & Gould, 1987; Olson & Nilsen, 1987-88) and errors made by beginners (e.g., Floyd & Pyun, 1987), there is virtually nothing appearing in the archival literature on spreadsheet error-finding. The similarity of spreadsheet models to computer programs is strong enough to warrant investigation of some of the other literature in computer-related expertise. Generally, in computer-related studies, empirical support for claims of performance superiority of experts appears much stronger than empirical support for similar claims in the accounting domain. The strength of results in the computer-related area is perhaps due to the amount of feedback provided by the computer, the appropriateness of experimental tasks, and selection of subjects with experience as well as knowledge and ability. Consistent with results in the cognitive psychology literature, studies show that experts generally solve problems faster (Larkin, 1981; Simon & Simon, 1978) and commit fewer errors (Soloway et al., 1982) than novices. Expert/novice studies have been performed in the domain SPREADSHEET ERROR-FINDING 83 PERFORMANCE of computer programming for nearly two decades. Studies have addressed program recall, comprehension, and debugging. Program recall and comprehension are related tasks. Performance advantages for expert programmers in recalling program segments are important indicators that their greater understanding makes it easier for them to encode and decode their contents. Shneiderman (1976) provided evidence for such a process, finding that there is only a performance effect for experts when the lines of code are arranged in a sensible sequence. A nonsense sequence provides little for the experts to encode and decode. Further evidence is that experts recall program functions while novices focus on syntax (Adelson, 1981). Several researchers have also studied expertise and program comprehension. Experts appear to comprehend programs using some of the same mechanisms they use to recall them. Expert programmers search for key lines to verify their hypotheses about a program’s function (Brooks, 1977), while novices appear to study programs on a line-by-line basis (Weidenbeck, 1986). One reason for this might be that experts possess a large bank of generic, prototypical program fragments, or programming plans that relate to control flow and to variables (Ehrlich & Soloway, 1984; Soloway, Adelson, & Ehrlich, 1988; Soloway & Ehrlich, 1984), and tap this bank to infer what is occurring in a program. Another reason might be that experts know which assumptions to make about how variables are named and how code is used (Soloway et al., 1988; Soloway & Ehrlich, 1984). Indeed, novice programmers appear to spend much time reading comments and comparison statements, while experts do not seem to need them to such an extent (Crosby & Stelovsky, 1990). Studies in program debugging have also confirmed that experts are able to locate and remove bugs faster than novices. Experts can dispense with more superficial errors quickly, then move on to more subtle logic errors (Youngs, 1974). Experts use a breadth-first approach to gain an overall understanding of a program, while novices appear to be more anxious to attend to details, resulting in more frequently changed hypotheses (Vessey, 1985). Experts also seem more flexible in shifting from breadth-intensive (comprehension) to depth-intensive (isolation) strategies than novices (Gould, 1975; Oman et al., 1989). In summary, expertise in computer programming appears to contribute significantly to program recall, comprehension, and, most importantly, debugging. These findings are important because the experts are better able to examine sequences of computer commands, understand them, and find errors. The empirical findings in the debugging literature would lead one to hypothesize that expertise in spreadsheet construction and usage would contribute to skill in finding spreadsheet errors: H4: Spreadsheet H5: Spreadsheet experts will locate more errors than will novices. experts will locate errors more quickly than will novices. Consistent with our expectations for CPAs in the accounting domain, spreadsheet experts will be likely to find more spreadsheet device (syntactic) errors, and more quickly, than will spreadsheet novices, due to the nature of their training and experience. H6: Spreadsheet experts will locate spreadsheet quickly than will spreadsheet novices. device errors more thoroughly and Finally, it would be expected that expertise in both dimensions will result in highest performance of all, considering both accuracy and speed in detecting errors, suggesting an interaction effect between accounting and spreadsheet expertise: H7: The highest performers overall will possess expertise in both dimensions. 84 D.F. GALLETTA et al. Table 1. Division of subjects into four cells CPAs non-CPAs Spreadsheet novices Spreadsheet experts II IV I III METHOD An experiment was conducted to identify differences between levels of expertise in a spreadsheet error-finding setting. individuals with varying Subjects Seventy subjects volunteered to participate in the spreadsheet error-finding study. Responses to background questionnaire items resulted in subjects being categorized as CPAs versus non-CPAs, and spreadsheet experts versus spreadsheet novices. This created four cells, as illustrated in Table 1. After 10 subjects who performed the task were disqualified,’ 15 subjects remained in each of the four cells described above. Most of the subjects in cells I and II (the CPAs) were obtained by using attendees at local CPA continuing education courses. The subjects in cells III and IV (the non-CPAs) were obtained by soliciting volunteers from MBA classes at a large university in the northeastern U.S. Most of the CPA subjects were enrolled in a basic spreadsheet skills course, and others were enrolled in more advanced spreadsheet topics courses. All subjects were paid a $5 incentive for their participation and offered a chance for a $20 prize for the highest speed and accuracy in finding spreadsheet errors. Permission was granted to approach the CPA attendees in the morning, and to solicit their participation during lunchtime. In exchange for their loss of opportunity to obtain lunch, sandwiches were offered in addition to the flat incentive and chance for the prize. On average, about 50% of the CPA attendees elected to participate; most of the others had previous lunchtime commitments. Some of them performed the task at their offices at a later date. CPAs were considered to be accounting experts for the purposes of this task. A CPA must complete a number of accounting courses and pass a difficult exam. This exam implicitly requires basic ability and explicitly tests knowledge of accounting facts. Beyond the CPA exam, a Pennsylvania candidate must obtain 2 years of public accounting experience (1 year if he or she possesses a graduate degree). Thus, all of the components necessary for accounting expertise appear present in the CPA subjects. The non-CPAs were exposed to the basic material in an introductory accounting course, but did not enroll in any courses beyond the introductory level. _____ _ ‘Five non-CPAs were disqualified because their questionnaire responses revealed that while they were not CPAs, they were accounting majors, were experienced in developing accounting software, or were Chartered Accountants from other countries. Four had spreadsheet experience above the novice threshold and below the expert threshold. One was disqualified when an error in setting up the computer prevented accurate capture of timing data. SPREADSHEET ERROR-FINDING PERFORMANCE 85 The spreadsheet experts in cells I and III were somewhat arbitrarily identified as those who spent over 250 hours in spreadsheet creation and/or modification (but not data entry or other superficial manipulation). Subjects who reported fewer than 150 hours of such experience were classified as spreadsheet novices (cells II and IV). The mean (median) number of hours for spreadsheet experts was 1,548 (925), and for novices was 54 (44). Spreadsheet experience was used as an indicator of expertise for two reasons. First, the subjects were assumed to have the requisite basic knowledge and ability in the level of arithmetic used in the spreadsheet tasks, because of the rigorous screening for entrance into the MBA program or requirements for CPA candidacy. Second, the key for building expertise in interactive software usage is to build on basic abilities and knowledge through experience; the feedback requirement appeared to be fulfilled by the highly experienced subjects. Because of the strikingly interactive nature of spreadsheet software, user errors are very often found immediately or as the result of further manipulation of spreadsheets. This experience will provide performance feedback because of the immediate error messages or obviously unreasonable results provided by the software. It is important for users to experience many of these errors. Extrapolations of the results of Floyd and Pyun (1987) indicate that the expert users in this study were likely to have been given what might be considered adequate feedback for building expertise. If users indeed make about 72 errors per hour during intensive spreadsheet use, only discover a third of them, and on a long-term basis work a third as intensively as they do during short laboratory bursts, then the 250-hour threshold used to indicate spreadsheet expertise involves more than 2,000 errors involving feedback. The mean number of hours of usage (1,548) reported by spreadsheet experts implies over 12,000 feedback-related errors. In contrast, the mean number of hours of novice usage (54) implies about 450 feedback-related errors, on average. All novice subjects had, prior to the experiment, been presented formally with at least the minimal level of material needed to perform the task. All of the non-CPAs had completed courses in introductory financial and managerial accounting about 6 months earlier. All of the spreadsheet novices received introductory training that covered all of the commands used. Only very basic accounting fundamentals and very basic spreadsheet commands were used. For example, the only Lotus commands used were addition, subtraction, multiplication, division, and summing over several cells (using the “@SUM” command). This requirement of basic skills provided a more stringent, meaningful, and realistic test of the dimensions of expertise than would be afforded by employing novices without even basic skills. Materials Ten spreadsheets (see Appendix for an example), adapted from examples that appeared in various introductory accounting textbooks, were created. Exactly two errors were introduced into each spreadsheet: one was an error in an accounting concept or principle (domain error), and one was an error in usage of the spreadsheet program (device error). Each error was carefully constructed to belong to only one of those two categories; formulas associated with domain error cells were otherwise correctly constructed and formulas associated with device error cells otherwise employed proper accounting principles. Examples of the domain errors included presenting accumulated depreciation as a debit on a trial balance, classification of “expenses prepaid this year for next year” in a list of non-cash expenses, and subtraction of scrap value from the cost of an asset before com- 86 D.F. GALLETTA et al. puting double-declining balance depreciation for book purposes. Examples of the device errors inciude omission of a character in a formula for a column total (a formula that should have been +G13+Gl4+G15 was entered as +G13+G4+G15), inclusion of an intermediate subtota1 in a range for calculating the sum of alI items in a column, and improper placement of parentheses in a financial ratio formula where the numerator and denominator were defined correctly in words on the spreadsheet. The impact of each error was tightly constrained to enable scoring of the results. Had this not been done, it would have been impossible to quantify the number of errors on any given spreadsheet. If an item was in error, it did not carry through to further calculations on the spreadsheet. An initial pilot test was performed on a CPA who was a spreadsheet expert using a personal computer attached to an overhead projection screen in front of the investigators. This pilot subject was also encouraged to think aloud while performing the task so that the experimenters could take notes of any difficulties he encountered with the task. Six finat spreadsheets were chosen from the 10, and were refined on the basis of this subject’s experience. A follow-up pilot test was then performed using PhD students and other CPAs, and any remaining misleading or ambiguous aspects of the materials were corrected. Experimental procedures Subjects were asked to find and circle (but not fix) as many of the spreadsheet errors as they could find. The spreadsheets were presented both on paper (see the Appendix for an example) and on a personal computer running Lotus l-2-3 (trademark of Lotus Development Corporation), version 2.01. All subjects were free to move the cursor about the screen to investigate the contents of any cell. Each spreadsheet was small enough to fit on a single display screen to avoid any variability caused by differences in their screen scrolling methods. Subjects were told that there were either zero, one, or two errors on each spreadsheet, and were instructed to circle the errors they found on the paper copy, indicating which they found first or second. After they were satisfied that either no errors existed, or that they found al1 errors in each spreadsheet, they were instructed to activate a Lotus macro2 that captured the computer’s internal clock time, recorded the results on the diskette, and loaded the next spreadsheet. After completing the last spreadsheet, a macro available from an exit screen automatically summarized al1 of the timing information while subjects completed a postexperiment questionnaire. Subjects began their tasks at different times to minimize any confounding effects of direct competition. Virtually al1 of the computers used by CPA subjects were IBM PCs, using CGA-level 13” color monitors, and virtually al1 of the computers used by non-CPA subjects were IBM PS/2 model 30s with 12” monochrome VGA monitors. This difference was unavoidable because of the differences in computer laboratories available to those subjects. Readability was actually very similar in each setting, because both kinds of screens were used in text mode, and any incremental sharpness of the monochrome VGA’s image (which could enhance performance) was probably compromised by its smaller screen size (which could degrade performance). Although studies have revealed reading and comprehension differences between users of conventional CRT screens and paper (e.g., see Gould et al., 1987a; 1987b), neither theory nor data exist to suggest that performance would be markedly dif- 2A Lotus macro is a program embedded into a spreadsheet, using the standard spreadsheet commands. SPREADSHEET ERROR-FINDING PERFORMANCE 87 Table 2. Mean number of errors found (Hl, H4, H7; n = 15 per cell) CPAs non-CPAs Overall Note. Spreadsheet novices Spreadsheet experts 6.8 (2.2) 6.3 (2.1) 6.6 (2.1) 7.9 (1.8) 5.7 (2.3) 6.8 (2.3) 12 would be a perfect score; standard deviations Overall 7.3 (2.1) 6.0 (2.2) 6.7 (2.2) in parentheses ferent for either type of screen, especially with each placed in text mode. Nevertheless, generalizability of the results might be affected by screen differences, and appropriate cautions must be given for interpreting the results. Coding of outcomes Scoring of subjects’ correctness was performed by two judges; both assigned initial scores for each subject without knowledge of the subject’s category3 or the other judge’s score. Out of 720 judgments, the raters initially disagreed on only 23 occasions, but resolved their differences after discussing the results; most were simply errors in coding. Initial intercoder agreement was assessed with Cohen’s Kappa using a procedure outlined by Bishop (1975). An overall Kappa value of .94 was obtained for all initial judgments, abstracting across error types and occurrences. Kappa scores for each error type for each spreadsheet ranged from .819 to 1.OOO;the evidence appears to suggest that there was substantial agreement between the coders even before the differences were found and resolved. Except where noted, hypothesis testing employed a 2 x 2 ANOVA approach, with two levels (novice and expert) in each dimension (accounting domain and spreadsheet device). RESULTS Overall, the subjects found only 56% of the errors (46% of the domain errors and 65% of the device errors). Because performance at a 50% level would enable maximum variability in either direction in the dependent variable, the task difficulty appeared to be nearly ideal. Hypotheses Hl and H4 assert that accounting and spreadsheeting expertise would enable subjects to locate more errors in the spreadsheets, respectively. Further, Hypothesis H7 predicts than an interaction between the two types of expertise enables subjects with expertise in both areas to find the largest number of errors. Table 2 depicts the mean number of errors found in each cell. The means in Table 2 were subjected to ANOVA, and only the main accounting expertise effect was significant (F = 5.67; 1,56 df; p < .021), supporting Hypothesis Hl. Nei‘One of the two judges was aware of the experimental hypotheses, and thus the subjects’ expertise categories were concealed. 88 D.F. GALLETTA et al. Table 3. Mean number of minutes taken (H2, H5, H7; n = 15 per cell) Spreadsheet novices CPAs non-CPAs Overall Spreadsheet experts 26.4 (10.1) 24.6 (7.4) 25.5 (8.7) 20.1 (4.3) 21.7 (4.4) 20.9 (4.4) Overall 23.3 (8.3) 23.1 (6.2) 23.2 (7.2) Note. Standard deviations shown in parentheses. ther the main spreadsheet expertise effect (H4; F = .18; I,56 df; p < .671) nor the interaction effect (H7: F = 2.33; 1,56 df; p < .133) was significant. The Bartlett-Box statistic was not significant, signifying homogeneity of variance among the cells (F = .22; 3,5645 dA p < .881). Therefore, an important assumption of ANOVA was not violated. Hypotheses H2 and H5 predict that the amount of time necessary to perform the errorfinding task would be lower for subjects with expertise in at least one dimension. Hypothesis 7 predicts that CPAs with spreadsheet expertise would be fastest of all. Table 3 presents the mean time for subjects in each cell. The means in Table 3 were examined using ANOVA, and it was found that only the spreadsheet expertise main effect was significant in differentiating subjects’ time performance (H5; F = 6.67; I,56 df; p < .012). Neither the domain expertise main effect (H2; F = .Ol; 1,56 df; p < .937) nor the interaction effect (H7; F = .88; 1,56 df; p < .353) was significant. Therefore, H5 receives support but H2 and H7 do not. One difficulty with this analysis was a lack of homogeneity of variance (Bartlett-Box = 4.5; 3, 5645 df, p < .004). A natural log transformation of the time measure rectified this difficulty (Bartlett-Box = 1.7; 3, 5645 df; p < .165) and the statistical analysis was virtually unchanged (for H5, minutes taken, F = 5.98; 1,56 df, p < .018). Because it appears that accounting expertise assists in locating errors, while spreadsheet expertise assists in speeding up performance, and because there is likely to be a trade-off between speed and accuracy, an analysis was performed taking into account both speed and accuracy. A ratio of errors per hour4 was computed for each subject, presented in Table 4. ANOVA results indicate that both the domain main effect (p=.O38; F=4.5; 1,56 df) and device main effect (p=.O35; F=4.7; 1,56 df) were significant, suggesting that there is indeed a trade-off between time and number of errors found that affected the results of analysis of means in Tables 2 and 3. Furthermore, the predicted interaction effect was significant (p=.O32; F=4.9; I,56 df ), lending support to H7. The Bartlett-Box statistic was not significant (F=.21; 3,5645 df, p < .886), signifying that the ANOVA homogeneity assumption was not violated. Hypotheses H3 and H6 predict that the performance effects can be explained by looking deeper into the types of errors found by subjects with expertise in each dimension. Tables 5 and 6 depict cell means taking into account only those six errors that match the expertise of the subjects. Table 5 compares the domain error-finding performance of CPAs 4Although all subjects performed the complete task in less than one hour, using hours as a base provides that are more interpretable. For example, knowing that the mean number per hour appears more meaningful than .308 per minute. ratios of errors found by all subjects was 18.5 SPREADSHEET ERROR-FINDING 89 PERFORMANCE Table 4. Mean total errors found per hour (Hl, H2, H4, H5, H7; n = 15 per cell) CPAs non-CPAs Overall Spreadsheet novices Spreadsheet experts 16.5 (6.1) 16.7 (7.6) 16.6 (6.8) 24.2 (6.6) 16.6 (7.0) 20.4 (7.7) Overall 20.4 (7.4) 16.6 (7.2) 18.5 (7.5) Note. Standard deviations shown in parentheses. versus non-CPAs, and Table 6 compares the device error-finding performance of spreadsheet experts versus novices. Results of t-tests performed on the data in Tables 5 and 6 are shown to the right of each table, and indicate that there is indeed some evidence that the performance of CPAs was closely related to the nature of the accounting errors (H3). The results in Table 6 do not support H6, and there is no evidence found to support the prediction that spreadsheet experts would perform better than spreadsheet novices because of their skill in finding device errors. The variances in each cell of each t-test were also compared. In each case except one, there were no differences between the variances. The variance of total time taken, in Table 6, was significantly different for spreadsheet experts and novices (F=3.95, p < .OOl). The separate variance estimate was used in determining the statistical significance of the two means in question, hence the calculated degrees of freedom. The analysis above is somewhat incomplete because of the inability to isolate how much time was spent seeking each type of error.* Because subjects were asked to identify which error they identified first, there is an alternative method for examining time-related results for each type of error. Table 7 presents the mean number of domain errors found first by CPAs and non-CPAs, and Table 8 presents the number of device errors found first by spreadsheet experts and novices. T-test results are again shown to the right of each table. Once again, it is evident that accounting expertise has a specific effect on subjects’ performance in finding accounting domain errors (H3), and that spreadsheet expertise has no discernable effect on subjects’ performance in finding spreadsheet device errors (H6). 5A qualitative analysis on protocol subjects is necessary to ascertain the types of errors sought by subjects. Such analysis is currently under way in a separate study by the first two authors. Table 5. Mean accounting error-finding performance of subjects (H3; N = 30 per cell) non-CPAs Accounting errors found Total time taken, minutes Ratio, computed per-hour Note. A perfect accounting expertise CPAs 2.3 (1.4) 23.1 (6.2) 6.3 (3.8) score would be six accounting of varying errors 3.2 (1.3) 23.3 (8.3) 9.1 (4.6) found; standard * (t = -2.55; 58 df; p < .013) ns (t = -.08; 58 df; p < .94) * (t = -2.54; 58 df;p < .014) deviations are shown in parentheses. 90 D.F. GALLETTA Table 6. Mean spreadsheet Spreadsheet errors found Total time taken, minutes Ratio, computed per-hour error-finding performance of subjects expertise (H6; N = 30 per cell) Spreadsheet novices Spreadsheet experts 3.9 (1.6) 25.5 (8.7) 10.0 (4.9) 3.9 (1.5) 20.9 (4.4) 11.6 (4.8) Note. A perfect score would be six spreadsheet errors found; standard separate variance estimate was used for the total time taken statistics. et al. of varying spreadsheet ns (t = .17; 58 df; p < .868) * (t = 2.61; 42.81 df; p < .013) ns (t = -1.22; 58 df; p < .228) deviations are shown in parentheses; the DISCUSSION While most of the hypotheses were fully or partially supported by the data, there are several limitations of this study. First, due to the requirement for choosing subjects with high and low levels of expertise in accounting and spreadsheet software usage, subjects could not be chosen at random from a single population; no single sampling frame includes subjects from all four cells. A related problem was that effects of the use of a variety of equipment were undefined. The use of a single computer for all subjects would have solved this problem, but the CPAs’ computer lab was not available to the MBAs and vice-versa. A third problem was that the errors planted in the spreadsheets had to be localized to a few cells, and thus might have not been representative of all errors found in real spreadsheets, where effects of a single error can propagate to dozens of other cells. This was necessary to enable meaningful counting of errors. A fourth problem is the employment of selfreported measures to establish spreadsheet expertise as opposed to the more objective means of establishing accounting expertise; this limitation might partially explain the different speed and accuracy results for each type of expertise. Finally, none of the CPA subjects were members of national firms, and results might have been attenuated somewhat; Messier (1983) found that CPAs from national firms outperformed CPAs from local firms. However, there were nevertheless some useful findings. Findings This study provided some consistent evidence of striking overall performance advantages (i.e., the number of errors found and/or the speed with which they are found) when spread- Table 7. Mean number of accounting errors found first by subjects with varying accounting expertise (H3; N = 30 per cell; standard deviations are in parentheses) non-CPAs Accounting errors found first 1.10 (1.1) CPAs 2.23 (1.3) *** (t = -3.6; 58 df; p < .OOl) Note. Purely by chance, subjects earning perfect scores would be expected to find 3 of the accounting errors first and 3 of the spreadsheet errors first. On average, the subjects in this sample would be expected to find 1.4 of the accounting errors first and 2.0 of the spreadsheet errors first given the overall 46% and 65% success rates of finding the 6 domain and 6 device errors, respectively. SPREADSHEET ERROR-FINDING PERFORMANCE 91 Table 8. Mean number of spreadsheet errors found first by subjects with varying expertise (H6; N = 30 per cell; standard deviations are in parentheses) Spreadsheet errors found first Spreadsheet novices Spreadsheet experts 3.4 (1.7) 3.2 (1.4) spreadsheet ns (t = .41; 58 df; p < .685) Note. Purely by chance, subjects earning perfect scores would be expected to find 3 of the accounting errors first and 3 of the spreadsheet errors first. On average, the subjects in this sample would be expected to find 1.4 of the accounting errors first and 2.0 of the spreadsheet errors first given the overall 46% and 65% success rates of finding the 6 domain and 6 device errors, respectively. sheet auditors possess accounting expertise as well as spreadsheet expertise. Subjects possessing both kinds of expertise far exceeded the overall performance of subjects with expertise in only one or neither area. These overall results appear to support the notions that the CPA certificate indicates domain expertise, and that spreadsheet experience provides effective feedback for forming device expertise. Domain-related analysis revealed that accounting expertise provides clear performance advantages in the number of accounting errors found (effectiveness). The performance advantages of accounting expertise were present when considering number of accounting errors found, number of accounting errors found per unit of time, and number of times that the accounting error was found before the spreadsheet usage error in each spreadsheet. Device-related analysis revealed that spreadsheet expertise provides clear speed advantages in finding spreadsheet errors (efficiency). However, contrary to what might have been expected, there was no evidence to suggest that spreadsheet expertise is crucial for discovering errors strictly affecting spreadsheet formulas and structure. For these subjects and this task, one spreadsheet training session provided CPAs with enough spreadsheet background to find errors that were strictly device-related, given extra time allowed for the task. These results appear on the surface to contradict those of previous research in two ways. (a) Accounting expertise appears to be a meaningful construct for an error-finding (i.e., auditing) task. (b) Experience appears to be an adequate surrogate for computer-related expertise in an error-finding task. The success of the accounting expertise measure might have resulted from an appropriate level of problem difficulty; the task chosen in this study probably did not involve errors that auditors seldom encounter. The computer-related experience surrogate for expertise might have succeeded not only due to the appropriateness of the task, but also due to the amount of feedback spreadsheet builders probably receive while they work. Implications Both researchers and practitioners can benefit from several implications of this experiment. In general, this study has provided evidence that researchers should consider expertise and/or experience in several domains to study certain aspects of end-user computing performance. Besides the accounting domain, software is employed by end users of a variety of backgrounds, including engineers, architects, managers, and other professionals. Further, there is abundant opportunity for researchers to discover why even obvious, elementary errors in very simple, clearly-documented spreadsheets are so difficult to find. D.F. GALLETTA 92 et al. Practitioners need to recognize the sheer difficulty of the spreadsheet error-finding task, while these errors, shown elsewhere to be abundant, can have wildly significant effects on decision-making. CPA firm managers might need to ascertain that any client spreadsheets are examined (or at least reviewed) for errors by CPAs with an abundance of spreadsheet experience. Perhaps teams should meet to perform spreadsheet reviews, in a manner similar to programming code walkthroughs. Training programs might be developed to increase the levels of performance of spreadsheet reviewers. This study provides some evidence that effectiveness of the task of spreadsheet errorfinding is improved via accounting domain expertise, while efficiency of that task is improved via spreadsheet device expertise. Although prevention of errors would be much more effective than attempting to find them, it would be unrealistic to expect to find permanent preventative measures. As long as spreadsheet builders remain fallible, the spectre of enormous but latent spreadsheet errors will remain. Increased understanding of the detection of spreadsheet errors is an important step in minimizing their effects. Acknowledgements-The authors would like to thank Kirk Fiedler for his abundant assistance in material refinement and scoring validation, as well as the Pennsylvania Institute of CPAs Foundation for Education and Research and Alternative Computer Corporation for allowing CPA professional development course enrollees to serve as subjects in this study. This study was supported by a Faculty Research Grant to the first author from the Joseph M. Katz Graduate School of Business. REFERENCES Adelson, B. (1981). Problem solving and the development of abstract categories in programming languages. Memory and Cognition, 9, 422-433. Adelson, B., & Soloway, E. (1985). The role of domain experience in software design. IEEE Transactions on Software Engineering, SE-II, 1351-1360. Alavi, M., Nelson, R.R., & Weiss, I.R. (1987-88). Strategies for end-user computing: An integrative perspective. Journal of Management Information Systems, 4, 28-49. AMA (1988). Report on end-user and departmental computing. New York: American Management Association. Anderson, K., & Bernard, A. (1988). Spreading mistakes. Journal of Accounting and EDP, 42-45. Ashton, A.H. (1991). Expertise and error frequency knowledge as potential determinants of audit expertise. The Accounting Review, 66, 218-239. Ballou, D.P., Pazer, H.L., Belardo, S., & Klein, B. (1987). Implications of data quality for spreadsheet analysis. Data Base, 18, 13-19. Bishop, Y.M.M., Fienberg, S.E., & Holland, P.W. (1975). Discrete multivariate analysis: Theory andpractice. Cambridge, MA: MIT Press. Bonar, J., & Soloway, E. (1985). Programming knowledge: A major source of misconceptions in novice programmers. Human-Computer Interaction, I, 133-161. Bonner, S.E. (1990). Experience effects in auditing: The role of task-specific knowledge. The accounting review, 65, 72-92. Bonner, S.E., Lewis, B.L., & Marchant, G. (1990). Determinants of auditor expertise. Journal of Accounting Research, 28(Suppl.), l-28. Brancheau, J.C., Vogel, D.R., & Wetherbe, J.C. (1985). An investigation of the information center from the user’s perspective. Data Base, I7, 4-18. Brooks, F.P., Jr. (1987). No silver bullet: Essence and accidents of software engineering. Computer, 20, 10-19. Brooks, R. (1977). Towards a theory of the cognitive processes in computer programming. International Journal of Man-Machine Studies, 9, 737-75 1. Brown, P.S., &Gould, J.D. (1987). An experimental study of people creating spreadsheets. ACM Transactions on Office Information Systems, 5, 258-272. SPREADSHEET ERROR-FINDING 93 PERFORMANCE Chase, W.G., & Simon, H.A. (1973). Perception in chess. Cognitive Psychology, 4, 55-81. Chi, M.T.H., Glaser, R., & Rees, E. (1982). Expertise in problem solving. In R.J. Sternberg (Ed.), Advances in the Psychology of Human Intelligence, I (7-75). Hillsdale, NJ: Erlbaum. Creeth, R. (1985). Microcomputers’ spreadsheets: Their use and abuse. Journal of Accountancy, June, 92. Crosby, M.A., & Stelovsky, J. (1990). How do we read algorithms? A case study. Computer, January, 24-35. Davies, N., & Ikin, C. (1987). Auditing spreadsheets. Australian Accountant, December, 54-56. Davis, G.B. (1984). Caution: User-developed systems can be dangerous to your organization. Working Paper 8204, Management Information Systems Research Center, University of Minnesota, (rev. ed.). Davis, J.S., & Solomon, I. (1989). Experience, expertise, and expert-performance research in public accounting. Journal of Accounting Literature, 8, 150-164. Ditlea, S. (1987). Spreadsheets can be hazardous to your health. Personal Computing, January, 60-69. Ehrlich, K., & Soloway, E. (1984). An empirical investigation of the tacit plan knowledge in programming. In J.C. Thomas & M.L. Schneider (Eds.), Humanfactors in computer systems (113-133). Norwood, NJ: Ablex Publishing Company. Floyd, B.D., & Pyun, J. (1987). Errors in spreadsheet use. Working Paper #CRIS-167, NYU Center for Research on Information Systems. Frederick, D.M., & Libby, R. (1986). Expertise and auditors’ judgements of conjunctive events. Journal of Accounting Research, 270-290. Galletta, D.F., & Heckman, R. (1990). A role theory perspective on end-user Information Systems development. Research, I, 168-187. Galletta, D.F., & Hufnagel, E. (1992). A model of end-user computing policy: Determinants of organizational compliance, Information and Management, 22, l-18. Gould, J.D. (1975). Some psychological evidence on how people debug computer programs. International Jour- nal of Man-Machine Studies, 7, 151-182. Gould, J.D., Alfaro, L., Barnes, V., Finn, R., Grischkowsky, N., & Minuto, A. (1987a). Reading is slower from CRT displays than from paper: Attempts to isolate a single-variable explanation. Human Factors, 29, 269-299. Gould, J.D., Alfaro, L., Finn, R., Haupt, B., & Minuto, A. (1987b). Reading from CRT displays can be as fast as reading from paper. Human Factors, 29, 497-517. Gugerty, L., & Olson, G.M. (1986). Comprehension differences in debugging by expert and novice programmers. In E. Soloway & S. Iyengar, (Eds.), Empirical Studies of Programmers, Norwood, NJ: Ablex Publishing Company. Hamilton, R.E., & Wright, W.F. (1982). Internal control judgements and effects of experience: Replications and extensions. Journal of Accounting Research, 756-765. Johnson, E.J. (1988). Expertise and decision under uncertainty: Performance and process. In M.T.H. Chi, R. Glaser, & M.J. Farr (Eds.), The nature of expertise (pp. 209-228). Hillsdale, NJ: Erlbaum. Kee, R.C., & Mason, J.O., Jr. (1988). Preventing errors in spreadsheets. Internal Auditor, February, 42-47. Larkin, J.H. (1981). Enriching formal knowledge: A model for learning to solve textbook physics problems. In J.R. Anderson (Ed)., Cognitive skills and their acquisition (pp. 31 l-334). Hillsdale, NJ: Erlbaum. Larkin, J.H., McDermott, J., Simon, D.P., & Simon, H.A. (1980). Models of competence in solving physics problems. Cognitive Science, 4, 317-345. Lederer, A.L., & Sethi, V. (1986). Planning for end-user computing: Top-down or bottom-up? Information Mun- agement Review, 2, 25-34. Mackay, J.M., Barr, S.H., & Kletke, M.G. (1992). An empirical investigation of the effects of decision problem-solving processes. Decision Sciences, 23, 648-672. Mackay, J.M., & Elam, J.J. (1992). A comparative study of how experts and novices use a decision aid problems in complex knowledge domains. Information Systems Research, 3, 150-172. Mackay, J.M., & Lamb, C.W., Jr. (1991). Training needs of novices and experts with referent experience domain knowledge. Information and Management, 20, 183-189. Messier, W.F., Jr. (1983). The effect of experience and firm type on materiality/disclosure judgements. aids on to solve and task Journal of Accounting Research, Autumn, 611-618. Munro, M.C., Huff, S.L., & Moore, G. (1987-88). Expansion and control of end-user computing. Journal of Management Information Systems, 4, 5-27. Olson, J.R., & Nilsen, E. (1987-88). Analysis of the cognition involved in spreadsheet software interaction. Human- Computer Interaction, 3, 309-349. Oman, P.W., Curtis, R.C., & Nanja, M. (1989). Effects of programming experience in debugging The Journal of Systems and Software, 9, 197-207. Pearson, R. (1988). Lies, damned lies, and spreadsheets. Byte, December, 299-304. semantic errors. 94 D.F. GALLETTA Rockart, J.F., & Flannery, L.S. (1983). The management of end-user et al. Communications of the ACM, computing. 26, 176-184. B. (1976). Exploratory experiments in programmer behavior. International Journal of Man-Machine Studies, 5, 123-143. Shneiderman, B. (1992). Designing the user interface: Strategies for effective human-computer interaction (2nd ed.). Reading, MA: Addison-Wesley. Simon, D.P., & Simon, H.A. (1978). Individual differences in solving physics problems. In R. Siegler (Ed.), Children’s thinking: What develops? (pp. 325348). Hillsdale, NJ: Erlbaum. Shneiderman, Soloway, E., Adelson, B., & Ehrlich, K. (1988). Knowledge and processes in the comprehension of computer programs. In M.T.H. Chi, R. Glaser, & M.J. Farr (Eds.), The nature of expertise (pp. 129-152). Hillsdale, NJ: Erlbaum. Soloway, E., & Ehrlich, K. (1984). Empirical studies of programming knowledge. IEEE Transactions on Soft- ware Engineering, SE-IO, 595-609. Soloway, E., Ehrlich, K., Bonar, J., & Greenspan, J. (1982). What do novices know about programming? In A. Badre & B. Shneiderman (Eds.), Directions in human/computer interaction (pp.27-55). Norwood,NJ: Ablex Publishing Company. Stang, D. (1981). Spreadsheet disasters I have known. Information Center, 3, 26-30. Sumner, M., & Klepper, R. (1981). Information systems strategy and end-user application development. Data Base, Summer, 19-30. Vessey, I. (1985). Expertise in debugging computer programs: A process analysis. International Journal of Mun- Machine Studies, 23, 459-494. Vitalari, N.P., & Dickson, G.W. (1983). Problem-solving for effective systems analysis: An exploration. Com- munications of the ACM, 26, 948-956. Weidenbeck, S. (1986). Cognitive processes in program ical studies of programmers (pp. 48-51). Norwood, comprehension. In E. Soloway & S. Iyengar NJ: Ablex Publishing Company. (Eds.), Empir- Xenakis, J.W. (1981). 1-2-3: A dangerous language. Computerworld, March 9, 31. Youngs, E.A.(1974). Human errors inprogramming. International Journal of Man-Machine Studies, 6, 361-376. APPENDIX Example 1 2 3 4 5 6 I 8 9 10 11 12 13 14 15 16 17 18 19 A t1 of Spreadsheet #l, as seen and marked 0 NONCASH AND OTHER EXPENSES by subjects C January Noncaeh expeneee Depreciation - delivery van 5757 ________ Total noncash expenaes Other expenses Office aalariee Officers' ealariee Advertieing Office expense8 Rent Electricity Intereet 1: 1 Error 2: y $757 95 1,064 ________ 1,916 $105,579 85,326 45,372 16,532 14,974 7,132 3,956 5105,579 85,326 32,882 19,478 14,974 6,933 5,489 (PRESS ALT-X TO MOVE TO THE NEXT SPREADSHEET) Error February 2,397 Total noncaah and other expeneee 20 D I 95 SPREADSHEETERROR-FINDINGPERFORMANCE Exampleof Spreadsheet #l,withformulas illustrated A C B il 2 3 4 5 6 Noncaeh expenses Depreciation - delivery van Amortization - patents Amounts prepaid thin year for next year 1 8 9 10 11 12 13 14 15 16 17 18 19 20 NONCASH AND Total noncash expeneee Other expenses Office ealariee Officers' ealariee Advertising Office expeneee Rent Electricity Interest D OTHER EXPENSES 1 January Pebruary $757 95 1,545 ________ +C4+C5+C6 $757 95 1,064 --_----+D4+05+D6 $105,579 85,326 45,372 16,532 14,974 7,732 3,956 ------_Total noncaeh and other expensea @SUM(C4..C16) il-=ll*r== (PRESS ALT-X TO MOVE TO T?IBNEXT SPRBADSHBET) $105,579 85,326 32,882 19,478 14,974 6,933 5,489 -___-___ @SUM(D4..D16) ===I=IIP
© Copyright 2026 Paperzz