A BAYESIAN APPROACH TO PREDICT PERFORMANCE OF A STUDENT (BAPPS) : A Case with Ethiopian Students Rahel Bekele & Professor Dr. Wolfgang Menzel Department of Informatik, University of Hamburg Germany [email protected] [email protected] Abstract To predict student’s future performance is an essential step for the purpose of giving adequate assistance. When prediction of performance is done by the instructors, it becomes complex because different students have different preferences and personalities, thus making it a time consuming and often tedious process which can be notoriously error-prone. Predicting performance of a student is achieved by building a statistical model of the student. Bayesian approach to predict student performance (BAPPS) is a system that uses a Bayes’ net to model the student and Bayes’ rule to infer about the student’s performance and update the model accordingly. Key Words Bayesian approach, Modelling, machine learning, Student Performance, Stochastic Modelling 1. Introduction 1.1 Some factors for low performance of Ethiopian Students The various studies (Atkinson(2000); Caplan(2002); Georgiou(2002); Gonzalez(2002), Diaz, (2003)) which attempt to explain low performance of students do so, by identifying elements that intervene with educational goals: teaching/learning strategy (system causal factor), parents (family causal factors), teachers (academic causal factor), and students (personal causal factor). The following factors are brief accounts for low performance that are related to system, parents, teachers or personal causal factors. English Language In Ethiopia, although English is taught as a subject beginning in grade 1 and it is used as a medium of instruction starting in the seventh grade, it is not adequate enough to let students easily understand spoken English as well as written text. Almost all of the reference materials in secondary schools are in English. Researchers in this area indicate that students have difficulty in understanding and using English. A study by Zaudneh, Darge and Nardos (1989), for instance, revealed that 58% of a sample of First year students of Addis Ababa University had difficulty in understanding their instructors’ communication in English. Class Size Although interactive class discussion is very important in the learning process, this situation is almost non existent for various reasons, among them are the large size of the class. More over, when the class becomes very large, teachers find it difficult to make arrangements to evaluate students in practical situations Class Participation Studies in this area indicate that parents of some students discourage question asking at home and use corporal punishment instead of reasoning in matters of discipline. (Ringness 1971, Darge, 1989). As a result, many students appear to have developed some attitudes and habits which may interfere with instructional processes. In class 1 situations, for example, it is usually difficult to get student to ask questions or participate in discussions. Students' respect for instructors is sometime mixed with fear. Some of them hesitate to question their instructor's ideas, even when the ideas are clearly debatable. Individual Differences In relation to individual differences, studies (Scharff(nd), Huitt(1997)) reveal that students’ backgrounds as well as natural behavioral differences may not allow them for equal perception and understanding. Accordingly, it is inevitable that some students retain faster and learn better than others. The way of taking lecture notes, solving problems, studying and level of performance in examinations may vary from student to student. Students may also lack a grasp of elementary concepts or skills such as quantitative or language skills, decline in achievement motivation for adequate practice presumably due to a habit established at lower educational levels. Inadequate Assistance Lack of assistance by instructors to individual or group of students is also a major factor for low performance of students. Tekeste(1990) indicated that teachers carry a weekly load of thirty periods in class sizes exceeding 70 students each and that they may have instructional responsibility to as many as 1000 students in a given term. For such instructors, identification of those students who are in need. (i.e. low achievers) and provision of assistance is difficult. 1.2 Statement of the Problem In a conventional set up that is characterized by large class size, teachers or instructors can only tell the performance of a student after he/she has taken tests. It is usually difficult to use other personal information(such as gender, interest, achievement motivation, confidence, etc.) to predict the extent of performance of an individual student. To manage the large amount of information and various possible combinations of students’ characteristics, assistance is required from a decision support system. Systems that require instructors to hand_build a rule set to predict performance assume that their students’ characteristics do not change over time. As the students‘ characteristics change over time, these rule sets must be constantly tuned and refined by the instructor. This is a time consuming and often tedious process which can be notoriously error-prone. The problems with the manual construction of rule sets to predict performance point out the need for adaptive methods for dealing with the task. It becomes important, therefore, to automatically analyze student characteristics and to categorize/classify student based on observed data. This research has looked specifically at personal, social and cultural features that may be indicative of performance In relation to the foregoing discussions, the following research questions provided the specific focus of our study. What are the major/strong characteristics that may intervene with performance? How can one measure those characteristics? How can we automatically decide whether a student is low, high or average performer as a result of values of the identified major characteristics? Practically, it is often difficult to model the exact characteristics of a student with respect to some variables. Thus, the procedure may include too much uncertainty. If those characteristics are uncertain, then this uncertainty will transfer to the prediction, which may also result in poorly adapted categorization of performance. The Bayesian approach was employed to tackle this problem since it was found to be a clear and manageable language for expressing what we are certain and uncertain about. The research aimed at applying the bayes’ net methodology in order to predict performance of a student based on values of identified characteristics. The study is expected to indicate some important characteristics or personality variables to be considered in determining performance. The outcomes are helpful to further extend the functionality of the student model component of an Intelligent Tutoring System. i.e. the student model will contain aspects of the social and personal characteristics as well as the predicted performance of the student, in addition to profiles regarding the student’s level of knowledge, 2 2. Related Works There is a good deal of work in automatically generating probabilistic models. [4, 5, 10,16, 17, 21,] The more domain specific work has focussed on probabilistic student models. ANDES [Conati, Larkin, and Van Lehn, 97] uses a belief network to represent alternate plans that may be used to solve physics problems. Student actions are used to update the probabilities of the respective plans. Martin and Vanlehn(1995) presented an On-Line assessment of Expertise(OLAE) that collects data from student solving problems in introductory college physics and analyzes the data with probabilistic methods that determine what knowledge the student is using and presents the results of the analysis. For each problem, the system automatically creates a Bayesian Net that relates knowledge represented as first-order rules, to particular actions, such as written questions. Using the resulting Bayesian network, OLAE observes a student’s behavior and computes the probabilities that the student knows and uses each of the rules. The research presented by Murray (1998) infered a student model from performance data using a Bayesian belief network. The belief network modeled the relationship between knowledge and performance for either test items or task actions. The measure of how well a student knows a skill is represented as a probability distribution over skill levels. Questions or expected actions are classified according to the same categories by the expected difficulty of answering them correctly or selecting the correct action. Continuing in this vein, we seek to employ the Belief network modeling technique to the problem of predicting student performance. 3. Approach / Methodology (i) Identification and measurement of characteristics In order to contextualize the research problem, mathematics was selected as subject area. Major social and cultural attributes likely to determine performance were obtained from discussion with experts and a number of surveys. Accordingly, the attributes were grouped into two where group 1 contained attributes such as Student’s performance in mathematics, English Language ability and Gender which were directly obtained from the student’s record. and group 2 contained attributes such as attitude towards group work, interest for mathematics, achievement motivation, self-confidence, and shyness. Those attributes in group 2 could not be directly measured, and questionnaires were developed for the purpose of measurement. The questionnaires for measuring the attributes in group 2 were developed in consultation with psychologists and from the available literature. Likert1 scale was minimized into three categories and numbers were assigned as follows. 3 : Strongly Agree, 2 : Agree to some extent, 1: Strongly Disagree. The items were then confirmed to the definition of the scale for which they were written. Two consecutive pilots survey were carried out before the final questionnaire was developed. With regard to scoring of items, scores from 3 to 1 were given for the positively worded items(high value) and 1 to 3 given for negatively worded items(low value). In the second pilot survey, the items for measuring each variable were found to be reliable with coefficient of alpha 2 >= 0.77. The necessary sample size was determined with respect to the probability of rare events. For example, getting a high value for all the personality values is considered to be a rare event. From the pilot survey, the probability of getting a student with high achievement motivation was the least (4/64). Therefore, this probability was used to calculate the sample size. The following statistical formula was used to get the size of the sample. 1 Likert scale is a five point scale in which the interval between each point on the scale is assumed to be equal) 2 Coefficeint of Alpha (Cronbach Alpha coefficient) a measure of squared correlation between observed scores and true scores. It is a value which shows the items‘ reliability of measurement. 3 n= t 2 pq = 514 d2 where t = 1.96 for alpha = 0.05 (95% accuracy) p = 4/64 (the probability value of the rare event (probability of success) q = 1-p (probability of failure) d = .015 (the interval for which the probability of the rare event is expected to fall.) The test subjects were students who are in their last year of their secondary schooling in one government senior secondary school. A total of 514 data records were collected. While English and mathematics marks of three consecutive semesters was collected from the record office, the other variables were measured by directly distributing the questionnaire to students. The following are possible values(outcomes) given for each variable. Personality variable Possible values (outcomes) Gender (Male, Female) Attitude towards group work (Positive, indifferent, Negative) Interest for Mathematics (Interested, indifferent, uninterested) Achievement Motivation (High, Medium, Low) Self Confidence (High, Medium, Low) Shyness (Extrovert, Medium, Introvert) English performance (Above satisfactory, Satisfactory, Below satisfactory) Math performance (Above satisfactory, Satisfactory, Below satisfactory Table 1. Possible outcomes of identified variables (iii) Data Preparation The bulk of the effort was invested in preparing the input for belief network investigation. The data was assembled, integrated and cleaned up. Typographical errors in the data were avoided because each value of the attribute was an SPSS generated one. Assuming the normal distribution, the mean within one standard deviation was used to group the personality variables into three categories. The following table is a summary. Personality variable Interval (X±1S) Attitude towards group work (30.51,40.69) Interest for Mathematics (24.45,38.95) Achievement Motivation (30.2, 39.6) Self Confidence (32.22, 39.38) Shyness (22.79,35.61) Mathematics mark (0.0, 2.453) English mark (0.0,2.47) Table 2: category of values High valued category (> x+s) >40 >38 >39 >39 >35 >=2.45 >=2.47 Average valued category (btn x-s and x+s) 30-40 24-38 30-39 32-39 22-35 0-2.44 0-2.47 Low valued category (< x-s) < 30 <24 < 30 <32 <22 <0 <0 Each student was categorized for each personality variable into one of those values. (iv) 3 Identification of Dependencies Z scores for Mathematics and English marks were computed for the purpose of standardization 4 The observed 514 data were ordered in such a way that math performance is a function of gender, groupwork attitude, interest for math, achievement motivation, self confidence, shyness and English performance. The format of this data is also used later for the Naive Bayes Algorithm. A sample of the observed data is given below. Gender GWA IM ACHM SC Shyness Eng. Perf Math Perf. Male Positive Indifferent Medium Medium medium Satisfactory Below Satisfactory Male Indifferent Indifferent Medium Medium Introvert Below Satisfactory Below Satisfactory Male Indifferent Interested Medium Low medium Below Satisfactory Satisfactory Male Indifferent Indifferent Medium Medium medium Below Satisfactory Below Satisfactory Male Negative Indifferent Medium Low medium Satisfactory Above Satisfactory Male Indifferent Indifferent Medium Medium Extrovert Below Satisfactory Satisfactory Male Positive Interested Medium Medium Extrovert Above Satisfactory Satisfactory Male Negative Indifferent Medium Low Introvert Below Satisfactory Below Satisfactory female Positive Interested High Medium Extrovert Satisfactory Above Satisfactory Male Indifferent Indifferent Medium High medium Below Satisfactory Below Satisfactory female Indifferent Indifferent Low Low Introvert Below Satisfactory Below Satisfactory Male Indifferent Indifferent Low Low medium Above Satisfactory Above Satisfactory female Indifferent Interested Medium Medium medium Below Satisfactory Above Satisfactory Table 4: Sample of observed data Achievemt Motivation Self Conf. Shyness English Performance Math performance Gender Attitude towards group work Interest for Mathematics Achievement Motivation Self Confidence Shyness English Performance Mathematics Performance Table 5: Dependencies Interest for Math Affects Attitude towards group work Gender As we see in the observation, not all students who get their math marks above satisfactory have all the associated values of the variables in excellent condition. For eg. a student might have his math mark above satisfactory but he may be introvert or he studies hard for achievement but does not have interest. Out of the 4374 possible combinations there are 514 observed data records. Based on extensive discussion with domain experts, the relationship between characteristics is summarized in the table below X X X X X X X X X X X X X X X X X X X X X X X X X X (Read as : Gender affects attitude towards group work, gender affects interest for math, etc.) 4. Experiment 4.1. Belief Network Modelling (i) Probability values Bayesian networks are directed acyclic graphs in which nodes represent random variables and arcs represent direct probabilistic dependencies among them. The structure of a Bayesian network is a graphical illustration of the 5 interactions among the set of variables that it models. Nodes of a Bayesian network are usually drawn as circles or ovals. In relation to this work, after identification and ordering of the variables, prior and posterior probability values were computed. Each variable was inserted to a network as a node. Attempt was made to make the number of parents as minimal as possible. The conditional probabilities were then fed to the network. The following illustrates the simplest network drawn. Figure 1 : Belief network modelling Each node is described by a probability distribution conditional on its direct predecessors. Nodes with no predecessors are described by prior probability distributions. For example, node math in the network above is described by the prior probability distribution over its thee outcomes: Above satisfactory, Satisfactory and Below Satisfactory. The other nodes are described by a probability distribution over their outcomes (eg. Positive, negative, indifferent for attitude) conditional on the outcomes of their predecessor (node math). Both the structure and the numerical probabilities are a mixture of expert knowledge and measurements of objective frequency data. In more technical terms, if the math performance has 3 outcomes, then we need prior probabilities of each outcome along with conditional probabilities for each of the variables. For instance, for variable “interest in math”, we need to supply the following conditional probability values. Above satisfactory Interested P(int/above) Uninterested P(unint/above) Indifferent P(indiff/above) Table 5 : Conditional Probability Table Satisfactory P(int/satisfactory) P(unint/satisfactory) P(indiff/satisfactory) Below satisfactory P(int/below) P(unint/below) P(indiff/below) The network allows for performing Bayesian inference, i.e., computing the impact of observing values of a subset of the model variables on the probability distribution over the remaining variables. For example, by observing values of variables, captured from student information, the model, allows us to compute the probability of performance .i.e P(math performance|gender, groupworkattitude, interest for math, achievement motivation, selfconfidence, shyness, english performance). A more realistic model of the conditional probabilities is shown below 6 Figure 2 : More conditional probabilities After the belief network was drawn probability values were updated by giving evidences. The first experiment was carried out using the Genie belief network tool and later the Bayesian network in Java classes was used. (http://www2.sis.pitt.edu/~genie/, http://sourceforge.net/projects/bnj/) (ii) Algorithm Selection and Evaluation Many researchers state that belief updating in Bayesian Networks is computationally complex calling for use of intelligence. There exist several efficient algorithms, however, that make belief updating in graphs consisting of tens or hundreds of variables tractable. Pearl (1986) developed a message-passing scheme that updates the probability distributions for each node in a Bayesian Network in response to observations of one or more variables. Lauritzen and Spiegelhalter (1988), Jensen et al.(1990), and Dawid (1992) proposed an efficient algorithm that first transforms a Bayesian Network into a tree where each node in the tree corresponds to a subset of variables in the original graph. The algorithm then exploits several mathematical properties of this tree to perform probabilistic inference. Several approximate algorithms based on stochastic sampling have also been developed. Of these, best known are probabilistic logic sampling (Henrion 1998), likelihood sampling (Shachter & Peot 1990, Fung & Chang 1990), and backward sampling (Fung & del Favero 1994). In most practical networks of the size of tens or hundreds of nodes, Bayesian updating is rapid and takes between a fraction of a second and a few seconds. In our work, a number of algorithms were tested which are already incorporated in the Genie belief network tool. The testing is done by comparing the inference done by the belief network and the already observed real data using a small sample of observations. It was found that Probabilistic Logic sampling Algorithm predicts better (observed and inferred are more or less the same) and hence it was adopted in the prototype development. 4.2 Naive Bayes Algorithm Naive Bayesian classification is a simple probabilistic classification method. The term naive Bayes refers to the fact that the probability model can be derived using Baye’s Theorem and that it incorporates strong independences assumption that often have no bearing in reality, hence are (deliberately) naive. Depending on the nature of the probability model, naive Bayes classifiers can be trained very efficiently in a supervised learning setting. 7 The Naive Bayes Algorithm in the Weka4 Data mining tool was also used in the prediction of student peformance. The learning schemes takes a set of classified examples (above satisfactory, below satisfactory and satisfactory), from which it is expected to earn a way of classifying unseen examples. After making the necessary data format for feeding in the software, the algorithm was run. We found that 63% of the data were correctly classified with the following confusion matrix. classified as Above Satisfactory Above_Satisfactory 34 Satisfactory 22 Below_Satisfactory 11 Table 6 : confusion matrix Satisfactory 22 27 27 Below Satisfactory 24 93 254 With respect to significant association between the 7 variables and performance, the best selected attributes were interest in math and english perforamnce. Detailed experiment on the available data is currently being done. 4.3 Prototype development An interface was developed in Java and programs written to automatically analyze the personal characteristics of students and predict their performance of a student by consulting the belief network . A total of 12 questions were selected for each of the variables from the already developed questionnaires and were entered in the computer for use by the program. From the question answering session of the program, the system gathers answers from the student and automatically calculates the values of each of the variables. The program then consults the belief network tool for the probability of the student having high, low or average performance. The system takes the one with higher probability and stores the information for use by the instructor. Some screen dumps that indicate the features of the interface developed are described below. When a new student starts the system, he/she will be asked to enter name, id, gender as shown in the following screen shot. Figure 3: a student enters here general information 4 http://www.cs.waikato.ac.nz/~ml/weka/ 8 After the student fills in the above data, a second screen is displayed where the student fills answers to questions. Figure 2 : Question and answers... 5. Conclusions and Future Work In examining the problem of prediction of performance, we have found that it is possible to automatically predict students‘ performance and give the necessary help. Moreover by using an extensible classification formalism such as Bayesian networks, it becomes possible to easily and uniformly integrate such knowledge into the learning task Our experiments also show the need for methods aimed at predicting performance and exploration of more learning algorithms. Finally we are also interested in extending this work to automatically recommend learning group assignment. 6. Acknowledgement We would like to thank Professor Dr. Christiane Floyd and Professor Dr. Darge Wole for their unreserved assistance during the course of this work. 7. 1. 2. 3. 4. 5. 6. 7. References Amberber Mengesha (1981). A Survey of the problems and Prospects of the Shift System as Applied to Ethiopian Schools”, Ethiopian Journal of Education, 9(1.). Atkinson, E. (2000). An Investigation into the relationship between teacher motivation and pupil motivation. Educational Psychology, 20(1), 45-57. Caplan, S. et al.(2002). Socioemotional factor contributing to adjustment among early entrance to college students. Gifted Child Quarterly, 46(2). 124-134. Collins, J.A. et al(nd). Adaptive Assessment using Granularity Hierarchies and Bayesian Nets. Conati and VanLehn (1996). A student modeling framework for probabilistic on-line assessment of problem solving performance. Proceedings of UM- 96, Fifth International Conference on User Modeling. Darge Wole (1989) “The Reactions of Social Sciences First Year Students in Addis Ababa University to Moral Dilemmas related to Academic Matters”, unpublished document. Diaz(2003). Personal, Family, and Academic Factors Affecting Low Achievement in Secondary School. Electronic Journal of Research in Educational Psychology and Psychopedagogy. 1(1) 43-66. 9 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. Dawid, A. Philip (1979). Conditional independence in statistical theory. Journal of the Royal Statistical Society, Series B (Methological), 41: PP. 1-31. Fung, Robert & Kuo-Chu Chang (1990). Weighting and integrating evidence for stochastic simulation in Bayesian networks. In Henrion, M., Shachter, R.D., Kanal, L.N. & Lemmer, J.F. (eds.) Uncertainty in Artificial Intelligence 5. PP. 209-219. Fung, Robert & Brendan del Favero (1994). Backward simulation in Bayesian networks. Proceedings of the Tenth Conference on Uncertainty in Artificial Intelligence. PP. 227-234. Georgiou, S. et al (2002). Teachers attributions of student failure and teacher behaviour toward the failing student. Psychology in the Schools, 39(5), 583-596. Gonzalez, J.A. et al (2002). A Structural Equation Model of Parental Involvement, Motivational and Aptitudinal Characteristics, and Academic Achievement. Journal of Experimental Education. Henrion, Max (1989). Some practical issues in constructing belief networks. Kanal, L.N., Levitt, T.S. & Lemmer, J.F. (eds.), Uncertainty in Artificial Intelligence 3. PP. 161-173 Jensen, Finn V., Steffen L. Lauritzen & Kristian G. Olsen (1990). Bayesian updating in recursive graphical models by local computations. Computational Staisticals Quarterly, 4:269-282. Lauritzen, Steffen L. & David J. Spiegelhalter (1988). Local computations with probabilities on graphical structures and their application to expert systems (with discussion). Journal of the Royal Statistical Society, Series B (Methological), 50(2):157-224 Olga Goubanova )nd). Predicting segmental Duration Using Bayesian Belief Networks. http://www.ssw4.org/papers/139.pdf – last visited on 9 Sept. 2004. Sahami, Mehran et al (n.d) A Bayesian Approach to Filtering Junk E_Mail, http://research.microsoft.com/~horvitz/junkfilter.htm Shachter, Ross D. & Mark A. Peot (1990). Simulation approaches to general probabilistic inference on belief networks. In Henrion, M., Shachter, R.D., Kanal, L.N. & Lemmer, J.F. (eds.) Uncertainty in Artificial Intelligence 5. Elsevier Science Publishers B.V. (North Holland), pages 221-231. Tekeste Negash(1990), The Crisis of Ethiopian Education: Some implications for Nation Building, Uppsala; Uppsala University. VanLehn, Kurt & Martin Joel (1995). Student Assessment using Bayesian Nets. In International Journal of Human Computer Studies (1995) . Vol 42, 575-591. Ye Chen and Yun Peng() An Extended Bayesian Belief Network Model of Multiagent Systems for Supply Chain Management. Murray, William R(1998). A Practical Approach to Bayesian Student Modeling. Proceedings of IST’98 4th International Conference on Intelligent Tutoring System, 424-433 Zaudneh Y., Darge Wole and Nardos A.(1989) “ A Survey of the Teaching – Learning Situation in Institutions of Higher Learnin in Ethiopia”, Unpublished Document Scharff(nd) in http://hubel.sfasu.edu/courseinfo/TS/indivdiff.html [last visited on 11th Sept. 2004]. Huitt, W. (1997). Individual differences. Educational Psychology Interactive. Valdosta, GA: Valdosta State University. [Last visited on11 Sept. 2004], http://chiron.valdosta.edu/whuitt/col/instruct/indiff.html. 10
© Copyright 2026 Paperzz