A Bayesian Approach to predict performance of a student

A BAYESIAN APPROACH TO PREDICT PERFORMANCE OF A STUDENT (BAPPS) : A Case with Ethiopian
Students
Rahel Bekele
&
Professor Dr. Wolfgang Menzel
Department of Informatik, University of Hamburg
Germany
[email protected]
[email protected]
Abstract
To predict student’s future performance is an essential step for the purpose of giving adequate assistance. When
prediction of performance is done by the instructors, it becomes complex because different students have different
preferences and personalities, thus making it a time consuming and often tedious process which can be notoriously
error-prone.
Predicting performance of a student is achieved by building a statistical model of the student. Bayesian approach to
predict student performance (BAPPS) is a system that uses a Bayes’ net to model the student and Bayes’ rule to infer
about the student’s performance and update the model accordingly.
Key Words
Bayesian approach, Modelling, machine learning, Student Performance, Stochastic Modelling
1.
Introduction
1.1 Some factors for low performance of Ethiopian Students
The various studies (Atkinson(2000); Caplan(2002); Georgiou(2002); Gonzalez(2002), Diaz, (2003)) which attempt
to explain low performance of students do so, by identifying elements that intervene with educational goals:
teaching/learning strategy (system causal factor), parents (family causal factors), teachers (academic causal factor),
and students (personal causal factor). The following factors are brief accounts for low performance that are related to
system, parents, teachers or personal causal factors.
English Language
In Ethiopia, although English is taught as a subject beginning in grade 1 and it is used as a medium of instruction
starting in the seventh grade, it is not adequate enough to let students easily understand spoken English as well as
written text. Almost all of the reference materials in secondary schools are in English. Researchers in this area
indicate that students have difficulty in understanding and using English. A study by Zaudneh, Darge and Nardos
(1989), for instance, revealed that 58% of a sample of First year students of Addis Ababa University had difficulty in
understanding their instructors’ communication in English.
Class Size
Although interactive class discussion is very important in the learning process, this situation is almost non existent
for various reasons, among them are the large size of the class. More over, when the class becomes very large,
teachers find it difficult to make arrangements to evaluate students in practical situations
Class Participation
Studies in this area indicate that parents of some students discourage question asking at home and use corporal
punishment instead of reasoning in matters of discipline. (Ringness 1971, Darge, 1989). As a result, many students
appear to have developed some attitudes and habits which may interfere with instructional processes. In class
1
situations, for example, it is usually difficult to get student to ask questions or participate in discussions. Students'
respect for instructors is sometime mixed with fear. Some of them hesitate to question their instructor's ideas, even
when the ideas are clearly debatable.
Individual Differences
In relation to individual differences, studies (Scharff(nd), Huitt(1997)) reveal that students’ backgrounds as well as
natural behavioral differences may not allow them for equal perception and understanding. Accordingly, it is
inevitable that some students retain faster and learn better than others. The way of taking lecture notes, solving
problems, studying and level of performance in examinations may vary from student to student. Students may also
lack a grasp of elementary concepts or skills such as quantitative or language skills, decline in achievement
motivation for adequate practice presumably due to a habit established at lower educational levels.
Inadequate Assistance
Lack of assistance by instructors to individual or group of students is also a major factor for low performance of
students. Tekeste(1990) indicated that teachers carry a weekly load of thirty periods in class sizes exceeding 70
students each and that they may have instructional responsibility to as many as 1000 students in a given term. For
such instructors, identification of those students who are in need. (i.e. low achievers) and provision of assistance is
difficult.
1.2
Statement of the Problem
In a conventional set up that is characterized by large class size, teachers or instructors can only tell the performance
of a student after he/she has taken tests. It is usually difficult to use other personal information(such as gender,
interest, achievement motivation, confidence, etc.) to predict the extent of performance of an individual student.
To manage the large amount of information and various possible combinations of students’ characteristics, assistance
is required from a decision support system. Systems that require instructors to hand_build a rule set to predict
performance assume that their students’ characteristics do not change over time. As the students‘ characteristics
change over time, these rule sets must be constantly tuned and refined by the instructor. This is a time consuming
and often tedious process which can be notoriously error-prone.
The problems with the manual construction of rule sets to predict performance point out the need for adaptive
methods for dealing with the task. It becomes important, therefore, to automatically analyze student characteristics
and to categorize/classify student based on observed data. This research has looked specifically at personal, social
and cultural features that may be indicative of performance
In relation to the foregoing discussions, the following research questions provided the specific focus of our study.
 What are the major/strong characteristics that may intervene with performance?
 How can one measure those characteristics?
 How can we automatically decide whether a student is low, high or average performer as a result of
values of the identified major characteristics?
Practically, it is often difficult to model the exact characteristics of a student with respect to some variables. Thus,
the procedure may include too much uncertainty. If those characteristics are uncertain, then this uncertainty will
transfer to the prediction, which may also result in poorly adapted categorization of performance.
The Bayesian approach was employed to tackle this problem since it was found to be a clear and manageable
language for expressing what we are certain and uncertain about. The research aimed at applying the bayes’ net
methodology in order to predict performance of a student based on values of identified characteristics.
The study is expected to indicate some important characteristics or personality variables to be considered in
determining performance. The outcomes are helpful to further extend the functionality of the student model
component of an Intelligent Tutoring System. i.e. the student model will contain aspects of the social and personal
characteristics as well as the predicted performance of the student, in addition to profiles regarding the student’s
level of knowledge,
2
2.
Related Works
There is a good deal of work in automatically generating probabilistic models. [4, 5, 10,16, 17, 21,]
The more domain specific work has focussed on probabilistic student models. ANDES [Conati, Larkin, and Van
Lehn, 97] uses a belief network to represent alternate plans that may be used to solve physics problems. Student
actions are used to update the probabilities of the respective plans. Martin and Vanlehn(1995) presented an On-Line
assessment of Expertise(OLAE) that collects data from student solving problems in introductory college physics and
analyzes the data with probabilistic methods that determine what knowledge the student is using and presents the
results of the analysis. For each problem, the system automatically creates a Bayesian Net that relates knowledge
represented as first-order rules, to particular actions, such as written questions. Using the resulting Bayesian
network, OLAE observes a student’s behavior and computes the probabilities that the student knows and uses each
of the rules.
The research presented by Murray (1998) infered a student model from performance data using a Bayesian belief
network. The belief network modeled the relationship between knowledge and performance for either test items or
task actions. The measure of how well a student knows a skill is represented as a probability distribution over skill
levels. Questions or expected actions are classified according to the same categories by the expected difficulty of
answering them correctly or selecting the correct action.
Continuing in this vein, we seek to employ the Belief network modeling technique to the problem of predicting
student performance.
3.
Approach / Methodology
(i)
Identification and measurement of characteristics
In order to contextualize the research problem, mathematics was selected as subject area. Major social and cultural
attributes likely to determine performance were obtained from discussion with experts and a number of surveys.
Accordingly, the attributes were grouped into two where group 1 contained attributes such as Student’s performance
in mathematics, English Language ability and Gender which were directly obtained from the student’s record. and
group 2 contained attributes such as attitude towards group work, interest for mathematics, achievement motivation,
self-confidence, and shyness. Those attributes in group 2 could not be directly measured, and questionnaires were
developed for the purpose of measurement.
The questionnaires for measuring the attributes in group 2 were developed in consultation with psychologists and
from the available literature. Likert1 scale was minimized into three categories and numbers were assigned as
follows. 3 : Strongly Agree, 2 : Agree to some extent, 1: Strongly Disagree.
The items were then confirmed to the definition of the scale for which they were written. Two consecutive pilots
survey were carried out before the final questionnaire was developed. With regard to scoring of items, scores from 3
to 1 were given for the positively worded items(high value) and 1 to 3 given for negatively worded items(low value).
In the second pilot survey, the items for measuring each variable were found to be reliable with coefficient of alpha 2
>= 0.77.
The necessary sample size was determined with respect to the probability of rare events. For example, getting a high
value for all the personality values is considered to be a rare event. From the pilot survey, the probability of getting a
student with high achievement motivation was the least (4/64). Therefore, this probability was used to calculate the
sample size. The following statistical formula was used to get the size of the sample.
1
Likert scale is a five point scale in which the interval between each point on the scale is assumed to be equal)
2
Coefficeint of Alpha (Cronbach Alpha coefficient) a measure of squared correlation between observed scores and
true scores. It is a value which shows the items‘ reliability of measurement.
3
n=
t 2 pq
= 514
d2
where t = 1.96 for alpha = 0.05 (95% accuracy)
p = 4/64 (the probability value of the rare event (probability of success)
q = 1-p (probability of failure)
d = .015 (the interval for which the probability of the rare event is expected to fall.)
The test subjects were students who are in their last year of their secondary schooling in one government senior
secondary school. A total of 514 data records were collected. While English and mathematics marks of three
consecutive semesters was collected from the record office, the other variables were measured by directly distributing
the questionnaire to students.
The following are possible values(outcomes) given for each variable.
Personality variable
Possible values (outcomes)
Gender
(Male, Female)
Attitude towards group work
(Positive, indifferent, Negative)
Interest for Mathematics
(Interested, indifferent, uninterested)
Achievement Motivation
(High, Medium, Low)
Self Confidence
(High, Medium, Low)
Shyness
(Extrovert, Medium, Introvert)
English performance
(Above satisfactory, Satisfactory, Below satisfactory)
Math performance
(Above satisfactory, Satisfactory, Below satisfactory
Table 1. Possible outcomes of identified variables
(iii)
Data Preparation
The bulk of the effort was invested in preparing the input for belief network investigation. The data was assembled,
integrated and cleaned up. Typographical errors in the data were avoided because each value of the attribute was an
SPSS generated one. Assuming the normal distribution, the mean within one standard deviation was used to group
the personality variables into three categories. The following table is a summary.
Personality variable
Interval (X±1S)
Attitude towards group work
(30.51,40.69)
Interest for Mathematics
(24.45,38.95)
Achievement Motivation
(30.2, 39.6)
Self Confidence
(32.22, 39.38)
Shyness
(22.79,35.61)
Mathematics mark
(0.0, 2.453)
English mark
(0.0,2.47)
Table 2: category of values
High valued
category
(> x+s)
>40
>38
>39
>39
>35
>=2.45
>=2.47
Average valued
category
(btn x-s and x+s)
30-40
24-38
30-39
32-39
22-35
0-2.44
0-2.47
Low valued
category
(< x-s)
< 30
<24
< 30
<32
<22
<0
<0
Each student was categorized for each personality variable into one of those values.
(iv)
3
Identification of Dependencies
Z scores for Mathematics and English marks were computed for the purpose of standardization
4
The observed 514 data were ordered in such a way that math performance is a function of gender, groupwork
attitude, interest for math, achievement motivation, self confidence, shyness and English performance. The format
of this data is also used later for the Naive Bayes Algorithm. A sample of the observed data is given below.
Gender
GWA
IM
ACHM
SC
Shyness
Eng. Perf
Math Perf.
Male
Positive
Indifferent Medium Medium medium
Satisfactory
Below Satisfactory
Male
Indifferent
Indifferent Medium Medium Introvert
Below Satisfactory
Below Satisfactory
Male
Indifferent
Interested
Medium Low
medium
Below Satisfactory
Satisfactory
Male
Indifferent
Indifferent Medium Medium medium
Below Satisfactory
Below Satisfactory
Male
Negative
Indifferent Medium Low
medium
Satisfactory
Above Satisfactory
Male
Indifferent
Indifferent Medium Medium Extrovert Below Satisfactory
Satisfactory
Male
Positive
Interested
Medium Medium Extrovert Above Satisfactory
Satisfactory
Male
Negative
Indifferent Medium Low
Introvert
Below Satisfactory
Below Satisfactory
female
Positive
Interested
High
Medium Extrovert Satisfactory
Above Satisfactory
Male
Indifferent
Indifferent Medium High
medium
Below Satisfactory
Below Satisfactory
female
Indifferent
Indifferent Low
Low
Introvert
Below Satisfactory
Below Satisfactory
Male
Indifferent
Indifferent Low
Low
medium
Above Satisfactory
Above Satisfactory
female
Indifferent
Interested
Medium Medium medium
Below Satisfactory
Above Satisfactory
Table 4: Sample of observed data
Achievemt
Motivation
Self Conf.
Shyness
English
Performance
Math
performance
Gender
Attitude towards group
work
Interest for Mathematics
Achievement Motivation
Self Confidence
Shyness
English Performance
Mathematics Performance
Table 5: Dependencies
Interest for
Math
Affects
Attitude
towards
group work
Gender
As we see in the observation, not all students who get their math marks above satisfactory have all the associated
values of the variables in excellent condition. For eg. a student might have his math mark above satisfactory but he
may be introvert or he studies hard for achievement but does not have interest. Out of the 4374 possible
combinations there are 514 observed data records. Based on extensive discussion with domain experts, the
relationship between characteristics is summarized in the table below
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
X
(Read as : Gender affects attitude towards group work, gender affects interest for math, etc.)
4.
Experiment
4.1.
Belief Network Modelling
(i)
Probability values
Bayesian networks are directed acyclic graphs in which nodes represent random variables and arcs represent direct
probabilistic dependencies among them. The structure of a Bayesian network is a graphical illustration of the
5
interactions among the set of variables that it models. Nodes of a Bayesian network are usually drawn as circles or
ovals.
In relation to this work, after identification and ordering of the variables, prior and posterior probability values were
computed. Each variable was inserted to a network as a node. Attempt was made to make the number of parents as
minimal as possible. The conditional probabilities were then fed to the network. The following illustrates the
simplest network drawn.
Figure 1 : Belief network modelling
Each node is described by a probability distribution conditional on its direct predecessors. Nodes with no
predecessors are described by prior probability distributions. For example, node math in the network above is
described by the prior probability distribution over its thee outcomes: Above satisfactory, Satisfactory and Below
Satisfactory. The other nodes are described by a probability distribution over their outcomes (eg. Positive, negative,
indifferent for attitude) conditional on the outcomes of their predecessor (node math).
Both the structure and the numerical probabilities are a mixture of expert knowledge and measurements of objective
frequency data. In more technical terms, if the math performance has 3 outcomes, then we need prior probabilities
of each outcome along with conditional probabilities for each of the variables. For instance, for variable “interest in
math”, we need to supply the following conditional probability values.
Above satisfactory
Interested
P(int/above)
Uninterested
P(unint/above)
Indifferent
P(indiff/above)
Table 5 : Conditional Probability Table
Satisfactory
P(int/satisfactory)
P(unint/satisfactory)
P(indiff/satisfactory)
Below satisfactory
P(int/below)
P(unint/below)
P(indiff/below)
The network allows for performing Bayesian inference, i.e., computing the impact of observing values of a subset of
the model variables on the probability distribution over the remaining variables. For example, by observing values of
variables, captured from student information, the model, allows us to compute the probability of performance .i.e
P(math performance|gender, groupworkattitude, interest for math, achievement motivation, selfconfidence,
shyness, english performance).
A more realistic model of the conditional probabilities is shown below
6
Figure 2 : More conditional probabilities
After the belief network was drawn probability values were updated by giving evidences. The first experiment was
carried out using the Genie belief network tool and later the Bayesian network in Java classes was used.
(http://www2.sis.pitt.edu/~genie/, http://sourceforge.net/projects/bnj/)
(ii)
Algorithm Selection and Evaluation
Many researchers state that belief updating in Bayesian Networks is computationally complex calling for use of
intelligence. There exist several efficient algorithms, however, that make belief updating in graphs consisting of tens
or hundreds of variables tractable. Pearl (1986) developed a message-passing scheme that updates the probability
distributions for each node in a Bayesian Network in response to observations of one or more variables. Lauritzen
and Spiegelhalter (1988), Jensen et al.(1990), and Dawid (1992) proposed an efficient algorithm that first transforms
a Bayesian Network into a tree where each node in the tree corresponds to a subset of variables in the original graph.
The algorithm then exploits several mathematical properties of this tree to perform probabilistic inference.
Several approximate algorithms based on stochastic sampling have also been developed. Of these, best known are
probabilistic logic sampling (Henrion 1998), likelihood sampling (Shachter & Peot 1990, Fung & Chang 1990), and
backward sampling (Fung & del Favero 1994).
In most practical networks of the size of tens or hundreds of nodes, Bayesian updating is rapid and takes between a
fraction of a second and a few seconds.
In our work, a number of algorithms were tested which are already incorporated in the Genie belief network tool.
The testing is done by comparing the inference done by the belief network and the already observed real data using a
small sample of observations. It was found that Probabilistic Logic sampling Algorithm predicts better (observed
and inferred are more or less the same) and hence it was adopted in the prototype development.
4.2 Naive Bayes Algorithm
Naive Bayesian classification is a simple probabilistic classification method. The term naive Bayes refers to the
fact that the probability model can be derived using Baye’s Theorem and that it incorporates strong independences
assumption that often have no bearing in reality, hence are (deliberately) naive. Depending on the nature of the
probability model, naive Bayes classifiers can be trained very efficiently in a supervised learning setting.
7
The Naive Bayes Algorithm in the Weka4 Data mining tool was also used in the prediction of student peformance.
The learning schemes takes a set of classified examples (above satisfactory, below satisfactory and satisfactory),
from which it is expected to earn a way of classifying unseen examples.
After making the necessary data format for feeding in the software, the algorithm was run. We found that 63% of
the data were correctly classified with the following confusion matrix.
classified as
Above
Satisfactory
Above_Satisfactory
34
Satisfactory
22
Below_Satisfactory
11
Table 6 : confusion matrix
Satisfactory
22
27
27
Below
Satisfactory
24
93
254
With respect to significant association between the 7 variables and performance, the best selected attributes were
interest in math and english perforamnce. Detailed experiment on the available data is currently being done.
4.3 Prototype development
An interface was developed in Java and programs written to automatically analyze the personal characteristics of
students and predict their performance of a student by consulting the belief network .
A total of 12 questions were selected for each of the variables from the already developed questionnaires and were
entered in the computer for use by the program. From the question answering session of the program, the system
gathers answers from the student and automatically calculates the values of each of the variables. The program then
consults the belief network tool for the probability of the student having high, low or average performance. The
system takes the one with higher probability and stores the information for use by the instructor.
Some screen dumps that indicate the features of the interface developed are described below.
When a new student starts the system, he/she will be asked to enter name, id, gender as shown in the following
screen shot.
Figure 3: a student enters here general information
4
http://www.cs.waikato.ac.nz/~ml/weka/
8
After the student fills in the above data, a second screen is displayed where the student fills answers to questions.
Figure 2 : Question and answers...
5.
Conclusions and Future Work
In examining the problem of prediction of performance, we have found that it is possible to automatically predict
students‘ performance and give the necessary help. Moreover by using an extensible classification formalism such as
Bayesian networks, it becomes possible to easily and uniformly integrate such knowledge into the learning task
Our experiments also show the need for methods aimed at predicting performance and exploration of more learning
algorithms. Finally we are also interested in extending this work to automatically recommend learning group
assignment.
6.
Acknowledgement
We would like to thank Professor Dr. Christiane Floyd and Professor Dr. Darge Wole for their unreserved assistance
during the course of this work.
7.
1.
2.
3.
4.
5.
6.
7.
References
Amberber Mengesha (1981). A Survey of the problems and Prospects of the Shift System as Applied to
Ethiopian Schools”, Ethiopian Journal of Education, 9(1.).
Atkinson, E. (2000). An Investigation into the relationship between teacher motivation and pupil motivation.
Educational Psychology, 20(1), 45-57.
Caplan, S. et al.(2002). Socioemotional factor contributing to adjustment among early entrance to college
students. Gifted Child Quarterly, 46(2). 124-134.
Collins, J.A. et al(nd). Adaptive Assessment using Granularity Hierarchies and Bayesian Nets.
Conati and VanLehn (1996). A student modeling framework for probabilistic on-line assessment of problem
solving performance. Proceedings of UM- 96, Fifth International Conference on User Modeling.
Darge Wole (1989) “The Reactions of Social Sciences First Year Students in Addis Ababa University to Moral
Dilemmas related to Academic Matters”, unpublished document.
Diaz(2003). Personal, Family, and Academic Factors Affecting Low Achievement in Secondary School.
Electronic Journal of Research in Educational Psychology and Psychopedagogy. 1(1) 43-66.
9
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
19.
20.
21.
22.
23.
24.
25.
Dawid, A. Philip (1979). Conditional independence in statistical theory. Journal of the Royal Statistical Society,
Series B (Methological), 41: PP. 1-31.
Fung, Robert & Kuo-Chu Chang (1990). Weighting and integrating evidence for stochastic simulation in
Bayesian networks. In Henrion, M., Shachter, R.D., Kanal, L.N. & Lemmer, J.F. (eds.) Uncertainty in Artificial
Intelligence 5. PP. 209-219.
Fung, Robert & Brendan del Favero (1994). Backward simulation in Bayesian networks. Proceedings of the
Tenth Conference on Uncertainty in Artificial Intelligence. PP. 227-234.
Georgiou, S. et al (2002). Teachers attributions of student failure and teacher behaviour toward the failing
student. Psychology in the Schools, 39(5), 583-596.
Gonzalez, J.A. et al (2002). A Structural Equation Model of Parental Involvement, Motivational and Aptitudinal
Characteristics, and Academic Achievement. Journal of Experimental Education.
Henrion, Max (1989). Some practical issues in constructing belief networks. Kanal, L.N., Levitt, T.S. &
Lemmer, J.F. (eds.), Uncertainty in Artificial Intelligence 3. PP. 161-173
Jensen, Finn V., Steffen L. Lauritzen & Kristian G. Olsen (1990). Bayesian updating in recursive graphical
models by local computations. Computational Staisticals Quarterly, 4:269-282.
Lauritzen, Steffen L. & David J. Spiegelhalter (1988). Local computations with probabilities on graphical
structures and their application to expert systems (with discussion). Journal of the Royal Statistical Society,
Series B (Methological), 50(2):157-224
Olga Goubanova )nd). Predicting segmental Duration Using Bayesian Belief Networks.
http://www.ssw4.org/papers/139.pdf – last visited on 9 Sept. 2004.
Sahami, Mehran et al (n.d) A Bayesian Approach to Filtering Junk E_Mail,
http://research.microsoft.com/~horvitz/junkfilter.htm
Shachter, Ross D. & Mark A. Peot (1990). Simulation approaches to general probabilistic inference on belief
networks. In Henrion, M., Shachter, R.D., Kanal, L.N. & Lemmer, J.F. (eds.) Uncertainty in Artificial
Intelligence 5. Elsevier Science Publishers B.V. (North Holland), pages 221-231.
Tekeste Negash(1990), The Crisis of Ethiopian Education: Some implications for Nation Building, Uppsala;
Uppsala University.
VanLehn, Kurt & Martin Joel (1995). Student Assessment using Bayesian Nets. In International Journal of
Human Computer Studies (1995) . Vol 42, 575-591.
Ye Chen and Yun Peng() An Extended Bayesian Belief Network Model of Multiagent
Systems for Supply Chain Management.
Murray, William R(1998). A Practical Approach to Bayesian Student Modeling. Proceedings of IST’98 4th
International Conference on Intelligent Tutoring System, 424-433
Zaudneh Y., Darge Wole and Nardos A.(1989) “ A Survey of the Teaching – Learning Situation in Institutions
of Higher Learnin in Ethiopia”, Unpublished Document
Scharff(nd) in http://hubel.sfasu.edu/courseinfo/TS/indivdiff.html [last visited on 11th Sept. 2004].
Huitt, W. (1997). Individual differences. Educational Psychology Interactive. Valdosta, GA: Valdosta State
University. [Last visited on11 Sept. 2004], http://chiron.valdosta.edu/whuitt/col/instruct/indiff.html.
10