Professional Documents
Culture Documents
consideration like classification, clustering association available in the market. Moreover, with the help of it
rule mining and regression analysis. In almost every one can understand the importance of accurate
research paper, the only classification algorithm is information.
taken into consideration for predicting student
academic performance. Limitations of proposed system
There are so many classification techniques available i) It violates user privacy:
for prediction but we are taking into consideration It is a known fact that data mining collects information
only decision tree, Naive Bayes, Support Vector about people using some market base techniques and
Machine (SVM), Artificial Neural Networks (ANN), information technology. And these data mining process
K-Nearest Neighbor, SMO, Linear Regression, involves several numbers of factors. But while
Random Forest, Random Tree, REPTree, LAD Tree, involving those factors, data mining system violates
J48 etc gave a brief finding of different research the privacy of its user and that is why it lacks in the
papers with their author’s name, main attributes matters of safety and security of its users. Eventually, it
helpful for prediction accuracy with different data creates Mis-communication between people.
mining algorithm used. Mostly used data mining ii)Additional irrelevant information
algorithm for SAP is Decision Tree (DT), Naive The main functions of the data mining systems create a
Bayes (NB), Artificial Neural Networks (ANN), relevant space for beneficial information. But the main
Rule-based (RB) and K-Nearest Neighbor (KNN). In problem with the information collection is that there is
Decision tree algorithm the maximum and minimum a possibility that the collection of information process
accuracy for predicting student’s academic can be little overwhelming for all. Therefore, it is very
performance are 99.9% and 66.8% respectively. To much essential to maintain a minimum level of limit
find the maximum prediction accuracy Maria Goga, for all the data mining techniques.
Shade Kuyoro and Nicolae Goga used the iii)Misuse of information
combination of student’s attribute like family, PEP, As it has been explained earlier that in the data mining
EES, end of first session result. In Naive Bayes system the possibility of safety and security measure
algorithm, the maximum and minimum accuracy for are really minimal. And that is why some can misuse
predicting student's academic performance are 100% this information to harm others in their own way.
and 63.3% respectively. Maria Koutina et. al. used Therefore, the data mining system needs to change its
the different combination of student's attribute like course of working so that it can reduce the ratio of
Gender, Age, Marital Status, Number of children, misuse of information through the mining process.
Occupation, Job associated with the computer. iv)An accuracy of data
Most of the time while collecting information about
Advantages of the proposed system certain elements one used to seek the help from their
clients, but nowadays everything has changed. And
i) It is helpful to predict future trends now the process of information collection made things
Most of the working nature of the data mining easy with the mining technology and their methods.
systems carries on all the informational factors of the One of the most possible limitation of this data mining
elements and their structure. One of the common system is that it can provide accuracy of data with its
benefits that can be derived with these data mining own limits. Finally the bottom line is that all the
systems is that they can be helpful while predicting techniques, methods and data mining systems help in
future trends. And that is quite possible with the help discovery of new creative things. And at the end of this
of technology and behavioral changes adopted by the discussion about the data mining methodology, one can
people. clearly understand the feature, elements, purpose,
ii)Helps in decision making characteristics and benefits with its own limitations.
There are some people who make use of these data
mining techniques to help them with some kind of Predicting students' performance in distance
decision making. Nowadays, all the information about learning using machine learning techniques
anything can be determined easily with the help of
technology and similarly, with the help of such The ability of prediction of a student’s performance
technology one can make a precise decision about could be useful in a great number of different ways
something unknown and unexpected. associated with university-level distance learning.
iii) Quick fraud detection Students’ key demographic characteristics and their
Most parts of the data mining process is basically marks in a few written assignments can constitute the
from information gathered with the help of marketing training set for a supervised machine learning
analysis. With the help of such marketing analysis one algorithm. The learning algorithm could then be able to
can also find out those fraudulent acts and products predict the performance of new students thus becoming
a useful tool for identifying predicted poor performers. Regarding the INF10 module of HOU during an
The scope of this work is to compare some of the state academic year the students have to hand in 4 written
of art learning algorithms. Two experiments have been assignments, participate in 4 optional face to face
conducted with six algorithms, which were trained meetings with their tutor and sit for final examinations
using data sets provided by the Hellenic Open after an 11-month period. The students’ marking
University. Among other significant conclusions it was system in Hellenic Universities is the 10-grade
found that the Naive Bayes algorithm is the most system. A student with a mark >=5 ‘passes’ a lesson
appropriate to be used for the construction of a or a module while a student with a mark <5 ‘fails’ to
software support tool, has more than satisfactory complete a lesson or a module. Key demographic
accuracy, its overall sensitivity is extremely characteristics of students (such as age, sex, residence
satisfactory and is the easiest algorithm to implement etc) and their marks in written assignments
this method uses existing ML techniques in order to constituted the initial training set for a supervised
predict the students’ performance in a distance learning algorithm in order to predict if a certain
learning system. It compares some of the state of art student will eventually pass or not a specific module.
learning algorithms to find out which algorithm is A total of 354 instances (student’s records) have been
more appropriate not only to predict student’s collected out of the 498 who had registered with
performance accurately but also to be used as an INF10. Two separate experiments were conducted.
educational supporting tool for tutors. To the best of The first experiment used the entire set of 354
our knowledge, there is no similar publication in the instances for all algorithms while the second
literature. Whittington (1995) studied only the factors experiment used only a small set of 28 instances
that impact on the success of distance education corresponding to the number of students in a tutor’s
students of the University of the West Indies. For the class.
purpose of our study the ‘informatics’ course of the The application of Machine Learning Techniques in
Hellenic Open University (HOU) provided the training predicting students’ performance proved to be useful
set for the ML algorithms. for identifying poor performers and it can enable
The basic educational unit at HOU is the module and tutors to take remedial measures at an earlier stage,
a student may register with up to three modules per even from the beginning of an academic year using
year. The ‘informatics’ course is composed of 12 only students’ demographic data, in order to provide
module and that leads to Bachelor Degree. The total additional help to the groups at risk. The probability
number of registered students with the course of of more accurate diagnosis of students’ performance
informatics in the academic year 2000-1 was 510. Of is increased as new curriculum data has entered
those students 498 (97.7%) selected the module during the academic year, offering the tutors more
Introduction to Informatics (INF10). This fact enabled effective results. Some very basic Machine Learning
the authors to focus on INF10 and collect data only definitions are given in discusses the data collection
from the tutors involved in this module. The tutor in a process, the attribute selection, the algorithm selection
distance-learning course has a specific role. Despite and the research design. Section 4 presents the
the distance between him/her and his/her students, experiment results for both experiments and all six
he/she has to teach, evaluate and continuously support algorithms and at the same time compares these
them. The communication between them by post, results using as criteria the overall sensitivity,
telephone, e-mail, through the written assignments or accuracy and specificity of the algorithms. Finally,
at optional consulting meetings helps the tutor to section 5 discusses the conclusions and some future
respond to this complex role. In all circumstances, the research directions.
tutor should promptly solve students’ educational
problems, discuss in a friendly way the issues that Description of the used machine learning techniques
distract them, instruct their study, but most of all
encourage them to continue their studies, Murthy (1998) provides a recent overview of existing
understanding their difficulties and effectively work in decision trees. Decision trees are trees that
supporting them. classify instances by sorting them based on attribute
Furthermore, tutors have to give them marks, values. Each node in a decision tree represents an
comments and advice on the written assignments and attribute in an instance to be classified, and each
they have to organize and carry out the face-to-face branch represents a value that the node can take.
consulting meetings. For all above mentioned reasons, Instances are classified starting at the root node and
it is important for the tutors to be able to recognize sorting them based on their attribute values. The main
and locate students with high probability of poor advantage of decision trees in particular and
performance (students at risk) in order to take hierarchical methods in general, is that they divide the
precautions and be better prepared to face such cases. classification problem into a sequence of sub problems
which are, in principle, simpler to solve than the data classes. Maximizing the margin, and thereby
original problem. The attribute that best divides the creating the largest possible distance between the
training data would be the root node of the tree. The separating hyperplane and the instances on either side
algorithm is then repeated on each partition of the of it, is proven to reduce an upper bound on the
divided data, creating sub trees until the training data expected generalization error (Burges, 1998).
are divided into subsets of the same class. Artificial Nevertheless, most real- world problems involve non-
Neural Networks (ANNs) are another method of separable data for which no hyperplane exists that
inductive learning based on computational models of successfully separates the positive from negative
biological neurons and networks of neurons as found in instances in the training set. One solution to the
the central nervous system of humans (Mitchell, 1997). inseparability problem is to map the data into a higher-
A multi-layer neural network consists of large number dimensional space and define a separating hyperplane.
of units (neurons) joined together in a pattern of This higher-dimensional space is called the feature
connections. Units in a net are usually segregated into space, as opposed to the input space occupied by the
three classes: input units, which receive information to training instances. Generally, with an appropriately
be processed, output units where the results of the chosen feature space of sufficient dimensionality, any
processing are found, and units in between called consistent training set can be made separable.
hidden units. Classification with a neural network takes
place in two distinct phases. First, the network is Analysis
trained on a set of paired data to determine the input-
output mapping. The weights of the connections This project aims to fill the gap between empirical
between neurons are then fixed and the network is used prediction of student performance and the existing
to determine the classifications of a new set of data. ML techniques. To this end, six ML algorithms have
Naive Bayes classifier is the simplest form of Bayesian been trained and found to be useful tools for
network (Domingos and Pazzani, 1997). This identifying predicted poor performers in an open and
algorithm captures the assumption that every attribute distance learning environment. With the help of
is independent from the rest of the attributes, given the machine learning methods the tutors are in a position
state of the class tribute. Naive Bayes classifiers to know which of their students will complete a
operate on data sets where each example x consists of module or a course with sufficiently accurate
attribute values <a1, a2 ... ai> and the target function precision. This precision reaches the 62% in the initial
f(x) can take on any value from a pre-defined finite set forecasts, which are based on demographic data of the
V= (v1, v2 ... vj). Instance-based learning algorithms students and exceeds the 82% before the final
belong in the category of lazy-learning algorithms examinations. Our data set is from the module
(Mitchell, 1997), as they defer in the induction or ‘Introduction in informatics but most of the
generalization process until classification is performed. conclusions are wide-ranging and present interest for
One of the most straightforward instance based the majority of programs of study of the Hellenic
learning algorithms is the nearest neighbour algorithm Open University. It would be interesting to compare
(Aha, 1997). our results with those from other open and distance
K-Nearest Neighbour (KNN) is based on the principal learning programs offered by other open Universities.
that the instances within a data set will generally exist So far, however, we have not been able to locate such
in close proximity with other instances that have results.
similar properties. If the instances are tagged with a Two experiments were conducted using data sets of
classification label, then the value of the label of an 354 and 28 instances respectively. The above
unclassified instance can be determined by observing accuracy was the result of the first experiment with
the class of its nearest neighbors. Logistic regression the large data set; however, the overall accuracy of the
analysis (Long, 1997) extends the techniques of second experiment for all algorithms was less
multiple regression analysis to research situations in satisfactory than the accuracy in the 1st experiment.
which the outcome variable (class) is categorical. The The 28 instances are probably few if we want more
relationship between the classifier and attributes. The accurate precision. After a number of experiments
dependent variable (class) in logistic regression is with different number of instances as training set, it
binary, that is, the dependent variable can take the seems that at least 70 instances are needed for a better
value 1 with a probability of success pi, or the value 0 predictive accuracy (70.51% average prediction
with probability of failure 1- pi. Comparing these two accuracy for the Naive Bayes algorithm).Besides the
probabilities, the larger probability indicates the class overall accuracy of the algorithms, the differences
label value that is more likely to be the actual label. between sensitivity and specificity are quite
The SVM technique revolves around the notion of a reasonable since ‘pass’ represents students who
‘margin’, either side of a hyperplane that separates two completed the INF10 module getting a mark of 5 or
more in the final test, while ‘fail’ represents students code. System design is the process of defining the
that suspended their studies during the academic year architecture, modules, interfaces, and data for a system
(due to personal or professional reasons or due to to satisfy specified requirements. Systems design could
inability to hand in 2 of the written assignments), as be seen as the application of systems theory to product
well as students who did not show up in the final development. There is some overlap with the
examination. Furthermore, ‘fail’ also represents disciplines of systems analysis, systems architecture
students who sit for the final examination and get a and systems engineering. The system includes
mark less than 5. Furthermore, the analysis of the Architectural design, logical design and physical
experiments and the comparison of the six algorithms design.
has demonstrated sufficient evidence that the Naive The various modules in our system is user, training,
Bayes algorithm is the most appropriate to be used for testing, ensembling and data visualization. The
the construction of a software support tool. The hardware requirements of our stem is a processor with
overall accuracy of the Naive Bayes algorithm was 4 GB RAM and the software requirements are Python
more than satisfactory (72.48% for the first is the preferred language for coding and for data
experiment) and the overall sensitivity was extremely visualization we use math plot lib. The python libraries
satisfactory (78.00% for the first experiment). are available in module named flask. PHP is used for
Moreover, the Naive Bayes algorithm is the easiest to server-side scripting. Software named pandas is used
implement among the tested algorithms. for developing the artificial neural networks (ANN).
Future work we intend to study if the use of more JSON modules are also linked with python libraries.
sophisticated approaches for discretization of the For database creation My SQL is used. Prototyping
marks of WRIs such as the one suggested by Fayyad model is followed.
and Irani (1993) could increase the classification
accuracy. In addition, because of FTOFs did not add V.DESIGN AND IMPLEMENTATION
accuracy and the run time of inductive algorithms
grows with the number of attributes, we will examine Training
if the selection of a subset of attributes could be useful.
Finally, since with the present work we can only Here we provide the data set which contains internal
predict if a student passes the module or not we intend marks, assignment marks, results of previous semester
to try to use regression methods in order to predict the and attendance. We train our model according to our
student’s marks instead. input data by supervised learning technique. Here the
output/result is already provided and is known. And
IV. METHODOLOGY we have to check whether the model is giving correct
result based on the training data. For example we have
The main aim of our project is to produce a model with to predict the semester 5 result which is already
better efficiency and to ensure correctness. We try to known. So we have results from semester 1 to 5 and
implement the system in such a way that error rate is the results from semester 1 to 4 is used for training
very less and also to make the model flexible to make and 5 is used for validation and hence we can check
necessary corrections in case of any error identified. the correctness.
Also we use ensembling method to produce improved
machine learning results .Thus ensuring the correctness Testing
of the model.
Testing is done based on various machine learning
System Study algorithms. This module is intented to check and
ensure the correctness of the model that is how correct
System study of a project includes system analysis and the prediction was and the error rate is also calculated.
system design. System analysis and design includes all Modifications are done based on the error rates. Error
activities, which help the transformation of comparisons with using various algorithms is done in
requirement specification into implementation. this module. For example we have to check the
Requirement specifications specify all functional and semester 6 result which is already known by us but
non-functional expectations from the software. These not provided in training set. We provide results till 5th
requirement specifications come in the shape of human semester only. So to do this we’ll give results from
readable and understandable documents, to which a semester 1 to 4 as training and 5th semester result will
computer has nothing to do. System analysis and be used for validation. Here system will predict a
design is the intermediate stage, which helps human- whether a student passed or failed. So classification
readable requirements to be transformed into actual task is carried out and classification algorithm is
Fig 4.1 A Sample Visualization iv) Probabilities of the item belonging to all classes are
compared and the class with the highest probability if
selected as a result.
The advantages of using this method include its
simplicity and easiness of understanding. In addition to
that, it performs well on the data sets with irrelevant
features, since the probabilities of them contributing to
the output are low. Therefore they are not taken into
account when making predictions. Moreover, this
algorithm usually results in a good performance in
terms of consumed resources, since it only needs to
calculate the probabilities of the features and classes,
there is no need to find any coefficients like in other
algorithms. As already mentioned, its main drawback
is that each feature is treated independently, although
in most cases this cannot be true.
margin
Algorithm
Complexity term. So, we choose the set of organization to weigh possible actions against one
hyperplanes, so f(x) = (w ⸱ x) +b: another based on their costs, probabilities, and
benefits. They can be used either to drive informal
discussion or to map out an algorithm that predicts the
best choice mathematically. A decision tree typically
starts with a single node, which branches into possible
SVMs are generally able to result in good accuracy,
outcomes. Each of those outcomes leads to additional
especially on “clean” datasets. Moreover, it is good
nodes, which branch off into other possibilities. This
with working with the high-dimensional datasets, also
gives it a treelike shape. There are three different
when the number of dimensions is higher than the
types of nodes: chance nodes, decision nodes, and end
number of the samples. However, for large datasets
nodes. A chance node, represented by a circle, shows
with a lot of noise or overlapping classes, it can be
the probabilities of certain results. A decision node,
more effective. Also, with larger datasets training time
represented by a square, shows a decision to be made,
can be high.
and an end node shows the final outcome of a
decision path.
Decision Trees
2) Split the dataset and calculate the entropy of each
branch. Then calculate the information gain of the
split that is the differences in the initial entropy and
the proportional sum of the entropies of the branches.
3) The attribute with the highest Gain value is selected
as the decision node.
4) If one of the branches of the selected decision node
has an entropy of 0, it becomes the leaf node. Other
branches require further splitting.
5) The algorithm is run recursively until there is
nothing to split anymore.
VI. CONCLUSION
The success of machine learning in predicting student
performance relies on the good use of the data and
machine learning algorithms. Selecting the right
machine learning method for the right problem is Fig 5.1 Visualization of Decision tree
necessary to achieve the best results.However, the
algorithm alone cannot provide the best prediction
results. Results of both data sets show similarities and
differences with their use in the original studies. In the
first data set, similarity is that recall values were
consistently higher than precision values. Difference
was in the accuracy values. The accuracy reached in
this thesis was higher than in the original research.
This can be attributed to the difference in dependent
variables. In this project the upcoming semester exam
result of student is predicted based on certain features
like attendance, internal marks and previous semester
results. The proposed system will give the real time
implementation of result prediction system we have
used real time data in training and testing stages.Data Fig 5.2 Result of SVM Algorithm
Visualization was used to do the analysis graphically
also it made comparison easier. Correlation was also
done by using Data Visualization. This project can also
be extended for similar applications in future.
REFERENCES
[1] Using Machine Learning to Predict Student
Performance, University of Tampere Faculty of
Natural Sciences Software Development M. Sc.
Thesis Supervisor: Jorma Laurikkala June 2017
[2] Student Performance Prediction via Online
Learning Behavior Analytics Wei Zhang, Xujun
Huang*, Shengming Wang, Jiangbo Shu, Hai
Liu, Hao Chen National Engineering Research
Center for E-Learning, Central China Normal
University, Wuhan 430079, China,2017
[3] Analysis of student performance based on
classification and mapreduce approach in bigdata,
Dr.R Senthil Kumar1, Jithin Kumar.K.P2
1,2Department of Computer Science, Amrita
School of Arts and Sciences, Amrita Vishwa
Vidyapeetham, MysuruCampus Mysuru,
India sen07mca@gmail.com1,
jithinkumarkp18
[4]http://www.academia.edu/24311300/Predictin
g_Students_Aca
demic_Perfomace_using_Naive_Bayes_Algorith
m
[5] Student’s Performance Prediction in Education