Predict Students' Performance Using Educational Data Mining

Prediction of Students Performance using
Educational Data Mining

Ms.Tismy Devasia1 ,Ms.Vinushree T P2, Mr.Vinayak Hegde3
Department of Computer Science
Amrita Vishwa Vidyapeetham University,Mysuru Campus
Mysuru, Karnataka, India
tismydevasia1993@gmail.com, vinushree7792@gmail.com,vinayakhegde92@gmail.com
Abstract—Data mining plays an important role in the

business world and it helps to the educational institution to (a) Generation of data source of predictive variables.
predict and make decisions related to the students’
academic status. With a higher education, now a days (b) Identification of various features or factors which affects
dropping out of students’ has been increasing, it affects not the performance of student’s learning during academic career.
only the students’ career but also on the reputation of the
institute. The existing system is a system which maintains (c) Construction of a prediction model with the help of
the student information in the form of numerical values classification data mining techniques on the basis of predictive
and it just stores and retrieve the information what it variables which is readily identified.
contains. So the system has no intelligence to analyze the
data. The proposed system is a web based application (d) Validation of the model which is developed for
which makes use of the Naive Bayesian mining technique Universities with students’ performance.
for the extraction of useful information. The experiment is
conducted on 700 students’ with 19 attributes in Amrita Data Mining can be put in the educational field to extend our
Vishwa Vidyapeetham, Mysuru. Result proves that Naive understanding of learning process by identifying the variables
Bayesian algorithm provides more accuracy over other and evaluating them. Mining in the field of educational
methods like Regression, Decision Tree, Neural networks environment is known as the Educational Data Mining.
etc., for comparison and prediction. The system aims at Students’ attendance in class, hours spent on a daily basis after
increasing the success graph of students using Naive college, family income, mother's age and mother's education
Bayesian and the system which maintains all student of the students are significantly related to student
admission details, course details, subject details, student performance. By means of naive Bayes’ model, it has been
marks details, attendance details, etc. It takes student’s found that the factors like mother’s qualification and income
academic history as input and gives students’ upcoming of the family are highly correlated with the performance of the
performances on the basis of semester. student. Data mining techniques economically offer more
customized education, improved system efficiency, and reduce
the education process expenses for universities. This guide us
Keywords—Educational Data Mining, Predicting student
to extend student retention rate, increase academic
performance, Naive Bayes’, Dropout, Classification.
achievements in case of student learning end result. Data
I. INTRODUCTION mining prediction technique helps to spot out the most
effective factors which work with student’s test score and then
Analyzing the huge amount of data to form summarized useful tune these factors to make better student test performance. It
information is a tedious task for human kind. Data Mining is provides a new way to look into the education system which
the area which analyzes huge repositories of data to extract was hidden from humankind.
necessary or useful information. Computers can process any
kind of data like numbers, texts, images and facts. This task
performs the analysis based on the patterns, association, II. RELATED WORK
relations among all these data so as to get the information.
J K Jothi and K Venkatalakshmi conducted the students’
The prediction with high accuracy in students' performance is
performance analysis on the graduate students’ data collected
beneficial as it helps in identifying the students with low
from the Villupuram college of Engineering and Technology.
academic achievements at the early stage of acdemics. In
The data included five year period and applied clustering
universities, student retention is related to academic
methods on the data to overcome the problem of low score of
performance and enrollment system. The steps to assist the
graduate students, and to raise students academic performance
low academic performers with better education are:
[1].
the conclusion that most of the papers adopt prediction than
Sheik and Gadage have done the analysis related to the student relationship mining [10].
learning behavior by using different data mining models,
namely classification, clustering, decision tree, sequential ElGamal A F presented a study for predicting student
pattern mining and text mining. They used open source tools performance in a programming course. Here the data is
such as KNIME (Konstanz Information Miner), collected from the department of computer science from
RAPIDMINER, WEKA, CARROT, ORANGE, R- Mansoura University and applied extract rules for predicting
Programming, and iDA. These tools have different students’ performance in programming course [11].
compatibilities and it provided an insight into the prediction
and evaluvation [2]. Angeline D M conducted a study on the students’ performance
by using Apriori algorithm that extracts the set of rules
Mythili M S and Shanavas A R applied classification specific to every category and analyze the given knowledge to
algorithms to analyze and evaluate school students’ classify the scholar based on their involvement in assignment,
performance using weka. They came with various internal assessment test, group action etc. It helps to identify
classification algorithms, namely J48, Random Forest, the students’ performance range like average, below average,
Multilayer perception, IBI and decision table with the data and good performance [12].
collected from the student management system [3].
Bhise, Thorat and Supekar presented a method using K-means
Dinesh A and Radhika V targeted on the techniques and clustering algorithm by describing it step by step. This paper
strategies of instructional data processing for data discovery mainly focused on reducing drop-out-ratio of the students and
from the information collected from various universities. This improve it by considering the evaluation factors like midterm
paper stated that relationship mining was leading between and final exam assignment. They considered different
1995 and 2005 and in 2008 to 2009 it slipped to 5th place. clustering techniques namely hierarchical, partitions, and
During the period 2008 to 2015 45% papers are moving to categorical [13].
prediction. The prediction model acts like a warning system to
improve their performance [4]. Remesh, Parkavi, and Yasodha conducted a study on the
placement chance prediction by investigating the different
Osmanbegovic and Suljic conducted a study for investigating techniques such as Naive Bayes Simple,
students’ future performance in the end semester results at the MultiLayerPerception, SMO, J48, and REPTree by its
University of Tuzla. They considered 11 factors and used accuracy. From the result they concluded that
classification model with highest accuracy for naive Bayes [5]. MultiLayerPerception technique is more suitable than other
algorithms [14].
Suyal and Mohod applied the association and classification
rule to identify the students’ performance. They mainly Tair M M A and El-Halees presented a case study with a set
focused to find the students who need special attention to of data collected from degree holders of college ‘Science and
reduce failure rate [6]. Technology, Khanyounis’, during the period of 1993 to 2007.
They used two classification methodologies such as Rule
Noah, Barida and Egerton conducted a study to evaluate Induction and Naive Bayesian classifier to forecast the grades
students’ performance by grouping the grading into various of the students [15].
classes using CGPA. They used different methods like Neural
network, Regression and K-means to identify the weak III. METHODOLOGY
performers for the purpose of performance improvement [7]. Data mining is the knowledge discovery process from a
huge data volume. The mechanism works in large dataset
Baradwaj and pal described data mining techniques that help where the student performance in the end semester
in early identification of student dropouts and students who examination is evaluated.
need special attention. Here they used a decision tree by using A. Data preparation
information like attendance, class test, semester and Student related data were collected from the college Amrita
assignment marks [8]. School of Arts and Sciences, Mysuru on the sampling method
of computer science department from the session 2013 to
Jeevalatha, Ananthi, and Saravana Kumar presented a case 2016. In this step, data stored in different tables were joined
study on performance analysis for placement selection for into a single set.
undergraduate students. They applied decision tree algorithm
by considering the factors like HSC, UG marks and B. Data selection and transformation
communication skills [9]. In this step only those fields were selected which were
required for the data mining process. The student register
Backer and Yacef conducted a study for identifying the most number, 10th, 12th, degree marks in each semester wise,
appropriate model for EDM. They analyzed data and reached assignment, gender, parent’s education, income were taken as
the attribute values for predictions. This is shown in the C. Naive Bayes’ Algorithm
table1.
TABLE I. STUDENT RELATED VARIABLES Step 1: Scan the student data set
Step 2: Calculate the probability of each attribute value.
SI.No DESCRIPTION POSSIBLE VALUES [n, n_c, m, p]
1 Students’ Gender Male, Female
Step 3: Apply the formulae
P(attribute value(ai)/subject value (vj))=(n_c + mp)/(n+m)
2 Students’ Category GM, SC, ST, OBC Where:
3 Medium of Teaching KAN, HIN, ENG n = the number of training data item for which v = vj
nc = number of examples for which v = vj and a = ai
4 Students’ Food Habit Veg, Non-Veg
p = a priori estimate for P(ai,vj)
5 Students’ Other Habit Drinking, Smoking, NA m = the parallel size of the sample
6 Living Location Village, City Step 4: Multiply the probabilities by p
Step 5: Compare the values and classify the attribute values to
7 Where do you stay Hostel, Room, PG one of the predefined set of class.
8 Number of Members in a 2, 3, >3
Family
9 Students’ Family Status Joint, Nuclear IV. EXPERIMENTAL RESULTS
Family Annual Income status BPL, Poor, Medium, High
10
11 Students’ Grade in 10th / <40, 40-59,60-80, >80
SSLC
12 Students’ Grade in 12th/ PUC <40, 40-59,60-80, >80
13 Students College Type Boys, Girls, Combined
No-Education, Elementary,
14 Father’s qualification Secondary, Graduate,
postgraduate,Doctorate, NA
No-Education, Elementary,
15 Mother’s Qualification Secondary, Graduate,
postgraduate, Doctorate, NA
Farmer, Business, Service,

16 Father’s Occupation Retired,
Not-Applicable
Housewife, Business,
17 Mother’s Occupation Service, Retired,
Not-Applicable
Fig.1. Student Data Set

18 Student Interested In Higher
Yes, No
Education
Above fig.1 represents the students’ data set collected from a
19 Do you use Mobile? If Yes, Yes, No database as well as a survey of approximately 60 students at
Since how many Amrita school of Arts and Sciences, Mysore .
Months/Years
20 Do you use the Internet? If Yes, No
Yes, Since how many
Months/Years
21 Do you use the social Yes, No
network? If Yes, Since how
many Months/Years
22 How many Siblings and their

1, 2, >3
qualification
23 Reading Habit Early morning , Night
24 How many hours do you 1, 2, >3

spend on studies per day
25 Do you use a vehicle Bike, Car, Bicycle, NA
Fig.2. Students 12th mark levels

Fig.2 shows the tenth grade performance of students. By
observation from the graph its clearly evident that among sixty
students half-dozen students area unit belong to the category
below forty, eleven students belong to forty to fifty nine, thirty
one students belong to sixty to eightieth category and twelve
students belong to higher than eighty. The graph displays that
minimum performance of students is thirty seven, maximum is
ninety eight, average students’ performance is seventy nine
and variance of students’ performance is fifteen.
Fig.4. Students performance Prediction Graph
Fig. 4 shows the performace evaluation graph of students in

percentage. It is evident from the graph that most of the
students are good in performance.
VI. CONCLUSION
In this paper, the classification is employed in student

information to predict the students' division on the premise of
previous information. As there are several approaches that
area unit used for knowledge classification, Naive theorem is
employed here. Information like group action, class test,
seminar and assignment marks were collected from the
students’ previous information, to predict the performance at
the top of the semester.
This study can facilitate the students and the lecturers to boost
the students of all category to perform well. This study helps
to spot out those students who require special attention ,
minimize the failure ratio and to take acceptable action for
upcoming semester examination.
Future work includes applying data processing techniques for

Associate in nursing distended knowledge set with additional
typical attributes to urge correct and economical results.
.References
[1] J.K.Jothi and K.Venkatalakshmi, “Intellectual performance analysis of

students by using data mining techniques”, International Journal of
Fig.3. Student Attributes for Prediction Innovative Research in Science, Engineering and Technology, vol 3,
Special iss 3, March 2014.
Fig.3 shows the attribute values of a student. When a new [2] Nikitaben Shelke and Shriniwas Gadage, “A survey of data mining
student enters his or her attributes, the values will be approaches in performance analysis and evaluation”, International
Journal of Advanced Research in Computer Science and Software
compared with the existing student’s attributes to evaluate the Engineering , vol 5, iss 4, 2015K.
performance and predict the same. The performance is [3] M.S. Mythili1 and A.R.Mohamed Shanavas , “An analysis of students’
predicted using Naive Bayesian algorithm. This helps the Performance using classification algorithms ”, IOSR-JCE, Volume 16,
organization to predict and analyze the student performance in iss1, Jan. 2014.
a better manner. [4] A.Dinesh Kumar and V.Radhika, “A survey on predicting student
performance”, International Journal of Computer Science and
Information Technologies, Vol. 5, 2014.
[5] E. Osmanbegović and M. Suljić, '”ata mining approach for predicting
students performance”, Economic Review, vol 10, iss 1, 2012.
[6] Sayali Rajesh Suyal and Mohini Mukund Mohod, “Quality [11] A.F . ElGamal, “An educational data mining model for predicting
improvisation of student performance using data mining techniques”, student performance in programming course”, International Journal of
International Journal of Scientific and Research Publications, vol 4,iss 4, Computer Applications(0975-8887), Vol.70, No.17, May 2013.
April 2014. [12] D.Magdalene Delighta Angeline, “Association rule generation for
[7] OTOBO Firstman Noah, BAAH Barida and Taylor Onate Egerton, student performance analysis using Apriori Algorithm”, The SIJ
“Evaluation of student performance using data mining over a given data Transactions on Computer Science Engineering And
space”, International Journal of Recent Technology and Engineering Applications(CSEA), vol.1, March-April 2013.
(IJRTE) ISSN: 2277-3878, Volume-2, iss 4, September 2013. [13] Bhise R.B, Thorat S.S and Supekar A.K, “Importance of data mining in
[8] Brijesh Kumar Baradwaj and Saurabh Pal, “Mining educational data to higher education system”, IOSR Journal of Humanities and Social
analyze Ssudents’ performance”, (IJACSA) International Journal of Science, ISSN: 2279-0837, vol.6, iss 6, January-February 2013.
Advanced Computer Science and Applications, Vol. 2, No. 6, 2011. [14] V.Ramesh, P.Parkavi and P.Yasodha, “Performance analysis of aata
[9] T.Jeevalatha, N. Ananthi and D.Saravana Kumar, “Performance analysis mining techniques for placement chance prediction”, International
of undergraduate students placement selection using Decision Tree Journal of Scientific and Egineering Research , Vol.2, iss 8, August
Algorithms”, International Journal of Computer Applications (0975- 2011.
8887), vol 108, December 2012. [15] Mohammed M.Abu Tair and Alaa M.El-Halees, ‘Mining Educational
[10] Ryan J.D.B.Baker and Kalina Yacef , “The state of educational data Data to Improve Students’ Performance: A case Study’, International
mining in 2009: A review and future revisions”, Journal of Educational journal of information and Communication Technology Research,
Data Mining , Vol.1,No.1, February 2009. ISSN: 2223-4985, vol.2 no.2, February 2012.

Predict Students' Performance Using Educational Data Mining

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Predict Students' Performance Using Educational Data Mining

Uploaded by

Copyright:

Available Formats

Prediction of Students Performance using

Educational Data Mining

Abstract—Data mining plays an important role in the

13 Students College Type Boys, Girls, Combined

Farmer, Business, Service,

Fig.1. Student Data Set

22 How many Siblings and their

24 How many hours do you 1, 2, >3

Fig.2. Students 12th mark levels

Fig.4. Students performance Prediction Graph

Fig. 4 shows the performace evaluation graph of students in

In this paper, the classification is employed in student

Future work includes applying data processing techniques for

[1] J.K.Jothi and K.Venkatalakshmi, “Intellectual performance analysis of

You might also like