홈으로ArticlesAll Issue
ArticlesAn Intelligent Model for Predicting the Students' Performance with Backpropagation Neural Network Algorithm Using Regularization Approach
  • Muhammad Mazhar Bukhari1, Syed Sajid Ullah2,3, Mueen Uddin4, Saddam Hussain5,*, Maha Abdelhaq6, and Raed Alsaqour7

Human-centric Computing and Information Sciences volume 12, Article number: 44 (2022)
Cite this article 1 Accesses


Higher education regulatory authorities, institutions, and students all value the ability to predict students' performance. Several colleges and universities use student data analytics to predict student performance. Several endeavors have been conducted to classify student results using well-known algorithms to attain the required accuracy. This article uses an artificial neural network (ANN) to examine and predict the academic characteristics and performance of students based on certain criteria such as prior academic records, family background, and attitudinal information, among others, so that educators can provide solutions in the event of high-risk students failing. Predicting student performance in school has been made possible using ANN, a machine learning model that has been shown to be dependable and effective for a wide range of functions and applications. When compared to other techniques, the results reveal a higher prediction accuracy of 89.72%. It is vital to compare the predictive outcomes with previous solutions in order to examine their fairness, validity, and reliability. When compared to other algorithms on a given dataset, we found that ANN predicts better due to model complexity control and reduction, as well as successive adjustments achieving the highest accuracy, allowing an ANN to produce target output that is more precisely similar to the actual output.


Artificial Neural Network, Classifiers, Model Complexity, Students’ Performance and Characteristics, Higher Education Commission, Regularization, Educational Dataset


Academic performance of students has been proven to be a critical element of the educational system. It is one of the most important indicators whose significance cannot be questioned at all levels when the superiority of education is unquestionably acknowledged. In a previous study, learning evaluation and academic-related activities were used to measure the academic achievement of pupils. On the other hand, the majority of experts agree that combining prior data in terms of academic and non-academic performance-related activities might forecast a student's success ratio at an earlier stage. Due to the prior prediction of a student, the academic program of studies can be fairly well-planned and designed on the basis of some customized requirements that lead the poor and average students to the accepted criteria [1]. Most of the research work has been conducted to determine the factors that contribute to the prediction process of students’ academic progress. The scope of the majority of such studies is confined to predicting the students’ success in a single course, specifically at the semester or year-end, incorporating some academic and non-academic factors. Non-academic factors include gender, internet access, age, and extracurricular activities, among others [2]. Academic factors include cumulative GPA, attendance status, and grades in any pre-requisite subjects, among others. Regardless of the contributions of these researchers in respect of support and assistance to the community, it affects negatively the students’ academic performance as previous academic indicators and performance status have not been incorporated. In order to develop the prediction models for the student’s learning, a key requirement is to examine both the negative and positive effects of employing past statistics and patterns. Several prediction methods have been used in the education sector for diverse purposes. Various methods of machine learning have produced different success ratios using different parameter settings, like logistic regression, decision trees, support vector machine (SVM), artificial neural network (ANN) and deep neural networks, etc. In [3, 4], the authors proved to be ideal for predicting students’ academic performance to the maximum accuracy because they were used in their relevant context. Hence, as a result, it is obvious that in order to obtain the desired highest accuracy and prediction ratio while choosing the appropriate parameters, a suitable modelling method for prediction purposes must be selected wisely [5].
The applications of ANN in the education sector have been increasing gradually since the last decade, inspiring the reality that it assists in perceiving, informative, and valuable students’ knowledge in their academic curriculum. From an educational perspective, ANN is an emerging discipline that relates to developing techniques to explore and analyze data [6]. It is quite pertinent to mention that finding out hidden patterns in the data and mutual association among the data attributes is a quality feature of a typical classification model, and the good thing is that ANN possesses the same. In order to predict the students’ performance during the study/course work, the application of ANN is being proposed in this article. Due to the critical situations during coronavirus disease 2019 (COVID-19) [7], the future success of students does not rely only on the conventional education system but also on e-learning and online disciplines, as well as distant learning, where it is not so easy to visualize, evaluate, and observe the student’s behavior, their interest towards study, and their participation status in their academic activities. [8]. Therefore, the article emphasizes finding the most excellent algorithm that is used to train the network to build as much as possible of a machine learning-based prediction model [9]. Furthermore, while comparing and contrasting with other renowned algorithms like SVM and k-nearest neighbor (KNN) on the same dataset, it has already been assessed and appraised the correctness along with the accuracy of the ANN-based approach [10, 11].
The objective of these algorithms is to predict whether or not a student needs more educational support (in the case of yes, if he or she is expected to fail in the model prediction). If a student is informed about their poor performance (result) in the middle of their studies, it would be beneficial for not only the student but the institution to cater for their additional coaching and hence place that student on the list of successful students [12]. Providing such additional academic activities, learning mechanisms, and adequate resources results in enhancing the educational quality of that student's studies. Therefore, such techniques are required to identify and anticipate the performance and characteristics of those students that have learning problems.
As far as the objectives of this paper are concerned, these include highlighting variables influencing students' performance as well as classification algorithms that prove to be the best and anticipating the final semester grades of the students. Classification is considered to be one of the most important dimensions of supervised learning, which is defined as learning that takes place in the presence of a training set of correctly classified observations. A classifier is an algorithm that accomplishes classification, particularly in a concrete implementation, and it is used to distinguish between different classes. A classifier is a supervised function (machine learning model) where the output attribute is categorically nominal. It is used when the learning process is complete to classify new records while providing the most realistic prediction [13]. The features that were included in the research include previous academic grades, demographic and social data. Data was tested using some methods followed by obtained results that proved to be the significance of previous grades of students. Researchers have also explored other features like students’ attendance, the reason for studying at school, demographic information, and how much time students spend on their studies that play a vital role in predicting their performance accordingly.
No doubt, developing such a prediction model is a bit complex in nature and predicts the students’ performance efficiently. However, it is thought-provoking that predicts consistent and accurate results for identifying students’ performance truly based on some key patterns intuitively. A prediction model is a crucial job caused by the large volume of data being generated in educational institutions and growing continuously. Given that data is heterogeneous in nature, as it is obtained from various sources that is why it is a bit complex to use this data, which is, to some extent, unstructured enough. It is required to transform the data to an acceptable state in which it can be used in an accurate model [14]. In order to build this accurate model that can predict the performance, the paper sorts out the most excellent training algorithm. By now, it has been evaluated that the working functionality of the ANN methodology while contrasting with other renowned algorithms.
Seeing as the classifier is a machine learning algorithm or a mathematical function that binds input into class categories, it contains a set of features that comprise the data that is needed to train the network. Different classifiers exhibit different algorithms with the intention of optimizing the whole process [15]. For instance, ANNs are a significant class of parallel processing architectures that provide a solution for particular types of complicated problems. Given that, it is based on biology, even though research up-till now has opened just a partial or some degree of appreciation of how this network works. Researchers are trying to capture the working approach and methodology of both biological as well as engineering areas to interpret the key mechanisms by which humans acquire learning and respond to their day-to-day experiences. Concerning neural processing, enhanced knowledge assists in producing improved and successful artificial networks [16].
Since the article identifies the students with their corresponding grades, the educators may assist the students with fewer grades to improve their academic performance in the future. The study also examines the correctness of classification techniques, specifically neural networks, to predict the student’s performance. To learn the neural network weights, the backpropagation algorithm has been deployed in the proposed model to predict the students’ performance as it carries out gradient descent to decrease the error ratio concerning the output value of the underlying network as well as the value of the expected target.

Related Work

This section reviews the literature related to the role of DP in preserving the privacy of the datasets containing dependent tuples.
The research community suggested various techniques and models to preserve the privacy of individuals in datasets containing sensitive information, e.g., healthcare data privacy [9, 10]. These mechanisms include DP-based privacy [2, 11, 12], homomorphic encryption-based privacy [13], privacy-enhancing technologies [1416] etc. Among these, DP is the most effective framework in ensuring the privacy of individuals [17]. It has received much attention from researchers in the last few years. The remarkable success of DP lies in the belief that datasets usually contain independent tuples [18]. However, it is not the case in most real-world datasets [19]. Kifer and Machanavajjhala [7] fundamentally investigated the privacy guarantees offered by DP in the dependent tuples. They utilized a no-free lunch hypothesis characterizing non-privacy as a game to argue that privacy can hardly be expected without considering the dependencies among the tuples of a dataset. They noted that DP might not give fruitful results in such situation.
Later, Kifer and Machanavajjhala [20] developed the Pufferfish framework that generalized DP for correlated datasets. They checked the impact of dependent tuples on the performance of DP more comprehensively than previously described and found that the background information gained by the adversary and the information extracted from the query results differ significantly; thus, making DP ineffective in the case of correlated datasets. Furthermore, He et al. [21] formalized Blowfish, the subclass of the Pufferfish model that extended the implementation of DP by formalizing the policy constraints for the databases. They divided the information into secrets and constraints that may be known to the adversary. The utility privacy tradeoff was measured by injecting less noise in secrets to maximize the utility. In [22], another DP-based mechanism was derived from the Pufferfish and Blowfish techniques to address the data dependency issue. The authors argued that accessibility compromises privacy while scrutinizing the deficiencies of the baseline approach. Another model was designed by combining the Pufferfish framework with the Bayesian theory [23]. They addressed the privacy issues in high-dimension correlated datasets. The Bayesian theory was implemented to examine the protection provided by the DP in the case of dependent tuples.
Jiang et al. [24] comprehensively investigated the privacy assurance of social network datasets in correlated scenarios and proved that DP and its major variants could guarantee provable privacy. Chen et al. [25] mitigated the dependent tuples problem by multiplying the number of correlated tuples by the computed global sensitivity of the dataset. However, they introduced much noise to deal with correlation problems that spoil the query result; thus, reducing their utility. In [26], the concept of correlated sensitivity was introduced to be used in place of global sensitivity to minimize the amount of noise and enhance utility. Zhao et al. [27] formalized the concept of dependent differential privacy (DDP) to avoid the leak of users' sensitive information that may occur in correlated datasets. This DDP guarantee can be applied to any information relationship and is free of background knowledge. Wang and Wang [28] presented the notion of correlated tuple differential privacy (CTDP) and achieved its privacy guarantees by the generalized Laplace mechanism (GLM). The aforesaid techniques overcome the impediments of DP under dependent tuples. These mechanisms were computationally efficient, accomplishing higher utilities than earlier work. br/> Lv and Zhu [29] tackled the correlation problem in big data by introducing the r-correlated block differential privacy (r-CBDP) mechanism. Their presented mechanism uses machine learning and MIC to verify the dependencies between the dependent tuples to preserve their privacy efficiently. Drakonakis et al. [30] designed LPAuditor, a tool for performing privacy loss evaluation on a public location dataset. They preserved the privacy of a dataset by adding noise to its histogram under the privacy budget. A correlation reduction scheme was introduced to deal with the privacy loss issue in machine learning algorithms trained on the correlated dataset [32]. The authors employed a differentially private feature selection method to alleviate the privacy leak due to correlated tuples. Chen et al. [27] developed a differentially private quasi-identifier classification (DP-QIC) technique for big data publication. The presented approach conceals the correlated attributes after evaluating the data set vulnerability by employing quasi-identifier classification based on the privacy ratio of attributes. In [19], the authors used Gibbs dependence to measure the correlation among the dataset records. However, the authors theoretically proved the result and did not implement the developed method on a real-world dataset. Most of the aforesaid techniques handle the privacy issue in correlated datasets by using either global sensitivity, correlated sensitivity, or correlation coefficient. Despite these efforts, such amendments either inject a lot of noise that degrades their utility or inefficiently provide the privacy guarantee of the query result. Consequently, there is a need for effectively measuring the sensitivity to tackle its impact on the correlated datasets. Therefore, this work uses MIC to discover accurately the dependencies among tuples of the dataset so that the correlated sensitivity can be precisely calculated. This paper focuses on providing maximum utility while preserving privacy by adding less noise according to the degree of dependencies of the correlated datasets.

Fig. 1. Artificial neural network architecture.

The elements of the neural network are represented in Fig. 1 and have an effect on specific weight changes. Every neuron is directly attached to a neuron in the other layer. Neurons are used in input and hidden layers, and finally, the output layer demonstrates the output result that exhibits the resultant nodes. The ANN mimics the mutual interaction among neurons by using summation functions based on a network model. The parameter "x" of the function is a vector of size n, say, x = [x1, x2,…, xn].

Research Contributions
This research article concerns the performance of students in educational institutions. Despite making several efforts towards improving the students’ success ratio in their academic activities since the last decades, the desired results have not been achieved so far. Most of the students leave the programme and even the institute because of the bad results they experience, specifically in the first year of their studies. To improve student academic performance, the first step is to identify pupils who are at risk. It is possible to identify these high-risk students in advance and do the necessary while transitioning these students from the danger zone to low-risk students by using predictive modelling techniques. Much research has been conducted to investigate the matter. Unfortunately, some problems still exist that act as hurdles in adopting a successful solution. Some research focuses on the academic activities of the students but not on other factors like the proper analysis of their demographic and family information, extra-curricular activities, study time, etc. Moreover, data mining techniques like clustering, classification, and prediction have been used. The proposed research work focuses on classifying the binary categories of the students and predicting their grades (labeled data), treated as a dependent variable with previous academic grades, family, demographic, and social data (feature data), treated as independent variables. According to the best knowledge of the researchers of this article and literature review, the minimization of model complexity and execution time have been overlooked in previous research work. It is very significant to control and monitor the impacts of these two critical factors so that the highest accuracy can be obtained with relatively reduced loss as much as possible.
This research article's contributions can be summarized as:
(1) On the basis of the input features, an intelligent model is proposed to forecast student performance.
(2) Given a training dataset, in order to map input features to output label variables, conduct a training session using ANN.
(3) Using regularization approach, a 9% prominent gain in overall testing accuracy and 0.07 (0.4 in first session and 0.47 in second session) reduction in loss have been achieved so far.
(4) Since overfitting occurs when under consideration model learns the details with noise in the training data, which creates negative impacts on the model performance specially on the test data, therefore, regularization approach is implemented to reduce the variance (training error) and bias (testing error) that prohibit the model to be over fitted.
The rest of the paper is divided into four sections: Section 2 describes the literature review where the background and previous work is discussed. The proposed methodology is presented in Section 3. Simulation and experimental results are explained in Section 4 and finally, Section 5 presents the concluding remarks followed by future research work.

Literature Review and Related Work

Quite a few researchers have presented their research work that is concerned not only with various distinct attributes, for example, demographic information, social school-related features, psychological, socio-economic, and other relevant factors that may impact students’ performance. However, on the contrary, several models are also available to predict the student’s performance.
In order to analyze the study environment of the students, Sharma et al. [1] propose a model in which various factors like family background, gender, academic and health-related reports, age, and extracurricular activities have been considered with respect to this model that helps in predicting student performance. The model is based on linear regression, decision trees, naïve Bayes classification, and KNN. Improvements have been made in terms of model accuracy and the success ratio of features that either a student passes or fails. The authors highlight the risk factors in the proposed model because of the critical values of some key features in the research work.
Marbouti et al. [2] examine the predicting methods to identify the students that are at high risk at some early stages of the semester in a particular course. Naïve Bayes classifier identifies 86.2% of students who are at high risk. The results of the test dataset using 14 variables selected by the correlation method include accuracy, true positive, true negative, false positive, and false negative, which strengthen the interoperability of the employed dataset with the proposed model. Imran et al. [5] propose a decision tree algorithm for predicting the students’ performance. The feature selection process was emphasized, specifically where data integration, cleaning, and discretization operations were performed on the data. By using the filter-based methodology, careful action was taken to select the relevant feature and train the model while acquiring a model with 90.13% accuracy. The model could not be outperformed due to the unavailability of some important data like the syllabus of quizzes, homework, and mid-term exams that prevented the researchers from identifying the success ratio in the course.
Bhutto et al. [8] employ classification algorithms, i.e., SVM using sequential minimal optimization (SMO) and logistic regression, to predict the future behavior of the students so that their academic performance can be classified into good, average, and bad categories. Using the WEKA tool, the logistic regression model exhibited 73% accuracy, while SVM (SMO) demonstrated 79% accuracy. The models are evaluated using recall, precision, F1-score, and accuracy as evaluation measures.
In order to enhance the satisfaction level and serenity of the students, Kaur et al. [10] used a social media technique to obtain information about the university as well as formal feedback from the students. The obtained information consists of students’ academics, extra-curricular activities, and their interest in contemporary technologies, etc. To predict the students’ contentment scores, they used a stacked ensemble machine learning algorithm. A meta-heuristic based wrapper method was employed for feature selection and dimensionality reduction. The proposed ensemble model exhibited a 0.373 RMSE, which was the lowest among other algorithms. The current research work relates to the limited feedback mechanism as university-wide local data was used. Therefore, it is common for students to use and share information using social multimedia. Therefore, the dataset needs to be enhanced by incorporating the data that can be obtained from those sites that are abundant sources of information relevant to the interests of the students, which is left for future work.
Cortez and Silva et al. [17] predict secondary school student performance using business intelligence and data mining techniques incorporating decision trees and ANN. This article presented the prediction of secondary student grades in two core classes, i.e., mathematics and Portuguese, by observing first and second-period grades. The whole procedure was based on in-campus studies and results were processed accordingly. However, it would be quite pertinent to mention that online data should be incorporated, if possible, to obtain the full potential of the learning environment. Perhaps some new features may be required to be added to the dataset so that the result would be based on the most recent data and a real-time prediction would be maintained. Though the result of both the employed algorithms, decision tree and ANN, is quite appropriate, i.e., 94% and 72%, respectively, that might be improved.
Al-Shehri et al. [18] present two prediction models (SVM and KNN) to anticipate the students’ performance in their examinations based on certain input variables. The research proved to be improved work by using both algorithms by achieving 91% and 92% accuracy with better results using a correlation coefficient of 0.96. On the other hand, the KNN achieved a correlation coefficient of 0.95. Rachakatla et al. [19] propose a framework that incorporates performance and data analysis-based statistical methods, decision making along with the association and correlation among the variables of the dataset. It focused on the analysis of the acquired data by means of Python, Crawlers, and other database exploration and analysis tools. In order to perform various kinds of analyses and create a knowledge base for further prediction purposes, it establishes a data warehouse where the data mining techniques can be applied.
Iatrellis et al. [20] developed a two-phase machine learning approach, exercising both supervised and unsupervised models that claimed to produce predictions with relatively higher accuracy. Following the higher education programmes of study, these machine learning models were trained to achieve relatively high accuracy, according to the author. If more information could be provided from the faculty members’ perspective, i.e., academic advisor diagnosis, it may be possible to predict the students’ performance in better ways and lead to more productive results.
Kumar et al. [21] apply clustering and classification techniques to the dataset. It predicted the results of the students' recruitment process in a technical examination. Among 200 instances, 50% of the instances were used in the training session, and the rest of the 50% of the instances were used in the testing session. The highest recorded accuracy was 81% among all three algorithms, K-means clustering, naïve Bayes, and decision tree were of the decision tree algorithm.

Proposed Methodology

The underlying research problem of students’ performance prediction can be examined by means of various factors. Viewing the literature, numerous admiring approaches proved to be a starting point and reference in respect of this research topic. In such a standard situation, a dataset comprising of students’ personal details as well as several factors would be employed in their learning analytics. For several reasons, complete information about these instances has not been obtained. Consequently, an independent data analysis may not be initiated, which leads to a deviation from the original target of successful prediction of students’ performance. So, to predict the students’ academic performance while overcoming the mentioned issues, an intelligent predictive model is being proposed that is based on ANN.

Problem Statement
One of the biggest challenges in educational institutes is the ever-increasing academic-related data at an exponential rate. This raises the problem of how to employ the data to suggest improvements in educational quality. Therefore, in order to address these challenges, a formal data exploration and analysis process model needs to be developed that assists an educational institution in finding, collaborating, and examining the data obtained from heterogeneous sources. Using data mining and machine learning techniques, this model can eventually be applied to predict the students’ academic performance [20]. The conventional education system is transformed into a successful system where results are supposed to be priorly identified. Likewise, the same example demonstrates that the proposed model will assist the academic stakeholders to compare their data and its results with machine learning detection outcomes and learn in a well-mannered way to assess and evaluate the machine learning-based prediction models [21].
Following is a set of research questions and objectives that describes the rational of research work.

Research Questions
(1) What are the key factors that involve in affecting the students’ academic performance?
(2) Which strategy can be adopted to measure the current progress of the students and identify the future success and failure?
(3) How does a general solution create the model complexity that results in low accuracy prediction and how to minimize this complexity that ends-up in enhancing the accuracy and reduces the loss?

Research Objectives
(1) To investigate the students’ perception on assessing their academic related performance.
(2) To explore the parameters that contribute towards demonstrating the students’ performance.
(3) To predict the grades on the basis of current academic activities.
(4) To predict whether or not a student needs more educational support (in case yes if he or she is expected to fail in the prediction process).
After studying widespread literature search as well as coordination with the education field researchers, a formal methodology for preceding the research process is carried out that comprises of a series of steps doing data analysis, feature engineering [22], and dimensionality related operations to select the best model that predicts the students’ academic performance. Taking into account, using pattern classification, the identified parameters to train the selected model to incorporate input and output variables to demonstrate the probable performance level corresponding to the grades of the students. As data collection and representation are mostly problem-specific, it is difficult to give general statements about this step of the process. Therefore, since an ANN detects the interactions among all underlying independent variables, it is specifically employed in this research article to build a formal communication mechanism among the characteristics and attributes of the dataset. It is the ability of an ANN to perceive the complex association between independent and dependent variables that reveals it to be an adequate and powerful tool for envisaging and observing the students’ performance predictions. The most significant features that are used in the dataset for anticipating the students’ performance using ANN include parents' cohabitation and education status, study and leisure time, extra and family educational support, punctuality and regularity, and previous grading status, etc. If you use the regularization method, you can get up to 89.72% accuracy in your predictions. If you don't use the regularization method, you can only get 80.37% accuracy.

Simulation Process
An ANN model is implemented that predicts the status of each student at the year-end, whether he or she will pass or fail. Final grade is labelled as G3 in the dataset, whereas the rest of the input independent variables are treated as features. At the initial level, data will be explored to know the details of the students so that a constructive line of action may be devised. Apart from obtaining the prediction accuracy, it is also significant to perform the data analysis with the intention of observing data analytic visualization requirements, data anomalies, and complexity alternative solutions. Once the dataset is selected to be used, the following steps will be performed to start the simulation process. The following are brief descriptions of each step.

3.4.1 Exploratory Data Analysis
It incorporates the acquisition of a dataset of two schools in the Alentejo region of Portugal [17], planning to perform data analysis and pattern recognition, data preprocessing, integration, applying feature engineering and feature selection strategies, etc. [23]. School reports and in-house surveys were used to compile the qualities, which include student grades, demographic, social, and school-related data. Exploratory data analysis (EDA) has shown abnormalities in data, hidden trends and patterns, and feature connections as a result of its open-ended nature. There are 1044 occurrences with 33 different variables [24]. A binary value of 1 or 0 is the target variable. Student status would be "pass" or "fail" if the binary number 1 was selected. A considerable association exists between the labelled data for G3, which represents the student's final year of high school, and the first and second periods of high school, respectively, in terms of grades. Fig. 2 depicts the segregation of rows that correspond to 1 and the 230 instances of 0 that are valued in relation to the output (label) variable. The characteristics relating to students’ performance include grades from their corresponding semesters as shown in Table 1. Depending on the input features, students were classified as "passed" or "failed."
Exploring the dataset, it is found that 81% of students who are studying at GP School acquire more academic grades than students at MS school. Females are getting 78% higher grades than males, which is 76%. The performance of the students aged 15–20 shows remarkable progress. As far as the students of urban areas are concerned, they are better than rural-area students. Students whose parents are educated and working, particularly in the health or education sectors, are more likely to achieve good academic results, as illustrated in Fig. 3.

Fig. 2. Bifurcation of final grade.

Table 1. Characteristics of dataset for simulation process
Attribute Description Type Range
school  School of student Binary 'GP' Gabriel Pereira, 'MS'  Mousinho da Silveira
sex  Sex of students Binary ‘F’ female or 'M' male
age  Age of students Numeric 15 to 22
address  Address of students Binary 'U' urban or 'R' rural
Medu  Mother's education  Numeric 0 (none), 1 (primary education, 4th grade), 2 (5th–9th grade), 3 (secondary education) or 4 (higher education)
Study_time  Weekly study times Numeric 1 (≤2 hours), 2 (2–5 hours), 3 (5–10 hours) or 4 (≥10 hours)
failures  Past_Class_Failures  Numeric n if 1≤n<3, else 4
Schools_up  Education_support Binary Yes/No
famsup  Family_education_support Binary Yes/No
higher  Higher education Binary Yes/No
Walc  Alcohol consumption Categorical from 1 very bad to 5 excellent
health  Present_Health_Status Categorical from 1 very bad to 5 excellent
absences  School_Absences Numeric From 0 to 93
G1  First_Period_Grade Numeric From 0 to 20
G2  Second_Period_Grade Numeric From 0 to 20
G3 Final_Grade Binary 0 fail or 1 pass

Fig. 3. Feature status with label

The students who have a minimum failure rate, give more time to study, and avoid unnecessary travel time seem to have a good grade with more than 80%. There is a prominent difference between students who seek higher education and those who are not interested in getting a higher education. Approximately 80% of students among them acquire a higher grade, compared to 48% of students who have no provision for higher education. The 75% to 80% of students with weekend alcohol consumption prove to be better grade handlers than those students who are workday alcohol consumers. Students possessing good health and regular status (few absentees) get good grades as shown in Fig. 4.

Fig. 4. Categorical-numeric feature association.

3.4.2 Pre-processing the data
Pre-processing the data involves data exploration and analysis, feature engineering, and feature selection that step forward towards model training, testing, and validation [25]. Feature engineering involves data transformation, dummy variables, controlling outliers, binning (preventing overfitting), and conventional scaling, all of which are methods for estimating values in the absence of actual data. As far as feature selection is concerned, the correlation coefficient method is applied, which checks the association of individual features to see if they play a role with the label (output) variable and with how much frequency, so that features may be selected accordingly. Insofar as the complex issue of the ANN model is concerned, regularization techniques are applied at simulation-end so that a reduction in model complexity can be made possible during data analysis, the process of understanding the data, finding patterns, and trying to obtain inferences due to which the underlying patterns are observed [26].
Feature engineering and feature reduction techniques are used in the simulation process as follows.
Data transformation: It is one of the important steps in data preprocessing that relates to the output (label) column in the dataset, i.e., G3. The output column contains the marks in the form of numerical values that need to be transformed to nominal values to represent the class label for classification and prediction purposes. The marks of the students are categorized into binary classes (0 and 1) as mentioned in Table 2.
Standard scaling: Transforms the data in a typical way such that its distribution contains a mean value of 0 and a standard deviation value of 1. Because the employed dataset contains multivariate data, it transforms the data independently for an individual feature, and it can be seen the prominent accuracy rate before and after using the MinMaxScaler Transform.

Table 2. Data transformation (numeric to nominal)
G3 output (label) column
Marks range Class label
0–9 Fail (0)
10–20 Pass (1)

Handling missing/null values: Since it is not always necessary to have a perfect dataset, containing all values and free from missing/null and duplicate values, a perfect dataset can be obtained. For example, a real-world dataset may be in an unstructured format containing noisy values that may not be used in the training models [27]. That’s why a prior process of handling the missing values and noisy data is necessary to perform for the sake of cleaning the data and is appropriate for training purposes. As far as the dataset is concerned, the correlation process recommends dropping these attributes containing the missing/null values that are explained under the Feature selection and Reduction heading below.
Removing outliers: Outliers refer to the fact that some attributes of datasets may contain extreme values that cross the range of normal values present in that particular attribute. Outliers can be identified in a dataset by looking for values that don't fit the normal distribution. This process of looking for and removing abnormal values is known as "cleaning up the dataset." Fig. 5 depicts some of the attributes containing outliers, so these outliers need to be removed.

Fig. 5. Features outlier status.

Encoding categorical data: The features school, age, sex address, Pstatus, Medu, famsize, Fedu, Fjob, Mjob, reason, traveltime, studytime, guardian, failures, famsup, schoolsup, activities, paid, higher, nursery, romantic, internet, are non-numeric categorical in nature, and the simulation process cannot be initiated unless the whole data is in numeric form, as machine learning models understand numbers, not text, so there is a need to use some categorical encoding technique to transform these values to some numeric values. A One-Hot Encoding technique is used to create the dummy variables; the outcome of each variable can be predicted using the underlying method in a dummy variable trap in which variables are correlated to each other.
The correlation coefficient method, which is relevant to the feature selection and reduction procedure, is being defined as a way to use the moderate number of necessary features without compromising the correctness of the model [28]. During the simulation procedure, the Pearson correlation coefficient approach is used to choose the most relevant features. An association between two variables can be either negative or positive using a scale ranging from 0 to 1, where no association is present, and -1 and +1 represent a negative or positive association, respectively, when using this method. As a result, the traits chosen are interdependent.

3.4.3 Training, testing, and splitting the data
For students’ dataset consists of 1,044 instances. Using ANN, classification is carried out in two different phases. For the sake of I/O mappings, the first phase trains the network by choosing 70% (n=730) instances as the training data set. A testing dataset is provided to ANN in the second phase. For the sake of classification of this new testing dataset, a total of 313 instances are used, and an update process of neuron weight is performed accordingly if required. An output is produced as a result of feed-forward propagation using the sigmoid function. This weight needs to be updated to achieve the desired result.

3.4.4 ANN modelling
An ANN is a hierarchal model composed of numerous layers where each layer contains a number of neurons (nodes) that are connected to all the nodes of the next layer [29]. The proposed model is made up of four layers: an input layer denoted by "x," an output layer denoted by "y," and two hidden layers denoted by "j" and "k," respectively. ANN calculates the gradient of the loss function for a single weight by the chain rule as mentioned in Fig. 6. It efficiently calculates one layer at a time. By considering the figure, 13 input features arrive through the input layer that is modelled with the values of weight w that are selected randomly and updated in the backpropagation process. An individual neuron's output is calculated from the first layer, i.e., the input layer "x," and directed first to the first hidden layer "j," then to the second hidden layer "k," and finally to the output layer. Finally, the error is calculated in the output. The backpropagation process is performed from the output layer to the second hidden layer to adjust and update the weights so that the error can be minimized.
The purpose of backpropagation is to optimize the whole process so that the ANN can learn how to transform presented input into output correctly. First of all, the mathematical model for the forward pass is explained, then the backward pass is described along with the loss calculation.

Fig. 6 Artificial neural network model.

1) The Forward Pass
On behalf of the input, weight, and bias, it is provided forward through the network. The activation function is used to calculate the total net input to the individual hidden layer neuron. The total-net input by incorporating sigmoid as an activation function and the process is repeated repeatedly with the output layer neurons.
Calculating the total-net input for first hidden layer j.


In order to get the output of jm, an activation process is performed using the sigmoid function:


Performing the same process for kn:


Getting the output as an input from the first and second hidden layer neurons j, k and repeating this process for the output layer neurons, the output would be:




Calculating the total error, the error can now be calculated for each output neuron. It is done using the “squared error function,” getting the total error, by adding them as:


Target refers to actual value, while output is the predicted value. Errors can be mentioned separately as:



By combining both variables, total error for the neural network is obtained.


2) Backward Pass
The purpose of the backpropagation is to update the weights during the learning phase, so that the target output may be close to the actual output while minimizing the error for each individual neuron. It is required to know how much the change in the weight reduces the total error as Such that the is the partial derivative or gradient of $E_{total}$ with respect to $w_n$. By applying the chain rule,





It is required to calculate each item in this equation. Initially, with reference to output, how much change can be seen in the total error.




By taking the partial derivative of $E_{total}$ with reference to $out_{y1}$, the quantity becomes zero. As the $out_{y1}$ does not affect it, the derivative of constant being taken is zero.



Now, how much total net input of y1 change with reference to weight?


Since the derivative of constant values is zero, so


Putting it together,


In order to reduce the error, subtract the value from the weight using learning rate η.


As $out_{kn}$ affects $out_{y1}$ and $out_{y2}$, therefore, needs to take into consideration its effect on both output neurons.
3) Hidden Layers
In order to update the weight values while continuing the backwards pass, it is required to calculate:



The output of each hidden layer neuron contributes to the output and error of output neurons. Since $out_k$ affects both $out_{y1}$ and $out_{y2}$, therefore the needs to know the impact on the neurons.






Calculating error at y1


Following the same process for , therefore,


Having it is required to solve and for individual weight.



Calculating the partial derivative of the total net input to k1 with respect to $w_n$.




3.4.5 Model evaluation and result interpretation
Anaconda Python, a data science platform designed for data scientists, information technology professionals, and business executives, was used to run the simulation. The experiment is carried out with the help of the backpropagation algorithm of an ANN. It will be determined how well the ANN performs by examining some well-known performance measures in detail. These indicators provide a useful sense of the effectiveness of the employed algorithm [30]. When a student is accurately identified, he or she falls into either the true positive (TP) or true negative (TN) category. When a student is classified as passing and the actual value is the same, then it correctly falls under the TP. Likewise, if any student fails and the actual value is the same, then it correctly falls under the TN. When a student is misclassified as passing but actually failing, this is referred to as a false positive (FP), and when a student is misclassified as failing but actually passing, this is referred to as a false negative (FN).The algorithm’s performance will be evaluated using a variety of methods, including the following.

Accuracy: may be calculated by dividing the number of successfully identified training examples by the total number of training instances in the student dataset. A label's accuracy is % when just one out of every 10 is incorrect.


Precision: An ANN's ability to accurately identify new findings is measured by its precision. Precision value of 90% means that on average, 1 of every 10 students, labeled student by the model is failing, and 9 are pass.


Recall is the number of correct identifications in relation to the total number of valid identifications in a dataset is known as recall (also known as sensitivity or TP rate). The recall value is 90% means that 1 of every 10 students, in reality, are missed by the model and 9 labeled as pass.


F1-score describes the trade-off between precision and recall by using the harmonic mean in terms of the positive class.


Fig. 7. ANN training and testing prediction progress.

Most of the time, the desired results may not be achieved in a few training iterations. To be more precise, the model needs to be trained several times to improve its accuracy of prediction. Given that the proposed ANN uses backpropagation as a learning algorithm to compute the gradient descent with respect to its weights in predicting the accuracy in respect of students’ academic performance [31]. The learning rate and momentum coefficient were 0.25 and 0.5, respectively. The simulation process was observed between 0 and 50 epochs by forming two hidden layers with 22 and 12 neurons in each layer, respectively. The ANN training processes have been performed in two sessions, one using default settings and the second incorporating the regularization approach. Fig. 7 depicts the overfitting problem for the first training session in the result of model complexity, as the highest accuracy of training is around 89%, while the testing process could only get 80.37%. A significant loss is observed in Fig. 8 where the training process exhibits a 0.30 loss and gradually gets down at the 50th epoch. On the contrary, the testing loss gets down near 0.40 but gradually increases afterward.

Fig. 8. ANN training and testing loss progress.

Regularization is one of the significant techniques in machine learning that is used to reduce the model complexity, improve the accuracy, and perform more appropriately the processes of prediction. Normally, the cost function is modified by means of adding penalty or complexity terms to the different parameters of the machine learning model, which is the magnitude of coefficients. Basically, it is used in the proposed model to prevent the model from overfitting and to maintain and improve the accuracy of the model as a whole. The basic purpose of regularization is to regularize or reduce the coefficient of the features of the student dataset. Hence, it can be said that by keeping the same number of features, the magnitude is reduced. The type of regularization that is used in the model is L2 regularization. Since L2 regularization works by adding a penalty to the proposed model, in the equation below, y is the dependent variable (label) presenting the value to be predicted, and x1, x2, and $x_n$ are the independent variables (features), w1, w2, and w_n are the weights or magnitude associated with the features, and b represents the bias.


It is required to optimize the w and b to minimize the cost function. The equation for the cost function would be:


In order to make the model that predicts the value of y, loss function, so-called residual sum of squares, will be added and parameters are optimized. It can be calculated by multiplying the lambda “λ” to the weight “W” of the individual feature as:


If the value of “λ” ideally approaches zero, the equation becomes the cost function of the model. In Equation (43), the penalty term regularizes the coefficients of the model, hence the regression reduces the amplitudes of the coefficients, as a result, the complexity of the model has been decreased.
A simple model in this scenario would be such model where the values of hyper-parameters contain relatively lesser entropy or model with the minimum parameters, therefore, it is recommended to apply some control or limit the complexity of the network by permit taking small values that transform the distribution of the values more regular through weight regularization.
Taking into account, optimizing the model along with training the network for 1,044 instances using 50 epochs, this simulation demonstrates the potential benefit and worth of neural network algorithms. To prove the significance and achievement of the neural network, a model was presented that was based on some preferred variables from a given dataset. A predicted model has been mentioned in this paper to predict the expected outcome of students at an educational institution. As mentioned above, ANN works by computing the gradient of the loss function with reference to individual weight using the chain rule, repeating backward from the last layer, and proves itself while acquiring an 80.37% success ratio with 1,044 instances. In the second session, a prominent improvement can be observed in Fig. 9 where the model gets approximately 89.72% of training and 87% of testing accuracy near the 40th epoch by incorporating the regularization technique, which is demonstrated in Table 3, where the comparison of both sessions (default and regularization) is shown. By comparing and contrasting the loss rates of both sessions, the loss is getting lower with the model training progress, and less than 0.4 loss can now be observed in Fig. 10.

Fig. 9. ANN training and testing prediction progress after regularization.

Fig. 10. ANN training and testing loss progress after regularization.

Table 3. Evaluation of simulation sessions (unit: %)
Epoch Training Testing
Default Regularization Default Regularization
10 85 83 78 79
20 83 85 78 81
30 84 86 78 83
40 87 88 81 87
50 87 89 82 87

Results have been evaluated by comparing the research work of different researchers on the same dataset using renowned algorithms. The comparison of the model accuracy is shown in Fig. 11, where ANN demonstrates better accuracy as compared to other algorithms (from 1 to 9 named as Logistic Regression, ZeroR, KNN, MLP, Naïve Bayes, J4.8, JRip, Random Forest, and ANN). It is worthwhile to mention here the role of the regularization process in enhancing the overall accuracy by reducing the model complexity in this setup.

Fig. 11. Model accuracy comparison chart.

Especially on the three above-mentioned evaluation measures: precision, recall, and F1-score, these measures are calculated using TP, TN, FP, and FN values of the confusion matrix. These measures are drawn in Fig. 12 that indicates the significance of ANN.

Fig. 12. Performance measures using artificial neural network.

Binary classifiers can also make use of another frequent and crucial tool: the receiver operating characteristic (ROC) curve. In this instance, the dotted line denotes a purely random classifier's (ROC) curve. The true positive rate (TPR) and false positive rate (FPR) of the above-mentioned prediction model are shown in the ROC Curve. As the label variable has either fog or cloud properties, it can address the usefulness and competence of a binary classification system. Fig. 13 depicts performance analysis using an effective ANN training with ROC curve. The TPR versus the FPR is represented by the ROC curve, which shows how well the binary classification system performs. ANN enhanced performance can be analyzed in this way.

Fig. 13. ROC curve plot.

Conclusion and Future Work

Most academic activities have been switched from traditional systems to online e-learning, also known as distant learning systems, as a result of COVID-19. It is difficult for an institute to continue its academic operations. It is important to note that prior to the final examination during the academic session, management should identify pupils who are struggling in their studies. High-level knowledge extraction is possible with ANN and data mining approaches. Several research studies have been conducted in the education sector to improve educational quality. In this study, a large dataset from two schools in Portugal's Alentejo region was used to address one of the most frequent difficulties with students' performance. Prior research has employed a variety of methods to forecast the overall result, including Random Forest, Rules Decision Table, and regression analysis, among others. The inputs of a group of students are carefully considered, and the network is trained accordingly while delivering expected results. A brief analysis of ANN and its comparison with renowned algorithms has been carried out to evaluate, examine, and appraise the result. The result indicates that the variables picked as input are extremely important in preparing a prediction of students' performance. Furthermore, model complexity is an important factor that must be reduced in order to improve accuracy, improve operational functionality, and reduce computational overhead and time. In this article, regularization is offered to improve the outcome, resulting in a significant change in the result as accuracy is increased from 80.37% to 89.72%.
It will be incorporated in the future research study to distribute the experiment to other educational institutions in order to anticipate student performance and design a plan to address their shortcomings in order to enhance the outcome. However, there must be a mechanism in place to select the parameters for the experiment in an automated manner. More investigation is needed to understand why some parameters aren't needed in the experiment; instead, look for other variables that affect or influence the student's performance.

Author’s Contributions

Conceptualization, MMB, SSU, SH. Funding acquisition, MU, MA. Investigation and methodology, MMB, SSU, SH, RA. Project administration, SSU, SH. Resources, MU, MA, RA. Supervision, SSU, SH. Writing of the original draft, MMB, SSU, SH, MU. Writing of the review and editing, MMB, SH, MA, RA. Software, MMB, SSU, SH. Validation, MMB, SH, MU. Formal analysis, MMB, SSU, MU, MA. Data curation, MU, SH, RA. Visualization, SSU, SH.


This research was funded by Princess Nourah bint Abdulrahman University Researchers Supporting Project Number (PNURSP2022R97), Princess Nourah bint Abdulrahman University, Riyadh, Saudi Arabia.


Princess Nourah bint Abdulrahman University Researchers Supporting Project Number (PNURSP2022R97), Princess Nourah bint Abdulrahman University, Riyadh, Saudi Arabia.

Competing Interests

The authors declare that they have no competing interests.

Author Biography

Name : Muhammad Mazhar Bukhari
Affiliation : Department of Computer Science, National College of Business Administration & Economics, Lahore, Pakistan.
Biography : Muhammad Mazhar Bukhari received M.Phil. from The Institute of Management Sciences, Lahore, Pakistan (2016). He is currently perusing Ph.D. degree from National College of Business Administration and Economics, Lahore, Pakistan.

Name : Syed Sajid Ullah
Affiliation : Department of Electrical and Computer Engineering, Villanova University, PA, USA.
Biography : Syed Sajid Ullah received his master’s in computer science degree (MS) from Hazara University Mansehra, Pakistan in 2020. He is currently pursuing his Ph.D. degree from Department of Electrical and Computer Engineering, Villanova University, PA, USA. His major research domains are cryptography, network security, information centric networking (ICN), named data networking (NDN), and IoT.

Name : Mueen Uddin
Affiliation : School of Digital Science, Universiti Brunei Darussalam, Jln Tungku Link, Gadong, BE1410, Brunei Darussalam.
Biography : Dr. Mueen Uddin is currently working as Assistant Professor of Cybersecurity and Blockchain at Universiti Brunei Darussalam. He completed his PhD from Universiti Teknologi Malaysia UTM in 2013 and B.S. & M.S. in Computer Science from Isra University Hyderabad, Pakistan Dr. Mueen has authored more than 100 International research articles published in highly indexed and reputed journals. His research interests include Blockchain, cybersecurity, cloud computing and virtualization.

Name : Saddam Hussain
Affiliation : School of Digital Science, Universiti Brunei Darussalam, Jln Tungku Link, Gadong, BE1410, Brunei Darussalam.
Biography : Saddam Hussain received Bachelor’s and Master’s degrees from Islamia College, Peshawar, and Hazara University, Masehra Pakistan in 2017 and 2021 respectively. Currently, perusing Ph.D. from School of Digital Science, Universiti Brunei Darussalam, Brunei Darussalam. He has published several papers in well-reputed journals including IEEE, JISA Elsevier, Cluster Computing, Computer Communication, IoTJ, Hindawi, CMC, and Electronics. He is serving as a reviewer in reputed journals including IEEE Access, International Journal of Wireless Information Networks, Scientific Journal of Electrical Computer and Informatics Engineering, and CMC. His research interests include Cryptography, Network Security, Wireless Sensor Networking (WSN), Information-Centric Networking (ICN), Named Data Networking (NDN), smart grid, Internet of Things (IoT), IIoT, Quantum Computing, Cloud Computing, and Edge Computing.

Affiliation : Department of Information Technology, College of Computing and Informatics, Saudi Electronic University, 93499, Riyadh, Saudi Arabia.
Biography : MAHA ABDELHAQ received the B.Sc. degree in computer science and the M.Sc. degree in securing wireless communications from the University of Jordan, Jordan, in 2006 and 2009, respectively, and the Ph.D. degree from the Faculty of Information Science and Technology, National University of Malaysia, Malaysia, in 2014. Her research interests include vehicular networks, MANET routing protocols, artificial immune systems, and fuzzy logic theory. She is a member of ACM and the International Association of Engineers.

Affiliation : Department of Information Technology, College of Computing and Informatics, Saudi Electronic University, 93499, Riyadh, Saudi Arabia.
Biography : RAED ALSAQOUR received the B.Sc. degree in computer science from Mu’tah University, Jordan, in 1997, the M.Sc. degree in distributed systems from the University Putra Malaysia, Malaysia, in 2003, and the Ph.D. degree in wireless communication systems from the National University of Malaysia, Malaysia, in 2008. He is currently an Associate Professor with the College of Computation and Informatics, Saudi Electronic University, Jeddah, Saudi Arabia. His research interests include wireless networks, ad hoc networks, vehicular networks, routing protocols, simulation, and network performance evaluation. He also has a keen interest in computational intelligence algorithms (fuzzy logic and genetic) applications and security issues (intrusion detection and prevention) over networks.


[1] R. Sharma, S. K. Maurya, and K. Kishor, “Student performance prediction using technology of machine learning,” in Proceedings of the International Conference on Innovative Computing & Communication (ICICC), New Delhi, India, 2021.
[2] F. Marbouti, H. Diefes-Dux, and K. Madhavan, “Models for early prediction of at-risk students in a course using standards-based grading,” Computers & Education, vol. 103, pp. 1-15, 2016.
[3] D. Ngabo, W. Dong, E. Ibeke, C. Iwendi, and E. Masabo, “Tackling pandemics in smart cities using machine learning architecture,” Mathematical Biosciences and Engineering, vol. 18, no. 6, pp. 8444-8461, 2021.
[4] M. A. Kamarposhti, I. Colak, C. Iwendi, S. S. Band, and E. Ibeke, “Optimal coordination of PSS and SSSC controllers in power system using ant colony optimization algorithm,” Journal of Circuits, Systems and Computers, vol. 31, no. 4, article no. 2250060, 2022. https://doi.org/10.1142/S0218126622500608
[5] M. Imran, S. Latif, D. Mehmood, and M. S. Shah, “Student academic performance prediction using supervised learning techniques,” International Journal of Emerging Technologies in Learning, vol. 14, no. 14, pp. 92-104, 2019.
[6] B. H. Kim, E. Vizitei, V. Ganapathi, “GritNet: student performance prediction with deep learning,” 2018 [Online]. Available: https://arxiv.org/abs/1804.07405.
[7] C. Iwendi, A. K. Bashir, A. Peshkar, R. Sujatha, J. M. Chatterjee, S. Pasupuleti, R. Mishra, S. Pillai, and O. Jo, “COVID-19 patient health prediction using boosted random forest algorithm,” Frontiers in Public Health, vol. 8, article no. 357, 2020. https://doi.org/10.3389/fpubh.2020.00357
[8] E. S. Bhutto, I. F. Siddiqui, Q. A. Arain, and M. Anwar, “Predicting students’ academic performance through supervised machine learning,” in Proceedings of 2020 International Conference on Information Science and Communication Technology (ICISCT), Karachi, Pakistan, 2020, pp. 1-6.
[9] M. L. Dahhan and Y. Almoussa, “Reducing the complexity of the multilayer perceptron network using the loading matrix,” International Journal of Computer Applications, vol. 175, no. 10, 975, pp. 40-48, 2020.
[10] M. Kaur, H. Mehta, S. Randhawa, P. K. Sharma, and J. H. Park, “Ensemble learning-based prediction of contentment score using social multimedia in education,” Multimedia Tools and Applications, vol. 80, no. 26, pp. 34423-34440, 2021.
[11] S. Badugu and B. Rachakatla, “Students’ performance prediction using machine learning approach,” in Data Engineering and Communication Technology. Singapore: Springer, 2020, pp. 333-340.
[12] H. Altabrawee, O. A. J. Ali, and S. Q. Ajmi, “Predicting students’ performance using machine learning techniques,” Journal of University of Babylon for Pure and Applied Sciences, vol. 27, no. 1, pp. 194-205, 2019.
[13] M. M. Bukhari, B. F. Alkhamees, S. Hussain, A. Gumaei, A. Assiri, and S. S. Ullah, “An improved artificial neural network model for effective diabetes prediction,” Complexity, vol. 2021, article no. 5525271, 2021. https://doi.org/10.1155/2021/5525271
[14] B. M. Rao and B. V. R. Murthy, “Prediction of student’s educational performance using machine learning techniques,” in Data Engineering and Communication Technology. Singapore: Springer, 2020, pp. 429-440.
[15] C. Verma, Z. Illes and V. Stoffova, “Age group predictive models for the real time prediction of the university students using machine learning: preliminary results,” in Proceedings of 2019 IEEE International Conference on Electrical, Computer and Communication Technologies (ICECCT), Coimbatore, India, 2019, pp. 1-7.
[16] Z. Iqbal, J. Qadir, A. N. Mian, and F. Kamiran, “Machine learning based student grade prediction: a case study,” 2017 [Online]. Available: https://arxiv.org/abs/1708.08744.
[17] P. Cortez and A. M. G. Silva, “Using data mining to predict secondary school student performance,” in Proceedings of 5th Annual Future Business Technology Conference, Porto, Portugal, 2008, pp. 5-12.
[18] H. Al-Shehri, A, Al-Qarni, L. Al-Saati, A. Batoaq, H. Badukhen, S. Alrashed, J. Alhiyafi, and S. O. Olatunji, “Student performance prediction using support vector machine and k-nearest neighbor,” in Proceedings of 2017 IEEE 30th Canadian Conference on Electrical and Computer Engineering (CCECE), Windsor, Canada, 2017, pp. 1-4.
[19] B. Rachakatla, B. Srinivasu, C. P. Laxmi, and S. Thasleem, “Students' performance evaluation and analysis,” i-Manager’s Journal on Software Engineering, vol. 13, no. 2, pp. 29-36, 2018.
[20] O. Iatrellis, I. K. Savvas, P. Fitsilis, and V. C. Gerogiannis, “A two-phase machine learning approach for predicting student outcomes,” Education and Information Technologies, vol. 26, no. 1, pp. 69-88, 2021.
[21] V. U. Kumar, A. Krishna, P. Neelakanteswara, and C. Z. Basha, “Advanced prediction of performance of a student in an University using machine learning techniques,” in Proceedings of 2020 International Conference on Electronics and Sustainable Communication Systems (ICESC), Coimbatore, India, 2020, pp. 121-126.
[22] S. Rathore, J. H. Park, and H. Chang, “Deep learning and blockchain-empowered security framework for intelligent 5G-enabled IoT,” IEEE Access, vol. 9, pp. 90075-90083, 2021.
[23] J. Jeon, J. H. Park, and Y. S. Jeong, “Dynamic analysis for IoT malware detection with convolution neural network model,” IEEE Access, vol. 8, pp. 96899-96911, 2020.
[24] “Index of /ml/machine-learning-databases/00320” [Online]. Available: https://archive.ics.uci.edu/ml/machine-learning-databases/00320/.
[25] Aslam, S., & Ashraf, I. (2014). Data mining algorithms and their applications in education data mining. International Journal, 2(7).
[26] Ramesh, V. A. M. A. N. A. N., Parkavi, P., & Ramar, K. (2013). Predicting student performance: a statistical and data mining approach. International journal of computer applications, 63(8).
[27] S. Abu Naser, I. Zaqout, M. Abu Ghosh, R. Atallah, and E. Alajrami, “Predicting student performance using artificial neural network: in the faculty of engineering and information technology,” International Journal of Hybrid Information Technology, vol. 8, no. 2, pp. 221-228, 2015.
[28] I. Khan, A. Al Sadiri, A. R. Ahmad, and N. Jabeur, "Tracking Student performance in introductory programming by means of machine learning," in Proceedings of 2019 4th MEC International Conference on Big Data and Smart City (ICBDSC), Muscat, Oman, 2019, pp. 1-6.
[29] Zeineddine, H., Braendle, U., & Farah, A. (2021). Enhancing prediction of student success: Automated machine learning approach. Computers & Electrical Engineering, 89, 106903.
[30] Agarwal, S., Upmon, Y., Pahuja, R., Bhandarkar, G., & Satapathy, S. C. (2022). Student Performance Prediction Using Classification Models. In Smart Intelligent Computing and Applications, Volume 1 (pp. 187-196). Springer, Singapore.
[31] Xu, X., Wang, J., Peng, H., & Wu, R. (2019). Prediction of academic performance associated with internet usage behaviors using machine learning algorithms. Computers in Human Behavior, 98, 166-173.

About this article
Cite this article

Muhammad Mazhar Bukhari1, Syed Sajid Ullah2,3, Mueen Uddin4, Saddam Hussain5,*, Maha Abdelhaq6, and Raed Alsaqour7, An Intelligent Model for Predicting the Students' Performance with Backpropagation Neural Network Algorithm Using Regularization Approach, Article number: 12:44 (2022) Cite this article 1 Accesses

Download citation
  • Received3 October 2021
  • Accepted26 January 2022
  • Published30 September 2022
Share this article

Anyone you share the following link with will be able to read this content:

Provided by the Springer Nature SharedIt content-sharing initiative