홈으로ArticlesAll Issue
ArticlesDetection and Mathematical Modeling of Anxiety Disorder Based on Socioeconomic Factors Using Machine Learning Techniques
  • Razan Ibrahim Alsuwailem and Surbhi Bhatia*

Human-centric Computing and Information Sciences volume 12, Article number: 52 (2022)
Cite this article 1 Accesses


The mental risk poses a high threat to the individuals, especially overseas demographic, including expatriates in comparison to the general Arab demographic. Since Arab countries are renowned for their multicultural environment with half of the population of students and faculties being international, this paper focuses on a comprehensive analysis of mental health problems such as depression, stress, anxiety, isolation, and other unfortunate conditions. The dataset is developed from a web-based survey. The detailed exploratory data analysis is conducted on the dataset collected from Arab countries to study an individual’s mental health and indicative help-seeking pointers based on their responses to specific pre-defined questions in a multicultural society. The proposed model validates the claims mathematically and uses different machine learning classifiers to identify individuals who are either currently or previously diagnosed with depression or demonstrate unintentional “save our souls” (SOS) behaviors for an early prediction to prevent risks of danger in life going forward. The accuracy is measured by comparing with the classifiers using several visualization tools. This analysis provides the claims and authentic sources for further research in the multicultural public medical sector and decision-making rules by the government.


Artificial Intelligence, Cloud, Machine Learning, Blockchain, Mental Illness, Neural Network Models, Machine Learning, Data Mining


Psychological disorders such as depression, anxiety, addiction, suicide, self-isolation, and other mood disorders are the mental illnesses prevalent in the Arab population, which is no different from any other demographic in the world [1]. A recent survey done by the Arab Youth Survey 2020 revealed the fact that mental illness is a significant concern among the youth demographic in the Middle East and North Africa (MENA), and should not be left untreated [2]. It has been well-informed by the mental health campaigners in the Middle East that are creating awareness among individuals in their hour of need, having the teachers, parents, and other policymakers speak freely regarding their opinions and feelings as to these disorders. In today’s fast-changing world taking place with artificial intelligence techniques, unprecedented pressures can be overcome if proper knowledge and early prediction of the disease can be treated well on time. This threat has taken shape after alertness was attained in the form of separate surveys which highlighted the scale of suffering that affected the region’s 200 million Arab youth with respect to depression, anxiety, and addiction. With 50% of the general population being international [3], the multicultural environment is one of the prime causes of conducting the research. This risk poses a more significant threat to the overseas population than nationals because of acculturation, as they are less access to mental health support and are isolated from their relatives, family, and friends [47]. The other factors such as proficiency in the Arabic language, resources and fear overall make them more susceptible to mental health problems than the native demographic and nationals [810].
Previous research has been conducted on the factors prevailing in health disorders among individuals [11, 12], but comparative studies between the overseas population living in the Arab world and nationals fail to have the previous works. Moreover, besides several studies that have already been done [1315], the number of articles regarding mental health problems is relatively limited in Saudi Arabia, the UAE, and other Arab countries. Therefore, this research provides the valuable contributions concerning the development of the dataset, providing resources to bridge the gaps between previous research done in the field given as follows:

There are more minor studies related to mental illness and help-seeking behavior in Arab countries;

Lack of early diagnosis of mental problems using computational techniques;

Lack of predictive model mathematically validated in the literature.

The data collection process will follow the survey-based questionnaire research, including different questions. The dataset shall not collect personal information such as name, ethnicity, and location. However, we shall collect information such as age, number of years of formal education, relationship status, and other information. The dataset on which the analysis is done will be shared as a supplement to the paper. The survey will be a question that will collect information on the following pointers, namely age, years of formal education, frequency of self-disappointment, frequency of disappointment with the world, frequency of suicidal thoughts, and frequency of thoughts of harming others, etc. These data points will then be encoded and further analysis will be done. This survey-based research has proved helpful in medical research related to depressive, anxiety, and stress symptoms [16]. This method sounds more cost-effective, with the data not having to be augmented much, by following the heterogeneity factor in the dataset. Therefore, the data will have fewer missing values, leading to better accuracy than the publicly available dataset and other manual-based surveys [17]. The Google Forms will be used for creating the questions and will be circulated to the different individuals targeting Arab countries such as Saudi Arabia, the UAE, Qatar, and Kuwait. Security matters and other privacy concerns will be handled in the form to ensure the confidentially of the individual. The countries mentioned above were chosen as study subjects for research in Saudi Arabia, Kuwait, Oman, and the UAE because of their multicultural demographics owing to many expatriates with hybrid and equal populations from nationals and internationals working together.
This paper aims to diagnose the early prediction of depression by using computational techniques. The results will be validated using a mathematical model, and any claims will be justified by measuring the accuracy of the results using machine learning classifiers. The dataset will consist of records from both international and local individuals, which will be used to examine their mental health conditions, including expatriates in a multicultural environment. A Vanilla Android app collects the data with Google Cloud data aggregation methods as its backend. The Android app is designed using the MIT App Inventor. The data is aggregated so that each participant logs data into their ledger (data record-keeping mechanism), while each ledger has its unique transaction ID. The participants are also provided with a unique private key, which will be used after the survey is complete for sharing the information with us. Therefore, when the participant wants to share the information with us, they can share a combination of their unique transaction ID and private key in the “TTTTT-KKKK” format, which will be appended with our public key and a resultant “key vector” is generated such as TTTTT-KKKK-PPPP, which will then automatically fetch the information from the cloud to our local database for further analysis. Therefore, a consent-driven-blockchain algorithm gives complete freedom to the survey participant to share the data after the survey is over, as blockchains [18, 19] are used for storing information. This data contains sensitive information on the subjects. Therefore, for data usage to be used for any analysis, the same requires consent from the subject. This consent is again a token-based consent management system, where the first character of the blockchain map speaks whether we can use the data or not.
This is done keeping in mind the nature of the data. A participant might feel that he can share the information without hesitation at the beginning. However, based on the kind of inputs he feeds into the app, the user might feel uncomfortable sharing the data, as the data contains much revealing information as to their dynamics with their family, friends, colleagues, society in general, etc.

Motivation and Contribution
The negative growth of the rare diseases causing a pandemic has thus burdened all of society globally with mental illness cases of depression and anxiety becoming prevalent in individuals overall. This has made the prevention and treatment of mental disorders a public health priority as to diagnosing them in advance, which can be done using artificial intelligence techniques. This research focuses on the comprehensive analysis of mental health conditions, while unintentional “save our souls” (SOS) behaviors have been studied for different individuals in Arab countries. This dataset generated from the survey is based on multiple-choice questions consisting of variables such as (discrete & continuous), which facilitate the early prediction of a disease or mental problems. The following lists the objectives for this research study as:

To study the individual’s mental health and indicative help-seeking pointers based on their responses to specific pre-defined questions in a multicultural society by employing a web-based survey methodology;

The available data points taken in this research are hierarchical to predict the extracted 27 features.

Exploratory data analysis (EDA) on the survey dataset is conducted representing the feature importance.

Proposed the framework model with the data discovery algorithm, which will help identify the individuals suffering from depression and mathematically validate the model;

Apply computational techniques using several supervised, unsupervised, and semi-supervised learning classifiers to predict the classification and generate the mathematical model to justify the claims;

The model’s performance is compared to the previous state of the art classifiers, and efficiency is evaluated in terms of various information retrieval metrics.

The validations will be conducted on the performed experiments and mathematically validate the findings by comparing the two inferences received from the two models. The machine learning classifiers will be measured using information retrieval parameters, and predictions will be made for the advance diagnosis of depressed individuals in order to take necessary actions.
The paper is divided into different sections, including “Related Work” in Section 2, “Proposed Methodology” in Section 3, “Results Showing the Experiments Performed and EDA Analysis” in Section 4, and in the last Section 5, “Conclusions with Future Directions” listed in detail.

Related Work

The influence of artificial intelligence in mental health has a vital role in developing and designing the systems applying machine learning and deep learning to find the facts related to social intelligence and human-computer interaction (HCI) covering the theme. This research brings out claims and justifications by referring to the more profound insights into the current landscape of machine learning applications for mental health from the two domains of HCI and computing science. Several machine learning and computational pipelines have been developed to standardize predicting a mental state and cognitive behavior in advance [20, 21]. These pipelines have stimulated the behavioral data (i.e., reaction time, decision, mood swings, etc.) and the empirical results, and offer solid behavioral predictions. Furthermore, computational models can be trained on the data to extract latent features, and be used to predict the different behavioral actions in the environment. Thus, these computational models are gaining traction in metal state prediction. A wide range of literature is available to understand the correlation between cognition and emotion [22], basic principles of psychopathology and emotions [23], and the neural enactment of mental processes [24]. Considering other domains, computational models are frequently used in the fields of biology and health care [25, 26] to identify reasonable outcomes of processes and are increasingly applied to social and cultural psychology to develop long-term demographic-level computational models. The different models are taken as agent-based and considered complex social influences, putting their implications for societal and cultural evolution [27, 28] and generating the testable hypothesis which readily challenges easy, logical analysis.
Computational modelling and natural processing language have been used to predict and monitor mental disorders such as depression, anxiety, etc. Several kinds of research have been done in sensor, audio, video, structures, and multimodal system use, with different mental health behavior having been explored, such as depression, suicide, stress, mood, bipolar, post-traumatic stress disorder (PTSD), anxiety, substance abuse, schizophrenia, and other mental health conditions [29]. The author explored association rule mining [30] to identify depression and anxiety using the Internet-based cognitive behavioral therapy (iCBT) program in patients. The linguistic features are considered for finding better correlations and detecting outcomes in a specific context of inpatient mental health. The paper has stated the different classes of artificially intelligent systems for identifying the fairness of systems for research on people with disabilities. A complete roadmap to explore the opportunities for researchers to create systems exploiting the power of artificially intelligent systems has been described [31]. Nemesure et al. [32] developed a prediction model to diagnose anxiety in psychiatric assessment using an electronic health record (EHR) dataset containing undergraduate students; hybrid machine learning and deep learning models are considered in the pipeline and explored biometric and demographic data for predicting psychiatric illness using various non-psychiatric input features. The model was tested, while the results were validated on different information retrieval metrics to determine which features are essential for the prediction of diseases. Sharma and Verbeke [33] investigated the variable importance hierarchy of biomarkers for anxiety disorders. The various univariate and multivariate models are experimented with to find correlations among the four anxiety disorders—generalized anxiety disorder (GAD), agoraphobia (AP), social anxiety disorder (SAD), and panic disorder (PD)—in the Dutch citizens’ dataset. The researcher, Byeon [34], worked on the dataset collected from South Korea on the elderly and completed the Zung Self-rating Anxiety Scale (SAS). Several machine learning algorithms were used for the metamodel for analyzing, and it was reported that the performance of the ensemble model performed better than the state-of-the-art machine learning models. Mutalib [35] studied the different factors related to mental health problems with the dataset collected from the higher education institute in Kuala Terengganu. Alghamdi et al. [36] have investigated the computational models and natural language processing concepts in the Arabic text to predict the onset of depression, and provided the empirical results to evaluate and compare the performance. They have created an ArabDep lexicon for a lexicon-based approach to analyzing the Arabic text collected from online forums. Then, they predict depression symptoms using a rule-based algorithm on the output of the created ArabDep lexicon. Simultaneously, they have annotated the data with the help of a psychologist, extracted the features, and predicted the depression symptoms using machine learning algorithms. Alabdulkreem [1] have researched the prediction of depression in Arab women, using their individual tweets during a specific period of the COVID-19 pandemic. In the research, a recurrent neural network was implemented on 10,000 tweets extracted from 200 users with an approximate 71% accuracy and a 0.7 F1-score. Shah et al. [10] have conducted a cross-sectional study on the behavior of 600 adolescents (from 12-18 years old) in the UAE. They identify the incidences of depression and its correlation with factors such as parental, family or individual or self-esteem. They used the Beck Scale to determine the prevalence of the depression symptoms, the Rosenberg Self-esteem Scale with high levels of reliability and univariate statistical analysis in a multiple regression model. Finally, they analyzed the incidence of depressive symptoms among 17.2% of adolescents in the UAE. In another study, Dardas et al. [37] presented a qualitative and explanatory design to collect data from 92 participants (from 14 to 17 years old) through 12 focus groups from Jordan in the UAE. They designed two themes for their study, which consider the Arab adolescents’ beliefs related to their family and social context with the Beck Depression Inventory (BDI-II) instrument and a semi-structured interview format. They concluded with the perceived contributing factors, nature of depression and attitudes toward depression interventions. The related recent literature based on the critical features employed in an earlier work has been compared and shown in Table 1 [3637].

Table 1.Study of recent works

Study Technique/Methods Datasets Key factors Future works
Alghamdi et al. [36] Lexicon-based approach, machine learning-based approach ArabDep corpus created by authors Data collection & annotation, rule-based & machine learning-based prediction of depression in Arab text with 80% accuracy. Deep learning approach by executing language model
Alabdulkreem [1] Machine learning approach with RNN architecture  Self-created corpus of 10,000 tweets from online forums The proposed tool analyzes the tweets of Arab women and provides an early age diagnosis of depression with a 70% classification accuracy. Considering risk factors with psychiatric disorders
Shah et al. [10] Multiple linear regression  Data collected via a cross-sectional study on the behavior of 600 adolescents (12–18 years old) Administrated the Beck Depression Inventory Scale & Rosenberg Self-esteem Scale, extracted positive predictors of depression, and predicted 17.2% of youth (95% CI 14.2–20.7) with depression symptoms.  To work on other datasets from social media platforms
Dardas et al. [37] EDA Collected data from 92 participants (14–17 years old) through 12 focus groups from Jordan An exploratory qualitative study including statistical & thematic analyses to obtain perceptions about depression from Jordanian adolescents. -
Priya et al. [38] Several machine learning classifiers were applied for the classification  Collected using a questionnaire (DASS-21) The application of machine learning classifiers resulted in different classification levels using IR metrics and an imbalanced confusion matrix. -
Leightley et al. [39] PTSD using supervised machine learning classifiers Study on ex-serviceman in the UK Predictive modelling with parameters used in their study as alcohol misuse, gender, & deployment status was considered satisfactory sensitivity.  -
Richter et al. [40 Novel diagnostic methodology to identify differences in cognitive biases using ANOVA & random clustering. Different Hebrew speakers The distinction between symptomatic participants consisting of maximal symptoms of depression, anxiety, or hybrid has been presented compared to the non-symptomatic, reporting a 71.44% prediction accuracy. Aspect on validation in clinical samples with a focus on cognitive tasks
Van Eeden et al. [41] Predictive performances of multinomial logistic regression, a naïve Bayes classifier, & auto-sklearn were measured. Netherlands-based study of depression and anxiety The predictions were made on DSM-IV-TR psychiatric diagnoses at a 2-, 4-, 6-, and 9-year follow-up with different sets of predictors using three methods, providing a 79% accuracy (95% CI 75%–81%). -

Proposed Methodology

In this research, a comprehensive analysis of mental health conditions and unintentional SOS behavior will be conducted for different individuals in Arab countries. The survey will generate this dataset based on multiple-choice questions consisting of discrete and continuous variables. The text-based inputs from the survey fillers will be encoded using the label encoders or one hot encoder defined under the scikit learn package of python. Also, an EDA will be conducted on the dataset to identify the characteristics that will help in early prediction. The analysis will have four data points and four inferences generated, classified into two classes, one from the machine learning models and the second from data analysis, for example using a mathematical model. The first inference will have the machine learning model results with three points, namely supervised, unsupervised, and semi-supervised learning algorithms. The reduced support vector machine (RSVM) and logistic regression will be used to give good results for text-based data. The hierarchical clustering will be used unsupervised due to the abrasiveness of the dataset, which will have high abrasiveness. Also, the K-nearest neighbor (KNN) will be used as it deals with the classification problem, which can be modelled into a clustering problem. Capsule neural networks will be used for supervised learning methods, while different integral, customized and activation functions will be used to optimize the accuracy. The results obtained from the different state-of-the-art machine learning classifiers will be analyzed, with the best possible results to be generated from the inference. The second inference generated from the mathematical model by looking at the secondary differential question or order of two linear equations derived will be studied from the data points, with the features being taken from the dataset. These mathematical results will develop the prediction model to fit the machine learning model. Several information retrieval metrics counting for the values of precision, recall, F1-score, accuracy, and R2 error [42] will identify the subject as diseased or not. The above-stated models will be validated to claim whether the two inferences are synchronized. The process is explained in the fundamental flowchart shown in Fig. 1.

Fig. 1. Process.

Data is collected through a survey. Data collection is done so that there is no inherent bias in the dataset. The collected data is then cleaned for any blank or NA values. Exploratory data analysis is done to understand the dataset. A relationship study between various features is done in a reverse “ablation” study using boxplots. Various outputs are mentioned in file EDA, while the baseline inference is drawn from this step, which will later work as our validator. Validator is stored as pred_ed[].
Three different models are used for these, categorized as supervised, unsupervised, and semi-supervised. In supervised learning, the KNN model is designed, with a prediction being made. The model accuracy claims to attain an 88% accuracy. The output of pred (X_test, y_test) is stored in an array as arr_supervised[]. Training and testing are done using the split of the available dataset into a ratio of 80:20. The data is split into k=3 random groups for analysis purposes. Therefore, a three-fold cross-validation is done, in which the model is trained and tested three times with three different datasets (which is a subset of the primary data).
The accuracy is determined using the cross_val_score, a built-in module in sklearn.model_selection.
In the semi-supervised model, we use the estimators in sklearn—semi_supervised API. The model attains a 93% accuracy, with the output being stored in an array as arr_semisup[]. Clusters in the semi-supervised were labelled using the random differential clustering method, and the labelling is done based on the most dominant cluster (with an accuracy closest to cross_val_score). As for an unsupervised model, since the data is hierarchical, we use the capsule networks for this case. The model attains a 97% accuracy. The output is stored in an array as arr_unsup[]. The array is of the form arr[age, depressed] – range(age): {18-54}, range(depressed)= {“Y”,”N”}. Averaging of the array is done on the analysis results: concat_pred = avg[sum(arr_supervised[],arr_semisup[],arr_unsup[])]. The Comparison of the concate_pred and pred_eda is made, while the result from the analysis matches 99% of the dataset. The model is explained in Fig. 2.

Fig. 2. Framework model.

Data Preparation
The data has been prepared by conducting a survey considering the local and international demographics in Arab countries (https://forms.gle/MZsX7M2LfbTAwK6r6). The dataset consists of 1,000 different people based on 27 different features. Once the data is collected, fundamental exploratory data analysis (E.D.A.) shall be performed. Different tools and techniques for analysis from R packages, such as ggplot, ggvis etc., to Python packages, such as matplotlib and seaborn, are used to structure the data. The analysis is conducted to find the relevant patterns in the dataset using several supervised, semi-supervised, and unsupervised algorithms. Based on the analysis, a mathematical model is also drawn, which will again be re-run to indicate individuals suffering from depression based on the analysis done by the proposed model.

Mathematical Model
Getting an achievable accuracy of about 99% by collating all the models and assembling them all for performing the validation on the survey dataset makes the system computationally heavy and much more complex. The mathematical model is formulated to outcome the observed gaps in the model by making the system more straightforward, computationally effective and more accurate. It will be faster because we take just three features based on the ablation study performed on the dataset in the next section. The internal dynamics of the features have been studied, with the person’s economic status, frequency of suicidal thoughts or self-harm, time-dependent parameters, and corresponding gradient value of the time dependency all being taken into account. The mathematical model has identified that the dependence of anxiety and depression is attributed to three main factors, which play an essential role.
Suppose a physical study can be done by taking two or three points from the dataset. In that case, our proposed model will give a 100% accuracy because we are introducing several different factors, such as a gradient that handles the frequency of suicidal thoughts. This gradient handles the changes (dynamical changes in a person’s economic and social conditions, and there are around two constants that can give weightage to the factors contributing to depression. These are the factors that make the system very robust, so that claims can be justified by taking it as a mechanism or a machine in which you put specific values inside. We get a better result, which gives you nearly a 100% accuracy without any extra computational expenses or computations, which the average person is unable to handle. We are making a mechanism more user-friendly and accessible because we want essential health to be a part of a person’s day-to-day health. This can also be related to one of the sustainable goals of “Health for all.” These are also pointers that justify the need for a mathematical model. The explanations of the mathematical model are given as follows:
The main feature of the mathematical model is the feedback loop between depression and the features that contribute the most to it according to EDA that is done on the survey data as denoted in the previous section.
We denote depression here as “D” and the set features that contribute to it most as “F.” We define the occurrence of depression as “D” and the features related to it as “F,” so the feedback loop between them is formed.

As per the results received from our data study, it has been claimed that the significant components of F comprise age (Ag), education (Edu), financial status (Fin) and frequency of self-harm (S.H.). It clearly shows that all the above three components are directly related. This means that depression gives rise to features in F, and vice versa, features in F give rise to depression. Upon analysis, it is found that the features inside F include age (Ag), education (Edu), financial status (Fin), and frequency of self-harm (S.H.). Therefore, since F is related to D, all the features inside F are also related to D. Therefore,

Now, we try to map the relationships into a functional form. Before that, it is important to note that S.H. is defined as the rate of occurrence, and all the other factors need to be seen as a numeric value, if not already some. Therefore

and relates to D.
Alternatively, here we see that only the frequency of self-harm is a function of time. Therefore, we define frequency as a rate (so, we see S.H. as . Therefore, all functions are related to depression (D) as,

Since there is a differential parameter in the function, we must introduce a slope function to the effects. To handle the functional offset due to the introduction of a differential parameter in the system, we introduce a slope parameter, which will convert the basic equation model, after which the function becomes,

where α is constant, V is slope function, and

It should be noted that the more VD becomes, there is a greater chance of an individual slipping into depression. Again, V is directly calculated from the algebraic gradient of . Another important parameter that comes with the “financial status” attribute is the uncertainty of the status. This attribute is subject to frequent or periodic change attributed to the economy, environment, or individual. To the factor, in this uncertainty, we add another parameter as denoted below:

where t is time. So,

where β is an exponential slope and γ is a constant.
To denote the seriousness of the parameter and k is defined as above. Here, alpha is a numeric constant (between 0 & 1); V is the slope of function (using theta as 45º); and k is the primary function as defined above and related to D. Upon analysis, we see that financial status can also be a function of time as it may change with the passage of time. Therefore, we add another slope parameter, and at this time, we add an exponential slope function to emphasize how rapidly a person’s life can change. Therefore, the new function is denoted as follows:


Again, k is the primary function; and β & γ are numeric constants more than 0 & less than/equal to 1.

Experiment Analysis

The analysis of early prediction is done on the survey dataset. Data is collected from 1,000 individuals using a survey format targeting those in their 20s to 40s. The features and characteristic of data used for the study include residential status (object), gender (object), age (numeric), academic status (object), proficiency of mother tongue (numeric), proficiency in Arabic (numeric), relationship status (binary), religious belief (binary), frequency-little interest or pleasure in doing things (object), frequency-feeling down, depressed, or hopeless (object), frequency-trouble falling or staying asleep, or oversleeping (object), frequency-feeling tired or having little energy (object), frequency-poor appetite or overeating (object), frequency-feeling bad about yourself or that you are a failure or have let yourself or your family down (object), frequency-trouble concentrating on things, such as reading the newspaper or watching television (object), frequency-suicidal thoughts (object), disassociation from world (object), loneliness (object), lack of friends (object), social discrimination (object), opportunistic discrimination (object), racial discrimination (object), financial status (binary), frequency-self-harm (numeric), fear-failure (object), fear-family disappointment (object), frequency-harming others (numeric).
The analysis is performed by training the model using machine learning classifiers.

Exploratory Data Analysis of the Dataset
The four basic python libraries are used, such as NumPy (for running different kinds of running matrix algorithms, panda (for processing and data import and export), seaborn (for advance data visualization) and matplotlib (for basic data visualization) are used for conducting EDA [43]. After applying inputs, the shape of the data is checked. The data contains 1,000 rows and 27 columns. The data has been augmented, trained and tested on different models further. The data has been read and the data type checked, where most of the data types observed in the survey data are of the object type. The data count has been done, and the zero (null values) are discarded. The relationships between the various factors in the dataset are identified for analysis. The box plots have been used to check for the correlation between various datasets. The first box plot in Fig. 3 reveals that most of the density of people who participated in the survey belong to the 20–35 years old demographic.

Fig.3. Visual representation of responses.

Fig.4. Correlation between academics and age to questions related to overall experience.

Fig.5. Correlation between frequency of suicidal thoughts and age.

The relationship between age, academic position, and the frequency of suicidal thoughts have also been identified as people with maximum suicidal thoughts being in the 40–45 years old age group, along with graduate studies. The inference drawn from Fig. 4 reveals that the people in middle management of the corporate sector only with bachelor’s degrees are highly depressed. A disturbing fact that comes out of the above plot is that people having an undergraduate degree and relatively younger in age tend to be suicidal. Another depressing fact has revealed that the people who have not yet graduated also have negative thoughts of suicide and can be considered depressed. The tiny red dot above “20” and below “undergraduate” can be seen in Fig. 5. The figure shows that most people with suicidal thoughts come from below “above average” financial status.
The mapping of suicidal thoughts and their financial status has been done, revealing that the above financial status is more perceived as showing suicidal tendencies. The below box plot in Fig. 6 shows no correlation between social discrimination and thoughts of self-harm.
Fig. 7 shows that there is no chance of a person with thoughts of self-harm wanting to harm others.

Fig.6. Box plot showing no correlation between social discrimination and thoughts of self-harm.

Fig.7. Frequency of harming others versus age.

The level of fear mapped with the disappointing family as the highest, average, and lowest numbers in Fig. 8.
The different features are taken, for which the correlation matrix is constructed. The features include age, proficiency in mother, tongue, proficiency in Arabic, frequency-self-harm, and frequency-harm to others. The correlation between frequency-harming others and proficiency in Arabic gives a value of 0.004, which gives a weak correlation between the two features and can play a minor role in identifying some relationship. The strong correlation is represented by the diagonal values, numbered as 1. Other features such as age, education, financial status and frequency of self-harm give a high value in the correlation matrix, representing a strong relationship. Financial status has a direct correlation with anxiety. The assumption declares that a person’s financial status can either get better or worse with time (function of time). Therefore, the anxiety induced in a person due to financial stress can either reduce or worsen with time. It shows that these are essential components and are dependent on the facts revealed. The positive correlation given in Fig. 8 shows dependence, which will be helpful in the ablation study of the dataset. In Fig. 9, longitudinal data here means that the features had a correlation coefficient of 0, which means that none of the items on the sample dataset showed any kind of linear multimodal relationship.

Fig.8. Level of fear mapped with number of people.
Fig.9. Correlation matrix.

Experimental Setting
The experiments are conducted in two different ways as follows. The EDA is conducted on the taxonomic hierarchical survey data. Different modules as inbuilt packages in Python language [44] have been used for implementation. The matrix analysis is done using NumPy, the plots and complex graphs are made using matplotlib and seaborn, the data is ingested using pandas, and the libraries sklearn and TensorFlow are used for machine learning tasks. The hardware requirements taken for experimentation were an Inter i5 8th Gen processor, 16 GB RAM, and CentOS operating system. The model was developed on the longitudinal data, while training of the model was performed to learn compact representation, with encoding of the dynamics attained by observing the longitudinal measures for each subject. This work contains hyper-parametric classifiers [45] from KNN, semi-supervised algorithms, and capsule networks [46]. Fig. 10 presents the accuracy at each iteration by considering 27 features in the dataset.
Fig.10. Accuracy at each iteration.

Ablation Study
The ablation study is performed on the different datasets by excluding some features. The sets for this study are taken as P, Q, R, and S. A total of 26 features are considered and marked in set P for experiments. Likewise, set Q consists of the listed features in Table 2, excluding “education”. Set R excludes the self-harm feature, while set S excludes two features, namely age + education + self-harm. It has been observed that we have received the result with 81% accuracy, excluding the “age” feature. After that, to highlight the dominant feature, experiments were repeatedly performed by removing the feature “education” it was observed that the accuracy raised to 83%. Then, the feature of self-harm dropped and the accuracy decreased to 79%. However, removing all three mentioned features gives more reduction and achieves a 71% accuracy. Thus, these experiments proved that the most dominant and highlighted feature is “education.” The results on accuracy by calculating the average accuracy from the base and modified data are shown in Table 2.

Table 2. Evaluation: accuracy as per the ablation study
Ablation study Average accuracy with base data (%) Accuracy with modified data (%)
Features (sans) age 92.67 81
Features (sans) education 92.67 83
Features (sans) self-harm 92.67 79
Features (sans) age + education + self-harm 92.67 71

Evaluation MetricsThe proposed approach is validated using various evaluation metrics such as precision, recall, F-measure, and accuracy. The proposed model is trained and tested using a ratio of 80:20. Table 3 demonstrates the results of the proposed work. The experiments are conducted on the survey dataset, achieving an above 80% accuracy in all three models. This accuracy is adequate as we have surveyed related works extensively and observed that the accuracy for users’ responses is below 75% in existing research works to the best of our knowledge. Precision measures the number of positive labels predicted in the actual positive class.


Recall is the ratio of a total number of positive labelled predictions to truly positive examples.


F-measure as a harmonic mean of precision and recall is calculated in F-measure. It is used to balance the values of precision and recall.


Accuracy is the ratio of the number of correct predictions and total predictions [41].


Table 3. Evaluation metrics of proposed work
Classifier Accuracy (%) Precision (%) Recall (%) F-score (%)
Un-supervised 88 82 84 82
Semi-supervised 93 96 95 95
Supervised 97 99 98 98
A receiver operating characteristic (ROC) curve is also calculated with the different classifiers used in the experimentation, providing the best results with supervised learning. The same is shown in Fig. 11.
Fig.11. ROC curve on different classifiers.

Conclusion and Future Direction

In this study, mental illness such as depression has been studied by experiment using machine learning and deep learning classifiers. Machine learning models have been implemented using packages like TensorFlow, Keras, etc. In addition to the above, the help of various medical documentation, such as PHQ-9, Social Connectedness Scale, etc., is taken to accurately find a correlation between various variables identified from the dataset. The significant outcomes are based on the standard data collection protocols taken from the survey with questions including measurement scales, medical history, etc. The validations are done using the mathematical model, which proved that the inferences are aligned with that of the mathematical model because the underlying mathematical philosophy of our software platform coincides with the mathematical model. Also, a clinical setup can be used for training machine learning algorithms, which can help health professionals predict the correct response to the data in identifying anxiety disorders. Furthermore, data mining can be done more efficiently with machine learning algorithms, and new decision rules may also be discovered. Health professionals can interact with the clinical decision support system. The research conducted in this study helped diagnose the early prediction of depression by using computational techniques. The results are validated using a mathematical model, and any claims will be justified by measuring the accuracy of the results using machine learning classifiers. The dataset consists of records from both international and local individuals, which will be used to examine their mental health conditions, including expatriates in a multicultural environment. The paper well-stated the correlation between various essential features using analysis and conducting the ablation study on the dataset. The graphs and figures are used as visualization tools to justify the effectiveness of the proposed model. In the future, the study can be further investigated by the GAN model on other hospital datasets taken from Arab countries. Also, a cloud-based data parser and an accurate time assessment of a person’s mental health can be introduced to get early medical assistance.

Author’s Contributions

Conceptualization: RIA, SB. Funding acquisition: RIA. Investigation & methodology: RIA, SB. Project administration: RIA. Resources: RIA. Writing of the original draft: SB. Writing of the review & editing: RIA, SB. Validation: SB. Data curation: RIA. Visualization: SB.


This work was supported by the Deanship of Scientific Research, King Faisal University, Saudi Arabia (Grant No. RA00018).

Competing Interests

The authors declare that they have no competing interests.

Author Biography

Name : Dr. Razan Alsuwailem
Affiliation : Assistant professor, College of Computer Science and Information technology, King Faisal University, Saudi Arabia
Biography : Dr. Razan Alsuwailem is currently an Assistant Professor at Department of Information Systems, College of Computer Science and Information Technology, King Faisal University in Saudi Arabia. She did her Bachelors in Computer and Information Systems in 2008 at King Faisal University, Saudi Arabia. She did her MA in Information Systems, in 2014 at Lawrence Technological University, United State. She obtained her PhD in Information Assurance, in 2018 from Eastern Michigan University – United State. In addition to her academic career, Dr. Alsuwailem held several managerial positions in the University.

Name : Dr. Surbhi Bhatia
Affiliation : Assistant professor, College of Computer Science and Information technology, King Faisal University, Saudi Arabia
Biography : Surbhi Bhatia received her doctorate in Computer Science and Engineering from Banasthali Vidypaith, India, in 2018, and did her Masters in Technology from Amity University in 2012 and Bachelors in Information Technology in 2010. She is also PMP certified from PMI, USA. She is currently an Assistant Professor in the Department of Information Systems, College of Computer Sciences and Information Technology, King Faisal University, Saudi Arabia. She is associated with reputed journals and has published many research papers in high indexing databases. She has been granted patents from USA, Australia and India. Her research interests are Machine Learning, Sentiment Analysis, and Information Retrieval.


[1] E. Alabdulkreem, “Prediction of depressed Arab women using their tweets,” Journal of Decision Systems, ol. 30, no. 2-3, pp.102-117, 2021.
[2] J. Bell, “The hidden face of mental illness in the Middle East,” 2019 [Online]. Available: https://www.arabnews.com/node/1496661/middle-east.
[3] M. Wei, P. P. Heppner, M. J. Mallen, T. Y. Ku, K. Y. H. Liao, and T. F. Wu, “Acculturative stress, perfectionism, years in the United States, and depression among Chinese international students,” Journal of Counseling Psychology, vol. 54, no. 4, pp. 385-394, 2007.
[4] M. G. Constantine, S. Okazaki, and S. O. Utsey, “Self‐concealment, social self‐efficacy, acculturative stress, and depression in African, Asian, and Latin American international college students,” American Journal of Orthopsychiatry, vol. 74, no. 3, pp. 230-241, 2004.
[5] Q. H. Vuong and N. K. Napier, “Acculturation and global mindsponge: an emerging market perspective,” International Journal of Intercultural Relations, vol. 49, pp. 354-367, 2015.
[6] J. Hyun, B. Quinn, T. Madon, and S. Lustig, “Mental health need, awareness, and use of counseling services among international graduate students,” Journal of American College Health, vol. 56, no. 2, pp. 109-118, 2007.
[7] Q. H. Vuong, K. C. P. Nghiem, V. P. La, T. T. Vuong, H. K. T. Nguyen, M. T. Ho, K. Tran, T. H. Khuat, and M. T. Ho, “Sex differences and psychological factors associated with general health examinations participation: results from a Vietnamese cross-section dataset,” Sustainability, vol. 11, no. 2, article no. 514, 2019. https://doi.org/10.3390/su11020514
[8] H. Xiong, C. Jin, M. Alazab, K. H. Yeh, H. Wang, T. R. Gadekallu, W. Wang, and C. Su, “On the design of blockchain-based ECDSA with fault-tolerant batch verification protocol for blockchain-enabled IoMT,” IEEE Journal of Biomedical and Health Informatics, vol. 26, no. 5, pp. 1977-1986, 2021.
[9] R. Beiter, R. Nash, M. McCrady, D. Rhoades, M. Linscomb, M. Clarahan, and S. Sammut, “The prevalence and correlates of depression, anxiety, and stress in a sample of college students,” Journal of Affective Disorders, vol. 173, pp. 90-96, 2015.
[10] S. M. Shah, F. Al Dhaheri, A. Albanna, N. Al Jaberi, S. Al Eissaee, N. A. Alshehhi, et al., “Self-esteem and other risk factors for depressive symptoms among adolescents in United Arab Emirates,” PloS One, vol. 15, no. 1, article no. e0227483, 2020. https://doi.org/10.1371/journal.pone.0227483
[11] S. Slewa-Younan, M. McKenzie, R. Thomson, M. Smith, Y. Mohammad, and J. Mond, “Improving the mental wellbeing of Arabic speaking refugees: an evaluation of a mental health promotion program,” BMC Psychiatry, vol. 20, article no. 314, 2020. https://doi.org/10.1186/s12888-020-02732-8
[12] L. A. Dardas, S. Silva, D. Noonan, and L. A. Simmons, “Studying depression among Arab adolescents: methodological considerations, challenges, and lessons learned from Jordan,” Stigma and Health, vol. 3, no. 4, pp. 296-304, 2018.
[13] L. A. Dardas, D. E. Bailey, and L. A. Simmons, “Adolescent depression in the Arab region: a systematic literature review,” Issues in Mental Health Nursing, vol. 37, no. 8, pp. 569-585, 2016.
[14] J. G. Wong, E. P. Cheung, K. K. Chan, K. K. Ma, and S. Wa Tang, “Web-based survey of depression, anxiety and stress in first-year tertiary education students in Hong Kong,” Australian & New Zealand Journal of Psychiatry, vol. 40, no. 9, pp. 777-782, 2006.
[15] K. H. Jones, P. A. Jones, R. M. Middleton, D. V. Ford, K. Tuite-Dalton, H. Lockhart-Jones, et al., “Physical disability, anxiety and depression in people with MS: an internet-based survey via the UK MS Register,” PloS One, vol. 9, no. 8, article no. e104604, 2014. https://doi.org/10.1371/journal.pone.0104604
[16] J. F. Ebert, L. Huibers, B. Christensen, and M. B. Christensen, “Paper- or web-based questionnaire invitations as a method for data collection: cross-sectional comparative study of differences in response rate, completeness of data, and financial cost. Journal of Medical Internet Research, vol. 20, no. 1, article no. e8353, 2018. https://doi.org/10.2196/jmir.8353
[17] R. U. Rayhan, Y. Zheng, E. Uddin, C. Timbol, O. Adewuyi, and J. N. Baraniuk, “Administer and collect medical questionnaires with Google documents: a simple, safe, and free system,” Applied Medical Informatics, vol. 33, no. 3, pp. 12-21, 2013.
[18] W. Wang, H. Xu, M. Alazab, T. R. Gadekallu, Z. Han, and C. Su, “Blockchain-based reliable and efficient certificateless signature for IIoT devices,” IEEE Transactions on Industrial Informatics, vol. 18, no. 10, pp. 7059-7067, 2022.
[19] S. K. Singh, A. E. Azzaoui, T. W. Kim, Y. Pan, and J. H. Park, “DeepBlockScheme: a deep learning-based blockchain driven scheme for secure smart city,” Human-centric Computing and Information Sciences, vol. 11, article no. 12, 2021. https://doi.org/10.22967/HCIS.2021.11.012
[20] M. D. Lee, A. H. Criss, B. Devezer, C. Donkin, A. Etz, F. P. Leite, et al., “Robust modeling in cognitive science,” Computational Brain & Behavior, vol. 2, no. 3, pp. 141-153, 2019.
[21] R. G. Nadakinamani, A. Reyana, S. Kautish, A. S. Vibith, Y. Gupta, S. F. Abdelwahab, and A. W. Mohamed, “Clinical data analysis for prediction of cardiovascular disease using machine learning techniques,” Computational Intelligence and Neuroscience, vol. 2022, article no. 2973324, 2022. https://doi.org/10.1155/2022/2973324
[22] E. Eldar, R. B. Rutledge, R. J. Dolan, and Y. Niv, “Mood as representation of momentum,” Trends in Cognitive Sciences, vol. 20, no. 1, pp. 15-24, 2016.
[23] I. Grahek, S. Musslick, and A. Shenhav, “A computational perspective on the roles of affect in cognitive control,” International Journal of Psychophysiology, vol. 151, pp. 25-34, 2020.
[24] B. U. Forstmann and E. J. Wagenmakers, An Introduction to Model-Based Cognitive Neuroscience. New York, NY: Springer, 2015.
[25] J. A. Fletcher and M. Doebeli, “A simple and general explanation for the evolution of altruism,” Proceedings of the Royal Society B: Biological Sciences, vol. 276, no. 1654, pp. 13-19, 2009.
[26] A. Balakrishnan, R. Kadiyala, G. Dhiman, G. Ashok, S. Kautish, K. Yadav, and J. Maruthi Nagendra Prasad, “A personalized eccentric cyber-physical system architecture for smart healthcare,” Security and Communication Networks, vol. 2021, article no. 1747077, 2021. https://doi.org/10.1155/2021/1747077
[27] M. Pavel, H. B. Jimison, H. D. Wactlar, T. L. Hayes, W. Barkis, J. Skapik, and J. Kaye, “he role of technology and engineering models in transforming healthcare,” IEEE Reviews in Biomedical Engineering, vol. 6, pp. 156-177, 2013
[28] M. Muthukrishna and M. Schaller, “Are collectivistic cultures more prone to rapid transformation? Computational models of cross-cultural differences, social network structure, dynamic social influence, and cultural change,” Personality and Social Psychology Review, vol. 24, no. 2, pp. 103-120, 2020.
[29] A. Thieme, D. Belgrave, and G. Doherty, “Machine learning in mental health: a systematic review of the HCI literature to support the development of effective and implementable ML systems,” ACM Transactions on Computer-Human Interaction, vol. 27, no. 5, article no. 34, 2020. https://doi.org/10.1145/3398069
[30] P. Chikersal, D. Belgrave, G. Doherty, A. Enrique, J. E. Palacios, D. Richards, and A. Thieme, “Understanding client support strategies to improve clinical outcomes in an online mental health intervention,” in Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems, Honolulu, HI, 2020, pp. 1-16.
[31] A. Guo, E. Kamar, J. W. Vaughan, H. Wallach, and M. R. Morris, “Toward fairness in AI for people with disabilities SBG@ a research roadmap,” ACM SIGACCESS Accessibility and Computing, vol. 2020, no. 125, article no. 2, 2020. https://doi.org/10.1145/3386296.3386298
[32] M. D. Nemesure, M. V. Heinz, R. Huang, and N. C. Jacobson, “Predictive modeling of depression and anxiety using electronic health records and a novel machine learning approach with artificial intelligence,” Scientific Reports, vol. 11, article no. 1980, 2021. https://doi.org/10.1038/s41598-021-81368-4
[33] A. Sharma and W. J. Verbeke, “Understanding importance of clinical biomarkers for diagnosis of anxiety disorders using machine learning models,” PloS One, vol. 16, no. 5, article no. e0251365, 2021. https://doi.org/10.1371/journal.pone.0251365
[34] H. Byeon, “Exploring factors for predicting anxiety disorders of the elderly living alone in South Korea using interpretable machine learning: a population-based study,” International Journal of Environmental Research and Public Health, vol. 18, no. 14, article no. 7625, 2021. https://doi.org/10.3390/ijerph18147625
[35] S. Mutalib, “Mental health prediction models using machine learning in higher education institution,” Turkish Journal of Computer and Mathematics Education, vol. 12, no. 5, pp. 1782-1792, 2021.
[36] N. S. Alghamdi, H. A. H. Mahmoud, A. Abraham, S. A. Alanazi, and L. Garcia-Hernandez, “Predicting depression symptoms in an Arabic psychological forum,” IEEE Access, vol. 8, pp. 57317-57334, 2020.
[37] L. A. Dardas, N. Shoqirat, H. Abu-Hassan, B. F. Shanti, A. Al-Khayat, D. H. Allen, and L. A. Simmons, “Depression in Arab adolescents: a qualitative study,” Journal of Psychosocial Nursing and Mental Health Services, vol. 57, no. 10, pp. 34-43, 2019.
[38] A. Priya, S. Garg, and N. P. Tigga, “Predicting anxiety, depression and stress in modern life using machine learning algorithms,” Procedia Computer Science, vol. 167, pp. 1258-1267, 2020.
[39] D. Leightley, V. Williamson, J. Darby, and N. T. Fear, “Identifying probable post-traumatic stress disorder: applying supervised machine learning to data from a UK military cohort,” Journal of Mental Health, vol. 28, no. 1, pp. 34-41, 2019.
[40] T. Richter, B. Fishbain, A. Markus, G. Richter-Levin, and H. Okon-Singer, “Using machine learning-based analysis for behavioral differentiation between anxiety and depression,” Scientific Reports, vol. 10, article no. 16381, 2020. https://doi.org/10.1038/s41598-020-72289-9
[41] W. A. van Eeden, C. Luo, A. M. van Hemert, I. V. Carlier, B. W. Penninx, K. J. Wardenaar, H. Hoos, and E. J. Giltay, “Predicting the 9-year course of mood and anxiety disorders with automated machine learning: a comparison between auto-sklearn, naïve Bayes classifier, and traditional logistic regression,” Psychiatry Research, vol. 299, article no. 113823, 2021. https://doi.org/10.1016/j.psychres.2021.113823
[42] S. Basheer, S. Bhatia, and S. B. Sakri, “Computational modeling of dementia prediction using deep neural network: analysis on OASIS dataset,” IEEE Access, vol. 9, pp. 42449-42462, 2021.
[43] M. Alojail and S. Bhatia, “A novel technique for behavioral analytics using ensemble learning algorithms in E-commerce,” IEEE Access, vol. 8, pp. 150072-150080, 2020.
[44] G. Van Rossum and F. L. Drake, Python Tutorial. Amsterdam, The Netherlands: Centrum voor Wiskunde en Informatica, 1995
[45] G. Yenduri and T. R. Gadekallu, “Firefly-based maintainability prediction for enhancing quality of software,” International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, vol. 29, no. Suppl 2, pp. 211-235, 2021.
[46] S. Sharma, S. Gupta, D. Gupta, S. Juneja, P. Gupta, G. Dhiman, and S. Kautish, “Deep learning model for the automatic classification of white blood cells,” Computational Intelligence and Neuroscience, vol. 2022, article no. 7384131, 2022. https://doi.org/10.1155/2022/7384131

About this article
Cite this article

Razan Ibrahim Alsuwailem and Surbhi Bhatia*, Detection and Mathematical Modeling of Anxiety Disorder Based on Socioeconomic Factors Using Machine Learning Techniques, Article number: 12:52 (2022) Cite this article 1 Accesses

Download citation
  • Received31 December 2021
  • Accepted3 May 2022
  • Published15 November 2022
Share this article

Anyone you share the following link with will be able to read this content:

Provided by the Springer Nature SharedIt content-sharing initiative