ArticlesAll Issue
ArticlesQuantum Machine Learning Applied to Electronic Healthcare Records for Ischemic Heart Disease Classification
• Danyal Maheshwari1,*, Ubaid Ullah1,*, Pablo A. Osorio Marulanda1, Alain García-Olea Jurado2, Ignacio Diez Gonzalez2, Jose M. Ormaetxe Merodio2, and Begonya Garcia-Zapirain1

Human-centric Computing and Information Sciences volume 13, Article number: 06 (2023)
https://doi.org/10.22967/HCIS.2023.13.006

Abstract

Cardiovascular diseases refer to diseases that affect the heart and blood arteries. Most strategies developed to predict ischemic heart disease (IHD) are focused on pain characteristics, age, and sex, but many variables have been described as determinant risk factors for developing IHD. Therefore, machine learning algorithms are essential to make efficient decisions in predicting cardiac disease in the healthcare industry by considering a lot of medical data. Recent research has focused on implementing these approaches to quantum machine learning (QML) algorithms. This research proposes a set of computationally efficient QML algorithms, optimized quantum support vector machine (OQSVM), and hybrid quantum multi-layer perceptron (HQMLP) for the classification of cardiovascular disease. The use of efficient pre-processing and the robust feature selection techniques, i.e., Wrapper and Filter method improves the prediction rate and ensures the robustness of the proposed models. All the models are evaluated using the real-time cardiovascular dataset and recorded the performance in terms of accuracy. The performance metrics of the proposed models are compared to those of recently published models with more complicated architectures. The highest accuracies of the proposed OQSVM, and HQMLP models, considering 10 features of the cardiovascular dataset, are recorded at 94% and 93%, respectively. Furthermore, the proposed models are computationally effective and can be preferred for real-time healthcare applications.

Keywords

Cardiovascular Diseases, Quantum Machine Learning, Hybrid QMLP, OQSVM

Introduction

Cardiovascular diseases (CVD) are one of the world's deadliest diseases, according to the World Health Organization [1]. It cost the lives of around 17.9 million individuals in 2019. Heart attacks and stroke accounted for 85 percent of the deaths worldwide, the ratio of deaths related to CVD recorded as 32%. It is projected that by 2030, 32% of world moralities will be caused by heart attacks [11]. One of the CVD of interest is ischemic heart disease (IHD). IHD, also called coronary heart disease, is a problem that occurs when the narrowed coronary arteries that supply oxygen to the heart muscles are partially or blocked due to cholesterol deposits or abruptly by a blood clot. The problems related to coronary artery disease (CAD) show an upward trend in most European countries. CAD was the leading cause of mortality in Spain after coronavirus disease 2019 (COVID-19) [4]. Classifying by sex, most male were affected by retinopathy and CAD, whereas the majority of the female were affected by cerebrovascular diseases. Additionally, CAD becomes a public health problem when the numbers of people affected by the disease are considered and when it is too late for treatment, large amounts of money must be spent.
CAD, such as chronic coronary syndrome and myocardial infarction, are the leading cause of CVD (commonly known as a heart attack). Other cardiac diseases include hypertensive heart disease, stock, heart failure, cardiomyopathy, valvular heart disease, abnormal heart rhythm, thromboembolic disease, and carditis [3]. CVD has increased dramatically in advanced and developing countries over the last several decades for various causes, and environmental risk factors are considered responsible for the leading cause of CVD deaths. Several risk factors have been linked to CVD, including physical activity, family history, smoking, age, obesity, and lifestyle [5, 6]. The rapid increment in the statistics of heart failure disease is highly alarming. The prevalence of heart disease is steadily increasing, impacting people of all ages. This high, increasing ratio of cardiac disease is primarily due to a lack of exercise, poor diet, and being overweight. An estimated 2% of adults in heavily influenced countries suffer from heart failure.
Moreover, 6%–10% of residents over the age of 65 suffer from heart failure [7, which is one of the main consequences of CAD. In this context, diet-driven reductions in nutrition-related CVD risk factors such as obesity, diabetes, and hypertension were demonstrated in a simulated trial using the Archimedes model to lower the overall risk of myocardial infarctions and attack by 46% within real-world settings [8]. Numerous observational studies and meta-analyses of randomized clinical trials have highlighted the Mediterranean diet (MedDiet) [9]. The techniques of replacing and repairing transcatheter aortic and mitral valves have maximized patient care and significantly minimized the mortality rate [10]. Given the importance of CAD among the CVD spectrum, an accurate prediction of CAD development in patients presenting chest pain seems necessary. Machine learning (ML) techniques can handle several variables, accessible from electronic health records, related to CAD development in patients with chest pain. It could lead to the homogenization of diagnostic and therapeutic approaches between professionals in this clinical setting.
In recent years, quantum computing (QC) has proliferated in theoretical and practical, raising hopes for its potential effect in real-world applications. The fundamental concepts of nature, such as quantum mechanics, make QC more re-evaluated. In the recent century, the advancement of physics made purity materials and observation methods unpredictable and derived some quantum phenomena more detectable [11]. QML straddles two current research fields: QC and ML. To analyze the correlation between ML and QC to see how well the effects and outcomes of one field's approaches might be used to solve problems in the other domain. With the rapid rise in data, quantum processing power can provide an edge in quantum machine learning (QML) tasks, while classical ML eventually intended to restrict the classical computational model [12]. The impact of quantum computers on ML has become an innovative field of research. In order to use the advantages of ML, recent research advancements have made significant contributions to the enhancement of ML algorithms.
Many researchers have recently implemented different quantum algorithms (QAs) [13, 10] using real healthcare datasets [15, 16]. Although QML is one of the most promising research fields, various research groups and individual researchers are actively exploring it [14]. In particular, work on new ML approaches that take advantage of QC advantages is more important in healthcare [18]. In such a scenario, supervised learning is a new QML task that has sparked much attention from academia and the healthcare industry [19]. In binary classification problem, the exponentially experimental improvement contains several contributions, i.e., quantum support vector machine (QSVM) [2022], variational quantum classifier (VQC) [15, 16], quantum deep learning models [23a>, 24], error minimizing algorithm [25], and also pre-processing techniques [15, 16, 29]. The design and implementation of QANN [2724] provide a gateway for more algorithms to be applied in quantum states.
Moreover, several ways of classical encoding data into quantum states benefit from an exploratory cost reduction in terms of resources and the inclusion of nonlinearity in the data [30]. The kernel-based methodologies are helpful to obtain data linearity for linear classifiers [31]. Now the researchers are focusing on establishing a comprehensive QA capable of solving classification problems related to the healthcare industry. The use of QC techniques in ML applications, most of the QML algorithms [14, 20] using various datasets such as University of California Machine Learning Repository (UCI), Iris, Cancer, and MNIST. In short, QML and classical ML algorithms are widely used to diagnose healthcare problems and aid patients in various ways.
This work focused on the comprehensive and experimental investigation of QML and its applications in the healthcare domain. The implemented quantum classifiers are compared with the classical algorithms by considering the cardiovascular dataset to classify the CVD. The QML algorithms have accomplished a significant improvement in terms of overall accuracy. The paper's main objective is to demonstrate the significance of quantum-enhanced ML in the medical and healthcare sectors. The significant contribution of the paper is as follows.

For pre-processing, we present the Wrapper and Filter technique to extract the import features for the cardiovascular dataset. The Wrapper evaluates the utility of a subset of features by using it to train a model, while the Filter assesses the significance of features based on their relationship to the dependent variable.

We enhance the Qiskit implemented model such as OQSVM by optimizing its hyperparameters, by utilizing the different combinations of quantum gates, with different entanglements to obtain the circuit's depth.

We designed and implemented the hybrid quantum multi-layer perceptron (HWMLP) algorithm. Different gate combinations are used to encode the data in the quantum system, and additionally, the employment of the ansatz qubits helps to rotate the data to enhance the model.

The rest of the paper is divided into three sections. Section 2 is about the Material and Methods, which is further contains the dataset description, pre-processing, and proposed Quantum models such as OQSVM and HQMLP, Section 3 elaborates the Results and Discussions, and Section 4 is concluded with the conclusion and future work.

Material and Methods

This section provides the materials used in this experiment and the pre-processing methodology used. The subsection dataset description provides a brief overview of the gathered cardiovascular dataset and the importance of its key parameters, while the subsection methods contain feature selection data pre-processing technique, and the implemented models, which are described below.

Dataset Description
In this research, we used the compilation of electronic medical records (EHRs). The EHRs database contains the patient's clinical treatment data, compiled by clinical information systems into research-oriented datasets. Garcia Olea et al. [4] investigated the relationship between EHR features and the development of CAD in patients who used the public clinical care system for chest pain at the Basurto Hospital in Bilbao between 2016–2018.
Initially, EHRs were collected for around 2,199,711 patients, of whom 43,835 were required to care for chest pain in the specific period of the study. Among these, 10,463 patients who had no further diagnostic tests were withdrawn.
Then, the dataset resulted in the collection of 33,372 records, were 5,379 (16.1%) of these developed CAD in the period considered, as shown in Table 1. Initially, the dataset had 82 features, which were categorical or continuous variables, plus the output variable (CAD diagnosis). The data cleaning stage comprises four stages: eliminated the null fields variables, eliminated repetitious variables, eliminated inoperable variables, and eliminated variables with a high correlation rate.

Table 1. Whole database traits without removing features

 Number of registers Number of attributes Type Output variable (1) 33,372 82 Continuous and categorical IHD

Twenty variables that do not meet the condition were eliminated due to null fields, which means they do not have complete information for all the records.
Subsequently, in the elimination stage of repetitious variables, the 14 variables containing only values of 0 were removed because they made no significant contributions to the prediction models.
Likewise, some information was not valuable in this study, such as patient ID, admission dates, discharge dates, and other data were excluded. Finally, variables with a high Pearson correlation coefficient were removed to avoid model redundancy. A threshold was chosen to select between each pair of variables. The selected ones (between pairs) are bold and shown in Table 2. The variables that were eliminated in one pair and not selected to be removed in another pair were kept in the database.

Table 2. Selected variables according to the correlation coefficient.
 Var 1 Var 2 Pearson coefficient Type 2 diabetes Antidiabetics 0.7651 Age Pensionista 0.7039 Antithrombotic agents Beta-blockers 0.5239 Antithrombotic agents Lipid-lowering 0.5215 Block RAAS Lipid-lowering 0.4959

In the end, 42 input variables were obtained with the same output variable (i.e., CAD). Among the variables included in the cleaning data set, demographic variables such as gender and age are considered and variables that indicate the presence of mental illnesses, such as anxiety or depression. In addition, the variables incorporated with different types of addictions, such as alcoholism, certain types of medications or drugs like proton pump inhibitors, antithrombotic agents, anabolic steroids, and certain types of medical conditions such as osteoporosis, cancer, and STIs were considered. Finally, the patient information was included; those were exposed to different treatments, such as cardiac therapy or catheterization. The variables for the final dataset are shown in Table 3.

Feature Selection Method
The various features are used in the database, where a dimensionality reduction stage is performed with two purposes: to reduce the size of the features and better adapt the dataset for the used model, to achieve a better, more efficient analysis, and prediction of the model. Earlier, we said that dimensionality reduction is carried out by performing different feature selection methods and considering their two types: Filter and Wrapper techniques. Regardless of Filter methods, one of the data modeling approaches used, choose features based on a performance metric. Therefore, the modeling algorithms can only be employed to return the best features after being discovered. On the other hand, Wrappers will evaluate the feature subsets based on their performance quality for the modeling technique used as a black box evaluator. For example, a Wrapper will evaluate the performance of subsets based on the clustering algorithm and clustering tasks. With Filters, the subset construction is based on the search strategy, and the evaluation is repeated for each subset. Because Wrappers are dependent on the resource demands of the modeling process, they are substantially slower than Filters at obtaining sufficient and appropriate subsets [32].

Table 3. Variables name and type for clean dataset
 Var name Data type and value Cateterismo Categorical- -1/1 Ergometría Categorical- -1/1 ECO_Estrés Categorical- -1/1 Ecocardioagrama Categorical- -1/1 ECG Categorical- -1/1 Depresión Categorical- -1/1 Alcohol Categorical- -1/1 Drogodependencia Categorical- -1/1 Ansiedad Categorical- -1/1 Demencia Categorical- -1/1 Insuficiencia_Renal Categorical- -1/1 Osteoporosis Categorical- -1/1 Diabetes_Tipo_1 Categorical- -1/1 Diabetes_Tipo_2 Categorical- -1/1 Dislipidemia Categorical- -1/1 Hipercolesterolemia Categorical- -1/1 Fibrilación_palpitación Categorical- -1/1 Flutter Categorical- -1/1 Insuficiencia_Cardiaca Categorical- -1/1 Cáncer Categorical- -1/1 Edad Numerical- Age in years Residencia Categorical- -1/1 Agentes_Antitrombóticos Categorical- -1/1 Ácido_Acetilsalicílico Categorical- -1/1 Inhibidores_De_La_Bomba_De_Protones Categorical- -1/1 Diuréticos Categorical- -1/1 Anticonceptivos_Hormonales Categorical- -1/1 Esteroides_Anabólicos Categorical- -1/1 Preparados Antigotosos Categorical- -1/1 Inmunomoduladores Categorical- -1/1 Antidepresivos Categorical- -1/1 Antipsicóticos Categorical- -1/1 Benzodiacepinas Categorical- -1/1 TerapiaCardiaca Categorical- -1/1 Antihipertensivos Categorical- -1/1 Vasodilatadores Categorical- -1/1 Betabloqueantes Categorical- -1/1 Antagonistas_Del_Ca Categorical- -1/1 Bloqueo_SRAA Categorical- -1/1 Antilipemiantes Categorical- -1/1 Its Categorical- -1/1 Hipertensión Categorical- -1/1

In this research, the Wrapper methods used are recursive feature elimination (RFE) with logistic regression (LR) and random forest (RF), and the Filter method called maximum relevance minimum redundancy (mRMR). The RFE approach works by deleting attributes recursively and developing a model based on the remaining attributes, sorting the features by relevance. The Broyden-Fletcher-Goldfarb-Shannon (LBFGS) solver is used as an estimator in logistic regression with RFE, and the numbers of steps and volume are tuned. In addition, using 100 trees in the forest and bootstrap samples to create the trees, the RF technique was used as the criterion to determine the quality of the split, the "Gini Index." On the other hand, the mRMR algorithm quantifies the redundancy and relevance using mutual information, quantifying mutual information of variables-pairwise, and mutual information of a feature and the response variable. Using the Wrapper and Filter methods for three classifiers, such as RF, RFE with LR, and mRMR models, the intersection technique, which includes standard features in the individual classifiers, was provided. These three classifiers extracted the 10 essential features from our database. A flow diagram of the steps for the feature selection is shown in Fig. 1.

Fig. 1. Feature selection flow diagram.

Proposed Quantum Algorithms
QML is a technique for enhancing ML algorithms and applying them to quantum systems. It is considered the sub-contribution of the research field of quantum information processing, which has the data learning capabilities to develop an efficient algorithm. QC uses quantum theory to determine the information of QAs that run on these quantum systems. This section contains the implementation of QML algorithms, as indicated in Fig. 2.

Optimized quantum support vector machine
Support vector machine (SVM) is a popular ML model that may select the best hyperplane for classification problems. In the case of the linear and separable dataset we have [33],

(1)

where M indicates the training dataset size with dimension N. To classify the data throughout the N dimensional hyperspace effectively, it is possible to construct an (N-1) dimensional hyperplane with the most significant margin. Here we consider the binary classification, denoting the labels $y_i$=±1 for simplicity.

(2)

Suppose an N-dimensional hyperplane space is denoted by a set of parameters There are many solutions exists for and b in order to satisfy for every training dataset.

Fig. 2. Block diagram of quantum proposed models.

The main task of the SVM is to define the optimal separation between the two classes (1, –1). The SVM also finds a maximum margin hyperplane with normal vector that splits the two classes for classification. There are no data points inside the margin, which is defined by two parallel hyperplanes separated by the most significant feasible distance Formally, the construction of hyperplane enlarges the distance so that and that in the class (–1), respectively, and indicates the offset of the hyperplane as shown in Fig. 3.

Fig. 3. Support vector machine. Adapted from [16].

The kernel method [33, 34] involves applying a feature map to create a separable hyperplane by plotting the data from two-dimension space into higher dimensional space, as shown in Fig. 4. The mathematical form of the optimization problem is the same as the linear constrained convex quadratic equation for $min_{ω,b}$ in such that $y_i$ ($wx_i$+b)≥1,i=1….N. The methodology for minimizing optimization problems is identical to dual problem minimization. In order to minimize the dual formulation, which has the constraints of the function of Karush-Kuhn-Tucker multipliers Thas been used [29].

(3)

Fig.4. (a) Coverting a nonlinear data into 2D space. (b) Separable hyperplane using Kernel method.

The hyperparameters are obtained through In such a scenario, only a few non zero α_iexists, which correlate to $x_i$ and lie on the two hyperplanes. Despite the fact that data points are not linearly separable, the kernel approach is used to differentiate the data. The kernel scheme uses the kernel matrix, a key component in most of the ML tasks [35, 36], in order to evaluate to the kernel function k(x,x' ). Moreover, in the kernel matrix to solve the dual form, first evaluate M(M-1)/2 with dot product In such case, convex quadratic programming is used to find the best $α_i$ value, with O($M^3$ ) incase of non-sparse. The time taken for each dot product evaluation is subjected to O(N),while the whole classical SVM algorithm the time along with accuracy is computed as For data x ⃗ the new classified is as follows [33].

(4)

QSVM is a supervised QML technique for data classification and regression. The QSVM algorithm works in the same manner as classical SVM does, except some additional implementations take place, which are executed on quantum processors. In QSVM, the classical data points are converted into quantum state variables by using feature mapping (PauliFeatureMap) [16].

(5)

The arbitrary classical function has applied to classical data points x. The circuit for OQSVM is shown in Fig. 5, where the primary function of the unitary gate is to rotate the qubits to the desired value of For each classical input, the classified classical data (–1, 1) are obtained by using measurement operation, which further depends on quantum circuit W(θ). Consequently, we can state that these test datasets are linked to the desired labeled data.
Fig.5. OQSVM circuit.

Hybrid quantum multi-layer perceptron
Neural networks (NNs) have gotten a lot of attention since they have proven extremely useful in pattern recognition and optimization. An artificial neural network (ANN) learns or trains by sorting the weight values to achieve a desired output from the reciprocal input. It can be conceived as a reduction in the error function determined by the discrepancy between the NN's output and the training set's predicted outcomes [37]. A NN, often known as a multi-layer network, is a type of ANN.
The multi-layer perceptron (MLP) is a widely used architecture utilizing the back-propagation training model in the NN domain. Although the most crucial component is the MLP model's definition, many connections can be the source of an overfitting problem with training datasets. At the same time, a lack of connectivity can make the model ineffective at elucidating the issue of inadequate adopted parameters [37]. Therefore, the fundamental research objective in MLP is to optimize the number of nodes and hidden layers for designing an MLP model to solve the issue of contingency on the number of layers. The MLP comprises two main layers, including the input and output layers and several hidden layers [38].
The number of neurons in the input layer is the same as the class labels for the pattern challenge measurement, and the number of nodes in the outcome layer is more like the number of predicated classes. In other words, there are two neuron layers in the binary categorization. A model issue is defined as the preferred number of layers and neurons in each layer and connection, and our foremost goal is to optimize it for a significant subnet with adequate attributes and adaptability for classification. The feed-forward QML model consists of 1 input layer with 10 features, 3 hidden layers with 10, 32, and 10 neurons, respectively, and the output layer. Assume that K is the number of input layers and k denoted the number of neurons K=($k_0$,k1,…,$k_n$) with the ReLU activation function. The HQMLP model has an input layer and output layer; in between these, it has hidden layers and quantum circuits, as indicated in Fig. 6. The back-propagation approach may improve the HQMLP classification model, analogous to the quantum perceptron. A succession of unitary operators can also be used to build the learning method. Suppose there are d samples in total where [39],

(6)

(7)

Fig.5. Hybrid quantum multi-layer perceptron.

(8)

The primary concept behind this method is to use an objective function as a guide to improving the parameters [16]. By considering the weights in the hidden layers as a matrix, we introduce a unitary operator then, we can define the weight of the hidden layer of HQMLP in a superposition by integrating variables Ry,Rz gates, and entangles with CNOT gate for parameterized ansatz circuit. The hidden layer circuits have the ansatz circuit, which parameterized input x dependent on the number of parameters θ, and measurement are all part of this process. The classical phase includes the circuit output, the objective function, and the learning procedure. The HQMLP is approximated using optimization approaches such as ADAM. The ansatz circuit is sometimes used to address complex optimization problems [38, 39].

(9)

To measure quantum circuits, assessing the possibilities by carrying out the decisive measurement. It is the same as taking a large number of samples from a set of possible computational base states and finding the average value.

The training aims to find the model parameters to minimize a certain loss function [16]. A quantum framework can be optimized the same way as classical NNs. Evaluating the process forward in both cases is to find the loss function because the gradient of a quantum circuit can be determined by using the gradient-based optimization techniques as a loss function to update our trainable parameters during training the model. Using this strategy, we can compute the difference between our estimates and the actual labels, which is stated as a loss evaluation function. Algorithm 1 shows that an optimization technique is used to update the quantum circuit's parameters after accomplishing measurements. The traditional loop trains our parameters until the cost function's significance diminishes.

Results and Discussion
The experimental results are carried out in a classical device that replicated a physical quantum device. The results for the proposed algorithms have been taken by considering the cardiovascular dataset, which contains 42 attributes related to CVD. The proposed models have been tested by considering the 10 important features extracted from 42 features. Using the Wrapper and Filter method for three classifiers, such as RF, LR, and mRMR models, the intersection technique, which includes standard features, has been considered for the critical feature selection process. The models have been implemented using Python3 with ML, Qiskit, and PyTorch libraries. The proposed algorithms run on the IBM Statevector simulator using application programming interface (API). Moreover, the data is normalized using feature transformation and scaling approaches, allowing machine learning and quantum-enhanced ML models to handle the data equally.
The min-max and standard scalar are used, which is responsible for scaling the data in a defined range (1, 0). The min-max scaling is obtained by using [16].

(10)

(11)

where x is the original data value, and n ̂ is the normalized data value. The balanced and normalized dataset is then split into a training and testing dataset with 80% and 20% ratios. Each classifier is built using training data, and the test data is used to compare the classifier's predicted labels to known test labels. The task of the proposed work is to classify the two possibility classes 1 and class 0 (1,0). As shown in Table 4, the model computes the mean when the prediction labels are equal to the actual labels using accuracy metrics. The analysis of all the classical and quantum models in terms of accuracies is shown in Fig. 7.

Table 4. Performance matric of the proposed model by using CVD dataset
 Classifier Confusion matrix Performance evaluation Average values Actual Prediction Precision Recall F1-score Test size Accuracy (%) F1-score (%) 0 1 SVM 0 304 6 0.95 0.98 0.96 310 96 96 1 17 295 0.98 0.95 0.96 312 OQSVM 0 282 26 0.96 0.91 0.93 310 94 94 1 19 293 0.91 0.96 0.94 312 MLP 0 301 11 0.9 0.96 0.93 312 96 93 1 32 278 0.96 0.9 0.93 310 HQMLP 0 299 13 0.93 0.96 0.94 312 93 92 1 33 277 0.96 0.89 0.92 310
Fig 7. erformance analysis of all proposed models.

Evaluation of Optimized Quantum Support Vector Machine
This set of experiments proves that QAs can help a model execute more quickly than classical algorithms. The performance metrics of the proposed models OQSVM and SVM for the cardiovascular dataset are shown in Table 4. In the case of the OQSVM algorithm, the precision score for class 1 and class 0 is noted as 92% and 96%, respectively. The recall and F1-score values for classes 1 and 0 are recorded as 96%, 91%, 94%, and 94%. In short, we can say that our proposed OQSVM and SVM algorithms achieved the average accuracy of 93% and 96% for both classes equally. The proposed algorithms' competency is compared with the results of published papers in the same criteria as demonstrated in Table 5. For instant, Bai et al. [19] present a quadratic kernel free least squares support vector machine (QLSSVM) for binary classification by considering different datasets. The decision variables of QLSSVM are separated into local and global variables using the consensus technique. The model is demonstrated through numerical testing using two different training data sets and achieved the highest accuracy, 91.2%, in the heart diseases classification, which is 2.8% less than our proposed OQSVM algorithm. The model was initially tested on artificial data to confirm the model's performance. Furthermore, the classification heart diseases dataset containing 13 features and 270 samples is considered. The use of QML algorithms helps process a massive amount of data quickly. In such an era, Moradi et al. [20] present an article by considering three different clinical datasets. In order to classify the disease, two QML algorithms: quantum distance classifier (qDS) and simplified quantum kernel SVM (sqKSVM), have been investigated. Using the linear time quantum encoding methodology of 15-qubit IBMQ Melbourne QC uses different ways to embed classical data into quantum states and estimate the inner product. The models are trained with the multiple datasets by evaluating different encoding techniques and achieved the highest accuracy of 92% with the UCI Breast Cancer dataset. In the case of the heart failure dataset, both the models obtained the highest accuracy of 62% and 60%, respectively, for the same data encoding techniques.

Table 5. Comparison study of OQSVM model
 Study Algorithm Used methodology # of total samples Accuracy (%) Bai et al. [19] QLSSVM Used consensus technique 270 91.2 Moradi et al. [20] sqKSVM Used quantum distance classification 600 92 Houssein et al. [21] QKSVM–BHHO Feature map optimization, PCA 139 88 Proposed OQSVM Feature selection, Optimization 3,312 93
Optimization is a fundamental concept in classification algorithms. A hyperplane cannot segregate data in its initial space in most cases. The technique of a non-linear transformation function to the data is known as a feature map. A hybrid quantum kernel SVM (QKSVM) with (BHHO) optimization algorithm for cancer classification has been introduced by Houssein et al. [21]. For pre-processing, the essential features are selected by using the Filter and Wrapper technique. The optimization of different feature map techniques along with principal component analysis (PCA) is considered to classify breast cancer diseases. The training was obtained using the colon dataset with 62 samples and breast cancer dataset with 139 samples and achieved 95% for both datasets with different feature mapping techniques. In short, our proposed OQSVM model outperforms and is more competent than the above discussed published models in terms of cardiovascular diseases classification.

Evaluation of Hybrid Quantum Multi-Layer Perceptron
We have also evaluated our third proposed HQMLP and MLP using the cardiovascular dataset shown in Table 4. The accuracy, recall, and F1-scores for both classes are 95.5%, 89.3%, and 92.3%, respectively, in this analysis portion using our HQMLP model. On the other hand, our improved classical MLP model achieved the limit of precision score 96.1%, recall 89.6%, and F1-score 92.8% for both class 1 and class 0. Finally, the average accuracies for both HQMLP and MLP models are recorded as 93% and 96%, respectively. Table 6 indicates that our proposed models' capability has been obtained and compared with previously published models. For instance, Patel and Tiwari [26] presented an article or binary classification by considering various clinical data, including the heart diseases dataset. A quantum NN has been designed, consisting of input, hidden, and output layers. The results have been taken by optimizing the hidden layer neuron and recording the computational time and accuracy. The model has tested for the small amount of UCI breast cancer dataset, diabetes dataset, and heart disease dataset. In the case of the heart diseases dataset, the model achieved 88.6% of the highest accuracy. To compare our proposed HQMLP model results, we have achieved 4% more than the counterpart. One of the primary advantages of the QML over the classical ML is the existence of adjustable hidden variables, which provide increased data density, as presented by Aishwarya et al. [27]. The performance of the variational quantum classifiers (VQC) hybrid NN and quantum annealing classifiers has been evaluated using the NeuroMarketing dataset. The ability to represent cognitive states of the human mind using hybrid quantum-classical techniques is demonstrated in this search. The VQC and quantum annealing classifiers are executed using PennyLane and DW2000Q-5QPU quantum processors. The VQC achieved the highest accuracy, 53%–55%, while for quantum annealing, it obtained 60% accuracy, which is much lower than our proposed model.

Table 6. Comparative study for HQMLP model
 Study Algorithm Used methodology # of total samples Accuracy (%) Patel and Tiwari [26] Q-BNN Quantum NN with 3 layers 270 88.6 Aishwarya et al. [27] VQC Hybrid neural network 132 53-55 Tang and Shu [28] RS-QNN Feature extraction, WT, RS - 91.7 Proposed HQMLP Feature selection, Optimization 3,312 93

Conclusion and Future Direction

The wide range of applications of quantum-enhanced ML in healthcare has been investigated empirically. The main objective of this research is to compare and contrast standard and quantum-enhanced ML methods for predicting CAD. The proposed QAs minimize the computation time and ensure accuracy for a previously carried out task using more complicated architectures. In order to cope with the binary classification problem all, the proposed quantum algorithms are well performed in terms of accuracy for real application of cardiovascular dataset. In the OQSVM model, we have used the Qiskit library by adding some additional hyperparameters for optimization. The booming pre-processing technique for such a model is also an additional benefit to evaluate it accurately. The PauliFeatureMap technique is used to convert classical data into the quantum state for the OQSVM model with two repetitions. Optimizing the rotation of X, Y, and Z gates in PauliFeatureMap ensures the depth of data. The model is executed on IBM, Statevector-simulator for cardiovascular dataset and recorded a potential rise in accuracy compared with other models.
The second implemented model, HQMLP, contains three major blocks: feed-forward propagation, data encoding, and back-propagation. The essential feature has been extracted by using machine learning models. The feed-forward propagation further consists of the linear network layer, maxpooling layer, fully connected layer, ReLU activation function, and dropout layer. An Ansatz circuit and Pauli Feature are used to obtain data encoding, while Adam optimizer and MSE are used for back-propagation. The model is also run-on the IBM simulator for the same testing (20%) and training (80%) of the cardiovascular dataset. The model's performance is ensured by comparing it with other published models and obtaining the accuracy maximization. In short, the proposed models and the robust pre-processing technique demonstrate competent and better results, when compared to the published state-of-the-art methodologies. The major limitation of this study is the noisy intermediate scale quantum (NISQ) device, which has a limited number of qubits along with the size of the dataset and computational time. In order to use more qubits and large datasets, the QC device limits the logic gate's rotation, circuit length, and noise error. These challenges may affect the state of qubits, where a wrong rotation might lead to an error in the outcome.
Some aspects of this research study can be improved further in the future direction. For example, the proposed OQVSM and HQMLP are designed to tackle two-class classification problems. Nevertheless, extending this research to multi-class classification problems would be interesting. Furthermore, the proposed model's efficiencies are only explored using numeric data (cardiovascular dataset). Therefore, it would be essential to analyze its behavior using various types of data accessible in the medical area. Furthermore, we can also increase the number of features and dataset size and extend this work by applying quantum deep neural networks for binary classification and regression problems.

Author’s Contributions

Conceptualization, DM, UU. Investigation and methodology, PM. Supervision, BZ, AJ. Data curation, IG, JM. Writing of the original draft, DM, UU. Writing of the review and editing, DM, UU, BZ.

Funding

This project has received funding from the European Union's Horizon 2020 Research and Innovation Program under the Marie Skłodowska-Curie (Grant No. 847624); in addition, a number of institutions back and co-finance this project. Furthermore, the Basque government to the eVIDA Research Group, the University of Deusto (Grant No. IT-1536-22), coordinated by Basurto Hospital in Bilbao, contributed to the database. Finally, the Clinical Research Ethics Committee of Euskadi (No. PI202031) in Spain validated the research protocols.

Competing Interests

The authors declare that they have no competing interests.

Author Biography

Please be sure to write the name, affiliation, photo, and biography of all the authors in order. Only up to 100 words of biography content for each author are allowed.

Name: Danyal Maheshwari
Affiliation: eVIDA Research Group, University of Deusto
Biography: Danyal Maheshwari was born in 1993 in Hyderabad, Pakistan. He graduated from the Mehran University of Engineering Technology in Jamshoro, Pakistan, with a B.E. and an M.E. in Biomedical Engineering. He was an Erasmus Scholar at the University of Limerick in Ireland for his bachelor's and master's degrees. He is now pursuing a PhD in engineering at the University of Deusto. He is also working with the eVIDA research team. His research is focused on Quantum Machine Learning for biomedical and medical data.

Name: Ubaid Ullah
Affiliation: eVIDA Research Group, University of Deusto
Biography: Ubaid Ullah received his BSc degree and M.Sc. degree in electronics from the Department of Electronics, University of Peshawar, Pakistan, in 2013 and 2016, respectively. He has also earned his M.Phil. degree in electronics from the Department of Electronics, Quaid-i-Azam University, Islamabad, Pakistan, in 2018. He has published many research articles in various journals, i.e., IEEE Access, MDPI Energies, and MDPI Symmetry. He has also served as a reviewer in various journals. He is currently working as a Research Assistant under COFUND Marie Skłodowska-Curie Fellowship at the Faculty of Engineering, University of Deusto, Bilbao, Spain. His current research interests included Quantum Machine Learning in the healthcare domain.

Name: Pablo A. Osorio Marulanda
Affiliation: eVIDA Research Group, University of Deusto
Biography: Pablo A. Osorio Marulanda was born in the year 2000, at Medellin, Colombia. He is a Colombian Mathematical Engineering student at EAFIT University. He is particularly interested in the application of mathematics to biological and medical domains, as well as the usage of AI and Machine Learning in these fields. He has worked on a variety of machine learning topics, including image processing, farm intelligence, and disease identification.

Affiliation: Cardiology Service, Basurto University Hospital, Bilbao, Bizkaia , Spain
Biography: Alain García Olea is a Cardiology resident at Basurto University Hospital and belongs to the Biocruces Clinical Cardiology Research Unit. He has also carried out review tasks for Elsevier, being among the reviewers of the 1st Edition of the book entitled "Netter's Introduction to Clinical Procedures". In the field of Artificial Intelligence research, he has presented papers at national and European congresses on prediction of ischemic heart disease and estimation of atrial fibrillation recurrence from the electronic medical record of patients, work that he has been promoting since 2020.

Name: Ignacio Diez Gonzalez
Affiliation: Cardiology Service, Basurto University Hospital, Bilbao, Bizkaia , Spain
Biography: Dr. Ignacio Díez González is a Cardiologist and head of the chest pain unit of the Cardiology Service of the Basurto University Hospital (Osakidetza-Basque Health Service) since December 2013 with specialization accreditation in heart disease diagnostic techniques. He is the head of the Biocruces Clinical Cardiology Research Department. In relation to the subject of the study, he has participated as principal investigator in the European Commission project: Diagnostic Imaging Strategies for Patients With Stable Chest Pain and Intermediate Risk of Coronary Artery Disease (DISCHARGE), recently published in the New England Journal of Medicine.

Name: Jose M. Ormaetxe Merodio
Affiliation: Cardiology Service, Basurto University Hospital, Bilbao, Bizkaia , Spain
Biography: Dr. José Ormaetxe Merodio is the head of the Cardiology service in the Basurto University Hospital (Osakidetza-Basque Health Service). He is an electrophysiologist, and he has published several international works in this field in Q1 journals. He has recently been part of the European Committee for the development of supraventricular tachycardia guidelines.

Name: Begonya Garcia-Zapirain
Affiliation: eVIDA Research Group, University of Deusto
Biography: Begonya Garcia-Zapirain was born in San Sebastián, Spain, in 1970. She graduated in telecommunication engineering from the University of Basque Country, Spain, in 1994. She received a Ph.D. degree in computer science and artificial intelligence from the University of Deusto, Spain, in 2004. From 2002 to 2008, she served as the Director of the Telecommunication Department, University of Deusto, Spain, where she is working as a Full Professor. In 2001, she created the eVida research group, which is recognized by the Government of the Basque Country, Spain, and the European Network of Living Labs (ENoLL).

References

[1] World Health Organization, “Cardiovascular diseases (CVDs),” 2023 [Online]. Available: https://www.who.int/news-room/fact-sheets/detail/cardiovascular-diseases-(CVDs).
[2] O. Lohaj, Z. Pella, and J. Paralic, “Data analytics methods for analyzing the impact of factors on early detection of cardiovascular risk,” SAMI 2022 - IEEE 20th Jubil. World Symp. Appl. Mach. Intell. Informatics, Proc., pp. 249–254, 2022, doi: 10.1109/SAMI54271.2022.9780806.
[3] H. Wang, M. Naghavi, C. Allen, R.M. Barber, Z. A. Bhutta, A. Carter, et al., “Global, regional, and national life expectancy, all-cause mortality, and cause-specific mortality for 249 causes of death, 1980–2015: a systematic analysis for the Global Burden of Disease Study 2015,” Lancet, vol. 388, no. 10053, pp. 1459-1544, 2016.
[4] A. Garcia Olea, M. Jojoa Acosta, I. Diez Gonzalez, M. B. Garcia Zapirain, I. Fernandez De La Prieta, M. Maeztu Rada, et al., “Electronic health records features associated to coronary artery disease in patients with chest pain,” European Heart Journal, vol. 42, no. Supplement_1, article no. ehab724-1151, 2021. https://doi.org/10.1093/eurheartj/ehab724.1151
[5] D. J. Monlezun, L. Dart, A. Vanbeber, P. Smith-Barbaro, V. Costilla, C. Samuel, et al., “Machine learning-augmented propensity score-adjusted multilevel mixed effects panel analysis of hands-on cooking and nutrition education versus traditional curriculum for medical students as preventive cardiology: multisite cohort study of 3,248 trainees over 5 years,” BioMed Research International, vol. 2018, article no. 5051289, 2018. https://doi.org/10.1155/2018/5051289
[6] Institute of Medicine. Promoting Cardiovascular Health in the Developing World: A Critical Challenge to Achieve Global Health. Washington, DC: National Academic Press, 2010.
[7] L. Bettari, M. Fiuzat, G. M. Felker, and C. M. O’Connor, “Significance of hyponatremia in heart failure,” Heart Failure Reviews, vol. 17, pp. 17-26, 2012.
[8] R. Kahn, R. M. Robertson, R. Smith, and D. Eddy, “The impact of prevention on reducing the burden of cardiovascular disease,” Circulation, vol. 118, no. 5, pp. 576-585, 2008.
[9] T. B. Huedo-Medina, M. Garcia, J. D. Bihuniak, A. Kenny, and J. Kerstetter, “Methodologic quality of meta-analyses and systematic reviews on the Mediterranean diet and cardiovascular disease outcomes: a review,” The American Journal of Clinical Nutrition, vol. 103, no. 3, pp. 841-850, 2016.
[10] A. J. Russak, F. Chaudhry, J. K. De Freitas, G. Baron, F. F. Chaudhry, S. Bienstock, et al., “Machine learning in cardiology: ensuring clinical impact lives up to the hype,” Journal of Cardiovascular Pharmacology and Therapeutics, vol. 25, no. 5, pp. 379-390, 2020.
[11] A. Abbas, M. Schuld, and F. Petruccione, “On quantum ensembles of quantum classifiers,” Quantum Machine Intelligence, vol. 2, article no. 6, 2020. https://doi.org/10.1007/s42484-020-00018-6
[12] D. Sierra-Sosa, M. Telahun, and A. Elmaghraby, “TensorFlow quantum: Impacts of quantum state preparation on quantum machine learning performance,” IEEE Access, vol. 8, pp. 215246-215255, 2020.
[13] D. Maheshwari, B. Garcia-Zapirain, and D. Sierra-Soso, “Machine learning applied to diabetes dataset using quantum versus classical computation,” in Proceedings of 2020 IEEE International Symposium on Signal Processing and Information Technology (ISSPIT), Louisville, KY, 2020, pp. 1-6.
[14] T. Bromley, “Ensemble classification with Rigetti and Qiskit devices,” 2020 [Online]. Available: https://pennylane.ai/qml/demos/ensemble_multi_qpu.html.
[15] D. Sierra-Sosa, J. D. Arcila-Moreno, B. Garcia-Zapirain, and A. Elmaghraby, “Diabetes type 2: Poincaré data preprocessing for quantum machine learning,” Computers, Materials & Continua, vol. 67, no. 2, pp. 1849-1861, 2021.
[16] D. Maheshwari, D. Sierra-Sosa, and B. Garcia-Zapirain, “Variational quantum classifier for binary classification: real vs synthetic dataset,” IEEE Access, vol. 10, pp. 3705-3715, 2022.
[17] M. Benedetti, E. Lloyd, S. Sack, and M. Fiorentini, “Parameterized quantum circuits as machine learning models,” Quantum Science and Technology, vol. 4, no. 4, article no. 043001, 2019. https://doi.org/10.1088/2058-9565/ab4eb5
[18] A. Perdomo-Ortiz, M. Benedetti, J. Realpe-Gomez, and R. Biswas, “Opportunities and challenges for quantum-assisted machine learning in near-term quantum computers,” Quantum Science and Technology, vol. 3, no. 3, article no. 030502, 2018. https://doi.org/10.1088/2058-9565/aab859
[19] Y. Bai, X. Han, T. Chen, and H. Yu, “Quadratic kernel-free least squares support vector machine for target diseases classification,” Journal of Combinatorial Optimization, vol. 30, pp. 850-870, 2015.
[20] S. Moradi, C. Brandner, C. Spielvogel, D. Krajnc, S. Hillmich, R. Wille, and L. Papp, “Clinical data classification with noisy intermediate scale quantum computers,” Scientific Reports, vol. 12, no. 1, article no. 1851, 2022. https://doi.org/10.1038/s41598-022-05971-9
[21] E. H. Houssein, Z. Abohashima, M. Elhoseny, and W. M. Mohamed, “An efficient binary Harris hawks optimization based on quantum SVM for cancer classification tasks,” in Proceedings of the 2nd International Conference on Distributed Sensing and Intelligent Systems (ICDSIS), Virtual Event, 2021, pp. 247-258.
[22] Y. Levine, D. Yakira, N. Cohen, and A. Shashua, “Deep learning and quantum entanglement: Fundamental connections with implications to network design,” in Proceedings of the 6th International Conference on Learning Representations (ICLR poster), Vancouver, Canada, 2018.
[23] I. Cong, S. Choi, and M. D. Lukin, “Quantum convolutional neural networks,” Nature Physics, vol. 15, no. 12, pp. 1273-1278, 2019.
[24] M. Niemiec, “Error correction in quantum cryptography based on artificial neural networks,” Quantum Information Processing, vol. 18, no. 6, article no. 174, 2019. https://doi.org/10.1007/s11128-019-2296-4
[25] S. Lloyd, M. Mohseni, and P. Rebentrost, “Quantum principal component analysis,” Nature Physics, vol. 10, no. 9, pp. 631-633, 2014.
[26] O. P. Patel and A. Tiwari, “Quantum inspired binary neural network algorithm,” in Proceedings of 2014 international Conference on Information Technology, Bhubaneswar, India, 2014, pp. 270-274.
[27] S. Aishwarya, V. Abeer, B. B. Sathish, and K. N. Subramanya, “Quantum computational techniques for prediction of cognitive state of human mind from EEG signals,” Journal of Quantum Computing, vol. 2, no. 4, pp. 157-170, 2020.
[28] X. Tang and L. Shu, “Classification of electrocardiogram signals with RS and quantum neural networks,” International Journal of Multimedia and Ubiquitous Engineering, vol. 9, no. 2, pp. 363-372, 2014.
[29] T. Hofmann, B. Scholkopf, and A. J. Smola, “Kernel methods in machine learning,” Annals of Statistics, vol. 36, no. 3, pp. 1171-1220, 2008.
[30] K. R. Muller, S. Mika, K. Tsuda, and K. Scholkopf, “An introduction to kernel-based learning algorithms,” IEEE Transactions on Neural Networks, vol. 12, no. 2, pp. 181-201, 2001.
[31] A. Jovic, K. Brkic, and N. Bogunovic, “A review of feature selection methods with applications,” in Proceedings of 2015 38th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO), Opatija, Croatia, 2015, pp. 1200-1205.
[32] P. Rebentrost, M. Mohseni, and S. Lloyd, “Quantum support vector machine for big data classification,” Physical Review Letters, vol. 113, no. 13, article no. 130503, 2014. https://doi.org/10.1103/PhysRevLett.113.130503
[33] Y. Ruan, X. Xue, H. Liu, J. Tan, and X. Li, “Quantum algorithm for k-nearest neighbors classification based on the metric of hamming distance,” International Journal of Theoretical Physics, vol. 56, pp. 3496-3507, 2017.
[34] S. Boyd and L. Vandenberghe, Convex Optimization. Cambridge, MA: Cambridge University Press, 2014.
[35] D. Coppersmith and S. Winograd, “Matrix multiplication via arithmetic progressions,” Journal of Symbolic Computation, vol. 9, no. 3, pp. 251-280, 1990.
[36] M. Schuld, “Supervised quantum machine learning models are kernel methods,” 2021 [Online]. Available: https://arxiv.org/abs/2101.11020.
[37] H. Ramchoun, Y. Ghanou, M. Ettaouil, and M. A. Janati Idrissi, “Multilayer perceptron: architecture optimization and training,” International Journal of Interactive Multimedia and Artificial Intelligence, vol. 4, no. 1, pp. 26-30, 2016.
[38] C. Shao, “A quantum model for multilayer perceptron,” 2018 [Online]. Available: https://arxiv.org/abs/1808.10561.
[39] C. Shao, “A quantum model of feed-forward neural networks with unitary learning algorithms,” Quantum Information Processing, vol. 19, no. 3, article no. 102, 2020. https://doi.org/10.1007/s11128-020-2592-z

Danyal Maheshwari1,*, Ubaid Ullah1,*, Pablo A. Osorio Marulanda1, Alain García-Olea Jurado2, Ignacio Diez Gonzalez2, Jose M. Ormaetxe Merodio2, and Begonya Garcia-Zapirain1, Quantum Machine Learning Applied to Electronic Healthcare Records for Ischemic Heart Disease Classification, Article number: 13:06 (2023) Cite this article 1 Accesses