Cuckoo inspired algorithms for feature selection in heart disease prediction

.


Introduction
Heart disease is the highest killer disease in many countries in the world including UK, USA, Canada, and Wales.Nearly, 370,000 deaths are recorded in the US each year due to heart disease while, in India, nearly two million peoples are being attacked with heart failure every year, out of which most them are youths.In addition, research showed that in India an average of 1 death in every 33 seconds while in US heart disease attack someone in every 42 seconds [1].Its projected that near 31% of the global death is due to heart disease and the figure is predicted to rise over 130 million by the year 2035 [1].There are many features to be considered when doctors diagnose heart diseases, which may be very difficult for the doctors to recognize them and diagnose quickly and accurately.These lead to the reason why soft computing approaches are being used to assist the doctors and rescue the prevailing situation.
Presently, there is some heart disease prediction system (HDPS) that are based on soft computing paradigms.Most of these HDPS models comprises two portions, feature selection (FS) and the classification.In FS, the most relevant features of the heart disease are selected.Whereas, the selected subset features are used as an input in the classification part [2], [3].Heart disease datasets contain both irrelevant and inessential features that do not contribute at all but rather make noise towards an explanation of the goal class [4].As such removing these redundant and irrelevant features is imperative A R T I C L E I N F O A B S T R A C T in as much as the classification accuracy must be improved [5].This will decrease the risk of overfitting data, affords better prediction, and reduced amount of computation time with fewer features [4], [6].
Several researchers have proposed the use of FS in heart disease.Jabbar et al. [7] used Correlationbased Feature Selection (CFS) and Random Forest Classifier (RFC)for heart disease diagnosis and achieved a better result compare to the previous one mentioned in the literature.Verma et al. [2] proposed a non-invasive HDPS using Cleveland data from UCI machine learning and Indira Gandhi college data.The system comprises of dual stages.In the first stage, Particle Swarm Optimization (PSO) algorithm was applied as a search method with CFS plus K-Means Clustering (KMC) for feature selection and extraction.The results obtained in the initial stage was used as an input in the next stage whereby, four different classifiers namely Multi-layer perceptron, (MLP), C4.5, Multinomial Logistic Regression (MLR) as well as the Fuzzy Unordered Rule Induction Algorithm (FURIA) are used to train the model.Experimental outcomes disclosed that MLR has the maximum prediction accuracy of 88.4%.Recently, Shah et al. [8] used Probabilistic Principal Component Analysis (PPCA) to handle omitted values as well as feature extraction with the help of parallel analysis.The feature vectors with reduce dimension are submitted to Radial Basis Function (RBF) based Support Vector Machine (SVM) for classification.Accurateness of 82.18%, 85.82% and 91.30% were achieved on Cleveland, Hungarian, and Switzerland data respectively.Similarly, an accuracy of 83% was obtained on arrhythmia data from UCI machine learning in the work of Vivekanandan and Iyengar [3].In the paper, modified differential evolution was used as FS and fuzzy feed forward neural network for prediction.Finally, Jabbar [9] applied PSO for FS on heart disease dataset and a reliable correctness is obtained with the support of K-nearest neighbor (KNN) as a classifier.
Cuckoo inspired metaheuristic algorithms are of two types: the cuckoo search algorithm (CSA) by means of levy flight developed by Yang and Deb [10] in 2009 and cuckoo optimization algorithm (COA) established by Rajabioun [11] in 2011.Although, they are not the same but share some common characteristics.In this research, the two algorithms are used as filter-based FS in heart disease prediction.Although, Gadekallu and Khare [12] combined CSA together with Rough Set Theory (RST) for FS on some heart disease datasets.So far none of the research used COA as FS.However, in this research both CSA and COA will be used and compared for feature subset selection.
The key goal of this study is to offer an effective and efficient HDPS that will predict heart disease with fewness features and improve accuracy.In the proposed model, both CSA and COA will be implemented for FS.A comparison of the two cuckoo inspired algorithms will be investigated for FS.Finally, the reduced features will be trained with four well-known classification algorithms namely, naive Bayes (NB), RFC, MLP, and SVM.

Feature Selection
Feature selection (FS) refers to the process of selecting a subset from the actual set of features or attributes from a given data set while ignoring the redundant or irrelevant features [3], [13].The best feature subset (called the optimal) is measured based on an evaluation condition.However, discovering the optimal feature is generally intractable this is due to the fact that the increase in dimensionality increases the number of features as well [3], [5], [13].Numerous problems connected to FS are proved to be NP-hard.
FS can be a filter or a wrapper.The filter feature selection uses statistical characteristics to rank features.The highest ranked features are considered for inclusion while the fewer rank features are ignored [14].They can scale up to a large amount of data, computationally fast, and do not depend on any mining algorithm [5].The wrapper, on the other hand, used a mining algorithm to determine the goodness of selected features, the subset that provides higher performance are selected [14].The major drawback of the wrapper model is classifier dependency, computationally expensive and is not suitable for large datasets [3], [13].
In this study, filter-based FS is employed to choose the most relevant attributes or features from the heart disease datasets.The general filter algorithm by Liu and Yu [13] depicted in Fig. 1 is adapted and enhanced with the cuckoo inspired algorithms.

Cuckoo Inspired Algorithms
The two cuckoo inspired algorithms used in this study are briefly describe in the following subsections.

Cuckoo Search Algorithm
The cuckoo search algorithm (CSA) is a swarm-based meta-heuristic optimization algorithm developed by Yang and Deb [10].The algorithm was inspired based on the lifespan of some birds' chiefly cuckoos.To simplify the description of the cuckoo search algorithm CSA, Yang and Deb [10] listed the following ideal rules: 1) Cuckoo places single egg at a while independently, and dump it egg in an arbitrarily preferred nest; 2) The finest nest with the highest value of eggs will move to the subsequent generation; 3) The quantity of obtainable host nests is static, and the egg placed by a cuckoo and revealed by a probability   [0,1].Therefore, the owner either push the cuckoo's egg or build a different nest entirely.
The last rule is approximated by dumping a portion   of the eggs and swapping to each generation of  nest.Fundamentally, these three rules offer a variety process for the optimization algorithm.Guaranteeing that the finest eggs endure from generation to generation.In a problem of maximization, the quality or fitness of a solution is relational to the objective function.The CSA can be recapped as shown in Fig. 2.
When developing brand-new solutions   +1 for cuckoo , a levy flight is observed as shown in (1).
Where  > 0, equal equal to the step or stride size that must be analogous to the scale of the problems of concern.Commonly used as  = (1).The symbol product ⨁ symbolize entry wise multiplications.
Levy flight basically provides a random walk whereas their random steps are obtained after levy distribution of large steps as shown in (2).This has both boundless variance and boundless means.The repeated jumps or steps of a cuckoo principally formulate a random walk progression which follows a power law footstep length movement through the heavy tail.In summary, three parameters are used by the initial CSA, 1) the population size , 2) the parameter  termed as the stride size scaling point, and 3) the switching or swapping parameter, which is the fraction of the eggs discarded.The bigger the Pa value the more important exploration process is and the less likely chances of being getting trapped top local optima and vice versa.Marichelvam, et al. [15] believes parameters such as Pa and  are the serious values for locating best solution.Similarly, Yang and Deb [10] convergence of the algorithm to the best solution is largely independent of the value of   , however,   =0.25 returned finest results.
Since the algorithm, Mantegna is widely used to obtain random numbers in levy flights [10], the study applied it to compute the step length  as indicated by (3).
where,  is the parameter amidst [1,2].Similarly,  and  are define are define inform of the standard distribution in (4) through (5) as shown:

Cuckoo Optimization Algorithm
Rajabioun [11] presented another new evolutionary optimization algorithm named cuckoo optimization algorithm (COA).The foundation behind this innovative optimization algorithm is how cuckoo place egg and their upbringing behavior.
According to the author, the cuckoos can occur in two ways namely, matured or aged cuckoos and eggs.The aged cuckoos placed their eggs in some other birds' nest.If the laid eggs are not recognized The objective function f(x);  = ( 1 ;  2 ; … ;   )  ; 3: Create opening populace of n host nests xi (i = 1; 2; …; n); 4: While (t < Max Generation) or (halt condition) 5: Begin 6: Get a cuckoo arbitrarily by means of levy fight; 7: Estimate its superiority/suitability   ; 8: Select a nest amongst n (say, j) arbitrarily; 9: If (F i > F j ) 10: Substitute j by means of the new-fangled solutions; 11: End If; 12: A portion (pa) of inferior quality nest are uncontrolled and fresh ones are made; 13: Retain the finest solutions (or the nest with excellence solutions); 14: Ranked the solutions and discover the recent finest one; 15: End While; 16: Post process outcomes along with visualizations; 17: End. and at the same time are not murder by the host bird.Then, they grow up, turn into matured cuckoos and start making societies.The idea behind this algorithm is that every society has its environment or habitat region where to live.As such matured cuckoos are looking for a better environmental habitat that has higher chances of eggs survival as their terminus for laying eggs.Then, each of the cuckoos starts laying their eggs randomly inside the egg laying radius (ELR) of the nests.The place in which more eggs lived implies that COA is optimized.The algorithm terminates if all the lived matured cuckoos unite to the best environmental position for their upbringing and replica.Now, the better location is the global extreme of the objective functions.A typical cuckoo optimization algorithm is shown in Fig. 3.

Fig. 3. The cuckoo optimization algorithm
In addition, its basic ideal rules to apply are as follows: The optimization problems variables should be in a form of an array called "habitat".This means that in   dimensional optimization problem, a territory range of 1  , that signify the present active location of the cuckoo is defined as shown in (6).
The profit of the environmental habitation is gotten by the assessment of "profits function"  in (7) at habitation of  1 ,  2 ,  3 … ,    such that: To apply COA in minimizations problems, simply maximize the  as shown in (8).
To begin the optimization problem, normally a habitat matrix of size   x   is generated.Since naturally, each cuckoo produced five to twenty eggs.Then, these values will be used for upper and lower limits for each iteration.The eggs are laid inside a range of space from their environment in the ELR.However, the ELR depends on three parameters namely, variable limit, the sum of existing cuckoo's eggs and the entire number of eggs In the equation ( 9),  refers to a whole number that will handle the highest value of  and  ℎ ,   stand for the limits for upper and lower variables respectively.Afterward the cuckoo laid the eggs, usually, % of the entire eggs i.e. 10% within fewer profit value and additional cost shall be Set cuckoo environments through some arbitrary ideas on the global function; 3: Devote some eggs roughly to respectively cuckoos; 4: Explain ELR for every single cuckoo; 5: Allow the cuckoos to lay their eggs in their matching ELR; 6: Destroy those cuckoos familiar by the multitude birds; 7: Allow egg to incubate and baby chicken raise; 8: Estimate the environment of every recently grownup cuckoo; 9: Restricts cuckoos' highest figure in location and destroy those who exist in inferior environments; 10: Group cuckoos and discover _nest cluster and choose goal line environment; 11: Allow the fresh cuckoo populace to settle at the goal line environment; 12: If halt criteria are fulfilled halt, otherwise go to 3; 13: End.eliminated.The categorization of cuckoo's societies is finalized using the KMC technique.Basically, a value of k = 3 to 5 proved to be sufficient in the simulation.
Every cuckoo flies a % only into the habitat.In addition, it also has an eccentricity of radians.These parameters aid the cuckoo to hunt for more and better strategic spots in the location.They are defined as follows: where,  ~  (0, 1), means that  is a regularly distributed random number between the array of zero and one.ꞷ stands for the parameter that restricts an alteration on or after the goal line region.An ꞷ of /6 rad looks to be sufficient for upright meeting or convergence of the cuckoo populace to universal extreme profit.It must contain all the information about the experimental procedure and materials used to carry out experiments.

The Proposed Method 2.3.1. CSA Filter-Based Feature Selection
The proposed filter-based FS embed both the general filter-based FS (Algorithm 1, Fig. 1) and cuckoo search algorithm (Algorithm 2, Fig. 2).The detail of the pseudocode is shown in Fig. 4.

COA Filter-Based Feature Selection
The proposed filter-based FS embed both the cuckoo optimization algorithm (Algorithm 3, Fig. 3) and general filter-based FS (Algorithm 1, Fig. 1).The detail of the pseudocode is shown in Fig. 5.

Dataset Description
The heart disease datasets are inputted.Five heart disease datasets are used in this study.The details of the datasets such as the number of features and instances are shown in Table 1.The data is obtained at the University of California Irvine, UCI machine learning at https://archive.ics.uci.edu/ml .Except for Eric dataset which can be obtained at http://eric.univ-lyon2.fr/~ricco/tanagra/fichiers/heart_disease_male.xls.The datasets are sanitized all incomplete instances are deleted.Lastly, few instances that do not contribute significantly are imputed using both the backward and forward fill of the Python Anaconda Navigator.

Performance Measure
The performance of the proposed filter-based algorithms is measured based on the number of features along with the classification accuracy.The detail of them are explained in the following sections:

1) Number of features
The essence of the two proposed filter-based algorithms is to produce fewer features that contribute significantly to heart disease and yet improve prediction accuracy.In this case, each of the algorithms selects the most informative features from each dataset.In addition, the selected features are compared to the ones found in the literature.Fig. 8 shows clearly that SVM outperformed the rest of the classifiers in terms of accuracy both before and after FS for the respectively proposed algorithms.Therefore, SVM is a good choice for the Hungarian data.Similarly, the performance of the CSA is better than COA both before and after FS.The results obtained in this study is also compared to the work of Gadekallu and Khare [12] where CSA was combined with RST for FS.The comparison showed that the proposed CSA based FS surpass that one both in terms of fewer number of features and prediction accuracy as depicted in Table 3. 4) Stat log Also, like the previous datasets, the stat log data has higher accuracy after FS compared to before.In addition, the CSA proved to be more effective with higher accuracy values compared to the COA.Similarly, the SVM recorded high level of accuracy in all the options for both COA and CSA respectively.The detailed diagram of the result showing the accuracy of each classifier can be found in Fig. 9.Moreover, the results obtained are also compared to Liu et al. [16], where relief  and rough set (RFRS) are used for .Similarly, a Bounded Sum of Weighted Fuzzy Membership functions (BSWFM) together with Euclidean distance (ED) was used as FS on the Stat log dataset in the work of Lee [17].In Tomar and Agarwal [18] least square twin SVM (LSTSVM) based FS was proposed on the same dataset.Buscema et al. [19] used training with input selection and testing (TWIST) algorithm for FS.Finally, Subbulakshmi et al. [20] used extreme learning machine (ELM) to select the most relevant fewer features from the Stat log dataset.But the choice of classifier affects the performance of the prediction accuracy.The comparison is a clear testimony that the proposed CSA based FS performed better in terms of selected features along with classification accuracy.Table 4 summarizes the detailed comparison.5) Z-Alizadeh Sani Finally, this dataset also showed that SVM still outperform the rest of the classifiers both before and after FS on the proposed algorithms.Fig. 10 clearly summarized the accuracy attained by each classifier for each of the proposed algorithms.

Fig. 7 .
Fig. 7. Accuracy of Echocardiogram dataset using different classifiers3) Hungarian Fig.8shows clearly that SVM outperformed the rest of the classifiers in terms of accuracy both before and after FS for the respectively proposed algorithms.Therefore, SVM is a good choice for the Hungarian data.Similarly, the performance of the CSA is better than COA both before and after FS.

Fig. 9 .
Fig. 9. Accuracy of stat log dataset using different classifiers

Fig. 10 .
Fig. 10.Accuracy of z-Alizadeh Sani dataset using different classifiers Usman et.al (Cuckoo inspired algorithms for feature selection in heart disease prediction)

Table 1 .
Heart disease datasets description

Table 3 .
Comparison analysis of Hungarian dataset

Table 4 .
Comparison analysis of Hungarian dataset