Feature selection using regression mutual information deep convolution neuron networks for COVID-19 X-ray image classification

Chest radiography (CXR) image is usually required for lung severity assessment. However, chest X-rays in COVID-19 interpretation is required expert radiologists’ knowledge. This study aims to improve the COVID-19 X-ray image classification using feature selection technique by the regression mutual information deep convolution neuron networks (RMI Deep-CNNs). The dataset consists of 219 COVID-19, 500 viral pneumonias, and 500 normal chest X-ray images. CXR images were comprehensively pre-trained using DCNNs to extract the very large image features, then, the feature selection could reduce the complexity of a model and reduce the model overfitting. Therefore, the critical features were selected using regression mutual information followed by the fully connected with softmax layer for classification. For the classification of two alternative systems, these networks were compared (ResNet152V2 and InceptionV3). The classification performance for both schemes were 92.21%, 100%, 90% and 91.39%, 100%, 82.50%, respectively. In addition, RMI Deep-CNNs not only improve the accuracy but also reduce trainable features by over 80%. This approach tends to significantly improve the computation time and model accuracy for COVID‐19 classification.

an accuracy of 83.5%. Ayrton [13] reported a validation accuracy of 96.2 percent using a short dataset of 339 photos based on ResNet50. The transfer learning approach has been used well in earlier works, although it still involves lots of convolution and maximum pooling processes. To avoid the limitation of complex pre-trained model, this study proposed regression mutual information (RMI) that measure the relationship between features and target class. The general mutual information (MI) is a theoretical metric that can be used to depict relationships between variables, even when those relationships are very non-linear and concealed by highly dimensional data. It is independent of any classifiers. The one with a higher MI between features and target class is more suitable for the classification tasks. Studies on applying MI to enhance DL networks are currently expanding [14]- [18].
When PCR tests suffer some limitations [19], [20], CXR and CT are necessary and readily available even in rather distant areas. A few studies have reported rather promising results for the diagnosis based on CXR imaging [21], [22]. Convolutional neural networks (CNN) architectures for the diagnosis of COVID-19 have proposed by Narin et al. [12]. They demonstrated that a pre-trained ResNet50 model achieved an accuracy of 98%. When it was challenging to discriminate between typical pneumonia and COVID-19, Wang et al. [23] created COVID-Net to identify CXR images of COVID-19 patients among patients with viral infections, bacterial infections, and healthy individuals. Although a tiny sample size was employed, and no information regarding the method's dependability was provided, COVID-Net managed to attain a PPV of 88.9% and a sensitivity of 80%. Biraja G., et al. [21] used the Bayesian technique to CXR-based COVID-19 diagnosis in order to employ uncertainty estimation with intriguing findings. Nevertheless, the samples are insufficient for statistical variability. Our method adds extra COVID-19 samples to existing datasets, followed by a discriminating Normal, Viral pneumonia, and COVID-19, and finally feature selection using regression mutual information. This approach addresses the drawbacks of state-of-the-art methodologies.
Due to the complexity of the general MI, the search technique for adding or removing any feature based on high scores or low scores is often sophisticated. Entropy decreases are measured by mutual information when the target value is present. Mutual information estimators rely on smoothing parameters, the feature selection greedy approach lacks a theoretically supported stopping condition, and the estimation itself is hampered by the estimation's extremely high dimensionality. To address this problem, Regression Mutual Information (RMI) was proposed. In this study, the processes are summarized as follows: First, the experiments show that the transfer learning from ImageNet could be used with other domains with the fine-tuning approach. Fine-tuning is a common technique in transfer learning to perform image classification and recognize classes that they were never trained on when using pre-trained model. Second, the proposed method is an effective model still maintains a high performance when using regression mutual information scheme.

Datasets
The experiment datasets are made up of 219 COVID-19 chest X-ray pictures that were downloaded from Dr. Joseph Cohen's open-source GitHub repository [23]. Additionally, 500 photographs of viral pneumonias and 500 images of normal chest X-rays were chosen from the Kaggle repository "Chest X-Ray Images (Pneumonia)" [24]. Based on previously trained models, all photos in this dataset were scaled to 224x224 and 299x299 pixels. Representative chest X-ray pictures of healthy people, people with viral pneumonia, and those with COVID-19 are shown in Fig. 1.

Experimental setup
The inception V3 is introduced as GoogLeNet in 201. There are various Inception modules that make up the Inception model. The Inception v3 model, which was introduced in 2015, has 42 layers overall and a reduced mistake rate than its forerunners. The final Inception V3 model shows as

Performance metrics
In order to evaluate the performance of different pre-trained models, K-fold cross-validation was used to verify the training models. The effectiveness of several networks was compared using three performance measures, including accuracy, sensitivity, and specificity. The predictive formulas were defined as: The experiment dataset consist of 3-class classification. Unlike binary classification, the performance was measured for each individual class. For example, the formulas of class 1 were defined as:

Feature extraction using pre-trained models
Many medical data sets have been effectively classified, segmented, and used to detect lesions using deep learning models. In this study, ResNet50 and InceptionV3 were used to extract the image features. Fig. 5 shows a lot of versions of the x-rays image with different highlighted features. However, some images contained weak information (row 2 column 3). When the features were extracted, the main objective of deep learning is to discover useful representations [25]. For maximizing between the complete input and the encoder output to learn the useful representations, mutual information was proposed to address this problem.

Mutual Information Evaluation
Information theory can be used to calculate how much information is shared between two variables in a relationship. When one variable is known, the amount of uncertainty in the other variable can be lowered. The amount of information that is unclear can be reduced when another variable is known. When the condition of Y is known, the uncertainty in the state of X is reduced, and the amount of pertinent information increases. Conditional entropy and probability distributions are typically used to calculate the classic mutual information. The pointwise mutual information H (X; Y) pairs estimated posterior knowledge of the number of each dependent pair. A search technique to choose potential feature sets X is the mutual information criterion [26]. The complexity typically dictates how each feature is added or removed based on high or low scores in the search technique. Therefore, the regression over feature and target class were established. An image is encoded using a convolution neuron network until reaching a feature map of M x M feature vectors corresponding to N input patches. These vectors were flattened into a single feature vector, x. In this study, the regression mutual information (RMI) was performed shown as: ̂= 0 + 1 1 + 2 2 + ⋯ + The correlation between the observed outcomes and the observed predictor values is measured by the RMI score, which is expressed as a R square (r2). R square, which typically runs from 0 to 1, is the square of the coefficient of multiple correlation when further regression is incorporated. Then, the new feature set ̂ was selected from top-5 RMI score. Fig. 6 show the regression mutual information evaluation process. For example, the features of image1 (see Fig. 3) were extracted and flatted into 56×56 (56×56 pixels) and 32 patches. The RMI scores were computed and selected from top-5. Suppose the feature vector does not support useful representation. For selecting maximum features, the RMI from the whole input, thus, the feature set was selected only useful input. Then the RMI scores were computed. The feature maps which have top-5 RMI scores were selected.

Fine-tune and Classification Layer
Training CNN on a small dataset such as medical image often affects the CNN ability to generalize. Therefore, transfer learning network was used to learn features. The final layer (the softmax layer) is often truncated and replaced with new softmax layers. For instance, a pre-trained network on ImageNet has a 1000-category softmax layer. Our experiment is performed with three categories of chest x-ray images. Instead of 1000 categories, the new softmax layer of the network will only have 3. Cross validation was used to fine-tune the back propagation on the network using the pre-trained weights. Then, the new features ̂ were parsed through this network to fully connected layer for classification task.

Results and Discussion
The overall objective of this research is to demonstrate the utility of our novel RMI approach for COVID-19 diagnosis. Therefore, two sets of experiments were conducted. First, the original models from the pre-trained were trained to classify CXR images. Second, the feasibility of applying RMI to enrich the traditional models was improved diagnosis accuracy.

Experiment 1: original pre-trained architecture
The input images are fed into the trained ResNet and InceptionV3 to extract image features. Overall results are summarized in Table 1. By using all the CXR features generated by ResNet152V2, we obtain the accuracy of 92.21% (Sensitivity=100% Specificity=90%). With additional features from generated by InceptionV3, we obtain the accuracy of 91.39% (Sensitivity=100% Specificity=82.50%). As seen, 205 Vol. 8, No. 2, July 2022, pp. 199-209 Yampaka et al. (Feature selection using regression mutual information deep convolution neuron networks for…) ResNet152V2 performed best on CXR dataset, while InceptionV3 was the lowest performance. Directly compared to InceptionV3, ResNet152V2's diagnosis produces significantly superior outcomes. The outcome may be accounted for by the fact that classifiers need to fit data more precisely using convolutional layers. The ResNet152V2 including RMI obtains the accuracy of 98.77% (Sensitivity=100% Specificity=98.02%), while InceptionV3 including RMI obtains the accuracy of 93.44% (Sensitivity=92.86% Specificity=93.44%).

Experiment 2: apply RMI to enrich the traditional models for improved diagnosis
In order to study the features that contribute to the goal class, the regression mutual information of each feature set was calculated, and the source picture for each feature was selected. The RMI scores were measured through calculating the regression mutual information. The feature set corresponding in top-5 RMI scores were obtained the final features. Table 2 summarizes the number of trainable features from different models (original vs. RMI).  Table 2 showed that two original models used 2,048 features, while RMI ResNet152V2 used 245 and RMI InceptionV3 used 320, respectively. The reduction contributes 88% of RMI ResNet152V2 and 84.37% of InceptionV3. In addition, RMI not only reduce trainable feature but also improve the accuracy of COVID-19 diagnosis.

COVID-19 predictions and explanations
The interpretation of features is important not only for the explanation but also for the confirmation of the diagnosis. The important areas assist physician in using their interpretive abilities to diagnose patients more quickly and accurately [27]. Based on the locations where the activation maps overlay the original image, the significant features can be found.
As seen in Fig. 7, the feature map including RMI generated by ResNet152V2 are more accurate than InceptionV3. The rationale is that ResNet152V2 with RMI emphasizes joined features more specifically than specific components. When highlights regions much more precisely, it provides more humaninterpretable explanations. The confusion matrix of the best model performance is shown in Fig. 8. Table 3 demonstrates that the majority of samples are correctly identified using the original ResNet152V2 with 0.93, 0.92, and 0.92, respectively. For the RMI ResNet152V2 is even slightly higher, yielding 0.94, 0.93, and 0.92, respectively.  The positive predictive value (PPV) was established based on these findings to predict whether infected individuals would be diagnosed as positive. Only five of the 224 COVID-19 patient samples in our test set were incorrectly identified as pneumonia, yielding a PPV for COVID-19 cases of 97.76 percent, significantly surpassing a comparable technique [27], [28]. To provide a clearer picture in both the original and RMI scenario, we also report the class-specific measurements in Table 4.

Conclusion
This study proposed RMI-DeepCNN to predict COVID-19 based on CXR pictures in this research. On the basis of CXR pictures, two pre-trained models, ResNet152V2 and InceptionV3, were used to predict normal, viral pneumonia, and COVID-19. The best model is RMI-ResNet152V2, which achieves an accuracy of 98.77% (Sensitivity: 100%; Specificity: 98.02%). According to evaluation results, our method outperforms a recent method in with a PPV of 97.76% and recall of 81%. Based on our results, RMI-DeepCNN provides the following proof based on the experiments and findings: First, expanding the feature selection method can still perform better than using only the original features even when a general strategy does not. Second, since precise diagnosis is crucial, models with many trainable parameters and a deeper layer of training can produce correct predictions during inference time. The chosen subset of all features utilizing RMI may be a particular strategy. There are some limitations in this study. First, CXR images for COVID-19 infection cases is insufficient to avoid the overfitting for our models. Second, the diagnoses and localization were not compared accuracies with the radiologists. In future, we intend to overcome these limitations.

Declarations
Author contribution. All authors contributed equally to the main contributor to this paper. All authors read and approved the final paper.
Funding statement. None of the authors have received any funding or grants from any institution or funding body for the research.