Covid-19 detection from chest x-ray images: comparison of well-established convolutional neural networks models

Coronavirus disease 19 (Covid-19) is a pandemic disease that has already killed hundred thousands of people and infected millions more. At the climax disease Covid-19, this virus will lead to pneumonia and result in a fatality in extreme cases. COVID-19 provides radiological cues that can be easily detected using chest X-rays, which distinguishes it from other types of pneumonic disease. Recently, there are several studies using the CNN model only focused on developing binary classifier that classify between Covid-19 and normal chest X-ray. However, no previous studies have ever made a comparison between the performances of some of the established pre-trained CNN models that involving multi-classes including Covid-19, Pneumonia and Normal chest X-ray. Therefore, this study focused on formulating an automated system to detect Covid-19 from chest X-Ray images by four established and powerful CNN models AlexNet, GoogleNet, ResNet-18 and SqueezeNet and the performance of each of the models were compared. A total of 21,252 chest X-ray images from various sources were pre-processed and trained for the transfer learning-based classification task, which included Covid-19, bacterial pneumonia, viral pneumonia, and normal chest x-ray images. In conclusion, this study revealed that all models successfully classify Covid-19 and other pneumonia at an accuracy of more than 78.5%, and the test results revealed that GoogleNet outperforms other models for achieved accuracy of 91.0%, precision of 85.6%, sensitivity of 85.3%, and F1 score of 85.4%.


Introduction
Covid-19 is the latest pandemic which a novel coronavirus infection that affects the whole world. The disease originated from the Hunan seafood market in Wuhan, China. To date (June 25th, 2021), a total of 180,746,187 Covid-19 cases have been reported worldwide, with 716,847 Covid-19 cases, including 4,721 deaths, reported by Malaysia's Ministry of Health (MOH) [1]. According to Zu et al. [2], Covid-19-infected patient will exhibit a variety of symptoms, including fever, coughing, and shortness of breath. The virus transmits commonly via droplets that expelled from human-to-human transmission during coughing, talking or sneezing. Covid-19 can be diagnosed through RT-PCR and RT-LAMP, antigen detection testing, serology testing, CT scan, MRI, Lung ultrasound and Chest Xray. The elderly, young children, pregnant women, and people with chronic diseases are the most vulnerable to Covid-19 [3].
As Covid-19 spreads, the medical community will increasingly rely on portable chest X-ray due to their widespread availability and reduced infection control issues [4]. This method is less radiation exposure and been used for decades to help radiologists view vital organs. However, despite its renowned roles in diagnosis of diseases, there is a lot of subjectivity in chest X-ray interpretation which require a lot of manual labour and consume a lot of time. Therefore, automated image classification system is a requirement for radiologists to identify key findings in chest X-rays of patients with specific characteristic of Covid-19.
A tremendous interest in machine learning has been known for decade years. Machine learning systems had been classified into four major categories including supervised learning, unsupervised learning, semi-supervised learning, and reinforcement learning. It composes of algorithms that make system can analyzing and predicting by learning from sample data. However, conventional machine learning method have shown drawbacks as it requires extensive training of data before they begin to give useful results. The larger the architecture, the more data is needed to produce viable results. The emergence on deep learning which is a subset of machine learning solved some problems in these fields that can be seen in many application such as image classification [5]- [7], speech recognition [8], object detection [9] and segmentation [10]. The most established algorithm among various deep learning models is CNN. It is a powerful tool in deep learning that is one specializes for image processing which can be used to detect Covid-19 from chest X-ray images.
Sub-branch of machine learning is deep learning that always referring to deep convolutional neural network where it is having process of convolution that used for automatic mass feature extraction [11]. Technique of image classification has been implemented in several applications such as transportation [12], remote sensing [13] and agriculture [14]. A fundamental example of deep learning are Deep Auto encoder, Recurrent Neural Network, Deep Belief Network and Convolutional Neural Network [15]. CNN have proven to be an effective model class for understanding image content, providing state-ofthe-art results on image classification, segmentation and pattern recognition [16] compare to among of other networks. CNN has been implemented to several medical application including diabetic retinopathy disease detection [17], Alzheimer's disease early detection from the Magnetic Resonance Image (MRI) [16], and brain tumor detection [18], [19].
Recently, there are several studies [20]- [25] using the CNN model only focused on developing binary classifier that classify between Covid-19 and normal chest X-ray. Besides, in the previous studies, two types of method have been used which are using an established CNN models and construct an algorithm of CNN based model from the scratch. However, no previous studies have ever made a comparison between the performances of some of the established pre-trained CNN models that involving multiclasses including Covid-19, Pneumonia and Normal chest X-ray. In addition, most of the previous work model the CNN by training the model with several well-known Chest X-ray Image Datasets individually. Therefore, this study focused on formulating an automated system to detect Covid-19 from chest X-Ray images by four established and powerful CNN models AlexNet, GoogleNet, ResNet-18 and SqueezeNet and the performance of each of the models will be compare. Other than that, this study also used the combination of several well-known Chest X-ray Image Datasets to produce generalized CNN model in classifying Covid-19 and other types of pneumonia. Fig. 1 shows the flow of experimental setup for this project. The experiment starts with data selection which are chest X-Ray images, image format conversion from grayscale to GRB, CNN models development and performances evaluation of each proposed models.

Data Selection
Due to the recent emergence of COVID-19, none of the large repositories contain any Covid-19 labelled data, necessitating the use of different sources of chest X-ray images of normal, bacterial pneumonia, viral pneumonia, and COVID-19 cases in this study. One of the datasets used is the recently released Covid Chestxray Dataset in Github [26], which contains a collection of images from publications on COVID-19 topics compiled by Joseph Paul Cohen [26]. As of June 24, 2021, this dataset contains 950 images labelled for the presence of 5 sub-categories and a mix of chest X-ray and CT images (no finding, general pneumonia, Covid-19 Pneumonia, Tuberculosis, and bacterial pneumonia). It also includes meta-data about each patient, such as gender and age. Another source for pneumonia chest Xray images is Kaggle [27] which has a total of 5,863 X-ray images in joint photographic experts group (JPEG) file format for three categories: normal, bacterial pneumonia, and viral pneumonia. The images were obtained from the Guangzhou Women and Children's Medical Center in Guangzhou, and all chest X-ray imaging was done as part of the patients' routine clinical care.
Since the number of Covid images was very small in the [26] dataset, additional images from the Covid19 Radiography Dataset [28], a public dataset for chest radiograph interpretation consisting of 21,173 chest radiographs labelled for the presence of 5 sub-categories, were used (Covid-19, Normal, Viral Pneumonia, and Lung Opacity). The images are all in Portable Network Graphics (PNG) format. The data was gathered from a variety of publicly available datasets, online sources, and published papers, including those from the Radiological Society of North America [29] and from Medical Imaging Databank in Valencian Region Medical Image Bank (BIMCV) [30]. Table 1 summarizes the process of data selection as some images were removed for analysis from the data collected from these three sources. As for dataset in [31], only image of Covid-19 and Bacterial Pneumonia from Posterior Anterior (PA) were chosen while a few images from dataset [32] were dropout due to the other condition such asin low quality, unreadable scans, and images that did not meet the criteria for use in this study for instance, images from computed tomography modality. Table  2 shows the number of images for each class after the data cleaning process used in this study. Fig. 2 illustrates nine random images input samples that were used by the CNN algorithms for training and testing.

Data Pre-processing
One of the major processes in data pre-processing was to re-dimension the X-ray images as the image input was different for different CNN models. The size of chest X-ray images for model AlexNet and SqueezeNet was redrawn to 227 by 227 pixels, compared with 224 by 224 pixels for GoogleNet and ResNet18. All the chest X-ray images have been standardized to the pre-trained model requirements with same file format which is Joint Photographic Group (JPEG).
Data augmentation has been shown to increase the classification accuracy of deep learning algorithms [34]. Deep learning models' performance can be increased by augmented the existing data. In this study, three augmentation procedures were used on the dataset before training phase and the settings deployed in image augmentation are shown below in Table 3. This process had been implemented and it helps in solving over fitting problems and enhances the model's generalization ability during training [35].
The chest X-ray images obtained from the various sources were in grayscale formatting. However, CNN models on the other hand, only accepting RGB images as input. As a result, image format conversion from the original grayscale, one-channel image to a three-channel which is RGB format image is required. This was accomplished by replicating the grayscale image's one channel matrix to produce three matrices, then concatenating the matrices to obtain a three channel RGB matrix.

Development CNN Models' Architecture
In this study, the knowledge of a pool of existing Convolution Neural Network topologies had been used, which showed good results with a wide range of classification tasks rather than proposing our own architecture (55). This study employs four different pre-trained CNN models were trained, validated, and tested which are AlexNet, SqueezeNet, ResNet18, and GoogleNet. The experimental evaluations were carried out using MATLAB R2020a on a computer equipped with an Intel(R) Core(TM) i3-6100U CPU @ 2.30GHz processor and 8GB RAM, as well as an 8-GB NVIDIA GeForce GTX 1080 graphics processing unit (GPU) card running on 64-bit Windows 10 operating system. Fig. 3 depicts the CNN model's overall architecture, which is comprised of two major components: the convolutional layer and the fully-connected layer. Each layer in the convolutional layer takes the output of the preceding layer as input and passes it on to the succeeding layers. The convolutional layers are made up of a combination of pooling layers and convolution layers, with the feature extraction process taking place in conjunction with a rectified liner unit (RELU) as an activation function, and their output is fed to the fully connected layers. CNN basically consists of three main layers. These are the convolution layer, the pooling layer and the fully connected layer. Basically, convolutional and pooling layers provide the learning of the model, while the full connection layer provides the classification. The depth, layers, size, and image input size of the CNN models network influenced the testing accuracy and training duration of each model, which were all different. Table 4 summarized the properties of each model.  AlexNet is still center of attention in many studies as the first well-known deep learning network. This model had been created by Alex Krizhevsky, Geoffrey Hinton, and Ilya Sutskever [36]. There are five convolutional layers (CLs) with three pooling layers, two fully connected layers (FLCs), and a Softmax layer make up the network [34]. The input image for the AlexNet must have a dimension of 227 x 227 x 3 pixels, and the first CL converts the input image with 96 kernels sized at 11 x 11 x 3 pixels with a stride of four pixels, which is the input to the second layer, and the remaining details are summarized in Fig. 4.   Fig. 4. AlexNet Structure [34] Another CNN model used in this study is SqueezeNet as shown in Fig. 5 which employs three architectural design strategies. This structure includes a fire module as well as the squeeze and expand layers, which form the network's foundation. Only 1 x 1 filters are used in the squeeze layer, which feed into an expand layer with a mix of 1 x 1 and 3 x 3 convolution filters. According to Shoeibi et. al, SqueezeNet's accuracy is comparable to AlexNet, but SqueezeNet has 50X smaller parameter and a model size of less than 0.5MB [37]. ResNet, which is an abbreviation for Residual Network (see Fig. 6), was the winner of the ILSVRS-2015 or been recognized as ImageNet Large Scale Visual Recognition Challenge 2015 [38]. The architecture included a central concept of shortcut or skip connections, which are added as a bypass to the convolutional layers of a regular feed-forward network and form the residual block. Residual learning enables network depth to reach thousands of layers. Skip connections facilitate gradient flow and solve the vanishing gradient problem. GoogleNet is a 22-layer model introduced by the Google team in 2015 that won the ILSVRC-2014 competition. This model differs from traditional CNN architecture as it stacks more layers of inception layers on top sequentially to enhance recognition accuracy, as shown in Fig. 7. It looks after the computational budget as well as in-depth processing. However, while increasing depth may produce better results, it also increases learnable parameters, which increases the risk of over fitting in the case of small labelled data [39]. Overall, the model has 12 times fewer parameters than AlexNet and has more control over the number of learnable parameters [40].

CNN Model Training
The pre-trained network and input data are loaded into Matlab to begin the model development process. For training and testing, the chest X-ray image data set comprised of 21,252 images was divided randomly by 90% and 10%, respectively. The training images were once again split into 70% and 30% to separate the chest X-ray images from the validation images. Table 5 reveals the distribution of images into training, validation and testing dataset. During training, the image input size for each model must be redimension to meet model requirements. The input size for AlexNet and SqueezeNet was 227x227x3, while 224x224x3 was for both GoogleNet and ResNet-18. By using transfer learning concept in training the CNN, the final layers were replaced with four fully connected layers and a new classification layer to classify four types of chest Xray images: Covid-19, Bacterial Pneumonia, Viral Pneumonia, and Normal. It will be the responsibility of the replaced final layers, which were new layers, to learn the new distinct features of the chest X-ray dataset. A few functions, such as mini-batch size and number of epoch, must be specified during network training. ADAM optimizer is used to optimize the loss function, with a learning rate of 0.0003. The batch size, also known as the mini-batch size, was set to 10 and defined as the number of samples processed before the model is updated. It's the total number of samples that will be sent to the network at once. The larger the batch size, the faster the model will complete an epoch during training.

Evaluation of Model's Performance
Following the completion of network training, testing accuracy was calculated using tested images which occupied 10% of the overall dataset to assess its accuracy. The total number of testing dataset images as mentioned in Table 5 which is 2125 chest X-ray images. Confusion matrix is one of the accurate measurements which give more insight into the achieved testing accuracy. Confusion matrix describe the performance of a classifier on a set of test data for which the true values for each model have been established. The confusion matrix allows for the visualization of the model's performance in relation to each class, which is more detailed in representing the performance of each model in recognizing pneumonia from Chest x-ray images.
To estimate the performance of the models, extra performance matrices are required to be explored through this study. The most widespread performance measures in the field of deep learning are Accuracy, Precision, Sensitivity (recall), F1 Score [41] and they are presented from Equation (1) where TP is the count of true positive samples, TN is the count of true negative samples, FP is the count of false positive samples, and FN is the count of false negative samples from a confusion matrix.
In general, precision describes how reliable the result is when the model correctly classifies chest X-ray images into their respective classes, whereas sensitivity describes how efficient the model detects the class. Lastly, the F1 score is a harmonic representation of precision and sensitivity, which is useful when a method has a lower sensitivity rate but higher precision or a higher sensitivity but low precision.

Results and Discussion
The aim of this study is to successfully develop CNN models that capable to perform image classification with high output accuracy. Able to classify chest X-Ray images into their exact categories whether Covid-19, Bacterial Pneumonia, Viral Pneumonia or Normal. This will help to improve performance of medical diagnosis towards patient that having Covid-19. The trained model's performance was assessed after the testing phase by plotting the confusion matrix and measuring the testing set accuracy as well as other parameters such as precision, sensitivity, and F1 score on the trained network. The confusion matrix was plotted for each trained model, and the results were analyzed based accuracy, precision, sensitivity, and F1 score. Table 6 summarized the performance of each trained network in classifying different type of pneumonia including due to Covid-19. Overall, GoogleNet outperforms the rest with testing accuracy at 91.0% while AlexNet, RestNet-18 and SqueezeNet only obtain 78.5%, 88.4% and 89.0%. It is evident that the performance of these well-known CNNs for classifying chest X-ray images into Covid-19, Bacterial Pneumonia, Viral Pneumonia and Normal obtained here are exceptionally good agreement with the existing works in [42], [43] as AlexNet is the worst among other in classifying images while GoogleNet and ResNet-18 are close between each other. Surprisingly, in this study, SqueezeNet manage to outperform AlexNet even the fact in [42] proof that the performance is on a par with AlexNet. This indicates that even originally the SqueezeNet perform similar too AlexNet, due to the smaller parameters size in SqueezeNet as compare to AlexNet, the domain adaption process performed by SqueezeNet is better AlexNet. Other than that, the precision, sensitivity and F1 score value of GoogleNet are also the highest as compare to other which are 85.6%, 85.3% and 85.4%.  Fig. 8 presents the confusion matrix of GoogleNet with the accuracy of testing surpassed testing accuracy of other network, which is 91.0% as shown in the grey box while the white boxes at the bottom show the true positive rate of the model according to class. The model successfully detect bacterial pneumonia at 74.1%, while viral pneumonia at 75.8%, excellently detect normal lungs at 98.0% and followed by 93.1% in detecting Covid-19 pneumonia. The diagonal cell shows the number of chest Xray images correctly classified and its percentage. From the 2125 images used in the testing phase, 1934 were correctly classified to their respective classes using GoogleNet, while the remaining 191 images were misclassified as false negative and false positive, as shown in the pink box in Fig. 8. This showing percentage of classification error for model GoogleNet is 9% only. In addition, only two classes that achieve precision and sensitivity above 90% which are Covid-19 and Normal. According to the confusion matrix in Fig. 8, for Covid-19, 374 images were classified accordingly but FP of the class came from six images of Covid-19 which not classified to its corresponding class as instead this six images are wrongly predicted as Normal image. As a result, percentage of the sensitivity and precision of the class are 93.1% and 98.4%, respectively.

Conclusion
The purpose of this study is to develop an automated system that can detect Covid-19 pneumonia from chest X-Ray images as well as to perform the comparison of well established CNN which are AlexNet, GoogleNet, ResNet-18 and SqueezeNet in detecting the Covid-19 and other pneumonia based on X-Ray chest image. The experimental techniques begin with the collection of image datasets, followed by the conversion of the image format from grayscale to RGB, the development of CNN models, and the evaluation of model performance. The CNN deep learning approach have been chosen for models' development since they are well-known for image classification. Based on previous study, this approach appears to give better results than handcrafted features. In summary, this study was a success, and all of the proposed models performed admirably in classifying Covid-19 and other pneumonia from chest X-ray images, with accuracy ranging from 78.5% to 91.0%. Results showed that GoogleNet exceeded other models with accuracy achieved 91.0% and showed the best level of precision, sensitivity and F1 scores compared with other models. Thus, this proposed approach can be deployed as one of the useful techniques in the medical field for classification of Chest X-ray images to identify patients having pneumonia because of Covid-19 or other disease. This finding is promising and should be explored with other well-known CNN as well to obtain the best CNN architecture that manly design for classifying type of pneumonia in X-chest image including the Covid-19.