Cluster analysis and ensemble transfer learning for COVID-19 classification from computed tomography scans

(1) * Lyubomir Gotsev Mail (State University of Library Studies and Information Technologies, Sofia, Bulgaria, Bulgaria)
(2) Ivan Mitkov Mail (State University of Library Studies and Information Technologies, Sofia, Bulgaria, Bulgaria)
(3) Eugenia Kovatcheva Mail (State University of Library Studies and Information Technologies, Sofia, Bulgaria, Bulgaria)
(4) Boyan Jekov Mail (State University of Library Studies and Information Technologies, Sofia, Bulgaria, Bulgaria)
(5) Roumen Nikolov Mail (State University of Library Studies and Information Technologies, Sofia, Bulgaria, Bulgaria)
(6) Elena Shoikova Mail (State University of Library Studies and Information Technologies, Sofia, Bulgaria, Bulgaria)
(7) Milena Petkova Mail (State University of Library Studies and Information Technologies, Sofia, Bulgaria, Bulgaria)
*corresponding author

Abstract


The paper presents a brief analysis of publications utilizing the public SARS-CoV-2 dataset, consisting of patients’ computer tomography scans captured from Brazil hospitals and an experimental setup addressing the found data challenges. The analysis shows that all protocols, with one exception, suffer from data leakage arising from data organization where the patients and their images are not grouped. Each patient is represented with several scans. It can provide misleading results as data of the same individual may occur in both training and test sets. Furthermore, only one paper proposed ensemble learning utilizing as base models VGG-16, ResNet50, and Xception. Therefore, we proposed and experimented with the following strategy to mitigate the found risks of bias: data standardization and normalization to achieve proper contrast and resolution; k-means and group shuffle split to avoid data leakage; augmentation and ensemble transfer learning to deal with limited sample size and over-fitting. Compared with the earlier proposed ensemble approach, the current one stacks VGG-16, Densenet-201, and Inception v3, achieving higher accuracy (99.3 %), second in the related work, and most significantly, it applies augmentation and clustering analysis to avoid overestimation. In contrast, the paper also presented critical metrics in the medical domain: negative prediction value (99.55%), false positive rate (0.89%), false negative rate (0.42%), and false discovery rate (0.83%). The strategy has two main advantages: reducing data pitfalls and decreasing generalization error. It can serve as a baseline to increase the performance quality and mitigate the risk of bias in the field.

Keywords


COVID-19; Computed Tomography; Clustering; Transfer Learning; Ensemble Learning

   

DOI

https://doi.org/10.26555/ijain.v8i2.817
      

Article metrics

Abstract views : 1679 | PDF views : 397

   

Cite

   

Full Text

Download

References


[1] E. Awulachew, K. Diriba, A. Anja, E. Getu, and F. Belayneh, “Computed Tomography (CT) Imaging Features of Patients with COVID-19: Systematic Review and Meta-Analysis,” Radiol. Res. Pract., vol. 2020, pp. 1–8, Jul. 2020, doi: 10.1155/2020/1023506.

[2] I. Soriano Aguadero et al., “Chest computed tomography findings in different phases of SARS-CoV-2 infection,” Radiol. (English Ed., vol. 63, no. 3, pp. 218–227, May 2021, doi: 10.1016/j.rxeng.2021.02.003.

[3] O. S. Albahri et al., “Systematic review of artificial intelligence techniques in the detection and classification of COVID-19 medical images in terms of evaluation and benchmarking: Taxonomy analysis, challenges, future solutions and methodological aspects,” J. Infect. Public Health, vol. 13, no. 10, pp. 1381–1396, Oct. 2020, doi: 10.1016/j.jiph.2020.06.028.

[4] F. Shi et al., “Review of Artificial Intelligence Techniques in Imaging Data Acquisition, Segmentation, and Diagnosis for COVID-19,” IEEE Rev. Biomed. Eng., vol. 14, pp. 4–15, 2021, doi: 10.1109/RBME.2020.2987975.

[5] A. U. Khan and Y. Ali, “Analytical Hierarchy Process (AHP) and analytic network process methods and their applications: a twenty year review from 2000-2019,” Int. J. Anal. Hierarchy Process, vol. 12, no. 3, pp. 369–459, Dec. 2020, doi: 10.13033/ijahp.v12i3.822.

[6] J. H. Kim and B. S. Ahn, “Extended VIKOR method using incomplete criteria weights,” Expert Syst. Appl., vol. 126, pp. 124–132, Jul. 2019, doi: 10.1016/j.eswa.2019.02.019.

[7] M. Roberts et al., “Common pitfalls and recommendations for using machine learning to detect and prognosticate for COVID-19 using chest radiographs and CT scans,” Nat. Mach. Intell., vol. 3, no. 3, pp. 199–217, Mar. 2021, doi: 10.1038/s42256-021-00307-0.

[8] G. S. Collins et al., “Protocol for development of a reporting guideline (TRIPOD-AI) and risk of bias tool (PROBAST-AI) for diagnostic and prognostic prediction model studies based on artificial intelligence,” BMJ Open, vol. 11, no. 7, pp. 1–7, Jul. 2021, doi: 10.1136/bmjopen-2020-048008.

[9] L. Wynants et al., “Prediction models for diagnosis and prognosis of covid-19: systematic review and critical appraisal,” BMJ, vol. 369, pp. 1–11, Apr. 2020, doi: 10.1136/bmj.m1328.

[10] H. Hassan et al., “Review and classification of AI-enabled COVID-19 CT imaging models based on computer vision tasks,” Comput. Biol. Med., vol. 141, pp. 1–21, Feb. 2022, doi: 10.1016/j.compbiomed.2021.105123.

[11] S. Biswas, S. Chatterjee, A. Majee, S. Sen, F. Schwenker, and R. Sarkar, “Prediction of COVID-19 from Chest CT Images Using an Ensemble of Deep Learning Models,” Appl. Sci., vol. 11, no. 15, pp. 1–16, Jul. 2021, doi: 10.3390/app11157004.

[12] Sonali, S. Sahu, A. K. Singh, S. P. Ghrera, and M. Elhoseny, “An approach for de-noising and contrast enhancement of retinal fundus image using CLAHE,” Opt. Laser Technol., vol. 110, pp. 87–98, Feb. 2019, doi: 10.1016/j.optlastec.2018.06.061.

[13] P. Silva et al., “COVID-19 detection in CT images with deep learning: A voting-based scheme and cross-datasets analysis,” Informatics Med. Unlocked, vol. 20, pp. 1–9, 2020, doi: 10.1016/j.imu.2020.100427.

[14] R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, and D. Batra, “Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization,” in 2017 IEEE International Conference on Computer Vision (ICCV), 2017, pp. 618–626, doi: 10.1109/ICCV.2017.74.

[15] S. Lawton and S. Viriri, “Detection of COVID-19 from CT Lung Scans Using Transfer Learning,” Comput. Intell. Neurosci., vol. 2021, pp. 1–14, Apr. 2021, doi: 10.1155/2021/5527923.

[16] K. G. Dhal, A. Das, S. Ray, J. Gálvez, and S. Das, “Histogram Equalization Variants as Optimization Problems: A Review,” Arch. Comput. Methods Eng., vol. 28, no. 3, pp. 1471–1496, May 2021, doi: 10.1007/s11831-020-09425-1.

[17] E. Soares, P. Angelov, S. Biaso, M. H. Froes, and D. K. Abe, “SARS-CoV-2 CT-scan dataset: A large dataset of real patients CT scans for SARS-CoV-2 identification,” MedRxiv. Cold Spring Harbor Laboratory Press, pp. 1–8, 2020, doi: 10.1101/2020.04.24.2007858.

[18] A. Jaiswal, N. Gianchandani, D. Singh, V. Kumar, and M. Kaur, “Classification of the COVID-19 infected patients using DenseNet201 based deep transfer learning,” J. Biomol. Struct. Dyn., vol. 0, no. 0, pp. 1–8, 2020, doi: 10.1080/07391102.2020.1788642.

[19] A. Castiglione, P. Vijayakumar, M. Nappi, S. Sadiq, and M. Umer, “COVID-19: Automatic Detection of the Novel Coronavirus Disease From CT Images Using an Optimized Convolutional Neural Network,” IEEE Trans. Ind. Informatics, vol. 17, no. 9, pp. 6480–6488, Sep. 2021, doi: 10.1109/TII.2021.3057524.

[20] A. Jaiswal, N. Gianchandani, D. Singh, V. Kumar, and M. Kaur, “Classification of the COVID-19 infected patients using DenseNet201 based deep transfer learning,” J. Biomol. Struct. Dyn., vol. 39, no. 15, pp. 5682–5689, Oct. 2021, doi: 10.1080/07391102.2020.1788642.

[21] F. Chollet, Deep learning mit python und keras: das praxis-handbuch vom entwickler der keras-bibliothek. United States of America: MITP-Verlags GmbH & Co. KG, 2018. Available at: Google Books

[22] F. Zhuang et al., “A Comprehensive Survey on Transfer Learning,” Proc. IEEE, vol. 109, no. 1, pp. 43–76, Jan. 2021, doi: 10.1109/JPROC.2020.3004555.

[23] C. Shorten and T. M. Khoshgoftaar, “A survey on Image Data Augmentation for Deep Learning,” J. Big Data, vol. 6, no. 1, pp. 1–48, Dec. 2019, doi: 10.1186/s40537-019-0197-0.

[24] Jia Deng, Wei Dong, R. Socher, Li-Jia Li, Kai Li, and Li Fei-Fei, “ImageNet: A large-scale hierarchical image database,” in 2009 IEEE Conference on Computer Vision and Pattern Recognition, 2009, pp. 248–255, doi: 10.1109/CVPRW.2009.5206848.

[25] G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger, “Densely Connected Convolutional Networks,” in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 2261–2269, doi: 10.1109/CVPR.2017.243.

[26] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, “Rethinking the Inception Architecture for Computer Vision,” in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 2818–2826, doi: 10.1109/CVPR.2016.308.

[27] S. Tammina, “Transfer learning using VGG-16 with Deep Convolutional Neural Network for Classifying Images,” Int. J. Sci. Res. Publ., vol. 9, no. 10, pp. 143–150, Oct. 2019, doi: 10.29322/IJSRP.9.10.2019.p9420.

[28] Y. Yang, “Ensemble Learning,” in Temporal Data Mining Via Unsupervised Ensemble Learning, Elsevier, 2017, pp. 35–56. doi: 10.1016/B978-0-12-811654-8.00004-X

[29] S. González, S. García, J. Del Ser, L. Rokach, and F. Herrera, “A practical tutorial on bagging and boosting based ensembles for machine learning: Algorithms, software tools, performance study, practical perspectives and opportunities,” Inf. Fusion, vol. 64, pp. 205–237, Dec. 2020, doi: 10.1016/j.inffus.2020.07.007.

[30] A. Telikani, A. Tahmassebi, W. Banzhaf, and A. H. Gandomi, “Evolutionary Machine Learning: A Survey,” ACM Comput. Surv., vol. 54, no. 8, pp. 1–35, Nov. 2022, doi: 10.1145/3467477.

[31] S. Simske, “Introduction, overview, and applications,” in Meta-Analytics, Elsevier, 2019, pp. 1–98. doi: 10.1016/B978-0-12-814623-1.00001-0




Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

___________________________________________________________
International Journal of Advances in Intelligent Informatics
ISSN 2442-6571  (print) | 2548-3161 (online)
Organized by UAD and ASCEE Computer Society
Published by Universitas Ahmad Dahlan
W: http://ijain.org
E: info@ijain.org (paper handling issues)
   andri.pranolo.id@ieee.org (publication issues)

View IJAIN Stats

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0