Cluster analysis and ensemble transfer learning for COVID-19 classification from computed tomography scans

Lyubomir Gotsev; Ivan Mitkov; Eugenia Kovatcheva; Boyan Jekov; Roumen Nikolov; Elena Shoikova; Milena Petkova

doi:10.26555/ijain.v8i2.817


Cluster analysis and ensemble transfer learning for COVID-19 classification from computed tomography scans

^{(1) *} Lyubomir Gotsev

(State University of Library Studies and Information Technologies, Sofia, Bulgaria, Bulgaria)
⁽²⁾ Ivan Mitkov

(State University of Library Studies and Information Technologies, Sofia, Bulgaria, Bulgaria)
⁽³⁾ Eugenia Kovatcheva

(State University of Library Studies and Information Technologies, Sofia, Bulgaria, Bulgaria)
⁽⁴⁾ Boyan Jekov

(State University of Library Studies and Information Technologies, Sofia, Bulgaria, Bulgaria)
⁽⁵⁾ Roumen Nikolov

(State University of Library Studies and Information Technologies, Sofia, Bulgaria, Bulgaria)
⁽⁶⁾ Elena Shoikova

(State University of Library Studies and Information Technologies, Sofia, Bulgaria, Bulgaria)
⁽⁷⁾ Milena Petkova

(State University of Library Studies and Information Technologies, Sofia, Bulgaria, Bulgaria)
^*corresponding author

Abstract

The paper presents a brief analysis of publications utilizing the public SARS-CoV-2 dataset, consisting of patientsâ€™ computer tomography scans captured from Brazil hospitals and an experimental setup addressing the found data challenges. The analysis shows that all protocols, with one exception, suffer from data leakage arising from data organization where the patients and their images are not grouped. Each patient is represented with several scans. It can provide misleading results as data of the same individual may occur in both training and test sets. Furthermore, only one paper proposed ensemble learning utilizing as base models VGG-16, ResNet50, and Xception. Therefore, we proposed and experimented with the following strategy to mitigate the found risks of bias: data standardization and normalization to achieve proper contrast and resolution; k-means and group shuffle split to avoid data leakage; augmentation and ensemble transfer learning to deal with limited sample size and over-fitting. Compared with the earlier proposed ensemble approach, the current one stacks VGG-16, Densenet-201, and Inception v3, achieving higher accuracy (99.3 %), second in the related work, and most significantly, it applies augmentation and clustering analysis to avoid overestimation. In contrast, the paper also presented critical metrics in the medical domain: negative prediction value (99.55%), false positive rate (0.89%), false negative rate (0.42%), and false discovery rate (0.83%). The strategy has two main advantages: reducing data pitfalls and decreasing generalization error. It can serve as a baseline to increase the performance quality and mitigate the risk of bias in the field.

Keywords

COVID-19; Computed Tomography; Clustering; Transfer Learning; Ensemble Learning

DOI

https://doi.org/10.26555/ijain.v8i2.817

Article metrics

Abstract views : 2287 | PDF views : 420

Cite

How to cite item

Full Text

Download

References

[1] E. Awulachew, K. Diriba, A. Anja, E. Getu, and F. Belayneh, â€œComputed Tomography (CT) Imaging Features of Patients with COVID-19: Systematic Review and Meta-Analysis,â€ Radiol. Res. Pract., vol. 2020, pp. 1â€“8, Jul. 2020, doi: 10.1155/2020/1023506.

[2] I. Soriano Aguadero et al., â€œChest computed tomography findings in different phases of SARS-CoV-2 infection,â€ Radiol. (English Ed., vol. 63, no. 3, pp. 218â€“227, May 2021, doi: 10.1016/j.rxeng.2021.02.003.

[3] O. S. Albahri et al., â€œSystematic review of artificial intelligence techniques in the detection and classification of COVID-19 medical images in terms of evaluation and benchmarking: Taxonomy analysis, challenges, future solutions and methodological aspects,â€ J. Infect. Public Health, vol. 13, no. 10, pp. 1381â€“1396, Oct. 2020, doi: 10.1016/j.jiph.2020.06.028.

[4] F. Shi et al., â€œReview of Artificial Intelligence Techniques in Imaging Data Acquisition, Segmentation, and Diagnosis for COVID-19,â€ IEEE Rev. Biomed. Eng., vol. 14, pp. 4â€“15, 2021, doi: 10.1109/RBME.2020.2987975.

[5] A. U. Khan and Y. Ali, â€œAnalytical Hierarchy Process (AHP) and analytic network process methods and their applications: a twenty year review from 2000-2019,â€ Int. J. Anal. Hierarchy Process, vol. 12, no. 3, pp. 369â€“459, Dec. 2020, doi: 10.13033/ijahp.v12i3.822.

[6] J. H. Kim and B. S. Ahn, â€œExtended VIKOR method using incomplete criteria weights,â€ Expert Syst. Appl., vol. 126, pp. 124â€“132, Jul. 2019, doi: 10.1016/j.eswa.2019.02.019.

[7] M. Roberts et al., â€œCommon pitfalls and recommendations for using machine learning to detect and prognosticate for COVID-19 using chest radiographs and CT scans,â€ Nat. Mach. Intell., vol. 3, no. 3, pp. 199â€“217, Mar. 2021, doi: 10.1038/s42256-021-00307-0.

[8] G. S. Collins et al., â€œProtocol for development of a reporting guideline (TRIPOD-AI) and risk of bias tool (PROBAST-AI) for diagnostic and prognostic prediction model studies based on artificial intelligence,â€ BMJ Open, vol. 11, no. 7, pp. 1â€“7, Jul. 2021, doi: 10.1136/bmjopen-2020-048008.

[9] L. Wynants et al., â€œPrediction models for diagnosis and prognosis of covid-19: systematic review and critical appraisal,â€ BMJ, vol. 369, pp. 1â€“11, Apr. 2020, doi: 10.1136/bmj.m1328.

[10] H. Hassan et al., â€œReview and classification of AI-enabled COVID-19 CT imaging models based on computer vision tasks,â€ Comput. Biol. Med., vol. 141, pp. 1â€“21, Feb. 2022, doi: 10.1016/j.compbiomed.2021.105123.

[11] S. Biswas, S. Chatterjee, A. Majee, S. Sen, F. Schwenker, and R. Sarkar, â€œPrediction of COVID-19 from Chest CT Images Using an Ensemble of Deep Learning Models,â€ Appl. Sci., vol. 11, no. 15, pp. 1â€“16, Jul. 2021, doi: 10.3390/app11157004.

[12] Sonali, S. Sahu, A. K. Singh, S. P. Ghrera, and M. Elhoseny, â€œAn approach for de-noising and contrast enhancement of retinal fundus image using CLAHE,â€ Opt. Laser Technol., vol. 110, pp. 87â€“98, Feb. 2019, doi: 10.1016/j.optlastec.2018.06.061.

[13] P. Silva et al., â€œCOVID-19 detection in CT images with deep learning: A voting-based scheme and cross-datasets analysis,â€ Informatics Med. Unlocked, vol. 20, pp. 1â€“9, 2020, doi: 10.1016/j.imu.2020.100427.

[14] R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, and D. Batra, â€œGrad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization,â€ in 2017 IEEE International Conference on Computer Vision (ICCV), 2017, pp. 618â€“626, doi: 10.1109/ICCV.2017.74.

[15] S. Lawton and S. Viriri, â€œDetection of COVID-19 from CT Lung Scans Using Transfer Learning,â€ Comput. Intell. Neurosci., vol. 2021, pp. 1â€“14, Apr. 2021, doi: 10.1155/2021/5527923.

[16] K. G. Dhal, A. Das, S. Ray, J. GÃ¡lvez, and S. Das, â€œHistogram Equalization Variants as Optimization Problems: A Review,â€ Arch. Comput. Methods Eng., vol. 28, no. 3, pp. 1471â€“1496, May 2021, doi: 10.1007/s11831-020-09425-1.

[17] E. Soares, P. Angelov, S. Biaso, M. H. Froes, and D. K. Abe, â€œSARS-CoV-2 CT-scan dataset: A large dataset of real patients CT scans for SARS-CoV-2 identification,â€ MedRxiv. Cold Spring Harbor Laboratory Press, pp. 1â€“8, 2020, doi: 10.1101/2020.04.24.2007858.

[18] A. Jaiswal, N. Gianchandani, D. Singh, V. Kumar, and M. Kaur, â€œClassification of the COVID-19 infected patients using DenseNet201 based deep transfer learning,â€ J. Biomol. Struct. Dyn., vol. 0, no. 0, pp. 1â€“8, 2020, doi: 10.1080/07391102.2020.1788642.

[19] A. Castiglione, P. Vijayakumar, M. Nappi, S. Sadiq, and M. Umer, â€œCOVID-19: Automatic Detection of the Novel Coronavirus Disease From CT Images Using an Optimized Convolutional Neural Network,â€ IEEE Trans. Ind. Informatics, vol. 17, no. 9, pp. 6480â€“6488, Sep. 2021, doi: 10.1109/TII.2021.3057524.

[20] A. Jaiswal, N. Gianchandani, D. Singh, V. Kumar, and M. Kaur, â€œClassification of the COVID-19 infected patients using DenseNet201 based deep transfer learning,â€ J. Biomol. Struct. Dyn., vol. 39, no. 15, pp. 5682â€“5689, Oct. 2021, doi: 10.1080/07391102.2020.1788642.

[21] F. Chollet, Deep learning mit python und keras: das praxis-handbuch vom entwickler der keras-bibliothek. United States of America: MITP-Verlags GmbH & Co. KG, 2018. Available at: Google Books

[22] F. Zhuang et al., â€œA Comprehensive Survey on Transfer Learning,â€ Proc. IEEE, vol. 109, no. 1, pp. 43â€“76, Jan. 2021, doi: 10.1109/JPROC.2020.3004555.

[23] C. Shorten and T. M. Khoshgoftaar, â€œA survey on Image Data Augmentation for Deep Learning,â€ J. Big Data, vol. 6, no. 1, pp. 1â€“48, Dec. 2019, doi: 10.1186/s40537-019-0197-0.

[24] Jia Deng, Wei Dong, R. Socher, Li-Jia Li, Kai Li, and Li Fei-Fei, â€œImageNet: A large-scale hierarchical image database,â€ in 2009 IEEE Conference on Computer Vision and Pattern Recognition, 2009, pp. 248â€“255, doi: 10.1109/CVPRW.2009.5206848.

[25] G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger, â€œDensely Connected Convolutional Networks,â€ in 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017, pp. 2261â€“2269, doi: 10.1109/CVPR.2017.243.

[26] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, â€œRethinking the Inception Architecture for Computer Vision,â€ in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 2818â€“2826, doi: 10.1109/CVPR.2016.308.

[27] S. Tammina, â€œTransfer learning using VGG-16 with Deep Convolutional Neural Network for Classifying Images,â€ Int. J. Sci. Res. Publ., vol. 9, no. 10, pp. 143â€“150, Oct. 2019, doi: 10.29322/IJSRP.9.10.2019.p9420.

[28] Y. Yang, â€œEnsemble Learning,â€ in Temporal Data Mining Via Unsupervised Ensemble Learning, Elsevier, 2017, pp. 35â€“56. doi: 10.1016/B978-0-12-811654-8.00004-X

[29] S. GonzÃ¡lez, S. GarcÃa, J. Del Ser, L. Rokach, and F. Herrera, â€œA practical tutorial on bagging and boosting based ensembles for machine learning: Algorithms, software tools, performance study, practical perspectives and opportunities,â€ Inf. Fusion, vol. 64, pp. 205â€“237, Dec. 2020, doi: 10.1016/j.inffus.2020.07.007.

[30] A. Telikani, A. Tahmassebi, W. Banzhaf, and A. H. Gandomi, â€œEvolutionary Machine Learning: A Survey,â€ ACM Comput. Surv., vol. 54, no. 8, pp. 1â€“35, Nov. 2022, doi: 10.1145/3467477.

[31] S. Simske, â€œIntroduction, overview, and applications,â€ in Meta-Analytics, Elsevier, 2019, pp. 1â€“98. doi: 10.1016/B978-0-12-814623-1.00001-0

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

___________________________________________________________
International Journal of Advances in Intelligent Informatics
ISSN 2442-6571 (print) | 2548-3161 (online)
Organized by UAD and ASCEE Computer Society
Published by Universitas Ahmad Dahlan
W: http://ijain.org
E: info@ijain.org (paper handling issues)
andri.pranolo.id@ieee.org (publication issues)

View IJAIN Stats

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0

Username
Password
Remember me