Leveraging social media data using latent dirichlet allocation and naïve bayes for mental health sentiment analytics on Covid-19 pandemic

(1) Nurzulaikha Khalid Mail (College of Computing Informatics and Mathematics, Universiti Teknologi MARA, Selangor, Malaysia)
(2) * Shuzlina Abdul-Rahman Mail (Research Initiative Group of Intelligent Systems, Universiti Teknologi MARA, Selangor, Malaysia)
(3) Wahyu Wibowo Mail (Institut Teknologi Sepuluh Nopember (ITS), Surabaya, Indonesia)
(4) Nur Atiqah Sia Abdullah Mail (College of Computing Informatics and Mathematics, Universiti Teknologi MARA, Selangor,, Malaysia)
(5) Sofianita Mutalib Mail (Research Initiative Group of Intelligent Systems, Universiti Teknologi MARA, Selangor, Malaysia)
*corresponding author


In Malaysia, during the early stages of the COVID-19 pandemic, the negative impact on mental health became noticeable. The public's psychological and behavioral responses have risen as the COVID-19 outbreak progresses. A high impression of severity, vulnerability, impact, and fear was the element that influenced higher anxiety. Social media data can be used to track Malaysian sentiments in the COVID-19 era. However, it is often found on the internet in text format with no labels, and manually decoding this data is usually complicated. Furthermore, traditional data-gathering approaches, such as filling out a survey form, may not completely capture the sentiments. This study uses a text mining technique called Latent Dirichlet Allocation (LDA) on social media to discover mental health topics during the COVID-19 pandemic. Then, a model is developed using a hybrid approach, combining both lexicon-based and Naïve Bayes classifier. The accuracy, precision, recall, and F-measures are used to evaluate the sentiment classification. The result shows that the best lexicon-based technique is VADER with 72% accuracy compared to TextBlob with 70% accuracy. These sentiments results allow for a better understanding and handling of the pandemic. The top three topics are identified and further classified into positive and negative comments. In conclusion, the developed model can assist healthcare workers and policymakers in making the right decisions in the upcoming pandemic outbreaks.


Social Media; COVID-19; Latent Dirichlet Allocation (LDA); Lexicon-Based; Mental Health; Naïve Bayes




Article metrics

Abstract views : 284 | PDF views : 67




Full Text



[1] A. A. Zanke, R. R. Thenge, and V. S. Adhao, “COVID-19: A pandemic declare by world health organization,” IP Int. J. Compr. Adv. Pharmacol., vol. 5, no. 2, pp. 49–57, 2020, doi: 10.18231/j.ijcaap.2020.012.

[2] H. Shanmugam, J. A. Juhari, P. Nair, S. K. Chow, and C. G. Ng, “Impacts of COVID-19 Pandemic on Mental Health in Malaysia: A Single Thread of Hope | Shanmugam | Malaysian Journal of Psychiatry,” Malaysian J. Psychiatry Ejournal, vol. 29, no. 1, pp. 78–84, 2020. Available at : https://www.mjpsychiatry.org/index.php/mjp/article/view/536/415

[3] L. Ping Wong and H. Alias, “Temporal changes in psychobehavioural responses during the early phase of the COVID-19 pandemic in Malaysia,” J. Behav. Med., vol. 44, pp. 18–28, 2021, doi: 10.1007/s10865-020-00172-z.

[4] P. E. Kummervold et al., “Categorizing Vaccine Confidence With a Transformer-Based Machine Learning Model: Analysis of Nuances of Vaccine Sentiment in Twitter Discourse,” JMIR Med. informatics, vol. 9, no. 10, p. e29584, 2021, doi: 10.2196/29584.

[5] Z. Wang, V. Joo, C. Tong, and D. Chan, “Issues of social data analytics with a new method for sentiment analysis of social media data,” Proc. Int. Conf. Cloud Comput. Technol. Sci. CloudCom, vol. 2015-Febru, no. February, pp. 899–904, 2015, doi: 10.1109/CloudCom.2014.40.

[6] M. Ghiassi and S. Lee, “A domain transferable lexicon set for Twitter sentiment analysis using a supervised machine learning approach,” Expert Syst. Appl., vol. 106, pp. 197–216, 2018, doi: 10.1016/j.eswa.2018.04.006.

[7] L. P. Wong et al., “Escalating progression of mental health disorders during the COVID-19 pandemic: Evidence from a nationwide survey,” PLoS One, vol. 16, no. 3 March, pp. 1–14, 2021, doi: 10.1371/journal.pone.0248916.

[8] M. F. bin Hassan, N. M. Hassan, E. S. Kassim, and M. I. Hamzah, “Issues and Challenges of Mental Health in Malaysia,” Int. J. Acad. Res. Bus. Soc. Sci., vol. 8, no. 12, pp. 1685-1696, Dec. 2018, doi: 10.6007/IJARBSS/V8-I12/5288.

[9] W. D. Shoesmith et al., “Reactions to symptoms of mental disorder and help seeking in Sabah, Malaysia,” Int. J. Soc. Psychiatry, vol. 64, no. 1, pp. 49–55, 2018, doi: 10.1177/0020764017739643.

[10] S. E. Cho, K. Jung, and H. W. Park, “Social media use during Japan ’ s 2011 earthquake : How Twitter transforms the locus of crisis communication,” vol. 149, no. 1 pp. 28–40, Nov. 2013, doi: 10.1177/1329878X1314900105.

[11] WHO, “WHO urges more investments, services for mental health 2019,” 2019. Available at : https://www.who.int/news/item/12-08-2010-who-urges-more-investments-services-for-mental-health

[12] C. Wang et al., “A longitudinal study on the mental health of general population during the COVID-19 epidemic in China,” Brain. Behav. Immun., vol. 87, pp. 40–48, 2020, doi: 10.1016/j.bbi.2020.04.028.

[13] D. Bose, P. S. Aithal, and S. Roy, “Survey of Twitter Viewpoint on Application of Drugs by VADER Sentiment Analysis among Distinct Countries,” Int. J. Manag. Technol. Soc. Sci., vol. 6, no. 1, pp. 110–127, 2021, doi: 10.47992/IJMTS.2581.6012.0132.

[14] J. Torales, M. O’Higgins, J. M. Castaldelli-Maia, and A. Ventriglio, “The outbreak of COVID-19 coronavirus and its impact on global mental health,” Int. J. Soc. Psychiatry, vol. 66, no. 4, pp. 317–320, 2020, doi: 10.1177/0020764020915212.

[15] Y.-T. Xiang, Y. Jin, and T. Cheung, “Joint international collaboration to combat mental health challenges during the coronavirus disease 2019 pandemic,” JAMA psychiatry, vol. 77, no. 10, pp. 989–990, 2020, doi: 10.1001/jamapsychiatry.2020.1057.

[16] C. Wang et al., “Immediate psychological responses and associated factors during the initial stage of the 2019 coronavirus disease (COVID-19) epidemic among the general population in China,” Int. J. Environ. Res. Public Health, vol. 17, no. 5, p. 1729, 2020, doi: 10.3390/ijerph17051729.

[17] Y. Song et al., “COVID-19 treatment: close to a cure? A rapid review of pharmacotherapies for the novel coronavirus (SARS-CoV-2),” Int. J. Antimicrob. Agents, vol. 56, no. 2, pp. 1-8, Aug. 2020, doi: 10.1016/J.IJANTIMICAG.2020.106080.

[18] R. Mohindra, R. Ravaki, V. Suri, A. Bhalla, and S. M. Singh, “Issues relevant to mental health promotion in frontline health care providers managing quarantined/isolated COVID19 patients,” Asian J Psychiatr, vol. 51, no. 3, p. 102084, 2020, doi: 10.1016/j.ajp.2020.102084.

[19] A. Roy, A. K. Singh, S. Mishra, A. Chinnadurai, A. Mitra, and O. Bakshi, “Mental health implications of COVID-19 pandemic and its response in India,” Int. J. Soc. Psychiatry, vol. 67, no. 5, pp. 587–600, 2021, doi: 10.1177/0020764020950769.

[20] L. McCay-Peet and A. Quan-Haase, “What is social media and what questions can social media research help us answer,” SAGE Handb. Soc. media Res. methods, pp. 13–26, 2017, doi: 10.4135/9781473983847.n2

[21] E. Lunstrum, “Feed them to the lions: Conservation violence goes online,” Geoforum, vol. 79, pp. 134–143, 2017, doi: 10.1016/j.geoforum.2016.04.009

[22] D. W. Macdonald, K. S. Jacobsen, D. Burnham, P. J. Johnson, and A. J. Loveridge, “Cecil: A moment or a movement? Analysis of media coverage of the death of a lion, Panthera leo,” Animals, vol. 6, no. 5, pp. 26-38, 2016, doi: 10.3390/ani6050026.

[23] E. Di Minin, C. Fink, T. Hiippala, and H. Tenkanen, “A framework for investigating illegal wildlife trade on social media with machine learning,” Conserv. Biol., vol. 33, no. 1, p. 210-2014, 2019, doi: 10.1111/cobi.13104.

[24] L. See et al., “Crowdsourcing, citizen science or volunteered geographic information? The current state of crowdsourced geographic information,” ISPRS Int. J. Geo-Information, vol. 5, no. 5, pp. 55-77, 2016, doi: 10.3390/ijgi5050055.

[25] K. Sherren, M. Smit, M. Holmlund, J. R. Parkins, and Y. Chen, “Conservation culturomics should include images and a wider range of scholars,” Front. Ecol. Environ., vol. 15, no. 6, pp. 289–290, 201, doi: 10.1002/fee.1507.

[26] E. M. Glowacki, A. J. Lazard, and G. B. Wilcox, “E-cigarette topics shared by medical professionals: a comparison of tweets from the United States and United Kingdom,” Cyberpsychology, Behav. Soc. Netw., vol. 20, no. 2, pp. 133–137, 2017, doi: 10.1089/cyber.2016.0409.

[27] A. Wahbeh, T. Nasralah, M. Al-Ramahi, and O. El-Gayar, “Mining physicians’ opinions on social media to obtain insights into COVID-19: Mixed methods analysis,” JMIR Public Heal. Surveill., vol. 6, no. 2, pp.1-10, 2020, doi: 10.2196/19276.

[28] G. Kaplan and Z. Y. Avdan, “COVID-19: Spaceborne nitrogen dioxide over Turkey,” Eskişehir Tech. Univ. J. Sci. Technol. A-Applied Sci. Eng., vol. 21, no. 2, pp. 251–255, 2020, doi: 10.18038/estubtda.724450.

[29] E. Chen, K. Lerman, and E. Ferrara, “Covid-19: The first public coronavirus twitter dataset,” JMIR Public Heal. Surveill, vol. 6, no. 2, pp. 1–9, May 2020, doi: 10.2196/19273.

[30] S. Kamal and M. S. Arefin, “Impact analysis of facebook in family bonding,” Soc. Netw. Anal. Min., vol. 6, no. 1, p. 9, 2016, doi: 10.1007/s13278-015-0314-9.

[31] H. T. Vu, M. Blomberg, H. Seo, Y. Liu, F. Shayesteh, and H. V. Do, “Social media and environmental activism: Framing climate change on Facebook by global NGOs,” Sci. Commun., vol. 43, no. 1, pp. 91–115, 2021, doi: 10.1177/1075547020971644

[32] B. Gokulakrishnan, P. Priyanthan, T. Ragavan, N. Prasath, and As. Perera, “Opinion mining and sentiment analysis on a twitter data stream,” in International Conference on Advances in ICT for Emerging Regions (ICTer2012), IEEE, 2012, pp. 182–188, doi: 10.1109/ICTer.2012.6423033.

[33] J. Frankenfield, “Artificial Intelligence (AI),” 2022. Available at : https://www.investopedia.com/terms/a/artificial-intelligence-ai.asp

[34] A. Haleem, M. Javaid, and R. Vaishya, “Effects of COVID-19 pandemic in daily life,” Curr. Med. Res. Pract., vol. 10, no. 2, pp. 78–79, Mar. 2020, doi: 10.1016/J.CMRP.2020.03.011.

[35] X. Mei et al., “Artificial intelligence–enabled rapid diagnosis of patients with COVID-19,” Nat. Med., vol. 26, no. 8, pp. 1224–1228, 2020, doi: 10.1038/s41591-020-0931-3.

[36] L. Wynants et al., “Prediction models for diagnosis and prognosis of covid-19: Systematic review and critical appraisal,” BMJ, vol. 369, pp.1-22, 2020, doi: 10.1136/bmj.m1328.

[37] R. Saha, S. Aich, S. Tripathy, and H. C. Kim, “Artificial intelligence is reshaping healthcare amid covid-19: A review in the context of diagnosis & prognosis,” Diagnostics, vol. 11, no. 9, pp. 1–15, 2021, doi: 10.3390/diagnostics11091604.

[38] G. S. Randhawa, M. P. M. Soltysiak, H. El Roz, C. P. E. de Souza, K. A. Hill, and L. Kari, “Machine learning using intrinsic genomic signatures for rapid classification of novel pathogens: COVID-19 case study,” PLoS One, vol. 15, no. 4, pp. 1–24, 2020, doi: 10.1371/journal.pone.0232391.

[39] M. Loey, G. Manogaran, M. H. N. Taha, and N. E. M. Khalifa, “A hybrid deep transfer learning model with machine learning methods for face mask detection in the era of the COVID-19 pandemic,” Meas. J. Int. Meas. Confed., vol. 167, pp. 1-11, 2021, doi: 10.1016/j.measurement.2020.108288.

[40] F. Piccialli, V. S. di Cola, F. Giampaolo, and S. Cuomo, “The Role of Artificial Intelligence in Fighting the COVID-19 Pandemic,” Inf. Syst. Front., vol. 23, no. 6, pp. 1467–1497, 2021, doi: 10.1007/s10796-021-10131-x.

[41] K. Chakraborty, S. Bhatia, S. Bhattacharyya, J. Platos, R. Bag, and A. E. Hassanien, “Sentiment Analysis of COVID-19 tweets by Deep Learning Classifiers—A study to show how popularity is affecting accuracy in social media,” Appl. Soft Comput., vol. 97, pp. 1-14, 2020, doi: 10.1016/j.asoc.2020.106754.

[42] B. Liu, Sentiment Analysis: Mining Opinions, Sentiments, and Emotions. Cambridge: Cambridge University Press, 2015, doi: 10.1017/CBO9781139084789.

[43] P. Chauhan, “Sentiment Analysis: A Comparative Study of Supervised Machine Learning Algorithms Using Rapid miner,” Int. J. Res. Appl. Sci. Eng. Technol., vol. V, no. XI, pp. 80–89, 2017, doi: 10.22214/ijraset.2017.11011.

[44] A. Alamoodi et al., “Sentiment analysis and its applications in fighting COVID-19 and infectious diseases: A systematic review,” Expert Syst. Appl., vol. 167, p. 1-13, 2020, doi: 10.1016/j.eswa.2020.114155.

[45] D. Valdez, M. ten Thij, K. Bathina, L. A. Rutter, and J. Bollen, “Social media insights into US mental health during the COVID-19 pandemic: Longitudinal analysis of twitter data,” J. Med. Internet Res., vol. 22, no. 12, pp. 1–11, 2020, doi: 10.2196/21418.

[46] S. Elbagir and J. Yang, “Language Toolkit and VADER Sentiment,” Proceedings of the International MultiConference of Engineers and Computer Scientists, pp, 1-5, 2019. Available at : https://www.iaeng.org/publication/IMECS2019/IMECS2019_pp12-16.pdf

[47] M. T. Ribeiro and C. Guestrin, “Model-Agnostic Interpretability of Machine Learning,” University of Washington Seattle, pp. 91-95, 2016, doi: 10.48550/arXiv.1606.05386

[48] R. A. Priyadharshini, S. Arivazhagan, and M. Arun, “A deep learning approach for person identification using ear biometrics,” Appl. Intell., vol. 51, no. 4, pp. 2161–2172, 2021, doi: 10.1007/s10489-020-01995-8.

[49] N. K. Chauhan and K. Singh, “A review on conventional machine learning vs deep learning,” in 2018 International Conference on Computing, Power and Communication Technologies (GUCON), IEEE, 2018, pp. 347–352, doi: 10.1109/GUCON.2018.8675097.

[50] D. Sharma, M. Sabharwal, V. Goyal, and M. Vij, “Sentiment analysis techniques for social media data: A review,” Adv. Intell. Syst. Comput., vol. 1045, no. September, pp. 75–90, 2020, doi: 10.1007/978-981-15-0029-9_7.

[51] K. B. Priya Iyer and S. Kumaresh, “Twitter sentiment analysis on coronavirus outbreak using machine learning algorithms,” Eur. J. Mol. Clin. Med., vol. 7, no. 3, pp. 2663–2676, 2020. Available at : https://ejmcm.com/article_3797.

[52] M. E. Roberts, B. M. Stewart, and D. Tingley, “stm: R package for Structural Topic Models; 2017,” R Packag. version 0.6, vol. 21, pp. 1-40, 2016, doi: 10.18637/jss.v091.i02.

[53] R. Albalawi, T. H. Yeap, and M. Benyoucef, “Using topic modeling methods for short-text data: A comparative analysis,” Front. Artif. Intell., vol. 3, pp. 1-14, 2020, doi: 10.3389/frai.2020.00042.

[54] R. Debnath and R. Bardhan, “India nudges to contain COVID-19 pandemic: A reactive public policy analysis using machine-learning based topic modelling,” PLoS One, vol. 15, no. 9, pp. 1-25, 2020, doi: 10.1371/journal.pone.0238972

[55] Y.-C. Wu, C.-S. Chen, and Y.-J. Chan, “The outbreak of COVID-19: An overview,” J. Chinese Med. Assoc., vol. 83, no. 3, p. 217-220, 2020, doi: 10.1097/JCMA.0000000000000270.

[56] Y. Du, “A Deep Topical N-gram Model and Topic Discovery on COVID-19 News and Research Manuscripts,” Electron. Thesis Diss. Repos., pp. 1-96, 2021. Available at: https://ir.lib.uwo.ca/etd/7797/

[57] L. Liu, L. Tang, W. Dong, S. Yao, and W. Zhou, “An overview of topic modeling and its current applications in bioinformatics,” Springerplus, vol.1608, pp. 1-22, 2016, doi: 10.1186/s40064-016-3252-8.

[58] H. Jiang, R. Zhou, L. Zhang, H. Wang, and Y. Zhang, “Sentence level topic models for associated topics extraction,” World Wide Web, vol. 22, no. 6, pp. 2545–2560, 2019, doi: 10.1007/s11280-018-0639-1.

[59] G. Narravula, “Text Embedding Based Topic Modeling on Noisy Historical Drilling Data,” Dalhousie University, no. Dec-2021, pp. 1-74, 2021. Available at: https://dalspace.library.dal.ca/handle/10222/81119.

[60] T. Hofmann, “Probabilistic latent semantic indexing,” in Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 1999, Association for Computing Machinery, Inc, Aug. 1999, pp. 50–57. doi: 10.1145/312624.312649.

[61] D. M. Blei, A. Y. Ng, and J. B. Edu, “Latent dirichlet allocation,” J. Mach. Learn. Res., Mar. 2003, doi: 10.5555/944919.944937.

[62] P. Kherwa and P. Bansal, “Topic Modeling: A Comprehensive Review,” EAI Endorsed Trans. Scalable Inf. Syst., vol. “7,” no. 24, pp. 1–16, Jul. 2019, doi: 10.4108/EAI.13-7-2018.159623.

[63] J. Xue et al., “Twitter discussions and emotions about the COVID-19 pandemic: Machine learning approach,” J. Med. Internet Res., vol. 22, no. 11, pp. 1-14, 2020, doi: 10.2196/20550.

[64] M. Röder, A. Both, and A. Hinneburg, “Exploring the space of topic coherence measures,” in Proceedings of the eighth ACM international conference on Web search and data mining, pp. 399–408, 2015, doi: 10.1145/2684822.2685324

[65] H. M. Alash and G. A. Al-Sultany, “Improve topic modeling algorithms based on Twitter hashtags,” J. Phys. Conf. Ser., vol. 1660, no. 1, pp. 1-10, 2020, doi: 10.1088/1742-6596/1660/1/012100.

[66] H. Yin, S. Yang, and J. Li, “Detecting Topic and Sentiment Dynamics Due to COVID-19 Pandemic Using Social Media,” Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), vol. 12447 LNAI, no. June, pp. 610–623, 2020, doi: 10.1007/978-3-030-65390-3_46.

[67] M. Ismail, “Sentiment Analysis of Patients’ Opinions in Healthcare using Lexicon-based Method,” Int. J. Eng. Adv. Technol., vol. 9, no. 1, pp. 6977–6981, 2019, doi: 10.35940/ijeat.a2141.109119.

[68] C. Borchers, J. M. Rosenberg, B. Gibbons, and M. A. Burchfield, “To Scale or Not to Scale : Comparing Popular Sentiment Analysis Dictionaries on Educational Twitter Data,” Fourteenth International Conference on Educational Data Mining (EDM 2021), pp. 2–7, 2021. Available at: https://educationaldatamining.org/EDM2021/virtual/static/pdf/EDM21_paper_122.pdf

[69] V. D. Chaithra, “Hybrid approach: naive bayes and sentiment VADER for analyzing sentiment of mobile unboxing video comments,” Int. J. Electr. Comput. Eng., vol. 9, no. 5, pp. 4452–4459, 2019, doi: 10.11591/ijece.v9i5.pp4452-4459.

[70] M. Umair and A. Hakim, “Sentiment Analysis of Students’ Feedback before and after COVID-19 Pandemic Sentiment analysis of Students Feedback before and after COVID-19 Pandemic View project,” Int. Journal on Emerging Tech., vol.12, no.2, pp.177-182, July, 2021. Available at: https://www.researchgate.net/publication/353305417_Sentiment_Analysis_of_Students'_Feedback_before_and_after_COVID-19_Pandemic.

[71] M. Ahmad, S. Aftab, M. S. Bashir, and N. Hameed, “Sentiment analysis using SVM: A systematic literature review,” Int. J. Adv. Comput. Sci. Appl., vol. 9, no. 2, pp. 182–188, 2018, doi: 10.14569/IJACSA.2018.090226.

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

International Journal of Advances in Intelligent Informatics
ISSN 2442-6571  (print) | 2548-3161 (online)
Organized by UAD and ASCEE Computer Society
Published by Universitas Ahmad Dahlan
W: http://ijain.org
E: info@ijain.org (paper handling issues)
   andri.pranolo.id@ieee.org (publication issues)

View IJAIN Stats

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0