Comparative study of predictive models for hoax and disinformation detection in indonesian news

Nadia Paramita Retno Adiati; Dimas Febriyan Priambodo; Girinoto Girinoto; Santi Indarjani; Akhmad Rizal; Arga Prayoga; Yehezikha Beatrix

doi:10.26555/ijain.v10i3.878


Comparative study of predictive models for hoax and disinformation detection in indonesian news

⁽¹⁾ Nadia Paramita Retno Adiati

(Politeknik Siber dan Sandi Negara, Indonesia)
^{(2) *} Dimas Febriyan Priambodo

(Politeknik Siber dan Sandi Negara, Indonesia)
⁽³⁾ Girinoto Girinoto

(Politeknik Siber dan Sandi Negara, Indonesia)
⁽⁴⁾ Santi Indarjani

(Politeknik Siber dan Sandi Negara, Indonesia)
⁽⁵⁾ Akhmad Rizal

(Politeknik Siber dan Sandi Negara, Indonesia)
⁽⁶⁾ Arga Prayoga

(Politeknik Siber dan Sandi Negara, Indonesia)
⁽⁷⁾ Yehezikha Beatrix

(Politeknik Siber dan Sandi Negara, Indonesia)
^*corresponding author

Abstract

Along with the times, false information easily spreads, including in Indonesia.Â In Press Release No.485/HM/KOMINFO/12/2021 the Ministry of Communication and Information has cut off access to 565,449 negative content and published 1,773 clarifications on hoax and disinformation content. Research has been carried out regarding this matter, but it is necessary to classify fake news into disinformation and hoaxes. This study presents a comparison between our proposed model, which is an ensemble of shallow learning predictive models, namely Random Forest, Passive Aggressive Classifier, and Cosine Similarity, and the deep learning model that uses BERT-Indo for classification. Both models are trained using equivalent datasets, which contain 8757 news, consisting of 3000 valid news, 3000 hoax news, and 2757 disinformation news. These news were obtained from websites such as CNN, Kompas, Detik, Kominfo, Temanggung Mediacenter, Hoaxdb Aceh, Turnback Hoax, and Antara, which were then cleaned from all unnecessary substances, such as punctuation marks, numbers, Unicode, stopwords, and suffixes using the Sastrawi library. At the benchmarking stage, the shallow learning model is evaluated to increase accuracy by applying ensemble learning combined using hard voting.Â This results in higher values, with an accuracy of 98.125%, precision of 98.2%, F-1 score of 98.1%, and recall of 98.1%, compared to the BERT-Indo model which only achieved 96.918% accuracy, 96.069% precision, 96.937% F-1 score, and 96.882% recall. Based on the accuracy value, shallow learning model is superior to deep learning model.Â This machine learning model is expected to be used to combat the spread of hoaxes and disinformation in Indonesian news. Additionally, with this research, false news can be classified in more detail, both as hoaxes and disinformation

Keywords

BERT-Indo; Deep learning; Shallow learning; Disinformation; Hoax

DOI

https://doi.org/10.26555/ijain.v10i3.878

Article metrics

Abstract views : 1555 | PDF views : 184

Cite

How to cite item

Full Text

Download

References

[1] J. Lee, K. Kim, G. Park, and N. Cha, â€œThe role of online news and social media in preventive action in times of infodemic from a social capital perspective: The case of the COVID-19 pandemic in South Korea,â€ Telemat. Informatics, vol. 64, p. 101691, Nov. 2021, doi: 10.1016/j.tele.2021.101691.

[2] M. D. Molina, S. S. Sundar, T. Le, and D. Lee, â€œâ€˜Fake Newsâ€™ Is Not Simply False Information: A Concept Explication and Taxonomy of Online Content,â€ Am. Behav. Sci., vol. 65, no. 2, pp. 180â€“212, Feb. 2021, doi: 10.1177/0002764219878224.

[3] â€œMaintaining Digital Space, Kominfo Wants Society to be Safe and Free from Negative Content,â€ Kementerian Komunikasi dan Informatika, 2021. [Online]. Available at: https://m.kominfo.go.id/content/detail/39100/siaran-pers-no-485hmkominfo122021-tentang-jaga-ruang-digital-kominfo-ingin-masyarakat-aman-dan-bebas-dari-konten-negatif/0/siaran_pers.

[4] V. L. Rubin, Y. Chen, and N. K. Conroy, â€œDeception detection for news: Three types of fakes,â€ Proc. Assoc. Inf. Sci. Technol., vol. 52, no. 1, pp. 1â€“4, Jan. 2015, doi: 10.1002/pra2.2015.145052010083.

[5] D. Rahmawati, D. Mulyana, G. Lumakto, M. Viendyasari, and W. Anindhita, â€œMapping Disinformation During the Covid-19 in Indonesia: Qualitative Content Analysis,â€ J. ASPIKOM, vol. 6, no. 2, p. 222, Jul. 2021, doi: 10.24329/aspikom.v6i2.907.

[6] S. Zannettou, M. Sirivianos, J. Blackburn, and N. Kourtellis, â€œThe Web of False Information: Rumors, Fake News, Hoaxes, Clickbait, and Various Other Shenanigans,â€ J. Data Inf. Qual., vol. 11, no. 3, pp. 1â€“37, Sep. 2019, doi: 10.1145/3309699.

[7] H. A. Santoso, E. H. Rachmawanto, A. Nugraha, A. A. Nugroho, D. Rosal Ignatius Moses Setiadi, and R. S. Basuki, â€œHoax classification and sentiment analysis of Indonesian news using Naive Bayes optimization,â€ TELKOMNIKA (Telecommunication Comput. Electron. Control., vol. 18, no. 2, p. 799, Apr. 2020, doi: 10.12928/telkomnika.v18i2.14744.

[8] P. Meel and D. K. Vishwakarma, â€œMachine Learned Classifiers for Trustworthiness Assessment of Web Information Contents,â€ in 2021 International Conference on Computing, Communication, and Intelligent Systems (ICCCIS), Feb. 2021, pp. 29â€“35, doi: 10.1109/ICCCIS51004.2021.9397228.

[9] N. Sharma, R. Sharma, and N. Jindal, â€œMachine Learning and Deep Learning Applications-A Vision,â€ Glob. Transitions Proc., vol. 2, no. 1, pp. 24â€“28, Jun. 2021, doi: 10.1016/j.gltp.2021.01.004.

[10] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, â€œBERT: Pre-training of Deep Bidirectional Transformers for Language Understanding,â€ in Proceedings of the 2019 Conference of the North, 2019, pp. 4171â€“4186, doi: 10.18653/v1/N19-1423.

[11] L. H. Suadaa, I. Santoso, and A. T. B. Panjaitan, â€œTransfer Learning of Pre-trained Transformers for Covid-19 Hoax Detection in Indonesian Language,â€ IJCCS (Indonesian J. Comput. Cybern. Syst., vol. 15, no. 3, p. 317, Jul. 2021, doi: 10.22146/ijccs.66205.

[12] K. S. Nugroho, A. Y. Sukmadewa, H. Wuswilahaken DW, F. A. Bachtiar, and N. Yudistira, â€œBERT Fine-Tuning for Sentiment Analysis on Indonesian Mobile Apps Reviews,â€ in 6th International Conference on Sustainable Information Engineering and Technology 2021, Sep. 2021, pp. 258â€“264, doi: 10.1145/3479645.3479679.

[13] O. M. Bafadhal and A. D. Santoso, â€œMapping Covid-19 News Hoax Messages In Indonesia Across Categories, Sources, And Types Of Disinformation,â€ Bricol. J. Magister Ilmu Komun., vol. 6, no. 02, pp. 235â€“249, Oct. 2020, doi: 10.30813/BRICOLAGE.V6I02.2148.

[14] M. Ula, â€œAnalysis and Detection of Hoax Content in Indonesian News Media Using Machine Learning,â€ J. Teknol. Terap. Sains 4.0, vol. 1, no. 2, p. 229, Dec. 2020, doi: 10.29103/tts.v1i2.3263.

[15] J. Fawaid, A. Awalina, R. Y. Krisnabayu, and N. Yudistira, â€œIndonesiaâ€™s Fake News Detection using Transformer Network,â€ in 6th International Conference on Sustainable Information Engineering and Technology 2021, Sep. 2021, pp. 247â€“251, doi: 10.1145/3479645.3479666.

[16] S. Chauhan, L. Vig, M. De Filippo De Grazia, M. Corbetta, S. Ahmad, and M. Zorzi, â€œA Comparison of Shallow and Deep Learning Methods for Predicting Cognitive Performance of Stroke Patients From MRI Lesion Images,â€ Front. Neuroinform., vol. 13, p. 451990, Jul. 2019, doi: 10.3389/fninf.2019.00053.

[17] N. Dey, A. S. Ashour, and G. N. Nguyen, â€œDeep Learning for Multimedia Content Analysis,â€ in Mining Multimedia Documents, Boca Raton : CRC Press, [2017]: Chapman and Hall/CRC, 2017, pp. 193â€“204, doi: 10.1201/b21638-14.

[18] P. Changpetch and M. Reid, â€œData mining techniques: Which one is your favorite?,â€ J. Educ. Bus., vol. 96, no. 3, pp. 143â€“148, Apr. 2021, doi: 10.1080/08832323.2020.1781753.

[19] M. A. Jassim and S. N. Abdulwahid, â€œData Mining preparation: Process, Techniques and Major Issues in Data Analysis,â€ IOP Conf. Ser. Mater. Sci. Eng., vol. 1090, no. 1, p. 012053, Mar. 2021, doi: 10.1088/1757-899X/1090/1/012053.

[20] C. Pahl and D. Donnellan, â€œData Mining Technology for the Evaluation of Web-Based Teaching and Learning Systems,â€ in Conference on E-Learning in Business, Oct. 2002, pp. 1â€“6. [Online]. Available at: https://core.ac.uk/reader/11310430.

[21] K. K. Ibrahim and A. J. Obaid, â€œWeb Mining Techniques and Technologies: A Landscape View,â€ J. Phys. Conf. Ser., vol. 1879, no. 3, p. 032125, May 2021, doi: 10.1088/1742-6596/1879/3/032125.

[22] M. IÅžIK and H. DAÄž, â€œThe impact of text preprocessing on the prediction of review ratings,â€ TURKISH J. Electr. Eng. Comput. Sci., vol. 28, no. 3, pp. 1405â€“1421, May 2020, doi: 10.3906/elk-1907-46.

[23] R. Singh and S. Pal, â€œApplication of Machine Learning Algorithms to Predict Students Performance,â€ Int. J. Adv. Res. Comput. Sci., vol. 29, no. 5, pp. 7249â€“7261, 2020, [Online]. Available at: http://sersc.org/journals/index.php/IJAST/article/view/18220.

[24] J. Ye, â€œCosine similarity measures for intuitionistic fuzzy sets and their applications,â€ Math. Comput. Model., vol. 53, no. 1â€“2, pp. 91â€“97, Jan. 2011, doi: 10.1016/j.mcm.2010.07.022.

[25] W. Darmalaksana, C. Slamet, W. B. Zulfikar, I. F. Fadillah, D. S. adillah Maylawati, and H. Ali, â€œLatent semantic analysis and cosine similarity for hadith search engine,â€ TELKOMNIKA (Telecommunication Comput. Electron. Control., vol. 18, no. 1, pp. 217â€“227, Feb. 2020, doi: 10.12928/TELKOMNIKA.V18I1.14874.

[26] A. Sharma, â€œA Comprehensive Guide to Google Colab: Features, Usage, and Best Practices,â€ Analytics Vidya, 2020. [Online]. Available at: https://www.analyticsvidhya.com/blog/2020/03/google-colab-machine-learning-deep-learning/.

[27] D. Paper, â€œBuild Your First Neural Network with Google Colab,â€ in TensorFlow 2.x in the Colaboratory Cloud, Berkeley, CA: Apress, 2021, pp. 25â€“45, doi: 10.1007/978-1-4842-6649-6_2.

[28] E. Dyah, â€œGetting to Know On Train Cleaning Officers, the â€˜Heroesâ€™ of Train Cleanliness,â€ Detik News, 2021. [Online]. Available at: https://news.detik.com/berita/d-5804822/mengenal-petugas-on-train-cleaning-pahlawan-kebersihan-kereta-api.

[29] Jacx, â€œHoax! A Tornado in Bali Destroyed a Temple on July 14,â€ ANTARA, 2023. [Online]. Available at: https://www.antaranews.com/berita/3641652/hoaks-puting-beliung-di-bali-hancurkan-rumah-pura-pada-14-juli.

[30] KOMINFO, â€œInfographics Points from the Job Creation Bill Highlighted by Workers,â€ 2020. [Online]. Available at: https://www.kominfo.go.id/content/detail/29954/disinformasi-infografis-poin-poin-ruu-cipta-kerja-yang-disorot-buruh/0/laporan_isu_hoaks.

[31] F. Koto, A. Rahimi, J. H. Lau, and T. Baldwin, â€œIndoLEM and IndoBERT: A Benchmark Dataset and Pre-trained Language Model for Indonesian NLP,â€ in Proceedings of the 28th International Conference on Computational Linguistics, 2020, pp. 757â€“770, doi: 10.18653/v1/2020.coling-main.66.

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

___________________________________________________________
International Journal of Advances in Intelligent Informatics
ISSN 2442-6571 (print) | 2548-3161 (online)
Organized by UAD and ASCEE Computer Society
Published by Universitas Ahmad Dahlan
W: http://ijain.org
E: info@ijain.org (paper handling issues)
andri.pranolo.id@ieee.org (publication issues)

View IJAIN Stats

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0

Username
Password
Remember me