
(2) * Dimas Febriyan Priambodo

(3) Girinoto Girinoto

(4) Santi Indarjani

(5) Akhmad Rizal

(6) Arga Prayoga

(7) Yehezikha Beatrix

*corresponding author
AbstractAlong with the times, false information easily spreads, including in Indonesia. In Press Release No.485/HM/KOMINFO/12/2021 the Ministry of Communication and Information has cut off access to 565,449 negative content and published 1,773 clarifications on hoax and disinformation content. Research has been carried out regarding this matter, but it is necessary to classify fake news into disinformation and hoaxes. This study presents a comparison between our proposed model, which is an ensemble of shallow learning predictive models, namely Random Forest, Passive Aggressive Classifier, and Cosine Similarity, and the deep learning model that uses BERT-Indo for classification. Both models are trained using equivalent datasets, which contain 8757 news, consisting of 3000 valid news, 3000 hoax news, and 2757 disinformation news. These news were obtained from websites such as CNN, Kompas, Detik, Kominfo, Temanggung Mediacenter, Hoaxdb Aceh, Turnback Hoax, and Antara, which were then cleaned from all unnecessary substances, such as punctuation marks, numbers, Unicode, stopwords, and suffixes using the Sastrawi library. At the benchmarking stage, the shallow learning model is evaluated to increase accuracy by applying ensemble learning combined using hard voting. This results in higher values, with an accuracy of 98.125%, precision of 98.2%, F-1 score of 98.1%, and recall of 98.1%, compared to the BERT-Indo model which only achieved 96.918% accuracy, 96.069% precision, 96.937% F-1 score, and 96.882% recall. Based on the accuracy value, shallow learning model is superior to deep learning model. This machine learning model is expected to be used to combat the spread of hoaxes and disinformation in Indonesian news. Additionally, with this research, false news can be classified in more detail, both as hoaxes and disinformation
KeywordsBERT-Indo; Deep learning; Shallow learning; Disinformation; Hoax
|
DOIhttps://doi.org/10.26555/ijain.v10i3.878 |
Article metricsAbstract views : 1247 | PDF views : 161 |
Cite |
Full Text![]() |
References
[1] J. Lee, K. Kim, G. Park, and N. Cha, “The role of online news and social media in preventive action in times of infodemic from a social capital perspective: The case of the COVID-19 pandemic in South Korea,†Telemat. Informatics, vol. 64, p. 101691, Nov. 2021, doi: 10.1016/j.tele.2021.101691.
[2] M. D. Molina, S. S. Sundar, T. Le, and D. Lee, “‘Fake News’ Is Not Simply False Information: A Concept Explication and Taxonomy of Online Content,†Am. Behav. Sci., vol. 65, no. 2, pp. 180–212, Feb. 2021, doi: 10.1177/0002764219878224.
[3] “Maintaining Digital Space, Kominfo Wants Society to be Safe and Free from Negative Content,†Kementerian Komunikasi dan Informatika, 2021. [Online]. Available at: https://m.kominfo.go.id/content/detail/39100/siaran-pers-no-485hmkominfo122021-tentang-jaga-ruang-digital-kominfo-ingin-masyarakat-aman-dan-bebas-dari-konten-negatif/0/siaran_pers.
[4] V. L. Rubin, Y. Chen, and N. K. Conroy, “Deception detection for news: Three types of fakes,†Proc. Assoc. Inf. Sci. Technol., vol. 52, no. 1, pp. 1–4, Jan. 2015, doi: 10.1002/pra2.2015.145052010083.
[5] D. Rahmawati, D. Mulyana, G. Lumakto, M. Viendyasari, and W. Anindhita, “Mapping Disinformation During the Covid-19 in Indonesia: Qualitative Content Analysis,†J. ASPIKOM, vol. 6, no. 2, p. 222, Jul. 2021, doi: 10.24329/aspikom.v6i2.907.
[6] S. Zannettou, M. Sirivianos, J. Blackburn, and N. Kourtellis, “The Web of False Information: Rumors, Fake News, Hoaxes, Clickbait, and Various Other Shenanigans,†J. Data Inf. Qual., vol. 11, no. 3, pp. 1–37, Sep. 2019, doi: 10.1145/3309699.
[7] H. A. Santoso, E. H. Rachmawanto, A. Nugraha, A. A. Nugroho, D. Rosal Ignatius Moses Setiadi, and R. S. Basuki, “Hoax classification and sentiment analysis of Indonesian news using Naive Bayes optimization,†TELKOMNIKA (Telecommunication Comput. Electron. Control., vol. 18, no. 2, p. 799, Apr. 2020, doi: 10.12928/telkomnika.v18i2.14744.
[8] P. Meel and D. K. Vishwakarma, “Machine Learned Classifiers for Trustworthiness Assessment of Web Information Contents,†in 2021 International Conference on Computing, Communication, and Intelligent Systems (ICCCIS), Feb. 2021, pp. 29–35, doi: 10.1109/ICCCIS51004.2021.9397228.
[9] N. Sharma, R. Sharma, and N. Jindal, “Machine Learning and Deep Learning Applications-A Vision,†Glob. Transitions Proc., vol. 2, no. 1, pp. 24–28, Jun. 2021, doi: 10.1016/j.gltp.2021.01.004.
[10] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding,†in Proceedings of the 2019 Conference of the North, 2019, pp. 4171–4186, doi: 10.18653/v1/N19-1423.
[11] L. H. Suadaa, I. Santoso, and A. T. B. Panjaitan, “Transfer Learning of Pre-trained Transformers for Covid-19 Hoax Detection in Indonesian Language,†IJCCS (Indonesian J. Comput. Cybern. Syst., vol. 15, no. 3, p. 317, Jul. 2021, doi: 10.22146/ijccs.66205.
[12] K. S. Nugroho, A. Y. Sukmadewa, H. Wuswilahaken DW, F. A. Bachtiar, and N. Yudistira, “BERT Fine-Tuning for Sentiment Analysis on Indonesian Mobile Apps Reviews,†in 6th International Conference on Sustainable Information Engineering and Technology 2021, Sep. 2021, pp. 258–264, doi: 10.1145/3479645.3479679.
[13] O. M. Bafadhal and A. D. Santoso, “Mapping Covid-19 News Hoax Messages In Indonesia Across Categories, Sources, And Types Of Disinformation,†Bricol. J. Magister Ilmu Komun., vol. 6, no. 02, pp. 235–249, Oct. 2020, doi: 10.30813/BRICOLAGE.V6I02.2148.
[14] M. Ula, “Analysis and Detection of Hoax Content in Indonesian News Media Using Machine Learning,†J. Teknol. Terap. Sains 4.0, vol. 1, no. 2, p. 229, Dec. 2020, doi: 10.29103/tts.v1i2.3263.
[15] J. Fawaid, A. Awalina, R. Y. Krisnabayu, and N. Yudistira, “Indonesia’s Fake News Detection using Transformer Network,†in 6th International Conference on Sustainable Information Engineering and Technology 2021, Sep. 2021, pp. 247–251, doi: 10.1145/3479645.3479666.
[16] S. Chauhan, L. Vig, M. De Filippo De Grazia, M. Corbetta, S. Ahmad, and M. Zorzi, “A Comparison of Shallow and Deep Learning Methods for Predicting Cognitive Performance of Stroke Patients From MRI Lesion Images,†Front. Neuroinform., vol. 13, p. 451990, Jul. 2019, doi: 10.3389/fninf.2019.00053.
[17] N. Dey, A. S. Ashour, and G. N. Nguyen, “Deep Learning for Multimedia Content Analysis,†in Mining Multimedia Documents, Boca Raton : CRC Press, [2017]: Chapman and Hall/CRC, 2017, pp. 193–204, doi: 10.1201/b21638-14.
[18] P. Changpetch and M. Reid, “Data mining techniques: Which one is your favorite?,†J. Educ. Bus., vol. 96, no. 3, pp. 143–148, Apr. 2021, doi: 10.1080/08832323.2020.1781753.
[19] M. A. Jassim and S. N. Abdulwahid, “Data Mining preparation: Process, Techniques and Major Issues in Data Analysis,†IOP Conf. Ser. Mater. Sci. Eng., vol. 1090, no. 1, p. 012053, Mar. 2021, doi: 10.1088/1757-899X/1090/1/012053.
[20] C. Pahl and D. Donnellan, “Data Mining Technology for the Evaluation of Web-Based Teaching and Learning Systems,†in Conference on E-Learning in Business, Oct. 2002, pp. 1–6. [Online]. Available at: https://core.ac.uk/reader/11310430.
[21] K. K. Ibrahim and A. J. Obaid, “Web Mining Techniques and Technologies: A Landscape View,†J. Phys. Conf. Ser., vol. 1879, no. 3, p. 032125, May 2021, doi: 10.1088/1742-6596/1879/3/032125.
[22] M. IŞIK and H. DAĞ, “The impact of text preprocessing on the prediction of review ratings,†TURKISH J. Electr. Eng. Comput. Sci., vol. 28, no. 3, pp. 1405–1421, May 2020, doi: 10.3906/elk-1907-46.
[23] R. Singh and S. Pal, “Application of Machine Learning Algorithms to Predict Students Performance,†Int. J. Adv. Res. Comput. Sci., vol. 29, no. 5, pp. 7249–7261, 2020, [Online]. Available at: http://sersc.org/journals/index.php/IJAST/article/view/18220.
[24] J. Ye, “Cosine similarity measures for intuitionistic fuzzy sets and their applications,†Math. Comput. Model., vol. 53, no. 1–2, pp. 91–97, Jan. 2011, doi: 10.1016/j.mcm.2010.07.022.
[25] W. Darmalaksana, C. Slamet, W. B. Zulfikar, I. F. Fadillah, D. S. adillah Maylawati, and H. Ali, “Latent semantic analysis and cosine similarity for hadith search engine,†TELKOMNIKA (Telecommunication Comput. Electron. Control., vol. 18, no. 1, pp. 217–227, Feb. 2020, doi: 10.12928/TELKOMNIKA.V18I1.14874.
[26] A. Sharma, “A Comprehensive Guide to Google Colab: Features, Usage, and Best Practices,†Analytics Vidya, 2020. [Online]. Available at: https://www.analyticsvidhya.com/blog/2020/03/google-colab-machine-learning-deep-learning/.
[27] D. Paper, “Build Your First Neural Network with Google Colab,†in TensorFlow 2.x in the Colaboratory Cloud, Berkeley, CA: Apress, 2021, pp. 25–45, doi: 10.1007/978-1-4842-6649-6_2.
[28] E. Dyah, “Getting to Know On Train Cleaning Officers, the ‘Heroes’ of Train Cleanliness,†Detik News, 2021. [Online]. Available at: https://news.detik.com/berita/d-5804822/mengenal-petugas-on-train-cleaning-pahlawan-kebersihan-kereta-api.
[29] Jacx, “Hoax! A Tornado in Bali Destroyed a Temple on July 14,†ANTARA, 2023. [Online]. Available at: https://www.antaranews.com/berita/3641652/hoaks-puting-beliung-di-bali-hancurkan-rumah-pura-pada-14-juli.
[30] KOMINFO, “Infographics Points from the Job Creation Bill Highlighted by Workers,†2020. [Online]. Available at: https://www.kominfo.go.id/content/detail/29954/disinformasi-infografis-poin-poin-ruu-cipta-kerja-yang-disorot-buruh/0/laporan_isu_hoaks.
[31] F. Koto, A. Rahimi, J. H. Lau, and T. Baldwin, “IndoLEM and IndoBERT: A Benchmark Dataset and Pre-trained Language Model for Indonesian NLP,†in Proceedings of the 28th International Conference on Computational Linguistics, 2020, pp. 757–770, doi: 10.18653/v1/2020.coling-main.66.

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
___________________________________________________________
International Journal of Advances in Intelligent Informatics
ISSN 2442-6571 (print) | 2548-3161 (online)
Organized by UAD and ASCEE Computer Society
Published by Universitas Ahmad Dahlan
W: http://ijain.org
E: info@ijain.org (paper handling issues)
andri.pranolo.id@ieee.org (publication issues)
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0