Semi-supervised learning for sentiment classification with ensemble multi-classifier approach

Agus Sasmito Aribowo; Halizah Basiron; Noor Fazilla Abd Yusof

doi:10.26555/ijain.v8i3.929


Semi-supervised learning for sentiment classification with ensemble multi-classifier approach

^{(1) *} Agus Sasmito Aribowo

(Universitas Pembangunan Nasional "Veteran" Yogyakarta Indonesia & FTMK UTeM Melaka Malaysia, Indonesia)
⁽²⁾ Halizah Basiron

(Fakulti Teknologi Maklumat dan Komunikasi, Universiti Teknikal Malaysia Melaka, Malaysia)
⁽³⁾ Noor Fazilla Abd Yusof

(Fakulti Teknologi Maklumat dan Komunikasi, Universiti Teknikal Malaysia Melaka, Malaysia)
^*corresponding author

Abstract

Supervised sentiment analysis ideally uses a fully labeled data set for modeling. However, this ideal condition requires a struggle in the label annotation process. Semi-supervised learning (SSL) has emerged as a promising method to avoid time-consuming and expensive data labeling without reducing model performance. However, the research on SSL is still limited and its performance needs to be improved. Thus, this study aims to create a new SSL-Model for sentiment analysis. The Ensemble Classifier SSL model for sentiment classification is introduced. The research went through pre-processing, vectorization, and feature extraction using TF-IDF and n-grams. Support Vector Machine (SVM) or Random Forest for tokenization was used to separate unigram, bigram, and trigram in model generation. Then, the outputs of these models were combined using stacking ensemble approach. Accuracy and F1-score were used for the evaluation. IMDB datasets and US Airlines were used to test the new SSL models. The conclusion is that the sentiment annotation accuracy is highly dependent on the suitability of the dataset with the machine learning algorithm. In IMDB dataset, which consists of two classes, it is better to use SVM. In the US Airlines consisting of three classes, SVM is better at improving the model performance against the baseline, but RF is better at achieving the baseline performance even though it fails to maintain the model performance.

Keywords

Ensemble Multi-classifier; Semi-supervised; Sentiment Analysis; SVM; Random Forest

DOI

https://doi.org/10.26555/ijain.v8i3.929

Article metrics

Abstract views : 1317 | PDF views : 244

Cite

How to cite item

Full Text

Download

References

[1] P. P. Patil, S. Phansalkar, and V. V. Kryssanov, Topic modelling for aspect-level sentiment analysis, vol. 828. Springer Singapore, 2019. doi: 10.1007/978-981-13-1610-4_23.

[2] V. L. Shan Lee, K. H. Gan, T. P. Tan, and R. Abdullah, â€œSemi-supervised learning for sentiment classification using small number of labeled data,â€ Procedia Comput. Sci., vol. 161, pp. 577â€“584, 2019, doi: 10.1016/j.procs.2019.11.159.

[3] Z. Zhang, J. Han, J. Deng, X. Xu, F. Ringeval, and B. Schuller, â€œLeveraging Unlabeled Data for Emotion Recognition with Enhanced Collaborative Semi-Supervised Learning,â€ IEEE Access, vol. 6, pp. 22196â€“22209, 2018, doi: 10.1109/ACCESS.2018.2821192.

[4] A. Alessa and M. F. B, Analysis Features and TF-IDF Weighting, vol. 2, no. Cdc. Springer International Publishing, 2018. doi: 10.1007/978-3-319-96136-1_15.

[5] V. Kumar and B. Subba, â€œA Tfidfvectorizer and SVM based sentiment analysis framework for text data corpus,â€ 26th Natl. Conf. Commun. NCC 2020, pp. 1â€“6, 2020, doi: 10.1109/NCC48643.2020.9056085.

[6] A. U. Rehman, A. K. Malik, B. Raza, and W. Ali, â€œA Hybrid CNN-LSTM Model for Improving Accuracy of Movie Reviews Sentiment Analysis,â€ Multimed. Tools Appl., vol. 78, no. 18, pp. 26597â€“26613, Sep. 2019, doi: 10.1007/S11042-019-07788-7.

[7] F. Ã–ZYURT and M. HUSSEÄ°N, â€œA New Technique for Sentiment Analysis System Based on Deep Learning Using Chi-Square Feature Selection Methods,â€ Balk. J. Electr. Comput. Eng., vol. 9, no. 4, pp. 320â€“326, 2021, Accessed: Jan. 04, 2023. [Online]. Available: https://dergipark.org.tr/en/pub/bajece/issue/65264/887339

[8] A. Al-Laith, M. Shahbaz, H. F. Alaskar, and A. Rehmat, â€œAraSenCorpus: A Semi-Supervised Approach for Sentiment Annotation of a Large Arabic Text Corpus,â€ Appl. Sci., vol. 11, no. 5, pp. 1â€“19, 2021, doi: 10.3390/app11052434.

[9] A. Alqarafi, A. Adeel, A. Hawalah, K. Swingler, and A. Hussain, A Semi-supervised Corpus Annotation for Saudi Sentiment Analysis Using Twitter, vol. 10989 LNAI. Springer International Publishing, 2018. doi: 10.1007/978-3-030-00563-4_57.

[10] O. Al-Harbi, â€œClassifying sentiment of dialectal arabic reviews: A semi-supervised approach,â€ Int. Arab J. Inf. Technol., vol. 16, no. 6, pp. 995â€“1002, 2019. Available at : https://iajit.org/portal/index.php/archive/volume-16-2019/november-2019-no-6/item/136-classifying-sentiment-of-dialectal-arabic-reviews-a-semi-supervised-approach

[11] V. D. H. de Carvalho, T. C. C. Nepomuceno, and A. P. C. S. Costa, An Automated Corpus Annotation Experiment in Brazilian Portuguese for Sentiment Analysis in Public Security, vol. 384 LNBIP. Springer International Publishing, 2020. doi: 10.1007/978-3-030-46224-6_8.

[12] V. Balakrishnan, P. Y. Lok, and H. Abdul Rahim, â€œA semi-supervised approach in detecting sentiment and emotion based on digital payment reviews,â€ J. Supercomput., vol. 77, no. 4, pp. 3795â€“3810, 2021, doi: 10.1007/s11227-020-03412-w.

[13] Y. Han, Y. Liu, and Z. Jin, â€œSentiment analysis via semi-supervised learning: a model based on dynamic threshold and multi-classifiers,â€ Neural Comput. Appl., vol. 32, no. 9, pp. 5117â€“5129, 2020, doi: 10.1007/s00521-018-3958-3.

[14] A. Naresh and P. Venkata Krishna, â€œAn efficient approach for sentiment analysis using machine learning algorithm,â€ Evol. Intell., vol. 14, no. 2, pp. 725â€“731, 2021, doi: 10.1007/s12065-020-00429-1.

[15] S. Tiun, U. A. Mokhtar, S. H. Bakar, and S. Saad, â€œClassification of functional and non-functional requirement in software requirement using Word2vec and fast Text,â€ J. Phys. Conf. Ser., vol. 1529, no. 4, 2020, doi: 10.1088/1742-6596/1529/4/042077.

[16] A. Rane and A. Kumar, â€œSentiment Classification System of Twitter Data for US Airline Service Analysis,â€ in Proceedings - International Conference on Computer Software and Applications, 2018, pp. 769â€“773. doi: 10.1109/COMPSAC.2018.00114.

[17] R. S. Kumar, A. F. Saviour Devaraj, M. Rajeswari, E. G. Julie, Y. H. Robinson, and V. Shanmuganathan, â€œExploration of sentiment analysis and legitimate artistry for opinion mining,â€ Multimed. Tools Appl., 2021, doi: 10.1007/s11042-020-10480-w.

[18] Y. Pan, Z. Chen, Y. Suzuki, F. Fukumoto, and H. Nishizaki, â€œSentiment analysis using semi-supervised learning with few labeled data,â€ Proc. - 2020 Int. Conf. Cyberworlds, CW 2020, pp. 231â€“234, 2020, doi: 10.1109/CW49994.2020.00044.

[19] M. A. Azim and M. H. Bhuiyan, â€œText to emotion extraction using supervised machine learning techniques,â€ Telkomnika (Telecommunication Comput. Electron. Control., vol. 16, no. 3, pp. 1394â€“1401, 2018, doi: 10.12928/TELKOMNIKA.v16i3.8387.

[20] S. Mitra and M. Jenamani, â€œSentiCon: A Concept Based Feature Set for Sentiment Analysis,â€ in 2018 13th International Conference on Industrial and Information Systems, ICIIS 2018 - Proceedings, 2018, no. 978, pp. 246â€“250. doi: 10.1109/ICIINFS.2018.8721408.

[21] P. H. Prastyo, I. Ardiyanto, and R. Hidayat, â€œIndonesian Sentiment Analysis: An Experimental Study of Four Kernel Functions on SVM Algorithm with TF-IDF,â€ 2020 Int. Conf. Data Anal. Bus. Ind. W. Towar. a Sustain. Econ. ICDABI 2020, 2020, doi: 10.1109/ICDABI51230.2020.9325685.

[22] S. S. M. M. Rahman, K. B. M. B. Biplob, M. H. Rahman, K. Sarker, and T. Islam, An investigation and evaluation of N-gram, TF-IDF and ensemble methods in sentiment classification, vol. 325 LNICST, no. August. Springer International Publishing, 2020. doi: 10.1007/978-3-030-52856-0_31.

[23] R. Hendrawan, Adiwijaya, and S. Al Faraby, â€œMultilabel Classification of Hate Speech and Abusive Words on Indonesian Twitter Social Media,â€ 2020 Int. Conf. Data Sci. Its Appl. ICoDSA 2020, 2020, doi: 10.1109/ICoDSA50139.2020.9212962.

[24] M. Aufar, R. Andreswari, and D. Pramesti, â€œSentiment Analysis on Youtube Social Media Using Decision Tree and Random Forest Algorithm: A Case Study,â€ 2020 Int. Conf. Data Sci. Its Appl. ICoDSA 2020, 2020, doi: 10.1109/ICoDSA50139.2020.9213078.

[25] M. A. Fauzi, â€œRandom forest approach fo sentiment analysis in Indonesian language,â€ Indones. J. Electr. Eng. Comput. Sci., vol. 12, no. 1, pp. 46â€“50, 2018, doi: 10.11591/ijeecs.v12.i1.pp46-50.

[26] A. R. Alaei, S. Becken, and B. Stantic, â€œSentiment Analysis in Tourism: Capitalizing on Big Data,â€ J. Travel Res., vol. 58, no. 2, pp. 175â€“191, 2019, doi: 10.1177/0047287517747753.

[27] M. A. Fauzi, â€œWord2Vec model for sentiment analysis of product reviews in Indonesian language,â€ Int. J. Electr. Comput. Eng., vol. 9, no. 1, pp. 525â€“530, 2019, doi: 10.11591/ijece.v9i1.pp525-530.

[28] Y. Li, Y. Lv, S. Wang, J. Liang, J. Li, and X. Li, â€œCooperative hybrid semi-supervised learning for text sentiment classification,â€ Symmetry (Basel)., vol. 11, no. 2, pp. 1â€“17, 2019, doi: 10.3390/sym11020133.

[29] J. Liang, R. Li, and Q. Jin, â€œSemi-supervised Multi-modal Emotion Recognition with Cross-Modal Distribution Matching,â€ MM 2020 - Proc. 28th ACM Int. Conf. Multimed., pp. 2852â€“2861, 2020, doi: 10.1145/3394171.3413579.

[30] S. Zhang et al., â€œCombining cross-modal knowledge transfer and semi-supervised learning for speech emotion recognition,â€ Knowledge-Based Syst., vol. 229, p. 107340, 2021, doi: 10.1016/j.knosys.2021.107340.

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

___________________________________________________________
International Journal of Advances in Intelligent Informatics
ISSN 2442-6571 (print) | 2548-3161 (online)
Organized by UAD and ASCEE Computer Society
Published by Universitas Ahmad Dahlan
W: http://ijain.org
E: info@ijain.org (paper handling issues)
andri.pranolo.id@ieee.org (publication issues)

View IJAIN Stats

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0

Username
Password
Remember me