Sentiment classification from reviews for tourism analytics

(1) Nur Aliah Khairina Mohd Haris Mail (School of Computing Sciences, College of Computing, Informatics and Media, Universiti Teknologi MARA, Malaysia)
(2) * Sofianita Mutalib Mail (School of Computing Sciences, College of Computing, Informatics and Media, Universiti Teknologi MARA, Malaysia)
(3) Ariff Md Ab Malik Mail (Faculty of Business and Management, Universiti Teknologi MARA, Malaysia)
(4) Shuzlina Abdul-Rahman Mail (Faculty of Computer and Mathematical Sciences, Universiti Teknologi MARA, Malaysia)
(5) Siti Nur Kamaliah Kamarudin Mail (Faculty of Computer and Mathematical Sciences, Universiti Teknologi MARA, Malaysia)
*corresponding author

Abstract


User-generated content is critical for tourism destination management as it could help them identify their customers' opinions and come up with solutions to upgrade their tourism organizations as it could help them identify customer opinions. There are many reviews on social media and it is difficult for these organizations to analyse the reviews manually. By applying sentiment classification, reviews can be classified into several classes and help ease decision-making. The reviews contain noisy contents, such as typos and emoticons, which could affect the accuracy of the classifiers. This study evaluates the reviews using Support Vector Machine and Random Forest models to identify a suitable classifier. The main phases in this study are data collection, data preparation, data labelling and modelling phases. The reviews are labelled into three sentiments; positive, neutral, and negative. During pre-processing, steps such as removing the missing value, tokenization, case folding, stop words removal, stemming, and applying n-grams are performed. The result of this research is evaluated by looking at the performance of the models based on accuracy where the result with the highest accuracy is chosen as the solution. In this study, data is data from TripAdvisor and Google reviews using web scraping tools. The findings show that the Support Vector Machine model with 5-fold cross-validation the most suitable classifier with an accuracy of 67.97% compared to Naive Bayes with 61.33% accuracy and Random Forest classifier with 63.55% accuracy. In conclusion, the result of this paper could provide important information in tourism besides determining the suitable algorithm to be used for Sentiment Analysis related to the tourism domain.

   

DOI

https://doi.org/10.26555/ijain.v9i1.1077
      

Article metrics

Abstract views : 672 | PDF views : 388

   

Cite

   

Full Text

Download

References


[1] S. Mutalib, A. H. Razali, S. N. K. Kamarudin, S. A. Halim, and S. Abdul-Rahman, “Prediction of Tourist Visit in Taman Negara Pahang, Malaysia using Regression Models,” Int. J. Adv. Comput. Sci. Appl., vol. 12, no. 12, pp. 746–754, 2021, doi: 10.14569/IJACSA.2021.0121292.

[2] A. Rasool, R. Tao, K. Marjan, and T. Naveed, “Twitter Sentiment Analysis: A Case Study for Apparel Brands,” J. Phys. Conf. Ser., vol. 1176, no. 2, p. 022015, Mar. 2019, doi: 10.1088/1742-6596/1176/2/022015.

[3] “How User Generated Content (UGC) Has Transformed Travel,” Simple View. Accessed Apr. 02, 2020. [Online]. Available at: https://www.simplevieweurope.com/blog/read/2019/04/how-user-generated-content-ugc-has-transformed-travel-b106.

[4] R. S. Putra, R. Nurcahyo, and D. S. Gabriel, “Tourists Perception in Bali Using Social Media and Online Media Sentiment Analysis,” in 2019 IEEE 6th International Conference on Engineering Technologies and Applied Sciences (ICETAS), Dec. 2019, pp. 1–5, doi: 10.1109/ICETAS48360.2019.9117317.

[5] G. Akehurst, “User generated content: the use of blogs for tourism organisations and tourism consumers,” Serv. Bus., vol. 3, no. 1, pp. 51–61, Mar. 2009, doi: 10.1007/s11628-008-0054-2.

[6] N. Antonio, M. B. Correia, and F. P. Ribeiro, “Exploring User-Generated Content for Improving Destination Knowledge: The Case of Two World Heritage Cities,” Sustainability, vol. 12, no. 22, p. 9654, Nov. 2020, doi: 10.3390/su12229654.

[7] N. N. Yusof, A. Mohamed, and S. Abdul-Rahman, “Context Enrichment Model Based Framework for Sentiment Analysis,” in Communications in Computer and Information Science, vol. 1100, Springer, 2019, pp. 325–335, doi: 10.1007/978-981-15-0399-3_26.

[8] N. Sakinah Shaeeali, A. Mohamed, and S. Mutalib, “Customer reviews analytics on food delivery services in social media: a review,” IAES Int. J. Artif. Intell., vol. 9, no. 4, p. 691, Dec. 2020, doi: 10.11591/ijai.v9.i4.pp691-699.

[9] A. Nabiha, S. Mutalib, and A. M. A. Malik, “Sentiment Analysis for Informal Malay Text in Social Commerce,” in 2021 2nd International Conference on Artificial Intelligence and Data Sciences (AiDAS), Sep. 2021, pp. 1–6, doi: 10.1109/AiDAS53897.2021.9574436.

[10] A. Tegar Satria, Mustafid, and D. Mutiara Kusumo Nugraheni, “Implementation of Integrated Bayes Formula and Support Vector Machine for Analysing Airline’s Passengers Review,” in E3S Web of Conferences, Nov. 2020, vol. 202, p. 15004, doi: 10.1051/e3sconf/202020215004.

[11] I. M. Sumertajaya, Y. Angraini, J. R. Harahap, and A. Fitrianto, “Sentiment Analysis on Covid-19 Vaccination in Indonesia Using Support Vector Machine and Random Forest,” JUITA J. Inform., vol. 10, no. 1, p. 1, May 2022, doi: 10.30595/juita.v10i1.12394.

[12] M. Afzaal, M. Usman, and A. Fong, “Predictive aspect-based sentiment classification of online tourist reviews,” J. Inf. Sci., vol. 45, no. 3, pp. 341–363, Jun. 2019, doi: 10.1177/0165551518789872.

[13] S. Gupta and A. Gupta, “Dealing with Noise Problem in Machine Learning Datasets: A Systematic Review,” Procedia Comput. Sci., vol. 161, pp. 466–474, Jan. 2019, doi: 10.1016/j.procs.2019.11.146.

[14] N. Nasser, “Planning for Urban Heritage Places: Reconciling Conservation, Tourism, and Sustainable Development,” J. Plan. Lit., vol. 17, no. 4, pp. 467–479, May 2003, doi: 10.1177/0885412203017004001.

[15] M. F. Cracolici and P. Nijkamp, “The attractiveness and competitiveness of tourist destinations: A study of Southern Italian regions,” Tour. Manag., vol. 30, no. 3, pp. 336–344, Jun. 2009, doi: 10.1016/j.tourman.2008.07.006.

[16] N. Backhaus, “Tourism and nature conservation in Malaysian National Parks,” Culture, Society, Environment - South Asian and South East Asian Studies, 6, 2005. Accessed Apr. 02, 2020. [Online]. Available at : https://www.zora.uzh.ch/id/eprint/2705/.

[17] A. Jambari et al., “Quantifying species richness and composition of elusive rainforest mammals in Taman Negara National Park, Peninsular Malaysia,” Glob. Ecol. Conserv., vol. 18, p. e00607, Apr. 2019, doi: 10.1016/j.gecco.2019.e00607.

[18] A. P. Kirilenko, S. O. Stepchenkova, H. Kim, and X. (Robert) Li, “Automated Sentiment Analysis in Tourism: Comparison of Approaches,” J. Travel Res., vol. 57, no. 8, pp. 1012–1025, Nov. 2018, doi: 10.1177/0047287517729757.

[19] G. Hu, P. Bhargava, S. Fuhrmann, S. Ellinger, and N. Spasojevic, “Analyzing Users’ Sentiment Towards Popular Consumer Industries and Brands on Twitter,” in 2017 IEEE International Conference on Data Mining Workshops (ICDMW), Nov. 2017, vol. 2017-Novem, pp. 381–388, doi: 10.1109/ICDMW.2017.55.

[20] L. D. Utami and S. Masripah, “Comparation of classification algorithm on sentiment analysis of online learning reviews and distance education,” Techno Nusa Mandiri, Sep. 15, 2021. Accessed Dec. 02, 2021. [Online]. Available at: https://ejournal.nusamandiri.ac.id/index.php/techno/article/view/2715.

[21] I. Firmansyah, M. H. Asnawi, S. A. Hasanah, R. Novian, and A. A. Pravitasari, “A Comparison of Support Vector Machine and Naïve Bayes Classifier in Binary Sentiment Reviews for PeduliLindungi Application,” in 2021 International Conference on Artificial Intelligence and Big Data Analytics, Oct. 2021, pp. 140–145, doi: 10.1109/ICAIBDA53487.2021.9689771.

[22] R. Gull, U. Shoaib, S. Rasheed, W. Abid, and B. Zahoor, “Pre Processing of Twitter’s Data for Opinion Mining in Political Context,” in Procedia Computer Science, Jan. 2016, vol. 96, pp. 1560–1570, doi: 10.1016/j.procs.2016.08.203.

[23] Stephenie, B. Warsito, and A. Prahutama, “Sentiment Analysis on Tokopedia Product Online Reviews Using Random Forest Method,” E3S Web of Conferences, vol. 202. EDP Sciences, p. 16006, Nov. 10, 2020, doi: 10.1051/e3sconf/202020216006.

[24] S. Bruno, C. Yang, W. Tian, Z. Xie, and Y. Shao, “Exploring the characteristics of tourism industry by analyzing consumer review contents from social media: a case study of Bamako, Mali,” Geo-spatial Inf. Sci., vol. 22, no. 3, pp. 214–222, Jul. 2019, doi: 10.1080/10095020.2019.1649848.

[25] H. Irawan, G. Akmalia, and R. A. Masrury, “Mining Tourist’s Perception toward Indonesia Tourism Destination Using Sentiment Analysis and Topic Modelling,” in Proceedings of the 2019 4th International Conference on Cloud Computing and Internet of Things, Sep. 2019, pp. 7–12, doi: 10.1145/3361821.3361829.

[26] K. Kim, O. Park, S. Yun, and H. Yun, “What makes tourists feel negatively about tourism destinations? Application of hybrid text mining methodology to smart destination management,” in Technological Forecasting and Social Change, Oct. 2017, vol. 123, pp. 362–369, doi: 10.1016/j.techfore.2017.01.001.

[27] M. Afzaal, M. Usman, and A. Fong, “Tourism Mobile App With Aspect-Based Sentiment Classification Framework for Tourist Reviews,” IEEE Trans. Consum. Electron., vol. 65, no. 2, pp. 233–242, May 2019, doi: 10.1109/TCE.2019.2908944.

[28] D. T. Hermanto, M. Ziaurrahman, M. A. Bianto, and A. Setyanto, “Twitter Social Media Sentiment Analysis in Tourist Destinations Using Algorithms Naive Bayes Classifier,” J. Phys. Conf. Ser., vol. 1140, no. 1, p. 012037, Dec. 2018, doi: 10.1088/1742-6596/1140/1/012037.

[29] A. M. M. Rus, R. Annisa, I. Surjandari, and Zulkarnain, “Measuring Hotel Service Quality in Borobudur Temple Using Opinion Mining,” in 2019 16th International Conference on Service Systems and Service Management (ICSSSM), Jul. 2019, pp. 1–5, doi: 10.1109/ICSSSM.2019.8887650.

[30] W. Chen, Z. Xu, X. Zheng, Q. Yu, and Y. Luo, “Research on Sentiment Classification of Online Travel Review Text,” Appl. Sci., vol. 10, no. 15, p. 5275, Jul. 2020, doi: 10.3390/app10155275.

[31] B. Ray, A. Garain, and R. Sarkar, “An ensemble-based hotel recommender system using sentiment analysis and aspect categorization of hotel reviews,” in Applied Soft Computing, Jan. 2021, vol. 98, p. 106935, doi: 10.1016/j.asoc.2020.106935.

[32] T. Kuhamanee, N. Talmongkol, K. Chaisuriyakul, W. San-Um, N. Pongpisuttinun, and S. Pongyupinpanich, “Sentiment analysis of foreign tourists to Bangkok using data mining through online social network,” in 2017 IEEE 15th International Conference on Industrial Informatics (INDIN), Jul. 2017, pp. 1068–1073, doi: 10.1109/INDIN.2017.8104921.

[33] Y. Setiowati and F. Setyorini, “Service Extraction and Sentiment Analysis to Indicate Hotel Service Quality in Yogyakarta based on User Opinion,” in 2018 International Seminar on Research of Information Technology and Intelligent Systems (ISRITI), Nov. 2018, pp. 427–432, doi: 10.1109/ISRITI.2018.8864269.

[34] A. A. Wadhe and S. S. Suratkar, “Tourist Place Reviews Sentiment Classification Using Machine Learning Techniques,” in 2020 International Conference on Industry 4.0 Technology (I4Tech), Feb. 2020, pp. 1–6, doi: 10.1109/I4Tech48345.2020.9102673.

[35] T. Widiyaningtyas, I. A. Elbaith Zaeni, and R. Al Farisi, “Sentiment Analysis Of Hotel Review Using N-Gram And Naive Bayes Methods,” in 2019 Fourth International Conference on Informatics and Computing (ICIC), Oct. 2019, pp. 1–5, doi: 10.1109/ICIC47613.2019.8985946.

[36] P. H. D. Abd Samad, S. Mutalib, and S. Abdul-Rahman, “Analytics of stock market prices based on machine learning algorithms,” Indones. J. Electr. Eng. Comput. Sci., vol. 16, no. 2, p. 1050, Nov. 2019, doi: 10.11591/ijeecs.v16.i2.pp1050-1058.

[37] R. A. Laksono, K. R. Sungkono, R. Sarno, and C. S. Wahyuni, “Sentiment Analysis of Restaurant Customer Reviews on TripAdvisor using Naïve Bayes,” in 2019 12th International Conference on Information & Communication Technology and System (ICTS), Jul. 2019, pp. 49–54, doi: 10.1109/ICTS.2019.8850982.

[38] “Stemming and Lemmatization in Python,” Data Camp. Accessed Apr. 02, 2021 [Online]. Available at: https://www.datacamp.com/tutorial/stemming-lemmatization-python.

[39] Murni, T. Handhika, A. Fahrurozi, I. Sari, D. P. Lestari, and R. I. M. Zen, “Hybrid Method for Sentiment Analysis Using Homogeneous Ensemble Classifier,” in 2019 2nd International Conference of Computer and Informatics Engineering (IC2IE), Sep. 2019, pp. 232–236, doi: 10.1109/IC2IE47452.2019.8940896.

[40] A. P. Genoud, Y. Gao, G. M. Williams, and B. P. Thomas, “A comparison of supervised machine learning algorithms for mosquito identification from backscattered optical signals,” in Ecological Informatics, Jul. 2020, vol. 58, p. 101090, doi: 10.1016/j.ecoinf.2020.101090.

[41] N. Cristianini and J. Shawe-Taylor, An Introduction to Support Vector Machines and Other Kernel-based Learning Methods. Cambridge University Press, 2000, doi: 10.1017/CBO9780511801389.

[42] M. Sinoplu and E. B. Ceyhan, “Sentiment Analysis of Social Media Posts about Tourist Attractions: Black Sea Region Sample,” Eur. J. Sci. Technol., vol. 36, no. 36, pp. 305–315, May 2022, doi: 10.31590/ejosat.1107640.




Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

___________________________________________________________
International Journal of Advances in Intelligent Informatics
ISSN 2442-6571  (print) | 2548-3161 (online)
Organized by UAD and ASCEE Computer Society
Published by Universitas Ahmad Dahlan
W: http://ijain.org
E: info@ijain.org (paper handling issues)
   andri.pranolo.id@ieee.org (publication issues)

View IJAIN Stats

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0