A hybrid model for aspect-based sentiment analysis on customer feedback: research on the mobile commerce sector in Vietnam

Feedback and comments on mobile commerce applications are extremely useful and valuable information sources that reflect the quality of products or services to determine whether data is positive or negative and help businesses monitor brand and product sentiment in customers’ feedback and understand customers’ needs. However, the increasing number of comments makes it increasingly difficult to understand customers using manual methods. To solve this problem, this study builds a hybrid research model based on aspect mining and comment classification for aspect-based sentiment analysis (ABSA) to deeply comprehend the customer and their experiences. Based on previous classification results, we first construct a dictionary of positive and negative words in the e-commerce field. Then, the POS tagging technique is applied for word classification in Vietnamese to extract aspects of model commerce related to positive or negative words. The model is implemented with machine and deep learning methods on a corpus comprising more than 1,000,000 customer opinions collected from Vietnam's four largest mobile commerce applications. Experimental results show that the Bi-LSTM method has the highest accuracy with 92.01%; it is selected for the proposed model to analyze the viewpoint of words on real data. The findings are that the proposed hybrid model can be applied to monitor online customer experience in real time, enable administrators to make timely and accurate decisions, and improve the quality of products and services to take a competitive advantage.


Introduction
With today's Internet and e-commerce platforms explosion, online shopping has become easier and more convenient than ever. Mobile commerce applications have been rapidly developing. In Vietnam, four mobile commerce apps (i.e., Shopee, Lazada, Sendo, and Tiki) have the most visits on Google Play Store, with total monthly traffic of 143 million in Q4, 2020 [1]. In addition, a large amount of data from users represents online responses in the form of daily texts on mobile commerce applications. These reviews are a valuable resource for businesses to understand the users' experiences and opinions about products and services, which is helpful for both users and manufacturers [2]. However, it is becoming more difficult to identify the main patterns with the increasing number of comments every day. Therefore, an automated approach to extracting and summarizing the main patterns of online commentary is essential, with opinion mining through sentiment analysis (SA), specifically aspect-based Feedback and comments on mobile commerce applications are extremely useful and valuable information sources that reflect the quality of products or services to determine whether data is positive or negative and help businesses monitor brand and product sentiment in customers' feedback and understand customers' needs. However, the increasing number of comments makes it increasingly difficult to understand customers using manual methods. To solve this problem, this study builds a hybrid research model based on aspect mining and comment classification for aspect-based sentiment analysis (ABSA) to deeply comprehend the customer and their experiences. Based on previous classification results, we first construct a dictionary of positive and negative words in the e-commerce field. Then, the POS tagging technique is applied for word classification in Vietnamese to extract aspects of model commerce related to positive or negative words. The model is implemented with machine and deep learning methods on a corpus comprising more than 1,000,000 customer opinions collected from Vietnam's four largest mobile commerce applications. Experimental results show that the Bi-LSTM method has the highest accuracy with 92.01%; it is selected for the proposed model to analyze the viewpoint of words on real data. The findings are that the proposed hybrid model can be applied to monitor online customer experience in real time, enable administrators to make timely and accurate decisions, and improve the quality of products and services to take a competitive advantage. sentiment analysis (ABSA), posing a significant challenge. In particular, the Vietnamese language is a complex language that consists of a 29-character alphabet including Latin characters, using additional tones, such as accents (´), hypotenuses (`), question marks (ˀ), tildes (~), and heavy accents (.). Additionally, many borrowed words are derived from Chinese, French, and English [3].
Various studies with machine learning approaches in sentiment analysis have been published. Yanuar Nurdiansyah et al. [4] suggested a system that utilizes the Naïve Bayes Classifier method to classify sentiment in Bahasa Indonesia movie reviews into two categories (positive and negative), with an average classification accuracy of 88.37%. Al Amrani et al. [5] proposed a hybrid approach to identify product reviews offered by Amazon using Random Forest (RF) and Support Vector Machine (SVM). Bolbol & Maghari et al. [6] conducted an experiment with various machine learning classifiers and found that Logistic Regression (LR) achieved the highest accuracy of 93% on the Arabic Tweets dataset. Other comparative studies used machine learning techniques (Naïve Bayes, SVM, Decision Tree, K-Nearest Neighbor, and Artificial Neural Network) to classify customer opinions [7]- [11]. Recently, deep learning approaches produced better performance than traditional machine learning. Peng et al. [12] employed Recurrent Neural Network (RNN), Long Short-Term Memory (LSTM) [13], [14], and Convolutional Neural Networks (CNN) and the results have shown that CNN has reported the accuracy of 88.22%, RNN and LSTM have reported accuracy of 68.64% and 85.32% respectively. Li et al. [15] combined BERT+BiLSTM+CNN to classify sentiment on the Weibo text dataset and achieved the accuracy of 92.4%.
In Vietnamese, Nguyen et al. [16] used the PhoBERT model to classify sentiment-based stock article titles, with an accuracy of 93%. In another research, Nguyen et al. [17] proposed fine-tuning BERT method for sentiment analysis of Vietnamese Reviews, the results show that the BERT-RCNN model achieves the best result with the F1-score is 91.15%. In addition, Truong et al. [18] used the pre-trained model PhoBERT, with other fine-tuning techniques was achieved with an accuracy of 94.28% on the UIT-VSFC dataset.
ABSA is an interesting research topic in the opinion mining field [19]. People tend to talk about many aspects in their comments about a product or service and provide many words with positive or negative connotations. Specifically, each comment will include the following four categories: (1) an aspect of a word that has a positive or negative meaning, (2) an aspect of many words with a positive or negative meaning, (3) many aspects of one word that have positive or negative connotations, and (4) many aspects of many words that have positive or negative connotations [20]. For example, "Lazada has a lot of discount codes, but app still many bugs to fix". Here, users gave positive reviews of the "discount" aspect but negative reviews of the "app" aspect. Dealing with the comments that have many facets in sentences is an extremely challenging task.
One of the early studies in the field of user opinion aspect mining was introduced by Hu & Liu [21], which is based on the frequency of occurrence of nouns and noun phrases to exploit aspects. Subsequently, various approaches have been studied, focusing on the aspect mining problem, with the most widely used approach being the rule-based approach [22], which extracts aspects based on the grammatical relationships of words in a sentence. An aspect extraction system [23] was introduced from products that considered nouns to be aspect terms and extracted them based on POS tagging and term frequency-inverse document frequency (TF-IDF) methods. Classification algorithms, such as Naïve Bayes and SVMs which have also been applied for aspect extraction. Mai & Le [24] proposed a sequencelabeling approach that combines BiRNN and Conditional Random Fields (CRF) to concurrently extract opinion targets and detect their associated sentiments in smartphone-related datasets. Luc Phan et al. [25] give a suggested method for the Vietnamese aspect-based sentiment task is based on the Bi-LSTM architecture, using fastText word embeddings, their experiments demonstrate that this approach achieves the highest F1-scores of 84.48% for the aspect task and 63.06% for the sentiment task in Vietnamese Smartphone Feedback Dataset (UIT-ViSFD).
In this study, a hybrid research model that combines aspect mining and comment classification is developed. We propose an experimental method for the model that consists of five main tasks: first, we provide a Vietnamese customer reviews dataset in the e-commerce industry; Second, we offer dictionaries and techniques to address issues in pre-processing Vietnamese text; Third, we propose the Bi-SLTM model for text sentiment classification; Fourth, we identify aspects of the product and service that users comment on, such as positive or negative comments, using POS tagging based on the result of the proposed model's output; Finally, we build a the dashboard to help managers gain deeper understanding of customers' needs and preferences, thereby improving the level of customer satisfaction and expectations about products and services. Fig. 1 provides an overview of the hybrid research model.
The remainder of this paper is structured as follows: Section 2 presents the methodology used in this study. Section 3 outlines the results and associated discussion. In this section, we detail our experiments and provide corresponding discussions. Finally, in Section 4, we summarize our work and discuss future directions for research.

Method
To build the hybrid model and research method, the secondary data from previous studies which are related to the research objectives of the article on topics of aspects extraction and sentiment analysis in the field of mobile commerce and others are surveyed and studied. Specifically, we surveyed the theoretical basis, analyzed business models, and proposed appropriate research models. And then, experimental research method will be applied to implement and evaluate the proposed model through the following stages: identifying business problems and collecting online comments from customers, cleaning data, integrating, and applying statistical methods, and employing machine and deep learning techniques for natural language processing to uncover hidden knowledge and insights. The implicit knowledge in the data is represented by the nuances in customer comments, specific aspects, and issues related to the nuances of words. Based on the results, recommendations related to business and management will be proposed. The proposed hybrid model for aspect-based sentiment analysis in Fig. 1 includes 5 stages.

Data Collection
Python language with google_play_scraper library is used to collect data from Google Play Store, which includes over 1,2 million customer feedback on four mobile commerce apps (Shopee, Lazada, Sendo, and Tiki) from 2013 to 2022. The dataset is named VN_E-commerce_Review and is published at https://www.kaggle.com/datasets/hienbm/vietnamese-ecommerce-review.

Data Pre-processing
After observing the original data set, there are so much noisy data such as other words not in the Vietnamese alphabet, emoji, emoticons, teen code, missing accents, and abbreviations. Therefore, the study highly focuses on pre-processing. As a result, the accuracy of proposed model has improved efficiently. The results after pre-processing will be shown in Table 1, including: (1) Converting all the words to lowercase; (2) Removing words with hashtag (#), html (< >), url (http://), mention (@); (3) Removing characters number, punctuation, and strange characters that are not in Vietnamese alphabet; (4) Removing elongated characters, duplicate and continuous words or phrases (e.g., "ok ok okkkkkk" -> "ok"); (5) Removing duplicate emojis; (6) Adding Vietnamese accents with dictionary mapping utilizing Vietnamese dictionary words with "key" as the original word and "value" as the word without the accents. (7) Mapping teen code and abbreviation words with a dictionary (e.g. "thjk" -> "thích "); (8) Removing extra white space; (9) Applying word tokenizer with pyvi library [26] (the library was published by Viet -Trung Tran) (eg: "tẩy chay" -> "tẩy chay boycott "). Table 1 shows the examples of the dataset before and after pre-processing.

Modeling
This study not only intends to build a novel technique to improve sentiment classification on comments using the Bi-LSTM model, but it also aims to conduct a sentiment dictionary and extract aspects based on sentiment. The structure of the suggested model is explored in detail in this section. Fig. 2 depicts the general structure of the Bi-LSTM model. After pre-processing, the dataset is divided into 3 sets as input data: a training set with 1,039,240 reviews (~80% of random data), a validation test set with 129,905 reviews (~10% of random data remaining), and a test set with 129,905 reviews (~10% of random data remaining). The label results have scores from 1 to 5, and then we divide them into 3 groups corresponding to scores 1-3 as Negative and scores 4-5 as Positive [26]. Fig. 3 shows the number of text reviews for each class for the training, validation, and test sets. Embedding layer takes the integer-encoded vocabulary and looks up the embedding vector for each word index. The embedding layer using training samples from the VN_E-commerce_Review dataset is trained rather than a pre-trained embedding word model such as word2vec [27] or GloVe [28]. In this study, we used an embedding layer given by Keras with a set of 32,526 unique vocabularies, with each word is embedded in a 30-dimension vector space. Bi-LSTM layer is an extension of the LSTM models that combine Bi-RNN models and LSTM units to capture the context information. In the first round, an LSTM is applied on the input sequence (i.e., forward layer). In the second round (i.e., backward layer) of the LSTM model, the reverse form of the input sequence is fed into the LSTM model [29]. Using the LSTM twice improves the learning of long-term dependencies, which allows it to learn the context more efficiently. The architecture of Bi-LSTM is illustrated in Fig. 4.

Fig. 4. The architecture of Bi-LSTM [30]
Sigmoid function is equivalent to a 2-element Softmax, with the second element being assumed to be zero. The output of the sigmoid function is always within a range between 0 to 1:

Hyper-parameter setting
Hyperparameter tuning is a crucial step in improving model performance. This method involves modeling the relationship between hyperparameters and the model's performance using a probabilistic model. The model is iteratively updated based on the performance of the evaluated hyperparameters, resulting in a more efficient search of the hyperparameter space. It can prevent overfitting or underfitting and selects the best model as the final model. In this study, we tune four different hyperparameters below and optimize the accuracy. After that, the Bayesian optimization function from Keras Tuner [30] for hyperparameter tuning is used. The objective of parameter is set to "val_accuracy" and the "max_trials" parameter is set to 50, which means that the optimization function will attempt to optimize the validation accuracy by trying a maximum of 50 different combinations of hyperparameters. Table 2 describes the best hyperparameter values in the proposed model.

Metrics
Confusion matrix is applied to evaluate the model. It is a matrix with the number of True Positives (TP), True Negatives (TN), False Negatives (FN), and False Positives (FP). The metrics that are most widely used for evaluation are described as follows:   Values between 1e-5 and 1e-1 (inclusive). We use logarithmic sampling to explore the search space that represents the learning rate of a neural network optimizer.

Building sentiment dictionary and extracting aspects by using POS tagging method
POS tagging stands for "Part-of-Speech tagging". It is a process in natural language processing that involves assigning a part of speech (such as noun, verb, adjective, etc.) to each word in a sentence. The process involves analyzing the grammatical structure of the sentence and identifying the role of each word in the sentence. In this study, to apply the technique of labeling from type, we used the pyvi library [31], which is considered one of the libraries with the best results for natural language processing in the Vietnamese language, with a F1 -score of 0.925.

Building sentiment dictionary
Based on using the POS tagging method and the results of model classification, we built a dictionary to classify positive and negative nuances. The dictionary was built by filtering adjective words in the dataset and determining their frequency of occurrence with respect to the rating score to assign them to either the positive or negative group. If a word's frequency of occurrence in one group is more than the other, it will be classified into that respective group. Table 3 shows some examples of words classified as positive or negative.

Aspects Extraction
The aspects by filtering the most frequently occurring noun words in the dataset were extracted that are related to segments such as goods, price, customer care, and customer experience UX/UI, etc. The aspect-based sentiment extraction results are presented in Table 4.

3.1.Experimental setup
Colaboratory Pro which is released by Google for all data pre-processing, training, and testing: 24GB of Reading Access Memory (RAM), GPU Nvidia Tesla P100-PCIe-16GB, and Intel(R) Xeon(R) CPU @ 2.30GHz. To train the model, the binary cross-entropy [32] for the loss function is applied: In the given equation, M represents the total number of classes, log refers to the natural logarithm, , is a binary indicator taking values of either 0 or 1, representing whether class label c is the correct classification for observation , and , is the predicted probability that observation o belongs to class c. For the optimizers, the optimal Adam method [33] with learning rate equal 0.002 is applied. The number of epochs is set to 100 with EarlyStopping to monitor the metric val_accuracy, and patience equals 15. This means stopping training when the monitored metric has stopped improving after 15 epochs in a row. Fig. 5 displays the accuracy and loss training histories.

Experimental results
After training, the test data set is used to assess the performance of the proposed model. Fig. 6 shows the result in confusion matrix.  Table 5 illustrates the accuracy, precision, recall, and F1-score values for the proposed technique in comparison to conventional machine learning, as well as the assessment findings for both before and after pre-processing. After obtaining the results, the Bi-LSTM method is the most suitable for the dataset used in this study, with an accuracy of 92.01% and F-score of 94.61% after pre-processing step, which are higher than the remaining methods and before pre-processing.

Bussiness application
After implementing the labeling model from categories using the POS tagging method, Fig. 7    Dive into more detail, Fig. 8 presents the top 10 words with positive nuances in the period (01/2018 -08/2022). "Good" was the word most appreciated by users on all four applications, with Lazada having the most comments from 2018-2021. Followed by words such as "multiple", "excellent", "fast", "cheap", "okay", "great", "true", "better", and "happy". The template is designed so that author affiliations are not repeated each time for multiple authors of the same affiliation. Please keep your affiliations as succinct as possible (for example, do not differentiate among departments of the same organization). This template was designed for two affiliations. Fig. 9 shows the top 10 negative words in the period (01/2018 -08/2022). In 2022, for the Shopee app, the negative words to care about were "bad", "long", "different" and "slow" with more than 1000 occurrences more than the rest of the top 10 words. For the Lazada, words with an occurrence number above 1000 were "bad". The negative words that Tiki needs to pay attention to include "bad", "long". In terms of words with a nuance-related aspect, Fig. 10 shows the top 10 words in the period (01/2018 -08/2022) for the four applications including "app", "order", "cost", "customer", "error", "voucher", "advertising", "phone", "time", and "pr*ck". Based on the 2022 data, the positive and negative related aspects that all four applications need to care about are as follows "app", which includes feedback regarding the use of the application, ease of use, access speed, and the response on applications. This aspect group had the highest occurrence frequency in the top 10 aspect-related words.

Discussion
The experimental results have clarified the research model in Fig. 1 for the analysis of positive and negative aspects and nuances of customers toward mobile applications of commercial shops through feedback, comments, and reviews. In particular, when analyzing aspects related to negative nuances, the results show that the majority of opinions focus on six main issues related to the shops' products and services: (1) product quality is not as advertised; (2) the process of returning products is difficult and time consuming such that customers do not know when they will be refunded; (3) the mobile application is difficult to use; (4) when there is a response, the customer calls the switchboard and waits for a long time; (5) service attitude errors of staff, as there are promises that are not fulfilled; (6) the transportation of goods causes damage and long waiting times.
Based on customer feedback, the study also analyzed and found six solutions that can be recommended for shops to improve product and service quality based on the positive aspects and nuances deduced from the customers' opinions: (1) reminding about and taking proactive measures against stalls in stores; (2) updating customers on a regular basis about the product return process, what stage the return is in, and when to refund and through which channels; (3) providing specific instructions for customers via video, images, or directly on the mobile application when the customer makes a return or complaint; (4) preparing staffing plans during peak hours or promotional campaign days to always have sufficient numbers of customer support staff. In the future, stores should develop a system of call centers that automatically receive and answer calls or build AI chatbots to support customers in a timely manner; (5) provide continuous reviews of quality monitoring, listening to call recordings until the end, and viewing replies to emails and messages. Moreover, shops should identify common mistakes and provide training sessions for employees to follow the correct process, learn how to control emotions, and help them better understand customer wishes; (6) continuous checks of the shipping process, stowage, and the shipper's performance through customer reviews. In addition, there should be detailed instructions on how to pack items so that stores selling on mobile applications can properly pack products.

Conclusion
Based on the research and analysis results from our dataset of customer opinions about e-commerce products and services from the four largest mobile commerce applications in Vietnam (i.e., Lazada, Shopee, Sendo, and Tiki) on the Google Play Store from 01/2013 to 08/2022, the experimental results demonstrate that the proposed Bi-LSTM model achieved the best model among all existing models, with a high F1 score of 94.61%. The result of the Bi-LSTM model can be attributed to its ability to learn long-term dependencies and understand the context of the text, which is crucial in identifying sentiment. Additionally, the study obtained research results of scientific and practical significance with the following contributions: (1) providing a dataset of customer reviews specific to the Vietnamese ecommerce industry; (2) developing dictionaries and techniques to address pre-processing issues with Vietnamese text; (3) proposing a hybrid model for classifying sentiment and extracting aspects of products and services from reviews. These results have significant implications for both researchers and businesses. For instance, the model can be employed to build dashboards to monitor online customer experiences in real-time, assist administrators in making timely decisions, enhance the quality of products and services, and improve customer satisfaction.
The study has some limitations that could be addressed in future research. The model focuses on considering the positive and negative nuances and did not analyze neutral nuances. To expand the model's capabilities: (1) planning to incorporate more neutral nuances in the Vietnamese language; (2) improving the feature extraction for the network. An integrated word embedding approach may produce better results; (3) using pre-trained models and applying them to large datasets. Declarations Author contribution. All authors contributed equally to the main contributor to this paper. All authors read and approved the final paper. Funding statement. None of the authors have received any funding or grants from any institution or funding body for the research.
Conflict of interest. The authors declare no conflict of interest. Additional information. No additional information is available for this paper.