Temperament detection based on Twitter data: classical machine learning versus deep learning

(1) Annisa Ulizulfa Mail (Department of Informatics, Universitas Diponegoro, Indonesia)
(2) * Retno Kusumaningrum Mail (Department of Informatics, Universitas Diponegoro, Indonesia)
(3) Khadijah Khadijah Mail (Department of Informatics, Universitas Diponegoro, Indonesia)
(4) Rismiyati Rismiyati Mail (Department of Informatics, Universitas Diponegoro, Indonesia)
*corresponding author

Abstract


Deep learning has shown promising results in various text-based classification tasks. However, deep learning performance is affected by the number of data, i.e., when the number of data is small, deep learning algorithms do not perform well, and vice versa. Classical machine learning algorithms commonly work well for a few data, and their performance reaches an optimal value and does not increase with the increase in sample data. Therefore, this study aimed to compare the performance of classical machine learning and deep learning methods to detect temperament based on Indonesian Twitter. In this study, the proposed Indonesian Linguistic Inquiry and Word Count were employed to analyze the context of Twitter. The classical machine learning methods implemented were support vector machine and K-nearest neighbor, whereas the deep learning method employed was a convolutional neural network (CNN) with three different architectures. Both learning methods were implemented using multiclass classification and one versus all (OVA) multiclass classification. The highest average f-measure was 58.73%, obtained by CNN OVA with a pool size of 3, a dropout value of 0.7, and a learning rate value of 0.0007.

Keywords


Temperament detection; twitter user; Support vector machine; K-nearest neighbour; Convolutional neural network

   

DOI

https://doi.org/10.26555/ijain.v8i1.692
      

Article metrics

Abstract views : 1045 | PDF views : 317

   

Cite

   

Full Text

Download

References


[1] D. Keirsey, Please understand me II: Temperament, character, intelligence. Prometheus Nemesis Book Company, 1998. Available at: Google Scholar.

[2] N. Majumder, S. Poria, A. Gelbukh, and E. Cambria, “Deep Learning-Based Document Modeling for Personality Detection from Text,” IEEE Intell. Syst., vol. 32, no. 2, pp. 74–79, Mar. 2017, doi: 10.1109/MIS.2017.23.

[3] D. Xue et al., “Personality Recognition on Social Media With Label Distribution Learning,” IEEE Access, vol. 5, pp. 13478–13488, 2017, doi: 10.1109/ACCESS.2017.2719018.

[4] S. C. Guntuku, D. B. Yaden, M. L. Kern, L. H. Ungar, and J. C. Eichstaedt, “Detecting depression and mental illness on social media: an integrative review,” Curr. Opin. Behav. Sci., vol. 18, pp. 43–49, Dec. 2017, doi: 10.1016/j.cobeha.2017.07.005.

[5] Y. Win, “Classification using Support Vector Machine to Detect Cyberbullying in Social Media for Myanmar Language,” in 2019 IEEE International Conference on Consumer Electronics - Asia (ICCE-Asia), 2019, pp. 122–125, doi: 10.1109/ICCE-Asia46551.2019.8942212.

[6] A. S. M. Alharbi and E. de Doncker, “Twitter sentiment analysis with a deep neural network: An enhanced approach using user behavioral information,” Cogn. Syst. Res., vol. 54, pp. 50–61, May 2019, doi: 10.1016/j.cogsys.2018.10.001.

[7] M. L. Kern, P. X. McCarthy, D. Chakrabarty, and M.-A. Rizoiu, “Social media-predicted personality traits and values can help match people to their ideal jobs,” Proc. Natl. Acad. Sci., vol. 116, no. 52, pp. 26459–26464, Dec. 2019, doi: 10.1073/pnas.1917942116.

[8] B. Y. Pratama and R. Sarno, “Personality classification based on Twitter text using Naive Bayes, KNN and SVM,” in 2015 International Conference on Data and Software Engineering (ICoDSE), 2015, pp. 170–174, doi: 10.1109/ICODSE.2015.7436992.

[9] I. A. Harsehanto and M. D. R. Wahyudi, “Analysis of Personality Characteristic Using the Naïve Bayess Classifier Algorithm (Case Study Official Twitter of Basuki Tjahaja Purnama’s and Anies Baswedan),” IJID (International J. Informatics Dev., vol. 7, no. 2, p. 14, Jan. 2019, doi: 10.14421/ijid.2018.07203.

[10] A. C. E. S. Lima and L. N. de Castro, “Predicting Temperament from Twitter Data,” in 2016 5th IIAI International Congress on Advanced Applied Informatics (IIAI-AAI), 2016, pp. 599–604, doi: 10.1109/IIAI-AAI.2016.239.

[11] S. C. Guntuku, W. Lin, J. Carpenter, W. K. Ng, L. H. Ungar, and D. Preoţiuc-Pietro, “Studying Personality through the Content of Posted and Liked Images on Twitter,” in Proceedings of the 2017 ACM on Web Science Conference, 2017, pp. 223–227, doi: 10.1145/3091478.3091522.

[12] A. K. John, A. Adewale M., and E. Chinnasa, “Temperament and Mood Detection Using Case-Based Reasoning,” Int. J. Intell. Syst. Appl., vol. 6, no. 3, pp. 50–61, Feb. 2014, doi: 10.5815/ijisa.2014.03.05.

[13] C. F. Claro, A. C. E. S. Lima, and L. N. de Castro, “Predicting Temperament using Keirsey’s Model for Portuguese Twitter Data,” in Proceedings of the 10th International Conference on Agents and Artificial Intelligence, 2018, pp. 250–256, doi: 10.5220/0006700102500256.

[14] A. C. E. S. Lima and L. N. de Castro, “TECLA: A temperament and psychological type prediction framework from Twitter data,” PLoS One, vol. 14, no. 3, p. e0212844, Mar. 2019, doi: 10.1371/journal.pone.0212844.

[15] D. Suhartono et al., “Personality Prediction Based on Twitter Information in Bahasa Indonesia,” 2017, pp. 367–372, doi: 10.15439/2017F359.

[16] A. Husseini Orabi, P. Buddhitha, M. Husseini Orabi, and D. Inkpen, “Deep Learning for Depression Detection of Twitter Users,” in Proceedings of the Fifth Workshop on Computational Linguistics and Clinical Psychology: From Keyboard to Clinic, 2018, pp. 88–97, doi: 10.18653/v1/W18-0609.

[17] S. Liao, J. Wang, R. Yu, K. Sato, and Z. Cheng, “CNN for situations understanding based on sentiment analysis of twitter data,” Procedia Comput. Sci., vol. 111, pp. 376–381, 2017, doi: 10.1016/j.procs.2017.06.037.

[18] J. W. Pennebaker, R. L. Boyd, K. Jordan, and K. Blackburn, “The development and psychometric properties of LIWC2015,” 2015. Available at: Google Scholar.

[19] P. Pawara, E. Okafor, M. Groefsema, S. He, L. R. B. Schomaker, and M. A. Wiering, “One-vs-One classification for deep neural networks,” Pattern Recognit., vol. 108, p. 107528, Dec. 2020, doi: 10.1016/j.patcog.2020.107528.

[20] Y. R. Tausczik and J. W. Pennebaker, “The Psychological Meaning of Words: LIWC and Computerized Text Analysis Methods,” J. Lang. Soc. Psychol., vol. 29, no. 1, pp. 24–54, Mar. 2010, doi: 10.1177/0261927X09351676.

[21] D. P. Dudău and F. A. Sava, “The development and validation of the Romanian version of Linguistic Inquiry and Word Count 2015 (Ro-LIWC2015),” Curr. Psychol., Jun. 2020, doi: 10.1007/s12144-020-00872-4.

[22] G. Orellana, B. Arias, M. Orellana, V. Saquicela, F. Baculima, and N. Piedra, “A Study on the Impact of Pre-Processing Techniques in Spanish and English Text Classification over Short and Large Text Documents,” in 2018 International Conference on Information Systems and Computer Science (INCISCOS), 2018, pp. 277–283, doi: 10.1109/INCISCOS.2018.00047.

[23] Y. A. Putra and M. L. Khodra, “Deep learning and distributional semantic model for Indonesian tweet categorization,” in 2016 International Conference on Data and Software Engineering (ICoDSE), 2016, pp. 1–6, doi: 10.1109/ICODSE.2016.7936108.

[24] R. M. Cahyaningtyas, R. Kusumaningrum, Sutikno, Suhartono, and D. E. Riyanto, “Emotion detection of tweets in Indonesian language using LDA and expression symbol conversion,” in 2017 1st International Conference on Informatics and Computational Sciences (ICICoS), 2017, pp. 253–258, doi: 10.1109/ICICOS.2017.8276371.

[25] R. B. S. Putra and E. Utami, “Non-formal affixed word stemming in Indonesian language,” in 2018 International Conference on Information and Communications Technology (ICOIACT), 2018, pp. 531–536, doi: 10.1109/ICOIACT.2018.8350735.

[26] X. H. Cao, I. Stojkovic, and Z. Obradovic, “A robust data scaling algorithm to improve classification accuracies in biomedical data,” BMC Bioinformatics, vol. 17, no. 1, p. 359, Dec. 2016, doi: 10.1186/s12859-016-1236-x.

[27] G. Aksu, C. O. Güzeller, and M. T. Eser, “The Effect of the Normalization Method Used in Different Sample Sizes on the Success of Artificial Neural Network Model,” Int. J. Assess. Tools Educ., pp. 170–192, Apr. 2019, doi: 10.21449/ijate.479404.

[28] M. Faisal, E. M. Zamzami, and Sutarman, “Comparative Analysis of Inter-Centroid K-Means Performance using Euclidean Distance, Canberra Distance and Manhattan Distance,” J. Phys. Conf. Ser., vol. 1566, no. 1, p. 012112, Jun. 2020, doi: 10.1088/1742-6596/1566/1/012112.

[29] I. Bin Mohamad and D. Usman, “Standardization and Its Effects on K-Means Clustering Algorithm,” Res. J. Appl. Sci. Eng. Technol., vol. 6, no. 17, pp. 3299–3303, Sep. 2013, doi: 10.19026/rjaset.6.3638.

[30] P. Refaeilzadeh, L. Tang, and H. Liu, “Cross-Validation,” in Encyclopedia of Database Systems, Boston, MA: Springer US, 2009, pp. 532–538. doi: 10.1007/978-0-387-39940-9_565.




Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

___________________________________________________________
International Journal of Advances in Intelligent Informatics
ISSN 2442-6571  (print) | 2548-3161 (online)
Organized by UAD and ASCEE Computer Society
Published by Universitas Ahmad Dahlan
W: http://ijain.org
E: info@ijain.org (paper handling issues)
   andri.pranolo.id@ieee.org (publication issues)

View IJAIN Stats

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0