Automatic Text Summarization Using Latent Drichlet Allocation (LDA) for Document Clustering

(1) * Erwin Yudi Hidayat Mail (Faculty of Computer Science, Universitas Dian Nuswantoro, Indonesia)
(2) Fahri Firdausillah Mail (Faculty of Computer Science, Universitas Dian Nuswantoro, Indonesia)
(3) Khafiizh Hastuti Mail (Faculty of Computer Science, Universitas Dian Nuswantoro, Indonesia)
(4) Ika Novita Dewi Mail (Faculty of Computer Science, Universitas Dian Nuswantoro, Indonesia)
(5) Azhari Azhari Mail (Computer Science and Electronics Department, Universitas Gajah Mada, Indonesia)
*corresponding author

Abstract


In this paper, we present Latent Drichlet Allocation in automatic text summarization to improve accuracy in document clustering. The experiments involving 398 data set from public blog article obtained by using python scrapy crawler and scraper. Several steps of clustering in this research are preprocessing, automatic document compression using feature method, automatic document compression using LDA, word weighting and clustering algorithm The results show that automatic document summarization with LDA reaches 72% in LDA 40%, compared to traditional k-means method which only reaches 66%.

Keywords


LDA, text summarization, clustering, k-means

   

DOI

https://doi.org/10.26555/ijain.v1i3.43
      

Article metrics

Abstract views : 5629 | PDF views : 1064

   

Cite

   

Full Text

Download

References


Changqiu Sun, Xiaolong Wang & Jun Xu, “Study on Feature Selection in Finance Text Categorization,” International Conference on Systems, Man, and Cybernetics Proceedings of the 2009 IEEE

H. Al-mubaid and A.S. Umair, "A new text categorization technique using distributional clustering and learning logic," IEEE Trans. Knowl. Data Eng, vol. 18, 2006, pp. 1156-1165.

Ladda Suanmali, Naomie Salim & M Salem Binwahlan, “Automatic text summarization using feature based fuzzy extraction,” Jurnal teknologi Maklumat jilid 20. Bil 2, 2008.

Luying Liu, Jianchu Kang, Jing Yu & Zhongliang Wang, “A Comparative Study on Unsupervised Feature Selection Methods for Text Clustering,” Natural Language Processing and Knowledge Engineering, 2005. IEEE NLP-KE '05. Proceedings of 2005 IEEE International Conference.

Manika Kar, Sérgio Nunes, Cristina Ribeiro, “Summarization of Changes in Dynamic Text Collections Using Latent Dirichlet Allocation Model,” Journal of Information Processing & Management, vol 51, no. 6, 2015, pp. 809-833

Tao Liu, Shengping Liu, Zheng Chen & Wei-Ying Ma, “An Evaluation on Feature Selection for Text Clustering,” Proceedings of the Twentieth International Conference on Machine Learning (ICML-2003), Washington DC, 2003.

L. Muflikhah & B. Baharudin, "Document Clustering using Concept Space and Cosine Similarity Measurement," International Conference on Computer Technology and Development, Kota Kinabalu: 2009, pp. 58 - 62.

W. Song and S. C. Park, “A Novel Document Clustering Model Based on Latent Semantic Analysis,” pp. 539–542, 2007.

Krysta M. Svore, Lucy V., & Christopher J.C. Burges, “Enhancing Single-document Summarization by Combining RankNet and Third-party Sources,” Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pp. 448–457, Prague, June 2007.

JIANG Xiao-Yu, FAN Xiao-Zhong, Wang Zhi-Fei & Jia Ke-Liang, “Improving the Performance of Text Categorization using Automatic Summarization,” International Conference on Computer Modeling and Simulation IEEE 2009.




Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

___________________________________________________________
International Journal of Advances in Intelligent Informatics
ISSN 2442-6571  (print) | 2548-3161 (online)
Organized by UAD and ASCEE Computer Society
Published by Universitas Ahmad Dahlan
W: http://ijain.org
E: info@ijain.org (paper handling issues)
   andri.pranolo.id@ieee.org (publication issues)

View IJAIN Stats

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0