Automatic Text Summarization Using Latent Drichlet Allocation (LDA) for Document Clustering

Erwin Yudi Hidayat; Fahri Firdausillah; Khafiizh Hastuti; Ika Novita Dewi; Azhari Azhari

doi:10.26555/ijain.v1i3.43


Automatic Text Summarization Using Latent Drichlet Allocation (LDA) for Document Clustering

^{(1) *} Erwin Yudi Hidayat

(Faculty of Computer Science, Universitas Dian Nuswantoro, Indonesia)
⁽²⁾ Fahri Firdausillah

(Faculty of Computer Science, Universitas Dian Nuswantoro, Indonesia)
⁽³⁾ Khafiizh Hastuti

(Faculty of Computer Science, Universitas Dian Nuswantoro, Indonesia)
⁽⁴⁾ Ika Novita Dewi

(Faculty of Computer Science, Universitas Dian Nuswantoro, Indonesia)
⁽⁵⁾ Azhari Azhari

(Computer Science and Electronics Department, Universitas Gajah Mada, Indonesia)
^*corresponding author

Abstract

In this paper, we present Latent Drichlet Allocation in automatic text summarization to improve accuracy in document clustering. The experiments involving 398 data set from public blog article obtained by using python scrapy crawler and scraper. Several steps of clustering in this research are preprocessing, automatic document compression using feature method, automatic document compression using LDA, word weighting and clustering algorithm The results show that automatic document summarization with LDA reaches 72% in LDA 40%, compared to traditional k-means method which only reaches 66%.

Keywords

LDA, text summarization, clustering, k-means

DOI

https://doi.org/10.26555/ijain.v1i3.43

Article metrics

Abstract views : 6313 | PDF views : 1087

Cite

How to cite item

Full Text

Download

References

Changqiu Sun, Xiaolong Wang & Jun Xu, â€œStudy on Feature Selection in Finance Text Categorization,â€ International Conference on Systems, Man, and Cybernetics Proceedings of the 2009 IEEE

H. Al-mubaid and A.S. Umair, "A new text categorization technique using distributional clustering and learning logic," IEEE Trans. Knowl. Data Eng, vol. 18, 2006, pp. 1156-1165.

Ladda Suanmali, Naomie Salim & M Salem Binwahlan, â€œAutomatic text summarization using feature based fuzzy extraction,â€ Jurnal teknologi Maklumat jilid 20. Bil 2, 2008.

Luying Liu, Jianchu Kang, Jing Yu & Zhongliang Wang, â€œA Comparative Study on Unsupervised Feature Selection Methods for Text Clustering,â€ Natural Language Processing and Knowledge Engineering, 2005. IEEE NLP-KE '05. Proceedings of 2005 IEEE International Conference.

Manika Kar, SÃ©rgio Nunes, Cristina Ribeiro, â€œSummarization of Changes in Dynamic Text Collections Using Latent Dirichlet Allocation Model,â€ Journal of Information Processing & Management, vol 51, no. 6, 2015, pp. 809-833

Tao Liu, Shengping Liu, Zheng Chen & Wei-Ying Ma, â€œAn Evaluation on Feature Selection for Text Clustering,â€ Proceedings of the Twentieth International Conference on Machine Learning (ICML-2003), Washington DC, 2003.

L. Muflikhah & B. Baharudin, "Document Clustering using Concept Space and Cosine Similarity Measurement," International Conference on Computer Technology and Development, Kota Kinabalu: 2009, pp. 58 - 62.

W. Song and S. C. Park, â€œA Novel Document Clustering Model Based on Latent Semantic Analysis,â€ pp. 539â€“542, 2007.

Krysta M. Svore, Lucy V., & Christopher J.C. Burges, â€œEnhancing Single-document Summarization by Combining RankNet and Third-party Sources,â€ Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pp. 448â€“457, Prague, June 2007.

JIANG Xiao-Yu, FAN Xiao-Zhong, Wang Zhi-Fei & Jia Ke-Liang, â€œImproving the Performance of Text Categorization using Automatic Summarization,â€ International Conference on Computer Modeling and Simulation IEEE 2009.

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

___________________________________________________________
International Journal of Advances in Intelligent Informatics
ISSN 2442-6571 (print) | 2548-3161 (online)
Organized by UAD and ASCEE Computer Society
Published by Universitas Ahmad Dahlan
W: http://ijain.org
E: info@ijain.org (paper handling issues)
andri.pranolo.id@ieee.org (publication issues)

View IJAIN Stats

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0

Username
Password
Remember me