Automatic Text Summarization Using Latent Drichlet Allocation (LDA) for Document Clustering

(1) * Erwin Yudi Hidayat Mail (Faculty of Computer Science, Universitas Dian Nuswantoro, Indonesia)
(2) Fahri Firdausillah Mail (Faculty of Computer Science, Universitas Dian Nuswantoro, Indonesia)
(3) Khafiizh Hastuti Mail (Faculty of Computer Science, Universitas Dian Nuswantoro, Indonesia)
(4) Ika Novita Dewi Mail (Faculty of Computer Science, Universitas Dian Nuswantoro, Indonesia)
(5) Azhari Azhari Mail (Computer Science and Electronics Department, Universitas Gajah Mada, Indonesia)
*corresponding author


In this paper, we present Latent Drichlet Allocation in automatic text summarization to improve accuracy in document clustering. The experiments involving 398 data set from public blog article obtained by using python scrapy crawler and scraper. Several steps of clustering in this research are preprocessing, automatic document compression using feature method, automatic document compression using LDA, word weighting and clustering algorithm The results show that automatic document summarization with LDA reaches 72% in LDA 40%, compared to traditional k-means method which only reaches 66%.


LDA, text summarization, clustering, k-means



