The performance of text similarity algorithms

(1) Didik Dwi Prasetya Mail (Universitas Negeri Malang, Indonesia)
(2) * Aji Prasetya Wibawa Mail (Universitas Negeri Malang, Indonesia)
(3) Tsukasa Hirashima Mail (Graduate School of Engineering, Hiroshima University, Japan)
*corresponding author


Text similarity measurement compares text with available references to indicate the degree of similarity between those objects. There have been many studies of text similarity and resulting in various approaches and algorithms. This paper investigates four majors text similarity measurements which include String-based, Corpus-based, Knowledge-based, and Hybrid similarities. The results of the investigation showed that the semantic similarity approach is more rational in finding substantial relationship between texts.


Similarity measure; String-based; Corpus-based; Knowledge-based; Text Mining



