A survey on text similarity measure

(1) Didik Dwi Prasetya Mail (Universitas Negeri Malang, Indonesia)
(2) * Aji Prasetya Wibawa Mail (Universitas Negeri Malang, Indonesia)
(3) Tsukasa Hirashima Mail (Graduate School of Engineering, Hiroshima University, Japan)
*corresponding author


Measurement of text similarity is a very important activity to determining the degree of similarity between objects. Finding of similarities between words, sentences, and documents are part of the essence of text similarity. Words can be said similar in two ways, lexically and semantically. There have been many studies of text similarity and resulting in various approaches and algorithms. This paper will summarize the measurements of text similarity categorized into four major groups: String-based, Corpus-based, Knowledge-based, and Hybrid similarities. To complete this study, we also conducted a small investigation to evaluate text similarity using common algorithms that represent categories of text similarity




