(1) Meilany Nonsi Tentua Mail (Universitas PGRI Yogyakarta, Indonesia)
(2) * Suprapto Suprapto Mail (Universitas Gadjah Mada, Indonesia)
(3) Afiahayati Afiahayati Mail (Universitas Gadjah Mada, Indonesia)
*corresponding author

Abstract


The pretrained language model in Indonesian is already available for natural language processing tasks. However, this pre-trained model has been trained on Indonesian text, which has a different structure from the job description. Due to this, the pre-trained language model effectiveness for skill recognition purposes. IndoBERTSkill is a novel pre trained domain-specific language model that recognizes Indonesian language skills. It is built on the Bidirectional Encoder Representations from Transformers (BERT) architecture. IndoBERTSkill was trained on an extensive collection of Indonesian language texts from the Indonesian Wikipedia, the English Wikipedia, and the Indonesian Job Description from the job portal. IndoBERTSkill's performance was evaluated through two main approaches: (1) language modeling via Masked Language Model (MLM) prediction, and (2) fine-tuning on a custom annotated dataset (NERSkill) for Named Entity Recognition (NER) tasks. The fine-tuning process involved training a classification layer on top of the IndoBERTSkill model using BIO tagging to identify hard skills, soft skills, and technology entities. Similarly, the skill recognition model derived from IndoBERTSkill exhibits the highest F1-Score among various pre-trained language models, precisely at 87%, thus demonstrating robustness and strong generalizability for skill entity recognition in Indonesian job descriptions. IndoBERTSkill provides valuable resources for developing Indonesian natural language processing applications that require skills introduction. This could increase the accuracy and efficiency of skills recognition across various domains, including job matching, education, and training.

Keywords


BERT, Skill Recognition, Named Entity Recognition, domain-specific, Pretrained Language Model

          

Article metrics

Abstract views : 12

   

Cite

   


Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

___________________________________________________________
International Journal of Advances in Intelligent Informatics
ISSN 2442-6571  (print) | 2548-3161 (online)
Organized by UAD and ASCEE Computer Society
Published by Universitas Ahmad Dahlan
W: http://ijain.org
E: info@ijain.org (paper handling issues)
 andri.pranolo.id@ieee.org (publication issues)

View IJAIN Stats

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0