Scientific reference style using rule-based machine learning

(1) * Afrida Helen Mail (Computer Science Departement, Padjadjaran University, Indonesia)
(2) Aditya Pradana Mail (Computer Science Departement, Padjadjaran University, Indonesia)
(3) Muhammad Afif Mail (Computer Science Departement, Padjadjaran University, Indonesia)
*corresponding author


Regular Expressions (RegEx) can be employed as a technique for supervised learning to define and search for specific patterns inside text. This work devised a method that utilizes regular expressions to convert the reference style of academic papers into several styles, dependent on the specific needs of the target publication or conference. Our research aimed to detect distinctive patterns of reference styles using RegEx and compare them with a dataset including various reference styles. We gathered a diverse range of reference format categories, encompassing seven distinct classes, from various sources such as academic papers, journals, conference proceedings, and books. Our approach involves employing RegEx to convert one referencing format to another based on the user's specific preferences. The proposed model demonstrated an accuracy of 57.26% for book references and 57.56% for journal references. We used the similarity ratio and Levenshtein distance to evaluate the dataset's performance. The model achieved a 97.8% similarity ratio with a Levenshtein distance of 2. Notably, the APA style for journal references yielded the best results. However, the effectiveness of the extraction function varies depending on the reference style. For APA style, the model showed a 99.97% similarity ratio with a Levenshtein distance of 1. Overall, our proposed model outperforms baseline machine learning models in this task. This study introduces an automated program that utilizes regular expressions to modify academic reference formats. This will enhance the efficiency, precision, and adaptability of academic publishing.


Regular expression; Reference writing style;Scientific paper; Levenshtein distance; Similarity ratio



Article metrics

Abstract views : 174 | PDF views : 57




Full Text



[1] G. Carleo et al., “Machine learning and the physical sciences,” Rev. Mod. Phys., vol. 91, no. 4, p. 045002, Dec. 2019, doi: 10.1103/RevModPhys.91.045002.

[2] R. Pradhan, “Rule based Approach to convert abbreviation into Phrases,” in 2021 5th International Conference on Information Systems and Computer Networks (ISCON), Oct. 2021, pp. 1–5, doi: 10.1109/ISCON52037.2021.9702404.

[3] M. D. Drovo, M. Chowdhury, S. I. Uday, and A. K. Das, “Named Entity Recognition in Bengali Text Using Merged Hidden Markov Model and Rule Base Approach,” in 2019 7th International Conference on Smart Computing & Communications (ICSCC), Jun. 2019, pp. 1–5, doi: 10.1109/ICSCC.2019.8843661.

[4] D. Stammbach and E. Ash, “DocSCAN: Unsupervised Text Classification via Learning from Neighbors,” KONVENS 2022 - Proc. 18th Conf. Nat. Lang. Process., no. Konvens, pp. 21–28, 2022, [Online]. Available at:

[5] V. G, H. R, and J. Hareesh, “Relation Extraction in Clinical Text using NLP Based Regular Expressions,” in 2019 2nd International Conference on Intelligent Computing, Instrumentation and Control Technologies (ICICICT), Jul. 2019, pp. 1278–1282, doi: 10.1109/ICICICT46008.2019.8993274.

[6] Z. Fu and J. Li, “High speed regular expression matching engine with fast pre-processing,” China Commun., vol. 16, no. 2, pp. 177–188, Feb. 2019. [Online]. Available at:

[7] H. Liu, A. Gegov, and F. Stahl, “Categorization and Construction of Rule Based Systems,” in Communications in Computer and Information Science, vol. 459 CCIS, Springer Verlag, 2014, pp. 183–194, doi: 10.1007/978-3-319-11071-4_18.

[8] C. Chapman, P. Wang, and K. T. Stolee, “Exploring regular expression comprehension,” in 2017 32nd IEEE/ACM International Conference on Automated Software Engineering (ASE), Oct. 2017, pp. 405–416, doi: 10.1109/ASE.2017.8115653.

[9] M. Uma, V. Sneha, G. Sneha, J. Bhuvana, and B. Bharathi, “Formation of SQL from Natural Language Query using NLP,” in 2019 International Conference on Computational Intelligence in Data Science (ICCIDS), Feb. 2019, pp. 1–5, doi: 10.1109/ICCIDS.2019.8862080.

[10] D. Tkaczyk, “What’s your (citations’) style?,” Crossref. Accessed Aug. 12, 2021. [Online]. Available at:

[11] N. Veljković, D. Puflovic, and L. Stoimenov, “Scientific References Import from Unstructured Data,” Facta Univ. Ser. Autom. Control Robot., vol. 18, no. 1, p. 031, Sep. 2019, doi: 10.22190/FUACR1901031V.

[12] S. Cvetković, M. Stojanović, and M. Stanković, “An Approach for Extraction and Visualization of Scientific Metadata,” in ICT Innovations 2010, Web Proceeding, 2010, pp. 161–170, [Online]. Available:

[13] Y. Xu et al., “Detecting premature departure in online text-based counseling using logic-based pattern matching,” Internet Interv., vol. 26, p. 100486, Dec. 2021, doi: 10.1016/j.invent.2021.100486.

[14] S. Arts, B. Cassiman, and J. C. Gomez, “Text matching to measure patent similarity,” Strateg. Manag. J., vol. 39, no. 1, pp. 62–84, Jan. 2018, doi: 10.1002/smj.2699.

[15] M. Cui, R. Bai, Z. Lu, X. Li, U. Aickelin, and P. Ge, “Regular Expression Based Medical Text Classification Using Constructive Heuristic Approach,” IEEE Access, vol. 7, pp. 147892–147904, 2019, doi: 10.1109/ACCESS.2019.2946622.

[16] I. G. Councill, C. Lee Giles, and M. Y. Kan, “ParsCit: An open-source CRF reference string parsing package,” in Proceedings of the 6th International Conference on Language Resources and Evaluation, LREC 2008, 2008, no. 3, pp. 661–667, [Online]. Available at:;jsessionid.

[17] A. Prasad, M. Kaur, and M.-Y. Kan, “Neural ParsCit: a deep learning-based reference string parser,” Int. J. Digit. Libr., vol. 19, no. 4, pp. 323–337, Nov. 2018, doi: 10.1007/s00799-018-0242-1.

[18] M. Kapoor, G. Fuchs, and J. Quance, “RExACtor: Automatic Regular Expression Signature Generation for Stateless Packet Inspection,” in 2021 IEEE 20th International Symposium on Network Computing and Applications (NCA), Nov. 2021, pp. 1–9, doi: 10.1109/NCA53618.2021.9685959.

[19] P. Wang, G. R. Bai, and K. T. Stolee, “Exploring Regular Expression Evolution,” in 2019 IEEE 26th International Conference on Software Analysis, Evolution and Reengineering (SANER), Feb. 2019, pp. 502–513, doi: 10.1109/SANER.2019.8667972.

[20] A. A. Jalal, “Text Mining: Design of Interactive Search Engine Based Regular Expressions of Online Automobile Advertisements,” Int. J. Eng. Pedagog., vol. 10, no. 3, p. 35, May 2020, doi: 10.3991/ijep.v10i3.12419.

[21] I. Onyenwe, S. Ogbonna, E. Onyedimma, O. Ikechukwu-Onyenwe, and C. Nwafor, “Developing Smart Web-Search using Regex,” Int. J. Nat. Lang. Comput., vol. 11, no. 3, pp. 25–30, Jun. 2022, doi: 10.5121/ijnlc.2022.11303.

[22] C. M. Frenz, “Introduction to Searching with Regular Expressions,” in Proceedings of the 2008 Trenton Computer Festival, 2008, pp. 1–13. [Online]. Available at:

[23] D. Riaño, R. Piñon, G. Molero-Castillo, E. Bárcenas, and A. Velázquez-Mena, “Regular Expressions for Web Advertising Detection Based on an Automatic Sliding Algorithm,” Program. Comput. Softw., vol. 46, no. 8, pp. 652–660, Dec. 2020, doi: 10.1134/S0361768820080162.

[24] C. A. Flores, R. L. Figueroa, and J. E. Pezoa, “Active Learning for Biomedical Text Classification Based on Automatically Generated Regular Expressions,” IEEE Access, vol. 9, pp. 38767–38777, 2021, doi: 10.1109/ACCESS.2021.3064000.

[25] V. Olago, M. Muchengeti, E. Singh, and W. C. Chen, “Identification of Malignancies from Free-Text Histopathology Reports Using a Multi-Model Supervised Machine Learning Approach,” Information, vol. 11, no. 9, p. 455, Sep. 2020, doi: 10.3390/info11090455.

[26] I. H. Sarker, “Machine Learning: Algorithms, Real-World Applications and Research Directions,” SN Comput. Sci., vol. 2, no. 3, p. 160, May 2021, doi: 10.1007/s42979-021-00592-x.

[27] H.-S. Shin, D. Turchi, S. He, and A. Tsourdos, “Behavior Monitoring Using Learning Techniques and Regular-Expressions-Based Pattern Matching,” IEEE Trans. Intell. Transp. Syst., vol. 20, no. 4, pp. 1289–1302, Apr. 2019, doi: 10.1109/TITS.2018.2849266.

[28] Y. Tang, W. Le, X. Chen, Z. Gu, L. Yin, and X. Yi, “Automatic Classification of Matching Rules in Pattern Matching,” in 2020 IEEE Fifth International Conference on Data Science in Cyberspace (DSC), Jul. 2020, pp. 302–306, doi: 10.1109/DSC50466.2020.00053.

[29] C. A. Flores and R. Verschae, “A Generic Semi-Supervised and Active Learning Framework for Biomedical Text Classification,” in 2022 44th Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC), Jul. 2022, vol. 2022-July, pp. 4445–4448, doi: 10.1109/EMBC48229.2022.9871846.

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

International Journal of Advances in Intelligent Informatics
ISSN 2442-6571  (print) | 2548-3161 (online)
Organized by UAD and ASCEE Computer Society
Published by Universitas Ahmad Dahlan
E: (paper handling issues) (publication issues)

View IJAIN Stats

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0