Fine-tuned hyperparameter optimization for phishing website detection: insights into efficiency and performance

Rizki Wahyudi; Azhari Shouni Barkah; Siti Rahayu Selamat; Pungkas Subarkah

doi:10.26555/ijain.v12i1.1920


Fine-tuned hyperparameter optimization for phishing website detection: insights into efficiency and performance

⁽¹⁾ Rizki Wahyudi

(Universitas Amikom Purwokerto, Indonesia)
⁽²⁾ Azhari Shouni Barkah

(Universitas Amikom Purwokerto, Indonesia)
^{(3) *} Siti Rahayu Selamat

(Universiti Teknikal Malaysia Melaka, Malaysia)
⁽⁴⁾ Pungkas Subarkah

(Universitas Amikom Purwokerto, Indonesia)
^*corresponding author

Abstract

The escalation of digital threats has made phishing-site identification a critical aspect of online protection. This study investigates how systematic hyperparameter adjustment through grid search influences both predictive precision and computational efficiency in phishing detection. Nine supervised classifiers from different algorithmic families were analyzed: tree-based models (DT, RF, GB, XGBoost), margin and distance-based learners (SVM, k-NN), probabilistic and neural approaches (NB, MLP), and a linear baseline using logistic regression (LR). Although machine learning (ML) approaches have demonstrated strong predictive capability, their reliability largely depends on precise parameter calibration. Through systematic exploration of parameter combinations, the grid-search approach identifies optimal settings for each model. Using the Kaggle phishing-URL dataset, tuned models achieved noticeable accuracy gains. DT, RF, and k-NN reached 99.1% accuracy with training times of 0.10 s, 1.55 s, and 0.01 s, respectively. MLP yielded 99.0% accuracy but required 2758 s, while SVM and LR achieved 97.8% and 92.9%. NB did the worst (62.7%). The results indicate that careful hyperparameter optimization enhances predictive ability, whereas model complexity heavily impacts runtime. This study’s novelty lies in a balanced assessment of accuracy and efficiency trade-offs, offering guidelines for selecting computationally efficient algorithms in practical phishing-detection systems.

Keywords

Phishing detection; Hyperparameter optimization; Machine learning; Grid search; Computational Efficiency

DOI

https://doi.org/10.26555/ijain.v12i1.1920

Article metrics

Abstract views : 327 | PDF views : 49

Cite

How to cite item

Full Text

Download

References

[1] P. E. Reports, P. S. Trends, B. P. Measurement, E. P. Attacks, M. Targeted, and I. Sectors, “Phishing Activity Trends Report Q1 2022,” in Unifying the Global Response To Cybercrime, University of Massachusetts Press, Mar. 2022, pp. 80–88. [Online]. Available at: https://docs.apwg.org/reports/apwg_trends_report_q1_2022.

[2] Z. Alkhalil, C. Hewage, L. Nawaf, and I. Khan, “Phishing Attacks: A Recent Comprehensive Study and a New Anatomy,” Front. Comput. Sci., vol. 3, p. 563060, Mar. 2021, doi: 10.3389/fcomp.2021.563060.

[3] M. K. Prabhakaran, A. D. Chandrasekar, and P. Meenakshi Sundaram, “PHISH_ATTENTION: achieving robust phishing website detection with balanced datasets and advanced URL features,” Comput. J., vol. 68, no. 9, pp. 1263–1284, Sep. 2025, doi: 10.1093/comjnl/bxaf036.

[4] A. Safi and S. Singh, “A systematic literature review on phishing website detection techniques,” J. King Saud Univ. - Comput. Inf. Sci., vol. 35, no. 2, pp. 590–611, Feb. 2023, doi: 10.1016/j.jksuci.2023.01.004.

[5] S. Kavya and D. Sumathi, “Staying ahead of phishers: a review of recent advances and emerging methodologies in phishing detection,” Artif. Intell. Rev., vol. 58, no. 2, p. 50, Dec. 2024, doi: 10.1007/s10462-024-11055-z.

[6] G. Sonowal and K. S. Kuppusamy, “PhiDMA – A phishing detection model with multi-filter approach,” J. King Saud Univ. - Comput. Inf. Sci., vol. 32, no. 1, pp. 99–112, Jan. 2020, doi: 10.1016/j.jksuci.2017.07.005.

[7] R. Wahyudi, H. Marcos, U. Hasanah, B. P. Hartato, T. Astuti, and R. A. Prasetyo, “Algorithm Evaluation for Classification ‘Phishing Website’ Using Several Classification Algorithms,” in 2018 3rd International Conference on Information Technology, Information System and Electrical Engineering (ICITISEE), IEEE, Nov. 2018, pp. 265–270. doi: 10.1109/ICITISEE.2018.8720975.

[8] A. Basit, M. Zafar, X. Liu, A. R. Javed, Z. Jalil, and K. Kifayat, “A comprehensive survey of AI-enabled phishing attacks detection techniques,” Telecommun. Syst., vol. 76, no. 1, pp. 139–154, Jan. 2021, doi: 10.1007/s11235-020-00733-2.

[9] R. Pugliese, S. Regondi, and R. Marini, “Machine learning-based approach: global trends, research directions, and regulatory standpoints,” Data Sci. Manag., vol. 4, no. 3, pp. 19–29, Dec. 2021, doi: 10.1016/j.dsm.2021.12.002.

[10] P. R. K. Gouse Baig Mohammad, S. Shitharth, “Integrated Machine Learning Model for an URL Phishing Detection,” Int. J. Grid Distrib. Comput., vol. 14, no. 1, pp. 513–529, 2020, Mar. 08, 2026. [Online]. Available at: https://www.researchgate.net/publication/352994631_Integrated_Machine_Learning_Model_for_an_URL_Phishing_Detection.

[11] B. Charbuty and A. Abdulazeez, “Classification Based on Decision Tree Algorithm for Machine Learning,” J. Appl. Sci. Technol. Trends, vol. 2, no. 01, pp. 20–28, Mar. 2021, doi: 10.38094/jastt20165.

[12] M. A. Salam, A. Taher, M. Samy, and K. Mohamed, “The Effect of Different Dimensionality Reduction Techniques on Machine Learning Overfitting Problem,” Int. J. Adv. Comput. Sci. Appl., vol. 12, no. 4, pp. 641–655, May 2021, doi: 10.14569/IJACSA.2021.0120480.

[13] A. Hoarau, A. Martin, J.-C. Dubois, and Y. Le Gall, “Evidential Random Forests,” Expert Syst. Appl., vol. 230, no. November, p. 120652, Nov. 2023, doi: 10.1016/j.eswa.2023.120652.

[14] A. Asselman, M. Khaldi, and S. Aammou, “Enhancing the prediction of student performance based on the machine learning XGBoost algorithm,” Interact. Learn. Environ., vol. 31, no. 6, pp. 3360–3379, Aug. 2023, doi: 10.1080/10494820.2021.1928235.

[15] B. Ahadzadeh, M. Abdar, F. Safara, A. Khosravi, M. B. Menhaj, and P. N. Suganthan, “SFE: A Simple, Fast, and Efficient Feature Selection Algorithm for High-Dimensional Data,” IEEE Trans. Evol. Comput., vol. 27, no. 6, pp. 1896–1911, Dec. 2023, doi: 10.1109/TEVC.2023.3238420.

[16] X. Zhang, P. Wei, and Q. Wang, “A hybrid anomaly detection method for high dimensional data,” PeerJ Comput. Sci., vol. 9, p. e1199, Jan. 2023, doi: 10.7717/peerj-cs.1199.

[17] C. Gong, Z. Su, X. Zhang, and Y. You, “Adaptive evidential K-NN classification: Integrating neighborhood search and feature weighting,” Inf. Sci. (Ny)., vol. 648, no. November, p. 119620, Nov. 2023, doi: 10.1016/j.ins.2023.119620.

[18] Z. Yang et al., “A New Three-Way Incremental Naive Bayes Classifier,” Electronics. vol. 12, no. 7, p. 1730, Mar. 2023, doi:10.3390/electronics12071730.

[19] M. Schonlau, “The Naive Bayes Classifier,” in Applied Statistical Learning, Springer, Cham, 2023, pp. 143–160. doi: 10.1007/978-3-031-33390-3_8.

[20] V. Shahrivari, M. M. Darabi, and M. Izadi, “Phishing Detection Using Machine Learning Techniques,” in Proceedings - 2020 1st International Conference of Smart Systems and Emerging Technologies, SMART-TECH 2020, Institute of Electrical and Electronics Engineers Inc., Sep. 2020, pp. 43–46. doi: doi: 10.48550/arXiv.2009.11116.

[21] K. Omari, “Phishing Detection using Gradient Boosting Classifier,” Procedia Comput. Sci., vol. 230, pp. 120–127, Jan. 2023, doi: 10.1016/j.procs.2023.12.067.

[22] E. Oram, P. B. Dash, B. Naik, J. Nayak, S. Vimal, and S. K. Nataraj, “Light gradient boosting machine-based phishing webpage detection model using phisher website features of mimic URLs,” Pattern Recognit. Lett., vol. 152, no. December, pp. 100–106, Dec. 2021, doi: 10.1016/j.patrec.2021.09.018.

[23] S. R. Abdul Samad et al., “Analysis of the Performance Impact of Fine-Tuned Machine Learning Model for Phishing URL Detection,” Electronics, vol. 12, no. 7, p. 1642, Mar. 2023, doi: 10.3390/electronics12071642.

[24] M. A. Talukder, R. Hossen, M. A. Uddin, M. N. Uddin, and U. K. Acharjee, “Securing transactions: a hybrid dependable ensemble machine learning model using IHT-LR and grid search,” Cybersecurity, vol. 7, no. 1, p. 32, Nov. 2024, doi: 10.1186/s42400-024-00221-z.

[25] K. Bian and R. Priyadarshi, “Machine Learning Optimization Techniques: A Survey, Classification, Challenges, and Future Research Issues,” Arch. Comput. Methods Eng., vol. 31, no. 7, pp. 4209–4233, Mar. 2024, doi: 10.1007/s11831-024-10110-w.

[26] H. J. P. Weerts, A. C. Mueller, and J. Vanschoren, “Importance of Tuning Hyperparameters of Machine Learning Algorithms,” Jul. 2020, p. 17. doi: 10.48550/arXiv.2007.07588.

[27] F. Abbas et al., “Optimizing Machine Learning Algorithms for Landslide Susceptibility Mapping along the Karakoram Highway, Gilgit Baltistan, Pakistan: A Comparative Study of Baseline, Bayesian, and Metaheuristic Hyperparameter Optimization Techniques,” Sensors, vol. 23, no. 15, p. 6843, Aug. 2023, doi: 10.3390/s23156843.

[28] S. Simon, N. Kolyada, C. Akiki, M. Potthast, B. Stein, and N. Siegmund, “Exploring Hyperparameter Usage and Tuning in Machine Learning Research,” in 2023 IEEE/ACM 2nd International Conference on AI Engineering – Software Engineering for AI (CAIN), IEEE, May 2023, pp. 68–79. doi: 10.1109/CAIN58948.2023.00016.

[29] M. Almousa, T. Zhang, A. Sarrafzadeh, and M. Anwar, “Phishing website detection: How effective are deep <scp>learning‐based</scp> models and hyperparameter optimization?,” Secur. Priv., vol. 5, no. 6, p. e256, Nov. 2022, doi: 10.1002/spy2.256.

[30] Y. Rimal, N. Sharma, and A. Alsadoon, “The accuracy of machine learning models relies on hyperparameter tuning: student result classification using random forest, randomized search, grid search, bayesian, genetic, and optuna algorithms,” Multimed. Tools Appl., vol. 83, no. 30, pp. 74349–74364, Feb. 2024, doi: 10.1007/s11042-024-18426-2.

[31] N. F. Almujahid, M. A. Haq, and M. Alshehri, “Comparative evaluation of machine learning algorithms for phishing site detection,” PeerJ Comput. Sci., vol. 10, p. 2131, Jun. 2024, doi: 10.7717/peerj-cs.2131.

[32] W. Fu, V. Nair, and T. Menzies, “Why is Differential Evolution Better than Grid Search for Tuning Defect Predictors?,” Mar. 2017, p. 13. doi: 10.48550/arXiv.1609.02613.

[33] M. Adnan, A. A. S. Alarood, M. I. Uddin, and I. ur Rehman, “Utilizing grid search cross-validation with adaptive boosting for augmenting performance of machine learning models,” PeerJ Comput. Sci., vol. 8, p. e803, Feb. 2022, doi: 10.7717/peerj-cs.803.

[34] S. F. M. Radzi, M. K. A. Karim, M. I. Saripan, M. A. A. Rahman, I. N. C. Isa, and M. J. Ibahim, “Hyperparameter Tuning and Pipeline Optimization via Grid Search Method and Tree-Based AutoML in Breast Cancer Prediction,” J. Pers. Med., vol. 11, no. 10, p. 978, Sep. 2021, doi: 10.3390/jpm11100978.

[35] B. Bischl et al., “Hyperparameter optimization: Foundations, algorithms, best practices, and open challenges,” WIREs Data Min. Knowl. Discov., vol. 13, no. 2, p. e1484, Mar. 2023, doi: 10.1002/widm.1484.

[36] A. A. Albishri and M. M. Dessouky, “A Comparative Analysis of Machine Learning Techniques for URL Phishing Detection,” Eng. Technol. Appl. Sci. Res., vol. 14, no. 6, pp. 18495–18501, Dec. 2024, doi: 10.48084/etasr.8920.

[37] R. Mohammad and L. McCluskey, “Phishing Websites,” UCI Machine Learning Repository. [Online]. Available at: https://archive.ics.uci.edu/dataset/327/phishing+websites.

[38] V. R. Joseph and A. Vakayil, “SPlit: An Optimal Method for Data Splitting,” Technometrics, vol. 64, no. 2, pp. 166–176, Apr. 2022, doi: 10.1080/00401706.2021.1921037.

[39] M. Ahsan, M. Mahmud, P. Saha, K. Gupta, and Z. Siddique, “Effect of Data Scaling Methods on Machine Learning Algorithms and Model Performance,” Technologies, vol. 9, no. 3, p. 52, Jul. 2021, doi: 10.3390/technologies9030052.

[40] H. Alibrahim and S. A. Ludwig, “Hyperparameter Optimization: Comparing Genetic Algorithm against Grid Search and Bayesian Optimization,” in 2021 IEEE Congress on Evolutionary Computation (CEC), IEEE, Jun. 2021, pp. 1551–1559. doi: 10.1109/CEC45853.2021.9504761.

[41] Y. Zhao, W. Zhang, and X. Liu, “Grid search with a weighted error function: Hyper-parameter optimization for financial time series forecasting,” Appl. Soft Comput., vol. 154, no. 1, p. 111362, Mar. 2024, doi: 10.1016/j.asoc.2024.111362.

[42] C. Catal, G. Giray, B. Tekinerdogan, S. Kumar, and S. Shukla, “Applications of deep learning for phishing detection: a systematic literature review,” Knowl. Inf. Syst., vol. 64, no. 6, pp. 1457–1500, Jun. 2022, doi: 10.1007/s10115-022-01672-x.

[43] D. Sehrawat and Y. Singh, “Comparative Analysis on Fraud Detection in Credit Card Transaction Using Different Machine Learning Algorithms,” in Lecture Notes in Networks and Systems, vol. 425, Springer, Singapore, 2022, pp. 673–684. doi: 10.1007/978-981-19-0707-4_61.

[44] D. Miyamoto, H. Hazeyama, and Y. Kadobayashi, “An Evaluation of Machine Learning-Based Methods for Detection of Phishing Sites,” Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics), vol. 5506 LNCS, no. PART 1, pp. 539–546, 2009, doi: 10.1007/978-3-642-02490-0_66.

[45] M. Vijayalakshmi, S. Mercy Shalinie, M. H. Yang, and R. M. U., “Web phishing detection techniques: a survey on the state‐of‐the‐art, taxonomy and future directions,” IET Networks, vol. 9, no. 5, pp. 235–246, Sep. 2020, doi: 10.1049/iet-net.2020.0078.

[46] L. Tang and Q. H. Mahmoud, “A Survey of Machine Learning-Based Solutions for Phishing Website Detection,” Mach. Learn. Knowl. Extr., vol. 3, no. 3, pp. 672–694, Aug. 2021, doi: 10.3390/make3030034.

[47] A. K. Dutta, “Detecting phishing websites using machine learning technique,” PLoS One, vol. 16, no. 10, p. e0258361, Oct. 2021, doi: 10.1371/journal.pone.0258361.

[48] A. Zamir et al., “Phishing web site detection using diverse machine learning algorithms,” Electron. Libr., vol. 38, no. 1, pp. 65–80, Mar. 2020, doi: 10.1108/EL-05-2019-0118.

[49] T. Nagunwa, “Comparative Analysis of Nature-Inspired Metaheuristic Techniques for Optimizing Phishing Website Detection,” Analytics, vol. 3, no. 3, pp. 344–367, Aug. 2024, doi: 10.3390/analytics3030019.

[50] R. G. Jimoh et al., “Efficient Ensemble-based Phishing Website Classification Models using Feature Importance Attribute Selection and Hyper parameter Tuning Approaches,” J. Inf. Technol. Comput., vol. 4, no. 2, pp. 1–10, Dec. 2023, doi: 10.48185/jitc.v4i2.891.

[51] D. Al Ahmare, S. A. Lashari, A. Khan, and S. Salah-Uddin, “Hyper Parameters Tuning Using Partial Swarm Optimization Algorithm Based on Random Forest for URLs-Based Phishing Detection,” J. Xi’an Shiyou Univ. Nat. Sci. Ed., vol. 67, no. 05, pp. 149--???, May 2024, doi: 10.5281/zenodo.11273166.

[52] R. Pavan, M. Nara, S. Gopinath, and N. Patil, “Bayesian Optimization and Gradient Boosting to Detect Phishing Websites,” in 2021 55th Annual Conference on Information Sciences and Systems (CISS), IEEE, Mar. 2021, pp. 1–5. doi: 10.1109/CISS50987.2021.9400317.

[53] M. Al-Sarem et al., “An Optimized Stacking Ensemble Model for Phishing Websites Detection,” Electronics, vol. 10, no. 11, p. 1285, May 2021, doi: 10.3390/electronics10111285.

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

___________________________________________________________
International Journal of Advances in Intelligent Informatics
ISSN 2442-6571 (print) | 2548-3161 (online)
Organized by UAD and ASCEE Computer Society
Published by Universitas Ahmad Dahlan
W: http://ijain.org
E: info@ijain.org (paper handling issues)
andri.pranolo.id@ieee.org (publication issues)

View IJAIN Stats

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0

Username
Password
Remember me