Random Forest Classifier Approach for Accurate Malicious URL Identification
DOI:
https://doi.org/10.37253/telcomatics.v10i2.11173Keywords:
Machine Learning, Random Fores, SecurityAbstract
Internet users currently face significant risks from malicious URLs that facilitate phishing attacks, malware distribution, and data theft. Traditional blacklisting methods have become ineffective against evolving cyberattack techniques. This study proposes a Random Forest classification approach for more accurate malicious URL detection, focusing on critical URL features including URL length, presence of special keywords, subdomain structure, and special character usage. these features train the Random Forest model to distinguish between safe and malicious URLs. We evaluate model effectiveness using accuracy, precision, and recall metrics. This research aims to develop a Random Forest-based malicious URL detection system that is more accurate and adaptive than conventional methods. The study examines both the advantages and limitations of this approach, along with its potential as a reliable detection solution for dynamic digital environments. Evaluation results demonstrate an overall accuracy of 94%, weighted average F1-score of 0.94, and macro average F1-score of 0.94.
Downloads
References
[1] L. Tang and Q. H. Mahmoud, “A Survey of Machine Learning-Based Solutions for Phishing Website Detection,” Sep. 01, 2021, MDPI. doi: 10.3390/make3030034.
[2] A. Safi and S. Singh, “A systematic literature review on phishing website detection techniques,” Journal of King Saud University - Computer and Information Sciences, vol. 35, no. 2, pp. 590–611, Feb. 2023, doi: 10.1016/j.jksuci.2023.01.004.
[3] B. K. Gontla, P. Gundu, P. J. Uppalapati, K. Narasimharao, and S. M. Hussain, “A Machine Learning Approach to Identify Phishing Websites: A Comparative Study of Classification Models and Ensemble Learning Techniques,” EAI Endorsed Transactions on Scalable Information Systems, vol. 10, no. 5, pp. 1–9, 2023, doi: 10.4108/eetsis.vi.3300.
[4] C. Opara, Y. Chen, and B. Wei, “Look before you leap: Detecting phishing web pages by exploiting raw URL and HTML characteristics,” Expert Syst Appl, vol. 236, Feb. 2024, doi: 10.1016/j.eswa.2023.121183.
[5] M. A. Remmide, F. Boumahdi, N. Boustia, C. L. Feknous, and R. Della, “Detection of Phishing URLs Using Temporal Convolutional Network,” in Procedia Computer Science, Elsevier B.V., 2022, pp. 74–82. doi: 10.1016/j.procs.2022.10.209.
[6] L. Breiman, “Random Forests,” 2001.
[7] M. Schonlau and R. Y. Zou, “The random forest algorithm for statistical learning,” Stata Journal, vol. 20, no. 1, pp. 3–29, Mar. 2020, doi: 10.1177/1536867X20909688.
[8] Abdulhamit Subasi, Esraa Molah, Fatin Almkallawi, and Touseef J. Chaudhery, Intelligent Phishing Website Detection using Random Forest Classifier. IEEE, 2018.
[9] V. Vajrobol, B. B. Gupta, and A. Gaurav, “Mutual information based logistic regression for phishing URL detection,” Cyber Security and Applications, vol. 2, Jan. 2024, doi: 10.1016/j.csa.2024.100044.
[10] B. Banik and A. Sarma, “PHISHING URL DETECTION USING LSTM BASED ENSEMBLE LEARNING APPROACHES,” International Journal of Computer Networks and Communications, vol. 15, no. 1, pp. 17–33, Jan. 2023, doi: 10.5121/ijcnc.2023.15102.
[11] S. Sheikhi and P. Kostakos, “Safeguarding cyberspace: Enhancing malicious website detection with PSO[sbnd]optimized XGBoost and firefly-based feature selection,” Comput Secur, vol. 142, Jul. 2024, doi: 10.1016/j.cose.2024.103885.
[12] B. Banik and A. Sarma, “Lexical Feature Based Feature Selection and Phishing URL Classification Using Machine Learning Techniques,” in Communications in Computer and Information Science, Springer, 2020, pp. 93–105. doi: 10.1007/978-981-15-6318-8_9.
[13] R. Verma and A. Das, “What’s in a URL: Fast feature extraction and malicious URL detection,” in IWSPA 2017 - Proceedings of the 3rd ACM International Workshop on Security and Privacy Analytics, co-located with CODASPY 2017, Association for Computing Machinery, Inc, Mar. 2017, pp. 55–63. doi: 10.1145/3041008.3041016.
[14] D. Sahoo, C. Liu, and S. C. H. Hoi, “Malicious URL Detection using Machine Learning: A Survey,” vol. 2019-August, doi: https://doi.org/10.1145/nnnnnnn.nnnnnnn.
[15] S. Mohanty and A. A. Acharya, “MFBFST: Building a stable ensemble learning model using multivariate filter-based feature selection technique for detection of suspicious URL,” in Procedia Computer Science, Elsevier B.V., 2022, pp. 1668–1681. doi: 10.1016/j.procs.2023.01.145.
Published
Issue
Section
License
Copyright (c) 2025 Telcomatics

This work is licensed under a Creative Commons Attribution 4.0 International License.





