Pendekatan Klasifikasi Random Forest untuk Identifikasi URL Berbahaya yang Akurat

Haeruddin Haeruddin; Elvert; Andik Yulianto; Sabariman Sabariman

doi:10.37253/telcomatics.v10i2.11173

Authors

Haeruddin Haeruddin Universitas Internasional Batam
Elvert Program Studi Teknologi Informasi, Fakultas Ilmu Komputer, Universitas Internasional Batam
Andik Yulianto Program Studi Sistem Informasi, Fakultas Ilmu Komputer, Universitas Internasional Batam
Sabariman Sabariman Program Studi Sistem Informasi, Fakultas Ilmu Komputer, Universitas Internasional Batam

DOI:

https://doi.org/10.37253/telcomatics.v10i2.11173

Keywords:

Machine Learning, Random Fores, Security

Abstract

Internet users currently face significant risks from malicious URLs that facilitate phishing attacks, malware distribution, and data theft. Traditional blacklisting methods have become ineffective against evolving cyberattack techniques. This study proposes a Random Forest classification approach for more accurate malicious URL detection, focusing on critical URL features including URL length, presence of special keywords, subdomain structure, and special character usage. these features train the Random Forest model to distinguish between safe and malicious URLs. We evaluate model effectiveness using accuracy, precision, and recall metrics. This research aims to develop a Random Forest-based malicious URL detection system that is more accurate and adaptive than conventional methods. The study examines both the advantages and limitations of this approach, along with its potential as a reliable detection solution for dynamic digital environments. Evaluation results demonstrate an overall accuracy of 94%, weighted average F1-score of 0.94, and macro average F1-score of 0.94.

Downloads

Download data is not yet available.

References

[1] L. Tang and Q. H. Mahmoud, “A Survey of Machine Learning-Based Solutions for Phishing Website Detection,” Sep. 01, 2021, MDPI. doi: 10.3390/make3030034.

[2] A. Safi and S. Singh, “A systematic literature review on phishing website detection techniques,” Journal of King Saud University - Computer and Information Sciences, vol. 35, no. 2, pp. 590–611, Feb. 2023, doi: 10.1016/j.jksuci.2023.01.004.

[3] B. K. Gontla, P. Gundu, P. J. Uppalapati, K. Narasimharao, and S. M. Hussain, “A Machine Learning Approach to Identify Phishing Websites: A Comparative Study of Classification Models and Ensemble Learning Techniques,” EAI Endorsed Transactions on Scalable Information Systems, vol. 10, no. 5, pp. 1–9, 2023, doi: 10.4108/eetsis.vi.3300.

[4] C. Opara, Y. Chen, and B. Wei, “Look before you leap: Detecting phishing web pages by exploiting raw URL and HTML characteristics,” Expert Syst Appl, vol. 236, Feb. 2024, doi: 10.1016/j.eswa.2023.121183.

[5] M. A. Remmide, F. Boumahdi, N. Boustia, C. L. Feknous, and R. Della, “Detection of Phishing URLs Using Temporal Convolutional Network,” in Procedia Computer Science, Elsevier B.V., 2022, pp. 74–82. doi: 10.1016/j.procs.2022.10.209.

[6] L. Breiman, “Random Forests,” 2001.

[7] M. Schonlau and R. Y. Zou, “The random forest algorithm for statistical learning,” Stata Journal, vol. 20, no. 1, pp. 3–29, Mar. 2020, doi: 10.1177/1536867X20909688.

[8] Abdulhamit Subasi, Esraa Molah, Fatin Almkallawi, and Touseef J. Chaudhery, Intelligent Phishing Website Detection using Random Forest Classifier. IEEE, 2018.

[9] V. Vajrobol, B. B. Gupta, and A. Gaurav, “Mutual information based logistic regression for phishing URL detection,” Cyber Security and Applications, vol. 2, Jan. 2024, doi: 10.1016/j.csa.2024.100044.

[10] B. Banik and A. Sarma, “PHISHING URL DETECTION USING LSTM BASED ENSEMBLE LEARNING APPROACHES,” International Journal of Computer Networks and Communications, vol. 15, no. 1, pp. 17–33, Jan. 2023, doi: 10.5121/ijcnc.2023.15102.

[11] S. Sheikhi and P. Kostakos, “Safeguarding cyberspace: Enhancing malicious website detection with PSO[sbnd]optimized XGBoost and firefly-based feature selection,” Comput Secur, vol. 142, Jul. 2024, doi: 10.1016/j.cose.2024.103885.

[12] B. Banik and A. Sarma, “Lexical Feature Based Feature Selection and Phishing URL Classification Using Machine Learning Techniques,” in Communications in Computer and Information Science, Springer, 2020, pp. 93–105. doi: 10.1007/978-981-15-6318-8_9.

[13] R. Verma and A. Das, “What’s in a URL: Fast feature extraction and malicious URL detection,” in IWSPA 2017 - Proceedings of the 3rd ACM International Workshop on Security and Privacy Analytics, co-located with CODASPY 2017, Association for Computing Machinery, Inc, Mar. 2017, pp. 55–63. doi: 10.1145/3041008.3041016.

[14] D. Sahoo, C. Liu, and S. C. H. Hoi, “Malicious URL Detection using Machine Learning: A Survey,” vol. 2019-August, doi: https://doi.org/10.1145/nnnnnnn.nnnnnnn.

[15] S. Mohanty and A. A. Acharya, “MFBFST: Building a stable ensemble learning model using multivariate filter-based feature selection technique for detection of suspicious URL,” in Procedia Computer Science, Elsevier B.V., 2022, pp. 1668–1681. doi: 10.1016/j.procs.2023.01.145.

Pendekatan Klasifikasi Random Forest untuk Identifikasi URL Berbahaya yang Akurat

Authors

DOI:

Keywords:

Abstract

Downloads

References

Downloads

Published

Issue

Section

License

menus

template

index

Google Scholar Citation

tools

Information

visitor_counter