Sobiad Atıf Dizini

Article Detail

Data correlation matrix-based spam URL detection using machine learning algorithms

2024

Journal:

Journal of Scientific Reports-A

Author:

DOI:

Abstract:

In recent years, the widespread availability of internet access has brought both advantages and disadvantages. Users now enjoy numerous benefits, including unlimited access to vast amounts of information and seamless communication with others. However, this accessibility also exposes users to various threats, including malicious software and deceptive practices, leading to victimization of many individuals. Common issues encountered include spam emails, fake websites, and phishing attempts. Given the essential nature of internet usage in contemporary society, the development of systems to protect users from such malicious activities has become imperative. Accordingly, this study utilized eight prominent machine learning algorithms to identify spam URLs using a large dataset. Since the dataset only contained URL information and spam classification, additional feature extractions such as URL length and the number of digits were necessary. The inclusion of such features enhances decision-making processes within the framework of machine learning, resulting in more efficient detection. As the effectiveness of feature extraction significantly impacts the results of the methods, the study initially conducted feature extraction and trained models based on the weight of features. This paper proposes a data correlated matrix approach for spam URL detection using machine learning algorithms. The distinctive aspect of this study lies in the feature extraction process applied to the dataset, aimed at discerning the most impactful features, and subsequently training models while considering the weighting of these features. The entire dataset was used without any reduction in data. Experimental findings indicate that tree-based machine learning algorithms yield superior results. Among all applied methods, the Random Forest approach achieved the highest success rate, with a detection rate of 96.33% for the non-spam class. Additionally, a combined and weighted calculation method yielded an accuracy of 94.16% for both spam and non-spam data.

Keywords:

0

2024

Journal:

Journal of Scientific Reports-A

Author:

DOI:

10.59313/jsr-a.1422913

Citation Owners

Information: There is no ciation to this publication.

Journal of Scientific Reports-A

Field : Fen Bilimleri ve Matematik; Mühendislik

Journal Type : Ulusal

Metrics

Article : 764

Cite : 1.283

2023 Impact : 0.117

Details

Abstract
Listen the Abstract

Author : --

Journal :

Issue

Year

Type

Citation Count

View PDF

Relevant Articles
Article Who Cited This Publication

Relevant Articles	Author	#

Article	Author	#

User Guide

Menu

Mendeley

Endnote

Data correlation matrix-based spam URL detection using machine learning algorithms

2024

Journal:

Journal of Scientific Reports-A

Author:

DOI:

10.59313/jsr-a.1422913

Abstract:

Keywords:

0

2024

Journal:

Journal of Scientific Reports-A

Author:

DOI:

10.59313/jsr-a.1422913

Citation Owners

Information: There is no ciation to this publication.

Similar Articles

Journal of Scientific Reports-A

Metrics