Preferred Language
Articles
/
ijs-1888
The Evaluation of Accuracy Performance in an Enhanced Embedded Feature Selection for Unstructured Text Classification
...Show More Authors

Text documents are unstructured and high dimensional. Effective feature selection is required to select the most important and significant feature from the sparse feature space. Thus, this paper proposed an embedded feature selection technique based on Term Frequency-Inverse Document Frequency (TF-IDF) and Support Vector Machine-Recursive Feature Elimination (SVM-RFE) for unstructured and high dimensional text classificationhis technique has the ability to measure the feature’s importance in a high-dimensional text document. In addition, it aims to increase the efficiency of the feature selection. Hence, obtaining a promising text classification accuracy. TF-IDF act as a filter approach which measures features importance of the text documents at the first stage. SVM-RFE utilized a backward feature elimination scheme to recursively remove insignificant features from the filtered feature subsets at the second stage. This research executes sets of experiments using a text document retrieved from a benchmark repository comprising a collection of Twitter posts. Pre-processing processes are applied to extract relevant features. After that, the pre-processed features are divided into training and testing datasets. Next, feature selection is implemented on the training dataset by calculating the TF-IDF score for each feature. SVM-RFE is applied for feature ranking as the next feature selection step. Only top-rank features will be selected for text classification using the SVM classifier. Based on the experiments, it shows that the proposed technique able to achieve 98% accuracy that outperformed other existing techniques. In conclusion, the proposed technique able to select the significant features in the unstructured and high dimensional text document.

Scopus Crossref
View Publication Preview PDF
Quick Preview PDF
Publication Date
Sat Jul 06 2024
Journal Name
Multimedia Tools And Applications
Text classification based on optimization feature selection methods: a review and future directions
...Show More Authors

A substantial portion of today’s multimedia data exists in the form of unstructured text. However, the unstructured nature of text poses a significant task in meeting users’ information requirements. Text classification (TC) has been extensively employed in text mining to facilitate multimedia data processing. However, accurately categorizing texts becomes challenging due to the increasing presence of non-informative features within the corpus. Several reviews on TC, encompassing various feature selection (FS) approaches to eliminate non-informative features, have been previously published. However, these reviews do not adequately cover the recently explored approaches to TC problem-solving utilizing FS, such as optimization techniques.

... Show More
View Publication Preview PDF
Scopus Crossref
Publication Date
Sat Jan 01 2022
Journal Name
Ieee Access
Wrapper and Hybrid Feature Selection Methods Using Metaheuristic Algorithms for English Text Classification: A Systematic Review
...Show More Authors

Feature selection (FS) constitutes a series of processes used to decide which relevant features/attributes to include and which irrelevant features to exclude for predictive modeling. It is a crucial task that aids machine learning classifiers in reducing error rates, computation time, overfitting, and improving classification accuracy. It has demonstrated its efficacy in myriads of domains, ranging from its use for text classification (TC), text mining, and image recognition. While there are many traditional FS methods, recent research efforts have been devoted to applying metaheuristic algorithms as FS techniques for the TC task. However, there are few literature reviews concerning TC. Therefore, a comprehensive overview was systematicall

... Show More
View Publication Preview PDF
Scopus (27)
Crossref (21)
Scopus Clarivate Crossref
Publication Date
Thu Nov 30 2023
Journal Name
Iraqi Journal Of Science
Effect of Genetic Algorithm as a Feature Selection for Image Classification
...Show More Authors

     Analysis of image content is important in the classification of images, identification, retrieval, and recognition processes. The medical image datasets for content-based medical image retrieval (  are large datasets that are limited by high computational costs and poor performance. The aim of the proposed method is to enhance this image retrieval and classification by using a genetic algorithm (GA) to choose the reduced features and dimensionality. This process was created in three stages. In the first stage, two algorithms are applied to extract the important features; the first algorithm is the Contrast Enhancement method and the second is a Discrete Cosine Transform algorithm. In the next stage, we used datasets of the medi

... Show More
View Publication Preview PDF
Scopus (3)
Crossref (1)
Scopus Crossref
Publication Date
Wed Sep 23 2020
Journal Name
Artificial Intelligence Research
Hybrid approaches to feature subset selection for data classification in high-dimensional feature space
...Show More Authors

This paper proposes two hybrid feature subset selection approaches based on the combination (union or intersection) of both supervised and unsupervised filter approaches before using a wrapper, aiming to obtain low-dimensional features with high accuracy and interpretability and low time consumption. Experiments with the proposed hybrid approaches have been conducted on seven high-dimensional feature datasets. The classifiers adopted are support vector machine (SVM), linear discriminant analysis (LDA), and K-nearest neighbour (KNN). Experimental results have demonstrated the advantages and usefulness of the proposed methods in feature subset selection in high-dimensional space in terms of the number of selected features and time spe

... Show More
View Publication
Crossref
Publication Date
Tue Dec 24 2024
Journal Name
International Journal Of Data And Network Science
Multi-objective of wind-driven optimization as feature selection and clustering to enhance text clustering
...Show More Authors

Text Clustering consists of grouping objects of similar categories. The initial centroids influence operation of the system with the potential to become trapped in local optima. The second issue pertains to the impact of a huge number of features on the determination of optimal initial centroids. The problem of dimensionality may be reduced by feature selection. Therefore, Wind Driven Optimization (WDO) was employed as Feature Selection to reduce the unimportant words from the text. In addition, the current study has integrated a novel clustering optimization technique called the WDO (Wasp Swarm Optimization) to effectively determine the most suitable initial centroids. The result showed the new meta-heuristic which is WDO was employed as t

... Show More
View Publication Preview PDF
Crossref (1)
Scopus Crossref
Publication Date
Mon Oct 30 2023
Journal Name
Iraqi Journal Of Science
Review on Hybrid Swarm Algorithms for Feature Selection
...Show More Authors

    Feature selection represents one of the critical processes in machine learning (ML). The fundamental aim of the problem of feature selection is to maintain performance accuracy while reducing the dimension of feature selection. Different approaches were created for classifying the datasets. In a range of optimization problems, swarming techniques produced better outcomes. At the same time, hybrid algorithms have gotten a lot of attention recently when it comes to solving optimization problems. As a result, this study provides a thorough assessment of the literature on feature selection problems using hybrid swarm algorithms that have been developed over time (2018-2021). Lastly, when compared with current feature selection procedu

... Show More
View Publication Preview PDF
Scopus Crossref
Publication Date
Thu Aug 30 2018
Journal Name
Iraqi Journal Of Science
Image Feature Extraction and Selection
...Show More Authors

Features are the description of the image contents which could be corner, blob or edge. Scale-Invariant Feature Transform (SIFT) extraction and description patent algorithm used widely in computer vision, it is fragmented to four main stages. This paper introduces image feature extraction using SIFT and chooses the most descriptive features among them by blurring image using Gaussian function and implementing Otsu segmentation algorithm on image, then applying Scale-Invariant Feature Transform feature extraction algorithm on segmented portions. On the other hand the SIFT feature extraction algorithm preceded by gray image normalization and binary thresholding as another preprocessing step. SIFT is a strong algorithm and gives more accura

... Show More
View Publication Preview PDF
Publication Date
Sun Apr 29 2018
Journal Name
Iraqi Journal Of Science
Modified Artificial immune system as Feature Selection
...Show More Authors

Feature selection algorithms play a big role in machine learning applications. There are several feature selection strategies based on metaheuristic algorithms. In this paper a feature selection strategy based on Modified Artificial Immune System (MAIS) has been proposed. The proposed algorithm exploits the advantages of Artificial Immune System AIS to increase the performance and randomization of features. The experimental results based on NSL-KDD dataset, have showed increasing in performance of accuracy compared with other feature selection algorithms (best first search, correlation and information gain).

View Publication Preview PDF
Publication Date
Fri Apr 30 2021
Journal Name
Iraqi Journal Of Science
Enhanced Supervised Principal Component Analysis for Cancer Classification
...Show More Authors

In this paper, a new hybridization of supervised principal component analysis (SPCA) and stochastic gradient descent techniques is proposed, and called as SGD-SPCA, for real large datasets that have a small number of samples in high dimensional space. SGD-SPCA is proposed to become an important tool that can be used to diagnose and treat cancer accurately. When we have large datasets that require many parameters, SGD-SPCA is an excellent method, and it can easily update the parameters when a new observation shows up. Two cancer datasets are used, the first is for Leukemia and the second is for small round blue cell tumors. Also, simulation datasets are used to compare principal component analysis (PCA), SPCA, and SGD-SPCA. The results sh

... Show More
View Publication Preview PDF
Scopus (6)
Crossref (5)
Scopus Crossref
Publication Date
Wed Jan 01 2020
Journal Name
Communications In Computer And Information Science
Performance Evaluation for Four Supervised Classifiers in Internet Traffic Classification
...Show More Authors

View Publication
Scopus (2)
Scopus Clarivate Crossref