Text classification based on optimization feature selection methods: a review and future directions

Osamah Mohammed Alyasiri; Yu-N Cheah; Hao Zhang; Omar Mustafa Al-Janabi; Ammar Kamal Abasi

doi:10.1007/s11042-024-19769-6

Details

Publication Date

Sat Jul 06 2024

Journal Name

Multimedia Tools And Applications

DOI

10.1007/s11042-024-19769-6

Choose Citation Style

Statistics

View publication

23

Statistics

(15)

(12)

Text classification based on optimization feature selection methods: a review and future directions

Text mining Text classification Text categorization Feature selection Optimization algorithms Machine learning classifiers

Osamah Mohammed Alyasiri

Yu-N Cheah

Hao Zhang

Omar Mustafa Al-Janabi

Ammar Kamal Abasi

...Show More Authors

A substantial portion of today’s multimedia data exists in the form of unstructured text. However, the unstructured nature of text poses a significant task in meeting users’ information requirements. Text classification (TC) has been extensively employed in text mining to facilitate multimedia data processing. However, accurately categorizing texts becomes challenging due to the increasing presence of non-informative features within the corpus. Several reviews on TC, encompassing various feature selection (FS) approaches to eliminate non-informative features, have been previously published. However, these reviews do not adequately cover the recently explored approaches to TC problem-solving utilizing FS, such as optimization techniques. This study comprehensively analyzes different FS approaches based on optimization algorithms for TC. We begin by introducing the primary phases involved in implementing TC. Subsequently, we explore a wide range of FS approaches for categorizing text documents and attempt to organize the existing works into four fundamental approaches: filter, wrapper, hybrid, and embedded. Furthermore, we review four optimization algorithms utilized in solving text FS problems: swarm intelligence-based, evolutionary-based, physics-based, and human behavior-related algorithms. We discuss the advantages and disadvantages of state-of-the-art studies that employ optimization algorithms for text FS methods. Additionally, we consider several aspects of each proposed method and thoroughly discuss the challenges associated with datasets, FS approaches, optimization algorithms, machine learning classifiers, and evaluation criteria employed to assess new and existing techniques. Finally, by identifying research gaps and proposing future directions, our review provides valuable guidance to researchers in developing and situating further studies within the current body of literature.

View Publication Preview PDF

Quick Preview PDF

Publication Date

Sat Jan 01 2022

Journal Name

Ieee Access

Wrapper and Hybrid Feature Selection Methods Using Metaheuristic Algorithms for English Text Classification: A Systematic Review

Metaheuristics

Feature extraction

Text categorization

Classification algorithms

Systematics

Search problems

Business

Osamah Mohammed

Yu-N

Ammar Kamal

Omar Mustafa

...Show More Authors

Feature selection (FS) constitutes a series of processes used to decide which relevant features/attributes to include and which irrelevant features to exclude for predictive modeling. It is a crucial task that aids machine learning classifiers in reducing error rates, computation time, overfitting, and improving classification accuracy. It has demonstrated its efficacy in myriads of domains, ranging from its use for text classification (TC), text mining, and image recognition. While there are many traditional FS methods, recent research efforts have been devoted to applying metaheuristic algorithms as FS techniques for the TC task. However, there are few literature reviews concerning TC. Therefore, a comprehensive overview was systematicall

View Publication Preview PDF

(72)

(58)

Publication Date

Tue Jul 28 2026

Journal Name

International Journal Of Data And Network Science

Multi-objective of wind-driven optimization as feature selection and clustering to enhance text clustering

Text Clustering

Multi-Objectives

Wind Driven Optimization

K-Means

Unsupervised Feature Selection

Meta-heuristics optimization

MEHDI G. DUAIMI

Bsoul,Q.

AL-Gburi, A.

...Show More Authors

Text Clustering consists of grouping objects of similar categories. The initial centroids influence operation of the system with the potential to become trapped in local optima. The second issue pertains to the impact of a huge number of features on the determination of optimal initial centroids. The problem of dimensionality may be reduced by feature selection. Therefore, Wind Driven Optimization (WDO) was employed as Feature Selection to reduce the unimportant words from the text. In addition, the current study has integrated a novel clustering optimization technique called the WDO (Wasp Swarm Optimization) to effectively determine the most suitable initial centroids. The result showed the new meta-heuristic which is WDO was employed as t

View Publication Preview PDF

(1)

Publication Date

Wed Aug 20 2025

Journal Name

Artificial Intelligence Review

A comprehensive review on key technologies toward smart healthcare systems based IoT: technical aspects, challenges and future directions

Muntadher

Marwah Abdulrazzaq

A. S.

O. S.

A. H.

Sadiq H.

Laith

...Show More Authors

Abstract<p>The unexpected death of humans due to a lack of medical care is a serious problem. Additionally, the number of elderly people requiring continuous care is increasing. A global aging population poses a challenge to the sustainability of conventional healthcare systems for the future. Simultaneously, recent years have seen remarkable progress in the Internet of Things (IoT) and communication technologies, alongside the growing importance of artificial intelligence (AI) explainability and information fusion. Therefore, developing smart healthcare systems based on IoT and advanced technologies is crucial. This would open up new possibilities for efficient and intelligent medical system</p> ... Show More

View Publication

(22)

(24)

Publication Date

Thu Aug 21 2025

Journal Name

Computers

A Comprehensive Review of Sensor Technologies in IoT: Technical Aspects, Challenges, and Future Directions

Sadiq H.

Basheera M.

Almuntadher

Dina

Zainab I.

Hala J.

Tuqa H.

Muntadher

Maryam H.

Susan K.

Ghadeer H.

Zianab A.

Abir

...Show More Authors

The rapid advancements in wireless technology and digital electronics have led to the widespread adoption of compact, intelligent devices in various aspects of daily life. These advanced systems possess the capability to sense environmental changes, process data, and communicate seamlessly within interconnected networks. Typically, such devices integrate low-power radio transmitters and multiple smart sensors, hence enabling efficient functionality across wide ranges of applications. Alongside these technological developments, the concept of the IoT has emerged as a transformative paradigm, facilitating the interconnection of uniquely identifiable devices through internet-based networks. This paper aims to provide a comprehensive ex

View Publication

(49)

(43)

Publication Date

Wed Sep 23 2020

Journal Name

Artificial Intelligence Research

Hybrid approaches to feature subset selection for data classification in high-dimensional feature space

Maysa

John Q

...Show More Authors

This paper proposes two hybrid feature subset selection approaches based on the combination (union or intersection) of both supervised and unsupervised filter approaches before using a wrapper, aiming to obtain low-dimensional features with high accuracy and interpretability and low time consumption. Experiments with the proposed hybrid approaches have been conducted on seven high-dimensional feature datasets. The classifiers adopted are support vector machine (SVM), linear discriminant analysis (LDA), and K-nearest neighbour (KNN). Experimental results have demonstrated the advantages and usefulness of the proposed methods in feature subset selection in high-dimensional space in terms of the number of selected features and time spe

View Publication

Publication Date

Tue Jan 31 2023

Journal Name

International Journal Of Nonlinear Analysis And Applications

Survey on intrusion detection system based on analysis concept drift: Status and future directions

detection system concept drift intrusion detection network security

Amer

Nora

...Show More Authors

Nowadays, internet security is a critical concern; the One of the most difficult study issues in network security is "intrusion detection". Fight against external threats. Intrusion detection is a novel method of securing computers and data networks that are already in use. To boost the efficacy of intrusion detection systems, machine learning and deep learning are widely deployed. While work on intrusion detection systems is already underway, based on data mining and machine learning is effective, it requires to detect intrusions by training static batch classifiers regardless considering the time-varying features of a regular data stream. Real-world problems, on the other hand, rarely fit into models that have such constraints. Furthermor

View Publication

Publication Date

Mon Jul 01 2024

Journal Name

Journal Of Engineering

Efficient Intrusion Detection Through the Fusion of AI Algorithms and Feature Selection Methods

Intrusion Detection System (IDS)

Machine learning

Naïve bayes

K-Nearest Neighbor (KNN)

Decision tree

Feature selection

Muna Hadi

...Show More Authors

With the proliferation of both Internet access and data traffic, recent breaches have brought into sharp focus the need for Network Intrusion Detection Systems (NIDS) to protect networks from more complex cyberattacks. To differentiate between normal network processes and possible attacks, Intrusion Detection Systems (IDS) often employ pattern recognition and data mining techniques. Network and host system intrusions, assaults, and policy violations can be automatically detected and classified by an Intrusion Detection System (IDS). Using Python Scikit-Learn the results of this study show that Machine Learning (ML) techniques like Decision Tree (DT), Naïve Bayes (NB), and K-Nearest Neighbor (KNN) can enhance the effectiveness of an Intrusi

View Publication Preview PDF

(5)

(1)

Publication Date

Sun Feb 25 2024

Journal Name

Baghdad Science Journal

Exploring Important Factors in Predicting Heart Disease Based on Ensemble- Extra Feature Selection Approach

Extra Tree

Feature selection

Feature subsets

Heart Disease Dataset

Machine learning

Howida

Farkhana

Alif Ridzuan

Ahmad Najmi Amerhaider

Zuriahati Mohd

Carolyn

...Show More Authors

Heart disease is a significant and impactful health condition that ranks as the leading cause of death in many countries. In order to aid physicians in diagnosing cardiovascular diseases, clinical datasets are available for reference. However, with the rise of big data and medical datasets, it has become increasingly challenging for medical practitioners to accurately predict heart disease due to the abundance of unrelated and redundant features that hinder computational complexity and accuracy. As such, this study aims to identify the most discriminative features within high-dimensional datasets while minimizing complexity and improving accuracy through an Extra Tree feature selection based technique. The work study assesses the efficac

View Publication Preview PDF

(7)

(5)

Publication Date

Wed Apr 20 2022

Journal Name

Periodicals Of Engineering And Natural Sciences (pen)

Text image secret sharing with hiding based on color feature

Nuha

Yossra

Alyaa

Tarik Ahmed

...Show More Authors

View Publication

(1)

Publication Date

Tue Dec 05 2023

Journal Name

Baghdad Science Journal

AlexNet-Based Feature Extraction for Cassava Classification: A Machine Learning Approach

Color

Feature extraction

KNN

Naïve Bayes

Shape

SVM

Texture

Miftahus

Mohd Farhan Md

Mohd Norasri

...Show More Authors

Cassava, a significant crop in Africa, Asia, and South America, is a staple food for millions. However, classifying cassava species using conventional color, texture, and shape features is inefficient, as cassava leaves exhibit similarities across different types, including toxic and non-toxic varieties. This research aims to overcome the limitations of traditional classification methods by employing deep learning techniques with pre-trained AlexNet as the feature extractor to accurately classify four types of cassava: Gajah, Manggu, Kapok, and Beracun. The dataset was collected from local farms in Lamongan Indonesia. To collect images with agricultural research experts, the dataset consists of 1,400 images, and each type of cassava has

View Publication Preview PDF

(11)

(5)

1 2 3 4 ... 2230 2231 2232 2233