Text classification based on optimization feature selection methods: a review and future directions

Osamah Mohammed Alyasiri; Yu-N Cheah; Hao Zhang; Omar Mustafa Al-Janabi; Ammar Kamal Abasi

doi:10.1007/s11042-024-19769-6

Details

Publication Date

Sat Jul 06 2024

Journal Name

Multimedia Tools And Applications

DOI

10.1007/s11042-024-19769-6

Choose Citation Style

Statistics

View publication

19

Statistics

(2)

(7)

Text classification based on optimization feature selection methods: a review and future directions

Text mining Text classification Text categorization Feature selection Optimization algorithms Machine learning classifiers

Osamah Mohammed Alyasiri

Yu-N Cheah

Hao Zhang

Omar Mustafa Al-Janabi

Ammar Kamal Abasi

...Show More Authors

A substantial portion of today’s multimedia data exists in the form of unstructured text. However, the unstructured nature of text poses a significant task in meeting users’ information requirements. Text classification (TC) has been extensively employed in text mining to facilitate multimedia data processing. However, accurately categorizing texts becomes challenging due to the increasing presence of non-informative features within the corpus. Several reviews on TC, encompassing various feature selection (FS) approaches to eliminate non-informative features, have been previously published. However, these reviews do not adequately cover the recently explored approaches to TC problem-solving utilizing FS, such as optimization techniques. This study comprehensively analyzes different FS approaches based on optimization algorithms for TC. We begin by introducing the primary phases involved in implementing TC. Subsequently, we explore a wide range of FS approaches for categorizing text documents and attempt to organize the existing works into four fundamental approaches: filter, wrapper, hybrid, and embedded. Furthermore, we review four optimization algorithms utilized in solving text FS problems: swarm intelligence-based, evolutionary-based, physics-based, and human behavior-related algorithms. We discuss the advantages and disadvantages of state-of-the-art studies that employ optimization algorithms for text FS methods. Additionally, we consider several aspects of each proposed method and thoroughly discuss the challenges associated with datasets, FS approaches, optimization algorithms, machine learning classifiers, and evaluation criteria employed to assess new and existing techniques. Finally, by identifying research gaps and proposing future directions, our review provides valuable guidance to researchers in developing and situating further studies within the current body of literature.

View Publication Preview PDF

Quick Preview PDF

Publication Date

Tue Sep 01 2020

Journal Name

Al-khwarizmi Engineering Journal

Two-Stage Classification of Breast Tumor Biomarkers for Iraqi Women

Iyden Kamil

Ali Hussein

Javier

...Show More Authors

Objective: Breast cancer is regarded as a deadly disease in women causing lots of mortalities. Early diagnosis of breast cancer with appropriate tumor biomarkers may facilitate early treatment of the disease, thus reducing the mortality rate. The purpose of the current study is to improve early diagnosis of breast by proposing a two-stage classification of breast tumor biomarkers fora sample of Iraqi women.

Methods: In this study, a two-stage classification system is proposed and tested with four machine learning classifiers. In the first stage, breast features (demographic, blood and salivary-based attributes) are classified into normal or abnormal cases, while in the second stage the abnormal breast cases are

View Publication Preview PDF

Publication Date

Thu Sep 15 2022

Journal Name

Knowledge And Information Systems

Multiresolution hierarchical support vector machine for classification of large datasets

Safaa

...Show More Authors

Support vector machine (SVM) is a popular supervised learning algorithm based on margin maximization. It has a high training cost and does not scale well to a large number of data points. We propose a multiresolution algorithm MRH-SVM that trains SVM on a hierarchical data aggregation structure, which also serves as a common data input to other learning algorithms. The proposed algorithm learns SVM models using high-level data aggregates and only visits data aggregates at more detailed levels where support vectors reside. In addition to performance improvements, the algorithm has advantages such as the ability to handle data streams and datasets with imbalanced classes. Experimental results show significant performance improvements in compa

View Publication

(6)

(4)

Publication Date

Tue Jan 01 2019

Journal Name

Chemical Industry And Chemical Engineering Quarterly

Optimization of dye adsorption process for Albizia lebbeck pods as a biomass using central composite rotatable design model

Sabah

Mahmood

Mohammed

...Show More Authors

Albizia lebbeck biomass was used as an adsorbent material in the present study to remove methyl red dye from an aqueous solution. A central composite rotatable design model was used to predict the dye removal efficiency. The optimization was accomplished under a temperature and mixing control system (37?C) with different particle size of 300 and 600 ?m. Highest adsorption efficiencies were obtained at lower dye concentrations and lower weight of adsorbent. The adsorption time, more than 48 h, was found to have a negative effect on the removal efficiency due to secondary metabolites compounds. However, the adsorption time was found to have a positive effect at high dye concentrations and high adsorbent weight. The colour removal effi

View Publication

(13)

(11)

Publication Date

Tue Jan 01 2019

Journal Name

Chemical Industry And Chemical Engineering Quarterly

Optimization of dye adsorption process for Albizia lebbeck pods as a biomass using central composite rotatable design model

Sabah

Mahmood

Mohammed

...Show More Authors

Albizia lebbeck biomass was used as an adsorbent material in the present study to remove methyl red dye from an aqueous solution. A central composite rotatable design model was used to predict the dye removal efficiency. The optimization was accomplished under a temperature and mixing control system (37?C) with different particle size of 300 and 600 ?m. Highest adsorption efficiencies were obtained at lower dye concentrations and lower weight of adsorbent. The adsorption time, more than 48 h, was found to have a negative effect on the removal efficiency due to secondary metabolites compounds. However, the adsorption time was found to have a positive effect at high dye concentrations and high adsorbent weight. The colour removal effi

View Publication

(13)

(11)

Publication Date

Fri Jan 31 2025

Journal Name

Aip Conference Proceedings

Classification of oral cavity cancer using linear discriminant analysis (LDA) and principal component analysis (PCA)

Mohammed Fouad

Ahmed F.

Yasser Y.

...Show More Authors

View Publication

(1)

Publication Date

Sat Jul 01 2017

Journal Name

Journal Of Construction Engineering And Management

Identification, Quantification, and Classification of Potential Safety Risk for Sustainable Construction in the United States

Ali

...Show More Authors

View Publication

(55)

(52)

Publication Date

Sun Apr 04 2010

Journal Name

Journal Of Educational And Psychological Researches

Translation & Adaptation of(Patterns) & (Assembly) Scales of The Flanagan Aptitude Classification Tests (FACT)

Translation & Adaptation

The Flanagan Aptitude Classification Tests (FACT)

Adil A. S. Al-Salihy

Huda Jameel Abdul-Ghani

...Show More Authors

The Flanagan Aptitude Classification Tests (FACT) assesses aptitudes that are important for successful performance of particular job-related tasks. An individual's aptitude can then be matched to the job tasks. The FACT helps to determine the tasks in which a person has proficiency. Each test measures a specific skill that is important for particular occupations. The FACT battery is designed to provide measures of an individual's aptitude for each of 16 job elements.

The FACT consists of 16 tests used to measure aptitudes that are important for the successful performance of many occupational tasks. The tests provide a broad basis for predicting success in various occupational fields. All are paper and pen

View Publication Preview PDF

Publication Date

Wed Jul 01 2015

Journal Name

Arabian Journal Of Geosciences

Mishrif carbonates facies and diagenesis glossary, South Iraq microfacies investigation technique: types, classification, and related diagenetic impacts

Afrah H.

Govand H.

...Show More Authors

View Publication

(16)

(12)

Publication Date

Sat Oct 28 2023

Journal Name

Baghdad Science Journal

A Comparative Study on Association Rule Mining Algorithms on the Hospital Infection Control Dataset

Machine learning

Apriori Algorithm

apriori mlxtend

Hospital Readmission

Association Rule Mining

Performance of Algorithms.

Yahya Asmar

Seyed Bagher

Laith Rezouki

...Show More Authors

Administrative procedures in various organizations produce numerous crucial records and data. These
records and data are also used in other processes like customer relationship management and accounting
operations.It is incredibly challenging to use and extract valuable and meaningful information from these data
and records because they are frequently enormous and continuously growing in size and complexity.Data
mining is the act of sorting through large data sets to find patterns and relationships that might aid in the data
analysis process of resolving business issues. Using data mining techniques, enterprises can forecast future
trends and make better business decisions.The Apriori algorithm has bee

View Publication Preview PDF

(4)

Publication Date

Sat Dec 30 2023

Journal Name

Journal Of Economics And Administrative Sciences

Classification of Iraqi Children According to Their Nutritional Status Using Fuzzy Logic

المنطق الضبابي

التصنيف الضبابي

الحالة التغذوية

طريقة مامديني

إزالة التضبيب.

Fuzzy Logic

Fuzzy Classification

Nutritional Status

Mamdani Method

Defuzzification

Hussein

Mohammad

...Show More Authors

In this paper, we build a fuzzy classification system for classifying the nutritional status of children under 5 years old in Iraq using the Mamdani method based on input variables such as weight and height to determine the nutritional status of the child. Also, Classifying the nutritional status faces a difficult challenge in the medical field due to uncertainty and ambiguity in the variables and attributes that determine the categories of nutritional status for children, which are relied upon in medical diagnosis to determine the types of malnutrition problems and identify the categories or groups suffering from malnutrition to determine the risks faced by each group or category of children. Malnutrition in children is one of the most

View Publication Preview PDF

1 2 ... 121 122 123 124 ... 2101 2102