A substantial portion of today’s multimedia data exists in the form of unstructured text. However, the unstructured nature of text poses a significant task in meeting users’ information requirements. Text classification (TC) has been extensively employed in text mining to facilitate multimedia data processing. However, accurately categorizing texts becomes challenging due to the increasing presence of non-informative features within the corpus. Several reviews on TC, encompassing various feature selection (FS) approaches to eliminate non-informative features, have been previously published. However, these reviews do not adequately cover the recently explored approaches to TC problem-solving utilizing FS, such as optimization techniques. This study comprehensively analyzes different FS approaches based on optimization algorithms for TC. We begin by introducing the primary phases involved in implementing TC. Subsequently, we explore a wide range of FS approaches for categorizing text documents and attempt to organize the existing works into four fundamental approaches: filter, wrapper, hybrid, and embedded. Furthermore, we review four optimization algorithms utilized in solving text FS problems: swarm intelligence-based, evolutionary-based, physics-based, and human behavior-related algorithms. We discuss the advantages and disadvantages of state-of-the-art studies that employ optimization algorithms for text FS methods. Additionally, we consider several aspects of each proposed method and thoroughly discuss the challenges associated with datasets, FS approaches, optimization algorithms, machine learning classifiers, and evaluation criteria employed to assess new and existing techniques. Finally, by identifying research gaps and proposing future directions, our review provides valuable guidance to researchers in developing and situating further studies within the current body of literature.
Sensibly highlighting the hidden structures of many real-world networks has attracted growing interest and triggered a vast array of techniques on what is called nowadays community detection (CD) problem. Non-deterministic metaheuristics are proved to competitively transcending the limits of the counterpart deterministic heuristics in solving community detection problem. Despite the increasing interest, most of the existing metaheuristic based community detection (MCD) algorithms reflect one traditional language. Generally, they tend to explicitly project some features of real communities into different definitions of single or multi-objective optimization functions. The design of other operators, however, remains canonical lacking any inte
... Show MoreGovernmental establishments are maintaining historical data for job applicants for future analysis of predication, improvement of benefits, profits, and development of organizations and institutions. In e-government, a decision can be made about job seekers after mining in their information that will lead to a beneficial insight. This paper proposes the development and implementation of an applicant's appropriate job prediction system to suit his or her skills using web content classification algorithms (Logit Boost, j48, PART, Hoeffding Tree, Naive Bayes). Furthermore, the results of the classification algorithms are compared based on data sets called "job classification data" sets. Experimental results indicate
... Show MoreRegarding the security of computer systems, the intrusion detection systems (IDSs) are essential components for the detection of attacks at the early stage. They monitor and analyze network traffics, looking for abnormal behaviors or attack signatures to detect intrusions in real time. A major drawback of the IDS is their inability to provide adequate sensitivity and accuracy, coupled with their failure in processing enormous data. The issue of classification time is greatly reduced with the IDS through feature selection. In this paper, a new feature selection algorithm based on Firefly Algorithm (FA) is proposed. In addition, the naïve bayesian classifier is used to discriminate attack behaviour from normal behaviour in the network tra
... Show MoreDisease diagnosis with computer-aided methods has been extensively studied and applied in diagnosing and monitoring of several chronic diseases. Early detection and risk assessment of breast diseases based on clinical data is helpful for doctors to make early diagnosis and monitor the disease progression. The purpose of this study is to exploit the Convolutional Neural Network (CNN) in discriminating breast MRI scans into pathological and healthy. In this study, a fully automated and efficient deep features extraction algorithm that exploits the spatial information obtained from both T2W-TSE and STIR MRI sequences to discriminate between pathological and healthy breast MRI scans. The breast MRI scans are preprocessed prior to the feature
... Show MoreThe Internet of Things (IoT) is a network of devices used for interconnection and data transfer. There is a dramatic increase in IoT attacks due to the lack of security mechanisms. The security mechanisms can be enhanced through the analysis and classification of these attacks. The multi-class classification of IoT botnet attacks (IBA) applied here uses a high-dimensional data set. The high-dimensional data set is a challenge in the classification process due to the requirements of a high number of computational resources. Dimensionality reduction (DR) discards irrelevant information while retaining the imperative bits from this high-dimensional data set. The DR technique proposed here is a classifier-based fe
... Show More