Feature selection (FS) constitutes a series of processes used to decide which relevant features/attributes to include and which irrelevant features to exclude for predictive modeling. It is a crucial task that aids machine learning classifiers in reducing error rates, computation time, overfitting, and improving classification accuracy. It has demonstrated its efficacy in myriads of domains, ranging from its use for text classification (TC), text mining, and image recognition. While there are many traditional FS methods, recent research efforts have been devoted to applying metaheuristic algorithms as FS techniques for the TC task. However, there are few literature reviews concerning TC. Therefore, a comprehensive overview was systematically studied by exploring available studies of different metaheuristic algorithms used for FS to improve TC. This paper will contribute to the body of existing knowledge by answering four research questions (RQs): 1) What are the different approaches of FS that apply metaheuristic algorithms to improve TC? 2) Does applying metaheuristic algorithms for TC lead to better accuracy than the typical FS methods? 3) How effective are the modified, hybridized metaheuristic algorithms for text FS problems?, and 4) What are the gaps in the current studies and their future directions? These RQs led to a study of recent works on metaheuristic-based FS methods, their contributions, and limitations. Hence, a final list of thirty-seven (37) related articles was extracted and investigated to align with our RQs to generate new knowledge in the domain of study. Most of the conducted papers focused on addressing the TC in tandem with metaheuristic algorithms based on the wrapper and hybrid FS approaches. Future research should focus on using a hybrid-based FS approach as it intuitively handles complex optimization problems and potentiality provide new research opportunities in this rapidly developing field.
Text categorization refers to the process of grouping text or documents into classes or categories according to their content. Text categorization process consists of three phases which are: preprocessing, feature extraction and classification. In comparison to the English language, just few studies have been done to categorize and classify the Arabic language. For a variety of applications, such as text classification and clustering, Arabic text representation is a difficult task because Arabic language is noted for its richness, diversity, and complicated morphology. This paper presents a comprehensive analysis and a comparison for researchers in the last five years based on the dataset, year, algorithms and the accuracy th
... Show MoreRegarding the security of computer systems, the intrusion detection systems (IDSs) are essential components for the detection of attacks at the early stage. They monitor and analyze network traffics, looking for abnormal behaviors or attack signatures to detect intrusions in real time. A major drawback of the IDS is their inability to provide adequate sensitivity and accuracy, coupled with their failure in processing enormous data. The issue of classification time is greatly reduced with the IDS through feature selection. In this paper, a new feature selection algorithm based on Firefly Algorithm (FA) is proposed. In addition, the naïve bayesian classifier is used to discriminate attack behaviour from normal behaviour in the network tra
... Show MoreText Clustering consists of grouping objects of similar categories. The initial centroids influence operation of the system with the potential to become trapped in local optima. The second issue pertains to the impact of a huge number of features on the determination of optimal initial centroids. The problem of dimensionality may be reduced by feature selection. Therefore, Wind Driven Optimization (WDO) was employed as Feature Selection to reduce the unimportant words from the text. In addition, the current study has integrated a novel clustering optimization technique called the WDO (Wasp Swarm Optimization) to effectively determine the most suitable initial centroids. The result showed the new meta-heuristic which is WDO was employed as t
... Show MoreTexture synthesis using genetic algorithms is one way; proposed in the previous research, to synthesis texture in a fast and easy way. In genetic texture synthesis algorithms ,the chromosome consist of random blocks selected manually by the user .However ,this method of selection is highly dependent on the experience of user .Hence, wrong selection of blocks will greatly affect the synthesized texture result. In this paper a new method is suggested for selecting the blocks automatically without the participation of user .The results show that this method of selection eliminates some blending caused from the previous manual method of selection.
Fruits sorting, recognizing, and classifying are essential post-harvest operations, as they contribute to the quality of food industry, thereby increasing the exported quantity of food. Today, an automated system for fruit classification and recognition is very important, especially when exporting to markets where quality of fruit must be high. In this study, the advantages and disadvantages of the various shape-based feature extraction algorithms and technologies that are used in sorting, classifying, and grading of fruits, as well as fruits quality estimation, are discussed in order to provide a good understanding of the use of shape-based feature extraction techniques.
Heart disease identification is one of the most challenging task that requires highly experienced cardiologists. However, in developing nations such as Ethiopia, there are a few cardiologists and heart disease detection is more challenging. As an alternative solution to cardiologist, this study proposed a more effective model for heart disease detection by employing random forest and sequential feature selection (SFS). SFS is an effective approach to improve the performance of random forest model on heart disease detection. SFS removes unrelated features in heart disease dataset that tends to mislead random forest model on heart disease detection. Thus, removing inappropriate and duplicate features from the training set with sequential f
... Show MoreFeatures are the description of the image contents which could be corner, blob or edge. Scale-Invariant Feature Transform (SIFT) extraction and description patent algorithm used widely in computer vision, it is fragmented to four main stages. This paper introduces image feature extraction using SIFT and chooses the most descriptive features among them by blurring image using Gaussian function and implementing Otsu segmentation algorithm on image, then applying Scale-Invariant Feature Transform feature extraction algorithm on segmented portions. On the other hand the SIFT feature extraction algorithm preceded by gray image normalization and binary thresholding as another preprocessing step. SIFT is a strong algorithm and gives more accura
... Show MoreThe Internet of Things (IoT) is a network of devices used for interconnection and data transfer. There is a dramatic increase in IoT attacks due to the lack of security mechanisms. The security mechanisms can be enhanced through the analysis and classification of these attacks. The multi-class classification of IoT botnet attacks (IBA) applied here uses a high-dimensional data set. The high-dimensional data set is a challenge in the classification process due to the requirements of a high number of computational resources. Dimensionality reduction (DR) discards irrelevant information while retaining the imperative bits from this high-dimensional data set. The DR technique proposed here is a classifier-based fe
... Show More