Feature selection (FS) constitutes a series of processes used to decide which relevant features/attributes to include and which irrelevant features to exclude for predictive modeling. It is a crucial task that aids machine learning classifiers in reducing error rates, computation time, overfitting, and improving classification accuracy. It has demonstrated its efficacy in myriads of domains, ranging from its use for text classification (TC), text mining, and image recognition. While there are many traditional FS methods, recent research efforts have been devoted to applying metaheuristic algorithms as FS techniques for the TC task. However, there are few literature reviews concerning TC. Therefore, a comprehensive overview was systematically studied by exploring available studies of different metaheuristic algorithms used for FS to improve TC. This paper will contribute to the body of existing knowledge by answering four research questions (RQs): 1) What are the different approaches of FS that apply metaheuristic algorithms to improve TC? 2) Does applying metaheuristic algorithms for TC lead to better accuracy than the typical FS methods? 3) How effective are the modified, hybridized metaheuristic algorithms for text FS problems?, and 4) What are the gaps in the current studies and their future directions? These RQs led to a study of recent works on metaheuristic-based FS methods, their contributions, and limitations. Hence, a final list of thirty-seven (37) related articles was extracted and investigated to align with our RQs to generate new knowledge in the domain of study. Most of the conducted papers focused on addressing the TC in tandem with metaheuristic algorithms based on the wrapper and hybrid FS approaches. Future research should focus on using a hybrid-based FS approach as it intuitively handles complex optimization problems and potentiality provide new research opportunities in this rapidly developing field.
Currently, one of the topical areas of application of machine learning methods is the prediction of material characteristics. The aim of this work is to develop machine learning models for determining the rheological properties of polymers from experimental stress relaxation curves. The paper presents an overview of the main directions of metaheuristic approaches (local search, evolutionary algorithms) to solving combinatorial optimization problems. Metaheuristic algorithms for solving some important combinatorial optimization problems are described, with special emphasis on the construction of decision trees. A comparative analysis of algorithms for solving the regression problem in CatBoost Regressor has been carried out. The object of
... Show MoreInformation from 54 Magnetic Resonance Imaging (MRI) brain tumor images (27 benign and 27 malignant) were collected and subjected to multilayer perceptron artificial neural network available on the well know software of IBM SPSS 17 (Statistical Package for the Social Sciences). After many attempts, automatic architecture was decided to be adopted in this research work. Thirteen shape and statistical characteristics of images were considered. The neural network revealed an 89.1 % of correct classification for the training sample and 100 % of correct classification for the test sample. The normalized importance of the considered characteristics showed that kurtosis accounted for 100 % which means that this variable has a substantial effect
... Show MoreThe general health of palm trees, encompassing the roots, stems, and leaves, significantly impacts palm oil production, therefore, meticulous attention is needed to achieve optimal yield. One of the challenges encountered in sustaining productive crops is the prevalence of pests and diseases afflicting oil palm plants. These diseases can detrimentally influence growth and development, leading to decreased productivity. Oil palm productivity is closely related to the conditions of its leaves, which play a vital role in photosynthesis. This research employed a comprehensive dataset of 1,230 images, consisting of 410 showing leaves, another 410 depicting bagworm infestations, and an additional 410 displaying caterpillar infestations. Furthe
... Show MoreThe process of selection assure the objective of receiving for chosen ones to high levels more than other ways , and the problem of this research came by these inquires (what is the variables of limits we must considered when first preliminaries selections for mini basket ? and what is the proper test that suits this category ? and is there any standards references it can be depend on it ?) also the aims of this research that knowing the limits variables to basketball mini and their tests as a indicators for preliminaries for mini basketball category in ages (9-12) years and specifies standards (modified standards degrees in following method) to tests results to some limits variables for research sample. Also the researchers depends on (16)
... Show MoreThe influx of data in bioinformatics is primarily in the form of DNA, RNA, and protein sequences. This condition places a significant burden on scientists and computers. Some genomics studies depend on clustering techniques to group similarly expressed genes into one cluster. Clustering is a type of unsupervised learning that can be used to divide unknown cluster data into clusters. The k-means and fuzzy c-means (FCM) algorithms are examples of algorithms that can be used for clustering. Consequently, clustering is a common approach that divides an input space into several homogeneous zones; it can be achieved using a variety of algorithms. This study used three models to cluster a brain tumor dataset. The first model uses FCM, whic
... Show Morel
Many water supplies are now contaminated by anthropogenic sources such as domestic and agricultural waste, as well as manufacturing activities, the public's concern about the environmental effects of wastewater contamination has grown. Several traditional wastewater treatment methods, such as chemical coagulation, adsorption, and activated sludge, have been used to eliminate pollution; however, there are several drawbacks, most notably high operating costs, because of its low operating and repair costs, the usage of aerobic waste water treatment as a reductive medium is gaining popularity. Furthermore, it is simple to produce and has a high efficacy and potential to degrade pollu
... Show MoreNowadays, cloud computing has attracted the attention of large companies due to its high potential, flexibility, and profitability in providing multi-sources of hardware and software to serve the connected users. Given the scale of modern data centers and the dynamic nature of their resource provisioning, we need effective scheduling techniques to manage these resources while satisfying both the cloud providers and cloud users goals. Task scheduling in cloud computing is considered as NP-hard problem which cannot be easily solved by classical optimization methods. Thus, both heuristic and meta-heuristic techniques have been utilized to provide optimal or near-optimal solutions within an acceptable time frame for such problems. In th
... Show More