Feature selection (FS) constitutes a series of processes used to decide which relevant features/attributes to include and which irrelevant features to exclude for predictive modeling. It is a crucial task that aids machine learning classifiers in reducing error rates, computation time, overfitting, and improving classification accuracy. It has demonstrated its efficacy in myriads of domains, ranging from its use for text classification (TC), text mining, and image recognition. While there are many traditional FS methods, recent research efforts have been devoted to applying metaheuristic algorithms as FS techniques for the TC task. However, there are few literature reviews concerning TC. Therefore, a comprehensive overview was systematically studied by exploring available studies of different metaheuristic algorithms used for FS to improve TC. This paper will contribute to the body of existing knowledge by answering four research questions (RQs): 1) What are the different approaches of FS that apply metaheuristic algorithms to improve TC? 2) Does applying metaheuristic algorithms for TC lead to better accuracy than the typical FS methods? 3) How effective are the modified, hybridized metaheuristic algorithms for text FS problems?, and 4) What are the gaps in the current studies and their future directions? These RQs led to a study of recent works on metaheuristic-based FS methods, their contributions, and limitations. Hence, a final list of thirty-seven (37) related articles was extracted and investigated to align with our RQs to generate new knowledge in the domain of study. Most of the conducted papers focused on addressing the TC in tandem with metaheuristic algorithms based on the wrapper and hybrid FS approaches. Future research should focus on using a hybrid-based FS approach as it intuitively handles complex optimization problems and potentiality provide new research opportunities in this rapidly developing field.
Classification of imbalanced data is an important issue. Many algorithms have been developed for classification, such as Back Propagation (BP) neural networks, decision tree, Bayesian networks etc., and have been used repeatedly in many fields. These algorithms speak of the problem of imbalanced data, where there are situations that belong to more classes than others. Imbalanced data result in poor performance and bias to a class without other classes. In this paper, we proposed three techniques based on the Over-Sampling (O.S.) technique for processing imbalanced dataset and redistributing it and converting it into balanced dataset. These techniques are (Improved Synthetic Minority Over-Sampling Technique (Improved SMOTE), Border
... Show MoreAs a result of the development and global openness and the possibility of companies providing their services outside their spatial boundaries that were determined by them, and the transformation of the world due to the development of the means of communication into a large global market that accommodates all products from different regions and of the same type and production field, competition resulted between companies, and the race to obtain the largest market share It ensures the largest amount of profits, and it is natural for the advertising promotion by companies for their product to shift from an advertisement for one product to a competitive advertisement that calls on the recipient to leave the competing product and switch to it
... Show MoreIn this paper, estimation of system reliability of the multi-components in stress-strength model R(s,k) is considered, when the stress and strength are independent random variables and follows the Exponentiated Weibull Distribution (EWD) with known first shape parameter θ and, the second shape parameter α is unknown using different estimation methods. Comparisons among the proposed estimators through Monte Carlo simulation technique were made depend on mean squared error (MSE) criteria
EDIRKTO, an Implicit Type Runge-Kutta Method of Diagonally Embedded pairs, is a novel approach presented in the paper that may be used to solve 4th-order ordinary differential equations of the form . There are two pairs of EDIRKTO, with three stages each: EDIRKTO4(3) and EDIRKTO5(4). The derivation techniques of the method indicate that the higher-order pair is more accurate, while the lower-order pair provides superior error estimates. Next, using these pairs as a basis, we developed variable step codes and applied them to a series of -order ODE problems. The numerical outcomes demonstrated how much more effective their approach is in reducing the quantity of function evaluations needed to resolve fourth-order ODE issues.
The purpose of this study is to measure the levels of quality control for some crude oil products in Iraqi refineries, and how they are close to the international standards, through the application of statistical methods in quality control of oil products in Iraqi refineries. Where the answers of the study sample were applied to a group of Iraqi refinery employees (Al-Dora refinery, Al-Nasiriyah refinery, and Al-Basra refinery) on the principles of quality management control, and according to the different personal characteristics (gender, age, academic qualification, number of years of experience, job level). In order to achieve the objectives of the study, a questionnaire that included (12) items, in order to collect preliminary inform
... Show MoreBecause of the experience of the mixture problem of high correlation and the existence of linear MultiCollinearity between the explanatory variables, because of the constraint of the unit and the interactions between them in the model, which increases the existence of links between the explanatory variables and this is illustrated by the variance inflation vector (VIF), L-Pseudo component to reduce the bond between the components of the mixture.
To estimate the parameters of the mixture model, we used in our research the use of methods that increase bias and reduce variance, such as the Ridge Regression Method and the Least Absolute Shrinkage and Selection Operator (LASSO) method a
... Show MoreThis study was carried out for direct detection of typhi and some of its multidrug resistance genes(tem,capt,gyrA&sul2)which encode for resistance to (Ampicillin, Chloramphenicol,Ciprofioxacin,Co-trimoxazole)by using Polymerase Chain Reaction technique .(71)blood samples for people suffering from typhoid fever symptoms depending on the clinical examination and (25)for control were collected. The results investigation for flic gene which encode for flagellin protein indicated that only (19)with percentage of (26,76%)gave appositive results while all control had a negative ones. Investigation for antibiotic resistance drug in samples which show positive results for flic gene showed that there is a multidrug for all antibiotics with (94.7
... Show MoreArabic text categorization for pattern recognitions is challenging. We propose for the first time a novel holistic method based on clustering for classifying Arabic writer. The categorization is accomplished stage-wise. Firstly, these document images are sectioned into lines, words, and characters. Secondly, their structural and statistical features are obtained from sectioned portions. Thirdly, F-Measure is used to evaluate the performance of the extracted features and their combination in different linkage methods for each distance measures and different numbers of groups. Finally, experiments are conducted on the standard KHATT dataset of Arabic handwritten text comprised of varying samples from 1000 writers. The results in the generatio
... Show More