Feature selection (FS) constitutes a series of processes used to decide which relevant features/attributes to include and which irrelevant features to exclude for predictive modeling. It is a crucial task that aids machine learning classifiers in reducing error rates, computation time, overfitting, and improving classification accuracy. It has demonstrated its efficacy in myriads of domains, ranging from its use for text classification (TC), text mining, and image recognition. While there are many traditional FS methods, recent research efforts have been devoted to applying metaheuristic algorithms as FS techniques for the TC task. However, there are few literature reviews concerning TC. Therefore, a comprehensive overview was systematically studied by exploring available studies of different metaheuristic algorithms used for FS to improve TC. This paper will contribute to the body of existing knowledge by answering four research questions (RQs): 1) What are the different approaches of FS that apply metaheuristic algorithms to improve TC? 2) Does applying metaheuristic algorithms for TC lead to better accuracy than the typical FS methods? 3) How effective are the modified, hybridized metaheuristic algorithms for text FS problems?, and 4) What are the gaps in the current studies and their future directions? These RQs led to a study of recent works on metaheuristic-based FS methods, their contributions, and limitations. Hence, a final list of thirty-seven (37) related articles was extracted and investigated to align with our RQs to generate new knowledge in the domain of study. Most of the conducted papers focused on addressing the TC in tandem with metaheuristic algorithms based on the wrapper and hybrid FS approaches. Future research should focus on using a hybrid-based FS approach as it intuitively handles complex optimization problems and potentiality provide new research opportunities in this rapidly developing field.
Twitter data analysis is an emerging field of research that utilizes data collected from Twitter to address many issues such as disaster response, sentiment analysis, and demographic studies. The success of data analysis relies on collecting accurate and representative data of the studied group or phenomena to get the best results. Various twitter analysis applications rely on collecting the locations of the users sending the tweets, but this information is not always available. There are several attempts at estimating location based aspects of a tweet. However, there is a lack of attempts on investigating the data collection methods that are focused on location. In this paper, we investigate the two methods for obtaining location-based dat
... Show Morehis study aimed to investigate the usability of Recycled Concrete Aggregate (RCA) in warm mix asphalt (WMA) as the implementation of sustainable construction technology. Five replacement rates (0%, 25%, 50%, 75%, and 100%) were tested for the coarse fraction of virgin aggregate (VA) with 3 types of RCA: untreated RCA, HL-treated RCA, and HCL-treated RCA. Scanning electron microscopy (SEM) analyses were performed to investigate the surface morphology for both treated and untreated RCA. The optimum asphalt cement content for every substitution rate was determined using Marshall mix design method. Thereafter, asphalt concrete specimens were prepared using the optimum asphalt cement content, followed by the evaluation of their performance prope
... Show MoreIntended for getting good estimates with more accurate results, we must choose the appropriate method of estimation. Most of the equations in classical methods are linear equations and finding analytical solutions to such equations is very difficult. Some estimators are inefficient because of problems in solving these equations. In this paper, we will estimate the survival function of censored data by using one of the most important artificial intelligence algorithms that is called the genetic algorithm to get optimal estimates for parameters Weibull distribution with two parameters. This leads to optimal estimates of the survival function. The genetic algorithm is employed in the method of moment, the least squares method and the weighted
... Show MoreResearch aims to know the impact beyond the defined in the collection. The research community is the second school students at Baghdad University and a research sample (63) students, the number of experimental group (27) students and a control group (30) students. The researcher was rewarded in variable lifetime for students and educational attainment and educational level of the parents and the educational level of mothers. The researcher has developed a test took the number of paragraphs (20). A test was true after it has been submitted to the Group of arbitrators. The test was consistent with test method used and the reliability coefficient (0, 88). Either the statistical methods used by the researcher are: Pearson correla
... Show MoreThe current research deals with short term forecasting of demand on Blood material, and its' problem represented by increasing of forecast' errors in The National Center for Blood Transfusion because using inappropriate method of forecasting by Centers' management, represented with Naive Model. The importance of research represented by the great affect for forecasts accuracy on operational performance for health care organizations, and necessity of providing blood material with desired quantity and in suitable time. The literatures deal with subject of short term forecasting of demand with using the time series models in order to getting of accuracy results, because depending these models on data of last demand, that is being sta
... Show MoreThe technology of reducing dimensions and choosing variables are very important topics in statistical analysis to multivariate. When two or more of the predictor variables are linked in the complete or incomplete regression relationships, a problem of multicollinearity are occurred which consist of the breach of one basic assumptions of the ordinary least squares method with incorrect estimates results.
There are several methods proposed to address this problem, including the partial least squares (PLS), used to reduce dimensional regression analysis. By using linear transformations that convert a set of variables associated with a high link to a set of new independent variables and unr
... Show MoreIt is often needed to have circuits that can display the decimal representation of a binary number and specifically in this paper on a 7-segment display. In this paper a circuit that can display the decimal equivalent of an n-bit binary number is designed and it’s behavior is described using Verilog Hardware Descriptive Language (HDL). This HDL program is then used to configure an FPGA to implement the designed circuit.
One study whose importance has significantly grown in recent years is lip-reading, particularly with the widespread of using deep learning techniques. Lip reading is essential for speech recognition in noisy environments or for those with hearing impairments. It refers to recognizing spoken sentences using visual information acquired from lip movements. Also, the lip area, especially for males, suffers from several problems, such as the mouth area containing the mustache and beard, which may cover the lip area. This paper proposes an automatic lip-reading system to recognize and classify short English sentences spoken by speakers using deep learning networks. The input video extracts frames and each frame is passed to the Viola-Jone
... Show MoreCo-composting process can be acquired by combining organic fraction of municipal solid waste (OFMSW) with sewage sludge (SS) and mature compost (MC) as enhancement and bulking agent to overcome the problems of municipal solid waste and wastewater treatment plants besides the finally produced fertilizer usage for agriculture and horticulture. The effects of different mixture ratios of (OFMSW), (SS) and (MC) on the performance of composting process were investigated in this study. Piles of about 10 kg were prepared by mixing OFMSW, SS and MC in three different ratios (w/w) [OFMSW: SS: MC= 3:1:1, 3:2:1, and 3:3:1]. Results showed that the pile [3:1:1] was most beneficial to composting. The final compost products contained a
... Show MoreThe effect of the initial pressure upon the laminar flame speed, for a methane-air mixtures, has been detected paractically, for a wide range of equivalence ratio. In this work, a measurement system is designed in order to measure the laminar flame speed using a constant volume method with a thermocouples technique. The laminar burning velocity is measured, by using the density ratio method. The comparison of the present work results and the previous ones show good agreement between them. This indicates that the measurements and the calculations employed in the present work are successful and precise