Feature selection (FS) constitutes a series of processes used to decide which relevant features/attributes to include and which irrelevant features to exclude for predictive modeling. It is a crucial task that aids machine learning classifiers in reducing error rates, computation time, overfitting, and improving classification accuracy. It has demonstrated its efficacy in myriads of domains, ranging from its use for text classification (TC), text mining, and image recognition. While there are many traditional FS methods, recent research efforts have been devoted to applying metaheuristic algorithms as FS techniques for the TC task. However, there are few literature reviews concerning TC. Therefore, a comprehensive overview was systematically studied by exploring available studies of different metaheuristic algorithms used for FS to improve TC. This paper will contribute to the body of existing knowledge by answering four research questions (RQs): 1) What are the different approaches of FS that apply metaheuristic algorithms to improve TC? 2) Does applying metaheuristic algorithms for TC lead to better accuracy than the typical FS methods? 3) How effective are the modified, hybridized metaheuristic algorithms for text FS problems?, and 4) What are the gaps in the current studies and their future directions? These RQs led to a study of recent works on metaheuristic-based FS methods, their contributions, and limitations. Hence, a final list of thirty-seven (37) related articles was extracted and investigated to align with our RQs to generate new knowledge in the domain of study. Most of the conducted papers focused on addressing the TC in tandem with metaheuristic algorithms based on the wrapper and hybrid FS approaches. Future research should focus on using a hybrid-based FS approach as it intuitively handles complex optimization problems and potentiality provide new research opportunities in this rapidly developing field.
Codes of red, green, and blue data (RGB) extracted from a lab-fabricated colorimeter device were used to build a proposed classifier with the objective of classifying colors of objects based on defined categories of fundamental colors. Primary, secondary, and tertiary colors namely red, green, orange, yellow, pink, purple, blue, brown, grey, white, and black, were employed in machine learning (ML) by applying an artificial neural network (ANN) algorithm using Python. The classifier, which was based on the ANN algorithm, required a definition of the mentioned eleven colors in the form of RGB codes in order to acquire the capability of classification. The software's capacity to forecast the color of the code that belongs to an ob
... Show MoreThe application of the test case prioritization method is a key part of system testing intended to think it through and sort out the issues early in the development stage. Traditional prioritization techniques frequently fail to take into account the complexities of big-scale test suites, growing systems and time constraints, therefore cannot fully fix this problem. The proposed study here will deal with a meta-heuristic hybrid method that focuses on addressing the challenges of the modern time. The strategy utilizes genetic algorithms alongside a black hole as a means to create a smooth tradeoff between exploring numerous possibilities and exploiting the best one. The proposed hybrid algorithm of genetic black hole (HGBH) uses the
... Show MoreIn this paper, integrated quantum neural network (QNN), which is a class of feedforward
neural networks (FFNN’s), is performed through emerging quantum computing (QC) with artificial neural network(ANN) classifier. It is used in data classification technique, and here iris flower data is used as a classification signals. For this purpose independent component analysis (ICA) is used as a feature extraction technique after normalization of these signals, the architecture of (QNN’s) has inherently built in fuzzy, hidden units of these networks (QNN’s) to develop quantized representations of sample information provided by the training data set in various graded levels of certainty. Experimental results presented here show that
... Show More<span>One of the main difficulties facing the certified documents documentary archiving system is checking the stamps system, but, that stamps may be contains complex background and surrounded by unwanted data. Therefore, the main objective of this paper is to isolate background and to remove noise that may be surrounded stamp. Our proposed method comprises of four phases, firstly, we apply k-means algorithm for clustering stamp image into a number of clusters and merged them using ISODATA algorithm. Secondly, we compute mean and standard deviation for each remaining cluster to isolate background cluster from stamp cluster. Thirdly, a region growing algorithm is applied to segment the image and then choosing the connected regi
... Show MoreA steganography hides information within other information, such as file, message, picture, or video. A cryptography is the science of converting the information from a readable form to an unreadable form for unauthorized person. The main problem in the stenographic system is embedding in cover-data without providing information that would facilitate its removal. In this research, a method for embedding data into images is suggested which employs least significant bit Steganography (LSB) and ciphering (RSA algorithm) to protect the data. System security will be enhanced by this collaboration between steganography and cryptography.
Uncompressed form of the digital images are needed a very large storage capacity amount, as a consequence requires large communication bandwidth for data transmission over the network. Image compression techniques not only minimize the image storage space but also preserve the quality of image. This paper reveal image compression technique which uses distinct image coding scheme based on wavelet transform that combined effective types of compression algorithms for further compression. EZW and SPIHT algorithms are types of significant compression techniques that obtainable for lossy image compression algorithms. The EZW coding is a worthwhile and simple efficient algorithm. SPIHT is an most powerful technique that utilize for image
... Show MoreIn high-dimensional semiparametric regression, balancing accuracy and interpretability often requires combining dimension reduction with variable selection. This study intro- duces two novel methods for dimension reduction in additive partial linear models: (i) minimum average variance estimation (MAVE) combined with the adaptive least abso- lute shrinkage and selection operator (MAVE-ALASSO) and (ii) MAVE with smoothly clipped absolute deviation (MAVE-SCAD). These methods leverage the flexibility of MAVE for sufficient dimension reduction while incorporating adaptive penalties to en- sure sparse and interpretable models. The performance of both methods is evaluated through simulations using the mean squared error and variable selection cri
... Show MoreAnomaly detection is still a difficult task. To address this problem, we propose to strengthen DBSCAN algorithm for the data by converting all data to the graph concept frame (CFG). As is well known that the work DBSCAN method used to compile the data set belong to the same species in a while it will be considered in the external behavior of the cluster as a noise or anomalies. It can detect anomalies by DBSCAN algorithm can detect abnormal points that are far from certain set threshold (extremism). However, the abnormalities are not those cases, abnormal and unusual or far from a specific group, There is a type of data that is do not happen repeatedly, but are considered abnormal for the group of known. The analysis showed DBSCAN using the
... Show More