Feature selection algorithms play a big role in machine learning applications. There are several feature selection strategies based on metaheuristic algorithms. In this paper a feature selection strategy based on Modified Artificial Immune System (MAIS) has been proposed. The proposed algorithm exploits the advantages of Artificial Immune System AIS to increase the performance and randomization of features. The experimental results based on NSL-KDD dataset, have showed increasing in performance of accuracy compared with other feature selection algorithms (best first search, correlation and information gain).
In data mining and machine learning methods, it is traditionally assumed that training data, test data, and the data that will be processed in the future, should have the same feature space distribution. This is a condition that will not happen in the real world. In order to overcome this challenge, domain adaptation-based methods are used. One of the existing challenges in domain adaptation-based methods is to select the most efficient features so that they can also show the most efficiency in the destination database. In this paper, a new feature selection method based on deep reinforcement learning is proposed. In the proposed method, in order to select the best and most appropriate features, the essential policies
... Show MoreHeart disease identification is one of the most challenging task that requires highly experienced cardiologists. However, in developing nations such as Ethiopia, there are a few cardiologists and heart disease detection is more challenging. As an alternative solution to cardiologist, this study proposed a more effective model for heart disease detection by employing random forest and sequential feature selection (SFS). SFS is an effective approach to improve the performance of random forest model on heart disease detection. SFS removes unrelated features in heart disease dataset that tends to mislead random forest model on heart disease detection. Thus, removing inappropriate and duplicate features from the training set with sequential f
... Show MoreHeart disease is a significant and impactful health condition that ranks as the leading cause of death in many countries. In order to aid physicians in diagnosing cardiovascular diseases, clinical datasets are available for reference. However, with the rise of big data and medical datasets, it has become increasingly challenging for medical practitioners to accurately predict heart disease due to the abundance of unrelated and redundant features that hinder computational complexity and accuracy. As such, this study aims to identify the most discriminative features within high-dimensional datasets while minimizing complexity and improving accuracy through an Extra Tree feature selection based technique. The work study assesses the efficac
... Show MoreWith the proliferation of both Internet access and data traffic, recent breaches have brought into sharp focus the need for Network Intrusion Detection Systems (NIDS) to protect networks from more complex cyberattacks. To differentiate between normal network processes and possible attacks, Intrusion Detection Systems (IDS) often employ pattern recognition and data mining techniques. Network and host system intrusions, assaults, and policy violations can be automatically detected and classified by an Intrusion Detection System (IDS). Using Python Scikit-Learn the results of this study show that Machine Learning (ML) techniques like Decision Tree (DT), Naïve Bayes (NB), and K-Nearest Neighbor (KNN) can enhance the effectiveness of an Intrusi
... Show MoreThe Internet of Things (IoT) is a network of devices used for interconnection and data transfer. There is a dramatic increase in IoT attacks due to the lack of security mechanisms. The security mechanisms can be enhanced through the analysis and classification of these attacks. The multi-class classification of IoT botnet attacks (IBA) applied here uses a high-dimensional data set. The high-dimensional data set is a challenge in the classification process due to the requirements of a high number of computational resources. Dimensionality reduction (DR) discards irrelevant information while retaining the imperative bits from this high-dimensional data set. The DR technique proposed here is a classifier-based fe
... Show MoreIntrusion detection systems detect attacks inside computers and networks, where the detection of the attacks must be in fast time and high rate. Various methods proposed achieved high detection rate, this was done either by improving the algorithm or hybridizing with another algorithm. However, they are suffering from the time, especially after the improvement of the algorithm and dealing with large traffic data. On the other hand, past researches have been successfully applied to the DNA sequences detection approaches for intrusion detection system; the achieved detection rate results were very low, on other hand, the processing time was fast. Also, feature selection used to reduce the computation and complexity lead to speed up the system
... Show MoreText documents are unstructured and high dimensional. Effective feature selection is required to select the most important and significant feature from the sparse feature space. Thus, this paper proposed an embedded feature selection technique based on Term Frequency-Inverse Document Frequency (TF-IDF) and Support Vector Machine-Recursive Feature Elimination (SVM-RFE) for unstructured and high dimensional text classificationhis technique has the ability to measure the feature’s importance in a high-dimensional text document. In addition, it aims to increase the efficiency of the feature selection. Hence, obtaining a promising text classification accuracy. TF-IDF act as a filter approach which measures features importance of the te
... Show MoreThis paper presents a hybrid approach called Modified Full Bayesian Classifier (M-FBC) and Artificial Bee Colony (MFBC-ABC) for using it to medical diagnosis support system. The datasets are taken from Iraqi hospitals, these are for the heart diseases and the nervous system diseases. The M-FBC is depended on common structure known as naïve Bayes. The structure for network is represented by D-separated for structure's variables. Each variable has Condition Probability Tables (CPTs) and each table for disease has Probability. The ABC is easy technique for implementation, has fewer control parameters and it could be easier than other swarm optimization algorithms, so that hybrid with other algorithms to reach the optimal structure. In the
... Show More