Text documents are unstructured and high dimensional. Effective feature selection is required to select the most important and significant feature from the sparse feature space. Thus, this paper proposed an embedded feature selection technique based on Term Frequency-Inverse Document Frequency (TF-IDF) and Support Vector Machine-Recursive Feature Elimination (SVM-RFE) for unstructured and high dimensional text classificationhis technique has the ability to measure the feature’s importance in a high-dimensional text document. In addition, it aims to increase the efficiency of the feature selection. Hence, obtaining a promising text classification accuracy. TF-IDF act as a filter approach which measures features importance of the text documents at the first stage. SVM-RFE utilized a backward feature elimination scheme to recursively remove insignificant features from the filtered feature subsets at the second stage. This research executes sets of experiments using a text document retrieved from a benchmark repository comprising a collection of Twitter posts. Pre-processing processes are applied to extract relevant features. After that, the pre-processed features are divided into training and testing datasets. Next, feature selection is implemented on the training dataset by calculating the TF-IDF score for each feature. SVM-RFE is applied for feature ranking as the next feature selection step. Only top-rank features will be selected for text classification using the SVM classifier. Based on the experiments, it shows that the proposed technique able to achieve 98% accuracy that outperformed other existing techniques. In conclusion, the proposed technique able to select the significant features in the unstructured and high dimensional text document.
Nowad ays, with the development of internet communication that provides many facilities to the user leads in turn to growing unauthorized access. As a result, intrusion detection system (IDS) becomes necessary to provide a high level of security for huge amount of information transferred in the network to protect them from threats. One of the main challenges for IDS is the high dimensionality of the feature space and how the relevant features to distinguish the normal network traffic from attack network are selected. In this paper, multi-objective evolutionary algorithm with decomposition (MOEA/D) and MOEA/D with the injection of a proposed local search operator are adopted to solve the Multi-objective optimization (MOO) followed by Naï
... Show MoreCOVID 19 has spread rapidly around the world due to the lack of a suitable vaccine; therefore the early prediction of those infected with this virus is extremely important attempting to control it by quarantining the infected people and giving them possible medical attention to limit its spread. This work suggests a model for predicting the COVID 19 virus using feature selection techniques. The proposed model consists of three stages which include the preprocessing stage, the features selection stage, and the classification stage. This work uses a data set consists of 8571 records, with forty features for patients from different countries. Two feature selection techniques are used in
The current issues in spam email detection systems are directly related to spam email classification's low accuracy and feature selection's high dimensionality. However, in machine learning (ML), feature selection (FS) as a global optimization strategy reduces data redundancy and produces a collection of precise and acceptable outcomes. A black hole algorithm-based FS algorithm is suggested in this paper for reducing the dimensionality of features and improving the accuracy of spam email classification. Each star's features are represented in binary form, with the features being transformed to binary using a sigmoid function. The proposed Binary Black Hole Algorithm (BBH) searches the feature space for the best feature subsets,
... Show MoreHeart disease identification is one of the most challenging task that requires highly experienced cardiologists. However, in developing nations such as Ethiopia, there are a few cardiologists and heart disease detection is more challenging. As an alternative solution to cardiologist, this study proposed a more effective model for heart disease detection by employing random forest and sequential feature selection (SFS). SFS is an effective approach to improve the performance of random forest model on heart disease detection. SFS removes unrelated features in heart disease dataset that tends to mislead random forest model on heart disease detection. Thus, removing inappropriate and duplicate features from the training set with sequential f
... Show MoreThe Internet of Things (IoT) is a network of devices used for interconnection and data transfer. There is a dramatic increase in IoT attacks due to the lack of security mechanisms. The security mechanisms can be enhanced through the analysis and classification of these attacks. The multi-class classification of IoT botnet attacks (IBA) applied here uses a high-dimensional data set. The high-dimensional data set is a challenge in the classification process due to the requirements of a high number of computational resources. Dimensionality reduction (DR) discards irrelevant information while retaining the imperative bits from this high-dimensional data set. The DR technique proposed here is a classifier-based fe
... Show MoreFeature selection, a method of dimensionality reduction, is nothing but collecting a range of appropriate feature subsets from the total number of features. In this paper, a point by point explanation review about the feature selection in this segment preferred affairs and its appraisal techniques are discussed. I will initiate my conversation with a straightforward approach so that we consider taking care of features and preferred issues depending upon meta-heuristic strategy. These techniques help in obtaining the best highlight subsets. Thereafter, this paper discusses some system models that drive naturally from the environment are discussed and calculations are performed so that we can take care of the prefe
... Show MoreCloud Computing is a mass platform to serve high volume data from multi-devices and numerous technologies. Cloud tenants have a high demand to access their data faster without any disruptions. Therefore, cloud providers are struggling to ensure every individual data is secured and always accessible. Hence, an appropriate replication strategy capable of selecting essential data is required in cloud replication environments as the solution. This paper proposed a Crucial File Selection Strategy (CFSS) to address poor response time in a cloud replication environment. A cloud simulator called CloudSim is used to conduct the necessary experiments, and results are presented to evidence the enhancement on replication performance. The obtained an
... Show MoreIn data mining and machine learning methods, it is traditionally assumed that training data, test data, and the data that will be processed in the future, should have the same feature space distribution. This is a condition that will not happen in the real world. In order to overcome this challenge, domain adaptation-based methods are used. One of the existing challenges in domain adaptation-based methods is to select the most efficient features so that they can also show the most efficiency in the destination database. In this paper, a new feature selection method based on deep reinforcement learning is proposed. In the proposed method, in order to select the best and most appropriate features, the essential policies
... Show MoreThe study was conducted at the fields of the Department of Horticulture and Landscape Gardening,College of Agriculture, University of Baghdad during the growing seasons of 2013- 2014 .forPerformance of Evaluation Vegetative growth and yield traits and estimate some important geneticparameter on seven selected breed of tomato which (S1-S7 ) Pure line. the results found significantdifferences between breeds in all study trails except clusters flowering number .S1 significantly plantlength which reached 227.3 .Also S1,S2 and S4 were significantly increased the number fruit for plant,Fruit weight Increased in S3 ,S6 and plant yield. Increased in S1, S4 ,S5. Genetic variation valueswere low in Floral clusters , TSS and fruit firmest and medium i
... Show More