Text Clustering consists of grouping objects of similar categories. The initial centroids influence operation of the system with the potential to become trapped in local optima. The second issue pertains to the impact of a huge number of features on the determination of optimal initial centroids. The problem of dimensionality may be reduced by feature selection. Therefore, Wind Driven Optimization (WDO) was employed as Feature Selection to reduce the unimportant words from the text. In addition, the current study has integrated a novel clustering optimization technique called the WDO (Wasp Swarm Optimization) to effectively determine the most suitable initial centroids. The result showed the new meta-heuristic which is WDO was employed as the multi-objective first time as unsupervised Feature Selection (WDOFS) and the second time as a Clustering algorithm (WDOC). For example, the WDOC outperformed Harmony Search and Particle Swarm in terms of F-measurement by 93.3%; in contrast, text clustering's performance improves 0.9% because of using suggested clustering on the proposed feature selection. With WDOFS more than 50 percent of features have been removed from the other examination of features. The best result got the multi-objectives with F-measurement 98.3%.
The influx of data in bioinformatics is primarily in the form of DNA, RNA, and protein sequences. This condition places a significant burden on scientists and computers. Some genomics studies depend on clustering techniques to group similarly expressed genes into one cluster. Clustering is a type of unsupervised learning that can be used to divide unknown cluster data into clusters. The k-means and fuzzy c-means (FCM) algorithms are examples of algorithms that can be used for clustering. Consequently, clustering is a common approach that divides an input space into several homogeneous zones; it can be achieved using a variety of algorithms. This study used three models to cluster a brain tumor dataset. The first model uses FCM, whic
... Show MoreA substantial matter to confidential messages' interchange through the internet is transmission of information safely. For example, digital products' consumers and producers are keen for knowing those products are genuine and must be distinguished from worthless products. Encryption's science can be defined as the technique to embed the data in an images file, audio or videos in a style which should be met the safety requirements. Steganography is a portion of data concealment science that aiming to be reached a coveted security scale in the interchange of private not clear commercial and military data. This research offers a novel technique for steganography based on hiding data inside the clusters that resulted from fuzzy clustering. T
... Show MoreIn data mining and machine learning methods, it is traditionally assumed that training data, test data, and the data that will be processed in the future, should have the same feature space distribution. This is a condition that will not happen in the real world. In order to overcome this challenge, domain adaptation-based methods are used. One of the existing challenges in domain adaptation-based methods is to select the most efficient features so that they can also show the most efficiency in the destination database. In this paper, a new feature selection method based on deep reinforcement learning is proposed. In the proposed method, in order to select the best and most appropriate features, the essential policies
... Show MoreHeart disease identification is one of the most challenging task that requires highly experienced cardiologists. However, in developing nations such as Ethiopia, there are a few cardiologists and heart disease detection is more challenging. As an alternative solution to cardiologist, this study proposed a more effective model for heart disease detection by employing random forest and sequential feature selection (SFS). SFS is an effective approach to improve the performance of random forest model on heart disease detection. SFS removes unrelated features in heart disease dataset that tends to mislead random forest model on heart disease detection. Thus, removing inappropriate and duplicate features from the training set with sequential f
... Show MoreEstablishing complete and reliable coverage for a long time-span is a crucial issue in densely surveillance wireless sensor networks (WSNs). Many scheduling algorithms have been proposed to model the problem as a maximum disjoint set covers (DSC) problem. The goal of DSC based algorithms is to schedule sensors into several disjoint subsets. One subset is assigned to be active, whereas, all remaining subsets are set to sleep. An extension to the maximum disjoint set covers problem has also been addressed in literature to allow for more advance sensors to adjust their sensing range. The problem, then, is extended to finding maximum number of overlapped set covers. Unlike all related works which concern with the disc sensing model, the cont
... Show MoreIn drilling processes, the rheological properties pointed to the nature of the run-off and the composition of the drilling mud. Drilling mud performance can be assessed for solving the problems of the hole cleaning, fluid management, and hydraulics controls. The rheology factors are typically termed through the following parameters: Yield Point (Yp) and Plastic Viscosity (μp). The relation of (YP/ μp) is used for measuring of levelling for flow. High YP/ μp percentages are responsible for well cuttings transportation through laminar flow. The adequate values of (YP/ μp) are between 0 to 1 for the rheological models which used in drilling. This is what appeared in most of the models that were used in this study. The pressure loss
... Show MoreHeart disease is a significant and impactful health condition that ranks as the leading cause of death in many countries. In order to aid physicians in diagnosing cardiovascular diseases, clinical datasets are available for reference. However, with the rise of big data and medical datasets, it has become increasingly challenging for medical practitioners to accurately predict heart disease due to the abundance of unrelated and redundant features that hinder computational complexity and accuracy. As such, this study aims to identify the most discriminative features within high-dimensional datasets while minimizing complexity and improving accuracy through an Extra Tree feature selection based technique. The work study assesses the efficac
... Show More