The influx of data in bioinformatics is primarily in the form of DNA, RNA, and protein sequences. This condition places a significant burden on scientists and computers. Some genomics studies depend on clustering techniques to group similarly expressed genes into one cluster. Clustering is a type of unsupervised learning that can be used to divide unknown cluster data into clusters. The k-means and fuzzy c-means (FCM) algorithms are examples of algorithms that can be used for clustering. Consequently, clustering is a common approach that divides an input space into several homogeneous zones; it can be achieved using a variety of algorithms. This study used three models to cluster a brain tumor dataset. The first model uses FCM, which is used to cluster genes. FCM allows an object to belong to two or more clusters with a membership grade between zero and one and the sum of belonging to all clusters of each gene is equal to one. This paradigm is useful when dealing with microarray data. The total time required to implement the first model is 22.2589 s. The second model combines FCM and particle swarm optimization (PSO) to obtain better results. The hybrid algorithm, i.e., FCM–PSO, uses the DB index as objective function. The experimental results show that the proposed hybrid FCM–PSO method is effective. The total time of implementation of this model is 89.6087 s. The third model combines FCM with a genetic algorithm (GA) to obtain better results. This hybrid algorithm also uses the DB index as objective function. The experimental results show that the proposed hybrid FCM–GA method is effective. Its total time of implementation is 50.8021 s. In addition, this study uses cluster validity indexes to determine the best partitioning for the underlying data. Internal validity indexes include the Jaccard, Davies Bouldin, Dunn, Xie–Beni, and silhouette. Meanwhile, external validity indexes include Minkowski, adjusted Rand, and percentage of correctly categorized pairings. Experiments conducted on brain tumor gene expression data demonstrate that the techniques used in this study outperform traditional models in terms of stability and biological significance.
conventional FCM algorithm does not fully utilize the spatial information in the image. In this research, we use a FCM algorithm that incorporates spatial information into the membership function for clustering. The spatial function is the summation of the membership functions in the neighborhood of each pixel under consideration. The advantages of the method are that it is less
sensitive to noise than other techniques, and it yields regions more homogeneous than those of other methods. This technique is a powerful method for noisy image segmentation.
The density-based spatial clustering for applications with noise (DBSCAN) is one of the most popular applications of clustering in data mining, and it is used to identify useful patterns and interesting distributions in the underlying data. Aggregation methods for classifying nonlinear aggregated data. In particular, DNA methylations, gene expression. That show the differentially skewed by distance sites and grouped nonlinearly by cancer daisies and the change Situations for gene excretion on it. Under these conditions, DBSCAN is expected to have a desirable clustering feature i that can be used to show the results of the changes. This research reviews the DBSCAN and compares its performance with other algorithms, such as the tradit
... Show More
The great scientific progress has led to widespread Information as information accumulates in large databases is important in trying to revise and compile this vast amount of data and, where its purpose to extract hidden information or classified data under their relations with each other in order to take advantage of them for technical purposes.
And work with data mining (DM) is appropriate in this area because of the importance of research in the (K-Means) algorithm for clustering data in fact applied with effect can be observed in variables by changing the sample size (n) and the number of clusters (K)
... Show MoreThis research aims to solve the problem of selection using clustering algorithm, in this research optimal portfolio is formation using the single index model, and the real data are consisting from the stocks Iraqi Stock Exchange in the period 1/1/2007 to 31/12/2019. because the data series have missing values ,we used the two-stage missing value compensation method, the knowledge gap was inability the portfolio models to reduce The estimation error , inaccuracy of the cut-off rate and the Treynor ratio combine stocks into the portfolio that caused to decline in their performance, all these problems required employing clustering technic to data mining and regrouping it within clusters with similar characteristics to outperform the portfolio
... Show MoreFuzzy Based Clustering for Grayscale Image Steganalysis
Image segmentation is a basic image processing technique that is primarily used for finding segments that form the entire image. These segments can be then utilized in discriminative feature extraction, image retrieval, and pattern recognition. Clustering and region growing techniques are the commonly used image segmentation methods. K-Means is a heavily used clustering technique due to its simplicity and low computational cost. However, K-Means results depend on the initial centres’ values which are selected randomly, which leads to inconsistency in the image segmentation results. In addition, the quality of the isolated regions depends on the homogeneity of the resulted segments. In this paper, an improved K-Means
... Show MoreIn this research two algorithms are applied, the first is Fuzzy C Means (FCM) algorithm and the second is hard K means (HKM) algorithm to know which of them is better than the others these two algorithms are applied on a set of data collected from the Ministry of Planning on the water turbidity of five areas in Baghdad to know which of these areas are less turbid in clear water to see which months during the year are less turbid in clear water in the specified area.
In this research two algorithms are applied, the first is Fuzzy C Means (FCM) algorithm and the second is hard K means (HKM) algorithm to know which of them is better than the others these two algorithms are applied on a set of data collected from the Ministry of Planning on the water turbidity of five areas in Baghdad to know which of these areas are less turbid in clear water to see which months during the year are less turbid in clear water in the specified area.
Document clustering is the process of organizing a particular electronic corpus of documents into subgroups of similar text features. Formerly, a number of conventional algorithms had been applied to perform document clustering. There are current endeavors to enhance clustering performance by employing evolutionary algorithms. Thus, such endeavors became an emerging topic gaining more attention in recent years. The aim of this paper is to present an up-to-date and self-contained review fully devoted to document clustering via evolutionary algorithms. It firstly provides a comprehensive inspection to the document clustering model revealing its various components with its related concepts. Then it shows and analyzes the principle research wor
... Show More