Document clustering is the process of organizing a particular electronic corpus of documents into subgroups of similar text features. Formerly, a number of conventional algorithms had been applied to perform document clustering. There are current endeavors to enhance clustering performance by employing evolutionary algorithms. Thus, such endeavors became an emerging topic gaining more attention in recent years. The aim of this paper is to present an up-to-date and self-contained review fully devoted to document clustering via evolutionary algorithms. It firstly provides a comprehensive inspection to the document clustering model revealing its various components with its related concepts. Then it shows and analyzes the principle research work in this topic. Finally, it compiles and classifies various objective functions, the core of the evolutionary algorithms, from the related collection of research papers. The paper ends up by addressing some important issues and challenges that can be subject of future work.
In this paper, we used four classification methods to classify objects and compareamong these methods, these are K Nearest Neighbor's (KNN), Stochastic Gradient Descentlearning (SGD), Logistic Regression Algorithm(LR), and Multi-Layer Perceptron (MLP). Weused MCOCO dataset for classification and detection the objects, these dataset image wererandomly divided into training and testing datasets at a ratio of 7:3, respectively. In randomlyselect training and testing dataset images, converted the color images to the gray level, thenenhancement these gray images using the histogram equalization method, resize (20 x 20) fordataset image. Principal component analysis (PCA) was used for feature extraction, andfinally apply four classification metho
... Show MoreText categorization refers to the process of grouping text or documents into classes or categories according to their content. Text categorization process consists of three phases which are: preprocessing, feature extraction and classification. In comparison to the English language, just few studies have been done to categorize and classify the Arabic language. For a variety of applications, such as text classification and clustering, Arabic text representation is a difficult task because Arabic language is noted for its richness, diversity, and complicated morphology. This paper presents a comprehensive analysis and a comparison for researchers in the last five years based on the dataset, year, algorithms and the accu
... Show MoreClustering is an unsupervised learning method that classified data according to similarity probabilities. DBScan as a high-quality algorithm has been introduced for clustering spatial data due to its ability to remove noise (outlier) and constructing arbitrarily shapes. However, it has a problem in determining a suitable value of Eps parameter. This paper proposes a new clustering method, termed as DBScanBAT, that it optimizes DBScan algorithm by BAT algorithm. The proposed method automatically sets the DBScan parameters (Eps) and finds the optimal value for it. The results of the proposed DBScanBAT automatically generates near original number of clusters better than DBScanPSO and original DBScan. Furthermore, the proposed method
... Show MoreThe detection of diseases affecting plant is very important as it relates to the issue of food security, which is a very serious threat to human life. The system of diagnosis of diseases involves a series of steps starting with the acquisition of images through the pre-processing, segmentation and then features extraction that is our subject finally the process of classification. Features extraction is a very important process in any diagnostic system where we can compare this stage to the spine in this type of system. It is known that the reason behind this great importance of this stage is that the process of extracting features greatly affects the work and accuracy of classification. Proper selection of
... Show MoreEstablishing complete and reliable coverage for a long time-span is a crucial issue in densely surveillance wireless sensor networks (WSNs). Many scheduling algorithms have been proposed to model the problem as a maximum disjoint set covers (DSC) problem. The goal of DSC based algorithms is to schedule sensors into several disjoint subsets. One subset is assigned to be active, whereas, all remaining subsets are set to sleep. An extension to the maximum disjoint set covers problem has also been addressed in literature to allow for more advance sensors to adjust their sensing range. The problem, then, is extended to finding maximum number of overlapped set covers. Unlike all related works which concern with the disc sensing model, the cont
... Show More