Document clustering is the process of organizing a particular electronic corpus of documents into subgroups of similar text features. Formerly, a number of conventional algorithms had been applied to perform document clustering. There are current endeavors to enhance clustering performance by employing evolutionary algorithms. Thus, such endeavors became an emerging topic gaining more attention in recent years. The aim of this paper is to present an up-to-date and self-contained review fully devoted to document clustering via evolutionary algorithms. It firstly provides a comprehensive inspection to the document clustering model revealing its various components with its related concepts. Then it shows and analyzes the principle research work in this topic. Finally, it compiles and classifies various objective functions, the core of the evolutionary algorithms, from the related collection of research papers. The paper ends up by addressing some important issues and challenges that can be subject of future work.
Advances in digital technology and the World Wide Web has led to the increase of digital documents that are used for various purposes such as publishing and digital library. This phenomenon raises awareness for the requirement of effective techniques that can help during the search and retrieval of text. One of the most needed tasks is clustering, which categorizes documents automatically into meaningful groups. Clustering is an important task in data mining and machine learning. The accuracy of clustering depends tightly on the selection of the text representation method. Traditional methods of text representation model documents as bags of words using term-frequency index document frequency (TFIDF). This method ignores the relationship an
... Show MoreSoil compaction is one of the most harmful elements affecting soil structure, limiting plant growth and agricultural productivity. It is crucial to assess the degree of soil penetration resistance to discover solutions to the harmful consequences of compaction. In order to obtain the appropriate value, using soil cone penetration requires time and labor-intensive measurements. Currently, satellite technologies, electronic measurement control systems, and computer software help to measure soil penetration resistance quickly and easily within the precision agriculture applications approach. The quantitative relationships between soil properties and the factors affecting their diversity contribute to digital soil mapping. Digital soil maps use
... Show MoreThe influx of data in bioinformatics is primarily in the form of DNA, RNA, and protein sequences. This condition places a significant burden on scientists and computers. Some genomics studies depend on clustering techniques to group similarly expressed genes into one cluster. Clustering is a type of unsupervised learning that can be used to divide unknown cluster data into clusters. The k-means and fuzzy c-means (FCM) algorithms are examples of algorithms that can be used for clustering. Consequently, clustering is a common approach that divides an input space into several homogeneous zones; it can be achieved using a variety of algorithms. This study used three models to cluster a brain tumor dataset. The first model uses FCM, whic
... Show MoreBy definition, the detection of protein complexes that form protein-protein interaction networks (PPINs) is an NP-hard problem. Evolutionary algorithms (EAs), as global search methods, are proven in the literature to be more successful than greedy methods in detecting protein complexes. However, the design of most of these EA-based approaches relies on the topological information of the proteins in the PPIN. Biological information, as a key resource for molecular profiles, on the other hand, acquired a little interest in the design of the components in these EA-based methods. The main aim of this paper is to redesign two operators in the EA based on the functional domain rather than the graph topological domain. The perturb
... Show MoreIn recent years, Wireless Sensor Networks (WSNs) are attracting more attention in many fields as they are extensively used in a wide range of applications, such as environment monitoring, the Internet of Things, industrial operation control, electric distribution, and the oil industry. One of the major concerns in these networks is the limited energy sources. Clustering and routing algorithms represent one of the critical issues that directly contribute to power consumption in WSNs. Therefore, optimization techniques and routing protocols for such networks have to be studied and developed. This paper focuses on the most recent studies and algorithms that handle energy-efficiency clustering and routing in WSNs. In addition, the prime
... Show More