Document clustering is the process of organizing a particular electronic corpus of documents into subgroups of similar text features. Formerly, a number of conventional algorithms had been applied to perform document clustering. There are current endeavors to enhance clustering performance by employing evolutionary algorithms. Thus, such endeavors became an emerging topic gaining more attention in recent years. The aim of this paper is to present an up-to-date and self-contained review fully devoted to document clustering via evolutionary algorithms. It firstly provides a comprehensive inspection to the document clustering model revealing its various components with its related concepts. Then it shows and analyzes the principle research work in this topic. Finally, it compiles and classifies various objective functions, the core of the evolutionary algorithms, from the related collection of research papers. The paper ends up by addressing some important issues and challenges that can be subject of future work.
Text Clustering consists of grouping objects of similar categories. The initial centroids influence operation of the system with the potential to become trapped in local optima. The second issue pertains to the impact of a huge number of features on the determination of optimal initial centroids. The problem of dimensionality may be reduced by feature selection. Therefore, Wind Driven Optimization (WDO) was employed as Feature Selection to reduce the unimportant words from the text. In addition, the current study has integrated a novel clustering optimization technique called the WDO (Wasp Swarm Optimization) to effectively determine the most suitable initial centroids. The result showed the new meta-heuristic which is WDO was employed as t
... Show MoreThe conventional procedures of clustering algorithms are incapable of overcoming the difficulty of managing and analyzing the rapid growth of generated data from different sources. Using the concept of parallel clustering is one of the robust solutions to this problem. Apache Hadoop architecture is one of the assortment ecosystems that provide the capability to store and process the data in a distributed and parallel fashion. In this paper, a parallel model is designed to process the k-means clustering algorithm in the Apache Hadoop ecosystem by connecting three nodes, one is for server (name) nodes and the other two are for clients (data) nodes. The aim is to speed up the time of managing the massive sc
... Show MorePattern matching algorithms are usually used as detecting process in intrusion detection system. The efficiency of these algorithms is affected by the performance of the intrusion detection system which reflects the requirement of a new investigation in this field. Four matching algorithms and a combined of two algorithms, for intrusion detection system based on new DNA encoding, are applied for evaluation of their achievements. These algorithms are Brute-force algorithm, Boyer-Moore algorithm, Horspool algorithm, Knuth-Morris-Pratt algorithm, and the combined of Boyer-Moore algorithm and Knuth–Morris– Pratt algorithm. The performance of the proposed approach is calculated based on the executed time, where these algorithms are applied o
... Show MoreScience, technology and many other fields are use clustering algorithm widely for many applications, this paper presents a new hybrid algorithm called KDBSCAN that work on improving k-mean algorithm and solve two of its
problems, the first problem is number of cluster, when it`s must be entered by user, this problem solved by using DBSCAN algorithm for estimating number of cluster, and the second problem is randomly initial centroid problem that has been dealt with by choosing the centroid in steady method and removing randomly choosing for a better results, this work used DUC 2002 dataset to obtain the results of KDBSCAN algorithm, it`s work in many application fields such as electronics libraries,
This study focused on spectral clustering (SC) and three-constraint affinity matrix spectral clustering (3CAM-SC) to determine the number of clusters and the membership of the clusters of the COST 2100 channel model (C2CM) multipath dataset simultaneously. Various multipath clustering approaches solve only the number of clusters without taking into consideration the membership of clusters. The problem of giving only the number of clusters is that there is no assurance that the membership of the multipath clusters is accurate even though the number of clusters is correct. SC and 3CAM-SC aimed to solve this problem by determining the membership of the clusters. The cluster and the cluster count were then computed through the cluster-wise J
... Show MoreIn real world, almost all networks evolve over time. For example, in networks of friendships and acquaintances, people continually create and delete friendship relationship connections over time, thereby add and draw friends, and some people become part of new social networks or leave their networks, changing the nodes in the network. Recently, tracking communities encountering topological shifting drawn significant attentions and many successive algorithms have been proposed to model the problem. In general, evolutionary clustering can be defined as clustering data over time wherein two concepts: snapshot quality and temporal smoothness should be considered. Snapshot quality means that the clusters should be as precise as possible durin
... Show MoreIntrusion detection system is an imperative role in increasing security and decreasing the harm of the computer security system and information system when using of network. It observes different events in a network or system to decide occurring an intrusion or not and it is used to make strategic decision, security purposes and analyzing directions. This paper describes host based intrusion detection system architecture for DDoS attack, which intelligently detects the intrusion periodically and dynamically by evaluating the intruder group respective to the present node with its neighbors. We analyze a dependable dataset named CICIDS 2017 that contains benign and DDoS attack network flows, which meets certifiable criteria and is ope
... Show MoreFinding communities of connected individuals in complex networks is challenging, yet crucial for understanding different real-world societies and their interactions. Recently attention has turned to discover the dynamics of such communities. However, detecting accurate community structures that evolve over time adds additional challenges. Almost all the state-of-the-art algorithms are designed based on seemingly the same principle while treating the problem as a coupled optimization model to simultaneously identify community structures and their evolution over time. Unlike all these studies, the current work aims to individually consider this three measures, i.e. intra-community score, inter-community score, and evolution of community over
... Show More