Text Clustering consists of grouping objects of similar categories. The initial centroids influence operation of the system with the potential to become trapped in local optima. The second issue pertains to the impact of a huge number of features on the determination of optimal initial centroids. The problem of dimensionality may be reduced by feature selection. Therefore, Wind Driven Optimization (WDO) was employed as Feature Selection to reduce the unimportant words from the text. In addition, the current study has integrated a novel clustering optimization technique called the WDO (Wasp Swarm Optimization) to effectively determine the most suitable initial centroids. The result showed the new meta-heuristic which is WDO was employed as the multi-objective first time as unsupervised Feature Selection (WDOFS) and the second time as a Clustering algorithm (WDOC). For example, the WDOC outperformed Harmony Search and Particle Swarm in terms of F-measurement by 93.3%; in contrast, text clustering's performance improves 0.9% because of using suggested clustering on the proposed feature selection. With WDOFS more than 50 percent of features have been removed from the other examination of features. The best result got the multi-objectives with F-measurement 98.3%.
Multi-document summarization is an optimization problem demanding optimization of more than one objective function simultaneously. The proposed work regards balancing of the two significant objectives: content coverage and diversity when generating summaries from a collection of text documents.
Any automatic text summarization system has the challenge of producing high quality summary. Despite the existing efforts on designing and evaluating the performance of many text summarization techniques, their formulations lack the introduction of any model that can give an explicit representation of – coverage and diversity – the two contradictory semantics of any summary. In this work, the design of
... Show MoreNowad ays, with the development of internet communication that provides many facilities to the user leads in turn to growing unauthorized access. As a result, intrusion detection system (IDS) becomes necessary to provide a high level of security for huge amount of information transferred in the network to protect them from threats. One of the main challenges for IDS is the high dimensionality of the feature space and how the relevant features to distinguish the normal network traffic from attack network are selected. In this paper, multi-objective evolutionary algorithm with decomposition (MOEA/D) and MOEA/D with the injection of a proposed local search operator are adopted to solve the Multi-objective optimization (MOO) followed by Naï
... Show MoreA substantial portion of today’s multimedia data exists in the form of unstructured text. However, the unstructured nature of text poses a significant task in meeting users’ information requirements. Text classification (TC) has been extensively employed in text mining to facilitate multimedia data processing. However, accurately categorizing texts becomes challenging due to the increasing presence of non-informative features within the corpus. Several reviews on TC, encompassing various feature selection (FS) approaches to eliminate non-informative features, have been previously published. However, these reviews do not adequately cover the recently explored approaches to TC problem-solving utilizing FS, such as optimization techniques.
... Show MoreAdvances in digital technology and the World Wide Web has led to the increase of digital documents that are used for various purposes such as publishing and digital library. This phenomenon raises awareness for the requirement of effective techniques that can help during the search and retrieval of text. One of the most needed tasks is clustering, which categorizes documents automatically into meaningful groups. Clustering is an important task in data mining and machine learning. The accuracy of clustering depends tightly on the selection of the text representation method. Traditional methods of text representation model documents as bags of words using term-frequency index document frequency (TFIDF). This method ignores the relationship an
... Show MoreAutomatic document summarization technology is evolving and may offer a solution to the problem of information overload. Multi-document summarization is an optimization problem demanding optimizing more than one objective function concurrently. The proposed work considers a balance of two significant objectives: content coverage and diversity while generating a summary from a collection of text documents. Despite the large efforts introduced from several researchers for designing and evaluating performance of many text summarization techniques, their formulations lack the introduction of any model that can give an explicit representation of – coverage and diversity – the two contradictory semantics of any summary. The design of gener
... Show MoreThe conventional procedures of clustering algorithms are incapable of overcoming the difficulty of managing and analyzing the rapid growth of generated data from different sources. Using the concept of parallel clustering is one of the robust solutions to this problem. Apache Hadoop architecture is one of the assortment ecosystems that provide the capability to store and process the data in a distributed and parallel fashion. In this paper, a parallel model is designed to process the k-means clustering algorithm in the Apache Hadoop ecosystem by connecting three nodes, one is for server (name) nodes and the other two are for clients (data) nodes. The aim is to speed up the time of managing the massive sc
... Show MoreFeature selection algorithms play a big role in machine learning applications. There are several feature selection strategies based on metaheuristic algorithms. In this paper a feature selection strategy based on Modified Artificial Immune System (MAIS) has been proposed. The proposed algorithm exploits the advantages of Artificial Immune System AIS to increase the performance and randomization of features. The experimental results based on NSL-KDD dataset, have showed increasing in performance of accuracy compared with other feature selection algorithms (best first search, correlation and information gain).
Financial fraud remains an ever-increasing problem in the financial industry with numerous consequences. The detection of fraudulent online transactions via credit cards has always been done using data mining (DM) techniques. However, fraud detection on credit card transactions (CCTs), which on its own, is a DM problem, has become a serious challenge because of two major reasons, (i) the frequent changes in the pattern of normal and fraudulent online activities, and (ii) the skewed nature of credit card fraud datasets. The detection of fraudulent CCTs mainly depends on the data sampling approach. This paper proposes a combined SVM- MPSO-MMPSO technique for credit card fraud detection. The dataset of CCTs which co
... Show MoreFuzzy C-means (FCM) is a clustering method used for collecting similar data elements within the group according to specific measurements. Tabu is a heuristic algorithm. In this paper, Probabilistic Tabu Search for FCM implemented to find a global clustering based on the minimum value of the Fuzzy objective function. The experiments designed for different networks, and cluster’s number the results show the best performance based on the comparison that is done between the values of the objective function in the case of using standard FCM and Tabu-FCM, for the average of ten runs.