The conventional procedures of clustering algorithms are incapable of overcoming the difficulty of managing and analyzing the rapid growth of generated data from different sources. Using the concept of parallel clustering is one of the robust solutions to this problem. Apache Hadoop architecture is one of the assortment ecosystems that provide the capability to store and process the data in a distributed and parallel fashion. In this paper, a parallel model is designed to process the k-means clustering algorithm in the Apache Hadoop ecosystem by connecting three nodes, one is for server (name) nodes and the other two are for clients (data) nodes. The aim is to speed up the time of managing the massive scale of healthcare insurance dataset with the size of 11 GB and also using machine learning algorithms, which are provided by the Mahout Framework. The experimental results depict that the proposed model can efficiently process large datasets. The parallel k-means algorithm outperforms the sequential k-means algorithm based on the execution time of the algorithm, where the required time to execute a data size of 11 GB is around 1.847 hours using the parallel k-means algorithm, while it equals 68.567 hours using the sequential k-means algorithm. As a result, we deduce that when the nodes number in the parallel system increases, the computation time of the proposed algorithm decreases.
Several methods have been developed for routing problem in MANETs wireless network, because it considered very important problem in this network ,we suggested proposed method based on modified radial basis function networks RBFN and Kmean++ algorithm. The modification in RBFN for routing operation in order to find the optimal path between source and destination in MANETs clusters. Modified Radial Based Neural Network is very simple, adaptable and efficient method to increase the life time of nodes, packet delivery ratio and the throughput of the network will increase and connection become more useful because the optimal path has the best parameters from other paths including the best bitrate and best life link with minimum delays. The re
... Show MoreThis paper proposed a new method for network self-fault management (NSFM) based on two technologies: intelligent agent to automate fault management tasks, and Windows Management Instrumentations (WMI) to identify the fault faster when resources are independent (different type of devices). The proposed network self-fault management reduced the load of network traffic by reducing the request and response between the server and client, which achieves less downtime for each node in state of fault occurring in the client. The performance of the proposed system is measured by three measures: efficiency, availability, and reliability. A high efficiency average is obtained depending on the faults occurred in the system which reaches to
... Show MoreIn this paper, wireless network is planned; the network is predicated on the IEEE 802.16e standardization by WIMAX. The targets of this paper are coverage maximizing, service and low operational fees. WIMAX is planning through three approaches. In approach one; the WIMAX network coverage is major for extension of cell coverage, the best sites (with Band Width (BW) of 5MHz, 20MHZ per sector and four sectors per each cell). In approach two, Interference analysis in CNIR mode. In approach three of the planning, Quality of Services (QoS) is tested and evaluated. ATDI ICS software (Interference Cancellation System) using to perform styling. it shows results in planning area covered 90.49% of the Baghdad City and used 1000 mob
... Show MoreAutomatic document summarization technology is evolving and may offer a solution to the problem of information overload. Multi-document summarization is an optimization problem demanding optimizing more than one objective function concurrently. The proposed work considers a balance of two significant objectives: content coverage and diversity while generating a summary from a collection of text documents. Despite the large efforts introduced from several researchers for designing and evaluating performance of many text summarization techniques, their formulations lack the introduction of any model that can give an explicit representation of – coverage and diversity – the two contradictory semantics of any summary. The design of gener
... Show MoreThe influx of data in bioinformatics is primarily in the form of DNA, RNA, and protein sequences. This condition places a significant burden on scientists and computers. Some genomics studies depend on clustering techniques to group similarly expressed genes into one cluster. Clustering is a type of unsupervised learning that can be used to divide unknown cluster data into clusters. The k-means and fuzzy c-means (FCM) algorithms are examples of algorithms that can be used for clustering. Consequently, clustering is a common approach that divides an input space into several homogeneous zones; it can be achieved using a variety of algorithms. This study used three models to cluster a brain tumor dataset. The first model uses FCM, whic
... Show MoreA new modified differential evolution algorithm DE-BEA, is proposed to improve the reliability of the standard DE/current-to-rand/1/bin by implementing a new mutation scheme inspired by the bacterial evolutionary algorithm (BEA). The crossover and the selection schemes of the DE method are also modified to fit the new DE-BEA mechanism. The new scheme diversifies the population by applying to all the individuals a segment based scheme that generates multiple copies (clones) from each individual one-by-one and applies the BEA segment-wise mechanism. These new steps are embedded in the DE/current-to-rand/bin scheme. The performance of the new algorithm has been compared with several DE variants over eighteen benchmark functions including sever
... Show MoreSmishing is the delivery of phishing content to mobile users via a short message service (SMS). SMS allows cybercriminals to reach out to mobile end users in a new way, attempting to deliver phishing messages, mobile malware, and online scams that appear to be from a trusted brand. This paper proposes a new method for detecting smishing by combining two detection methods. The first method is uniform resource locators (URL) analysis, which employs a novel combination of the Google engine and VirusTotal. The second method involves examining SMS content to extract efficient features and classify messages as ham or smishing based on keywords contained within them using four well-known classifiers: support vector machine (SVM), random
... Show MoreThis study focused on spectral clustering (SC) and three-constraint affinity matrix spectral clustering (3CAM-SC) to determine the number of clusters and the membership of the clusters of the COST 2100 channel model (C2CM) multipath dataset simultaneously. Various multipath clustering approaches solve only the number of clusters without taking into consideration the membership of clusters. The problem of giving only the number of clusters is that there is no assurance that the membership of the multipath clusters is accurate even though the number of clusters is correct. SC and 3CAM-SC aimed to solve this problem by determining the membership of the clusters. The cluster and the cluster count were then computed through the cluster-wise J
... Show More