The conventional procedures of clustering algorithms are incapable of overcoming the difficulty of managing and analyzing the rapid growth of generated data from different sources. Using the concept of parallel clustering is one of the robust solutions to this problem. Apache Hadoop architecture is one of the assortment ecosystems that provide the capability to store and process the data in a distributed and parallel fashion. In this paper, a parallel model is designed to process the k-means clustering algorithm in the Apache Hadoop ecosystem by connecting three nodes, one is for server (name) nodes and the other two are for clients (data) nodes. The aim is to speed up the time of managing the massive scale of healthcare insurance dataset with the size of 11 GB and also using machine learning algorithms, which are provided by the Mahout Framework. The experimental results depict that the proposed model can efficiently process large datasets. The parallel k-means algorithm outperforms the sequential k-means algorithm based on the execution time of the algorithm, where the required time to execute a data size of 11 GB is around 1.847 hours using the parallel k-means algorithm, while it equals 68.567 hours using the sequential k-means algorithm. As a result, we deduce that when the nodes number in the parallel system increases, the computation time of the proposed algorithm decreases.
Wireless sensor applications are susceptible to energy constraints. Most of the energy is consumed in communication between wireless nodes. Clustering and data aggregation are the two widely used strategies for reducing energy usage and increasing the lifetime of wireless sensor networks. In target tracking applications, large amount of redundant data is produced regularly. Hence, deployment of effective data aggregation schemes is vital to eliminate data redundancy. This work aims to conduct a comparative study of various research approaches that employ clustering techniques for efficiently aggregating data in target tracking applications as selection of an appropriate clustering algorithm may reflect positive results in the data aggregati
... Show MoreEverybody is connected with social media like (Facebook, Twitter, LinkedIn, Instagram…etc.) that generate a large quantity of data and which traditional applications are inadequate to process. Social media are regarded as an important platform for sharing information, opinion, and knowledge of many subscribers. These basic media attribute Big data also to many issues, such as data collection, storage, moving, updating, reviewing, posting, scanning, visualization, Data protection, etc. To deal with all these problems, this is a need for an adequate system that not just prepares the details, but also provides meaningful analysis to take advantage of the difficult situations, relevant to business, proper decision, Health, social media, sc
... Show MoreText Clustering consists of grouping objects of similar categories. The initial centroids influence operation of the system with the potential to become trapped in local optima. The second issue pertains to the impact of a huge number of features on the determination of optimal initial centroids. The problem of dimensionality may be reduced by feature selection. Therefore, Wind Driven Optimization (WDO) was employed as Feature Selection to reduce the unimportant words from the text. In addition, the current study has integrated a novel clustering optimization technique called the WDO (Wasp Swarm Optimization) to effectively determine the most suitable initial centroids. The result showed the new meta-heuristic which is WDO was employed as t
... Show MoreBackground: image processing of medical images is major method to increase reliability of cancer diagnosis.
Methods: The proposed system proceeded into two stages: First, enhancement stage which was performed using of median filter to reduce the noise and artifacts that present in a CT image of a human lung with a cancer, Second: implementation of k-means clustering algorithm.
Results: the result image of k-means algorithm compared with the image resulted from implementation of fuzzy c-means (FCM) algorithm.
Conclusion: We found that the time required for k-means algorithm implementation is less than that of FCM algorithm.MATLAB package (version 7.3) was used in writing the programming code of our w
major goal of the next-generation wireless communication systems is the development of a reliable high-speed wireless communication system that supports high user mobility. They must focus on increasing the link throughput and the network capacity. In this paper a novel, spectral efficient system is proposed for generating and transmitting twodimensional (2-D) orthogonal frequency division multiplexing (OFDM) symbols through 2- D inter-symbol interference (ISI) channel. Instead of conventional data mapping techniques, discrete finite Radon transform (FRAT) is used as a data mapping technique due to the increased orthogonality offered. As a result, the proposed structure gives a significant improvement in bit error rate (BER) performance. Th
... Show More
The reliability of the stress-strength model attracted many statisticians for several years owing to its applicability in different and diverse parts such as engineering, quality control, and economics. In this paper, the system reliability estimation in the stress-strength model containing Kth parallel components will be offered by four types of shrinkage methods: constant Shrinkage Estimation Method, Shrinkage Function Estimator, Modified Thompson Type Shrinkage Estimator, Squared Shrinkage Estimator. The Monte Carlo simulation study is compared among proposed estimators using the mean squared error. The result analyses of the shrinkage estimation methods showed that the shrinkage functions estimator was the best since
... Show MoreArabic text categorization for pattern recognitions is challenging. We propose for the first time a novel holistic method based on clustering for classifying Arabic writer. The categorization is accomplished stage-wise. Firstly, these document images are sectioned into lines, words, and characters. Secondly, their structural and statistical features are obtained from sectioned portions. Thirdly, F-Measure is used to evaluate the performance of the extracted features and their combination in different linkage methods for each distance measures and different numbers of groups. Finally, experiments are conducted on the standard KHATT dataset of Arabic handwritten text comprised of varying samples from 1000 writers. The results in the generatio
... Show MoreFuture wireless systems aim to provide higher transmission data rates, improved spectral efficiency and greater capacity. In this paper a spectral efficient two dimensional (2-D) parallel code division multiple access (CDMA) system is proposed for generating and transmitting (2-D CDMA) symbols through 2-D Inter-Symbol Interference (ISI) channel to increase the transmission speed. The 3D-Hadamard matrix is used to generate the 2-D spreading codes required to spread the two-dimensional data for each user row wise and column wise. The quadrature amplitude modulation (QAM) is used as a data mapping technique due to the increased spectral efficiency offered. The new structure simulated using MATLAB and a comparison of performance for ser
... Show MoreClustering algorithms have recently gained attention in the related literature since
they can help current intrusion detection systems in several aspects. This paper
proposes genetic algorithm (GA) based clustering, serving to distinguish patterns
incoming from network traffic packets into normal and attack. Two GA based
clustering models for solving intrusion detection problem are introduced. The first
model coined as handles numeric features of the network packet, whereas
the second one coined as concerns all features of the network packet.
Moreover, a new mutation operator directed for binary and symbolic features is
proposed. The basic concept of proposed mutation operator depends on the most
frequent value