Article - ijs-2747 - Digital Repository

Details

Publication Date

Sat Jul 31 2021

Journal Name

Iraqi Journal Of Science

Issue Number

DOI

10.24996/ijs.2021.62.7.32

Keywords

Big Data

Hadoop

Mahout

Predictive Analytics

Parallel K-means

Choose Citation Style

Statistics

Abstract Views

217

Galley Views

256

Statistics

(3)

(1)

Authors (2)

Noor S.

Suhad A.

A Parallel Clustering Analysis Based on Hadoop Multi-Node and Apache Mahout

The conventional procedures of clustering algorithms are incapable of overcoming the difficulty of managing and analyzing the rapid growth of generated data from different sources. Using the concept of parallel clustering is one of the robust solutions to this problem. Apache Hadoop architecture is one of the assortment ecosystems that provide the capability to store and process the data in a distributed and parallel fashion. In this paper, a parallel model is designed to process the k-means clustering algorithm in the Apache Hadoop ecosystem by connecting three nodes, one is for server (name) nodes and the other two are for clients (data) nodes. The aim is to speed up the time of managing the massive scale of healthcare insurance dataset with the size of 11 GB and also using machine learning algorithms, which are provided by the Mahout Framework. The experimental results depict that the proposed model can efficiently process large datasets. The parallel k-means algorithm outperforms the sequential k-means algorithm based on the execution time of the algorithm, where the required time to execute a data size of 11 GB is around 1.847 hours using the parallel k-means algorithm, while it equals 68.567 hours using the sequential k-means algorithm. As a result, we deduce that when the nodes number in the parallel system increases, the computation time of the proposed algorithm decreases.

View Publication Preview PDF

Quick Preview PDF

Publication Date

Fri Jan 01 2021

Journal Name

International Journal Agricultural And Statistical Sciences

A COMPARISON BETWEEN SOME HIERARCHICAL CLUSTERING TECHNIQUES

In this paper, some commonly used hierarchical cluster techniques have been compared. A comparison was made between the agglomerative hierarchical clustering technique and the k-means technique, which includes the k-mean technique, the variant K-means technique, and the bisecting K-means, although the hierarchical cluster technique is considered to be one of the best clustering methods. It has a limited usage due to the time complexity. The results, which are calculated based on the analysis of the characteristics of the cluster algorithms and the nature of the data, showed that the bisecting K-means technique is the best compared to the rest of the other methods used.

(1)

Authors (2)

Asmaa Najm

Suhad Ahmed

Preview PDF

Publication Date

Sun Feb 25 2024

Journal Name

Baghdad Science Journal

Research on Emotion Classification Based on Multi-modal Fusion

Nowadays, people's expression on the Internet is no longer limited to text, especially with the rise of the short video boom, leading to the emergence of a large number of modal data such as text, pictures, audio, and video. Compared to single mode data ,the multi-modal data always contains massive information. The mining process of multi-modal information can help computers to better understand human emotional characteristics. However, because the multi-modal data show obvious dynamic time series features, it is necessary to solve the dynamic correlation problem within a single mode and between different modes in the same application scene during the fusion process. To solve this problem, in this paper, a feature extraction framework of

(1)

Authors (3)

zhihua

Nor Haizan Mohamed

Haslina

View Publication Preview PDF

Publication Date

Fri Aug 28 2020

Journal Name

Iraqi Journal Of Science

De-Noising of Corrupted Fluoroscopy Images Based on a New Multi-Line Algorithm

Fluoroscopic images are a field of medical images that depends on the quality of image for correct diagnosis; the main trouble is the de-nosing and how to keep the poise between degradation of noisy image, from one side, and edge and fine details preservation, from the other side, especially when fluoroscopic images contain black and white type noise with high density. The previous filters could usually handle low/medium black and white type noise densities, that expense edge, =fine details preservation and fail with high density of noise that corrupts the images. Therefore, this paper proposed a new Multi-Line algorithm that deals with high-corrupted image with high density of black and white type noise. The experiments achieved i

Authors (2)

Maytham. A.

Rohaida

View Publication Preview PDF

Publication Date

Thu Nov 30 2023

Journal Name

Iraqi Journal Of Science

A Key Based Hybrid Approach for Privacy and Integrity in Multi-Cloud

Before users store data in the cloud, many security issues must be addressed, as they will have no direct control over the data that has been outsourced to the cloud, particularly personal and sensitive data (health, finance, military, etc.). This article proposes a system based on chaotic maps for private key generation. A hybrid encryption for fast and secure cryptography. In addition to a multi-cloud storage with Pseudonymized file names to preserve user data privacy on the cloud while minimizing data loss. As well as a hash approach to check data integrity. AES in combination with RSA and fragmenting the file is used for the encryption. Integrity is cheeked using SHA-3. The experiments demonstrated that the key generation stra

Authors (2)

Mariam Duraid

Yousra abdul alsahib

View Publication

Publication Date

Fri Oct 02 2015

Journal Name

American Journal Of Applied Sciences

Advances in Document Clustering with Evolutionary-Based Algorithms

Document clustering is the process of organizing a particular electronic corpus of documents into subgroups of similar text features. Formerly, a number of conventional algorithms had been applied to perform document clustering. There are current endeavors to enhance clustering performance by employing evolutionary algorithms. Thus, such endeavors became an emerging topic gaining more attention in recent years. The aim of this paper is to present an up-to-date and self-contained review fully devoted to document clustering via evolutionary algorithms. It firstly provides a comprehensive inspection to the document clustering model revealing its various components with its related concepts. Then it shows and analyzes the principle research wor

(2)

Authors (1)

Sarmad

View Publication

Publication Date

Tue Feb 13 2024

Journal Name

Iraqi Journal Of Science

Fuzzy Linear Discriminant Analysis Clustering With Its Application

Many fuzzy clustering are based on within-cluster scatter with a compactness measure , but in this paper explaining new fuzzy clustering method which depend on within-cluster scatter with a compactness measure and between-cluster scatter with a separation measure called the fuzzy compactness and separation (FCS). The fuzzy linear discriminant analysis (FLDA) based on within-cluster scatter matrix and between-cluster scatter matrix . Then two fuzzy scattering matrices in the objective function assure the compactness between data elements and cluster centers .To test the optimal number of clusters using validation clustering method is discuss .After that an illustrate example are applied.

Authors (2)

Iden

Rand

View Publication Preview PDF

Publication Date

Fri Dec 01 2023

Journal Name

Applied Energy

Deep clustering of Lagrangian trajectory for multi-task learning to energy saving in intelligent buildings using cooperative multi-agent

The intelligent buildings provided various incentives to get highly inefficient energy-saving caused by the non-stationary building environments. In the presence of such dynamic excitation with higher levels of nonlinearity and coupling effect of temperature and humidity, the HVAC system transitions from underdamped to overdamped indoor conditions. This led to the promotion of highly inefficient energy use and fluctuating indoor thermal comfort. To address these concerns, this study develops a novel framework based on deep clustering of lagrangian trajectories for multi-task learning (DCLTML) and adding a pre-cooling coil in the air handling unit (AHU) to alleviate a coupling issue. The proposed DCLTML exhibits great overall control and is

(12)

(7)

Authors (1)

Jasim

View Publication

Publication Date

Tue Sep 08 2020

Journal Name

Baghdad Science Journal

Hiding the Type of Skin Texture in Mice based on Fuzzy Clustering Technique

A substantial matter to confidential messages' interchange through the internet is transmission of information safely. For example, digital products' consumers and producers are keen for knowing those products are genuine and must be distinguished from worthless products. Encryption's science can be defined as the technique to embed the data in an images file, audio or videos in a style which should be met the safety requirements. Steganography is a portion of data concealment science that aiming to be reached a coveted security scale in the interchange of private not clear commercial and military data. This research offers a novel technique for steganography based on hiding data inside the clusters that resulted from fuzzy clustering. T

(3)

Authors (2)

Alaa Noori

اخلاص

View Publication Preview PDF

Publication Date

Mon Dec 14 2020

Journal Name

2020 13th International Conference On Developments In Esystems Engineering (dese)

Anomaly Based Intrusion Detection System Using Hierarchical Classification and Clustering Techniques

With the rapid development of computers and network technologies, the security of information in the internet becomes compromise and many threats may affect the integrity of such information. Many researches are focused theirs works on providing solution to this threat. Machine learning and data mining are widely used in anomaly-detection schemes to decide whether or not a malicious activity is taking place on a network. In this paper a hierarchical classification for anomaly based intrusion detection system is proposed. Two levels of features selection and classification are used. In the first level, the global feature vector for detection the basic attacks (DoS, U2R, R2L and Probe) is selected. In the second level, four local feature vect

(1)

(2)

Authors (2)

Suhaila N.

View Publication

Publication Date

Wed Apr 10 2019

Journal Name

Engineering, Technology & Applied Science Research

Content Based Image Clustering Technique Using Statistical Features and Genetic Algorithm

Text based-image clustering (TBIC) is an insufficient approach for clustering related web images. It is a challenging task to abstract the visual features of images with the support of textual information in a database. In content-based image clustering (CBIC), image data are clustered on the foundation of specific features like texture, colors, boundaries, shapes. In this paper, an effective CBIC) technique is presented, which uses texture and statistical features of the images. The statistical features or moments of colors (mean, skewness, standard deviation, kurtosis, and variance) are extracted from the images. These features are collected in a one dimension array, and then genetic algorithm (GA) is applied for image clustering.

(5)

(2)

Authors (1)

Alsaidi B.K.

View Publication

1 2 3 4 ... 997 998 999 1000