The conventional procedures of clustering algorithms are incapable of overcoming the difficulty of managing and analyzing the rapid growth of generated data from different sources. Using the concept of parallel clustering is one of the robust solutions to this problem. Apache Hadoop architecture is one of the assortment ecosystems that provide the capability to store and process the data in a distributed and parallel fashion. In this paper, a parallel model is designed to process the k-means clustering algorithm in the Apache Hadoop ecosystem by connecting three nodes, one is for server (name) nodes and the other two are for clients (data) nodes. The aim is to speed up the time of managing the massive scale of healthcare insurance dataset with the size of 11 GB and also using machine learning algorithms, which are provided by the Mahout Framework. The experimental results depict that the proposed model can efficiently process large datasets. The parallel k-means algorithm outperforms the sequential k-means algorithm based on the execution time of the algorithm, where the required time to execute a data size of 11 GB is around 1.847 hours using the parallel k-means algorithm, while it equals 68.567 hours using the sequential k-means algorithm. As a result, we deduce that when the nodes number in the parallel system increases, the computation time of the proposed algorithm decreases.
Cloud Computing is a mass platform to serve high volume data from multi-devices and numerous technologies. Cloud tenants have a high demand to access their data faster without any disruptions. Therefore, cloud providers are struggling to ensure every individual data is secured and always accessible. Hence, an appropriate replication strategy capable of selecting essential data is required in cloud replication environments as the solution. This paper proposed a Crucial File Selection Strategy (CFSS) to address poor response time in a cloud replication environment. A cloud simulator called CloudSim is used to conduct the necessary experiments, and results are presented to evidence the enhancement on replication performance. The obtained an
... Show MoreGeneral medical fields and computer science usually conjugate together to produce impressive results in both fields using applications, programs and algorithms provided by Data mining field. The present research's title contains the term hygiene which may be described as the principle of maintaining cleanliness of the external body. Whilst the environmental hygienic hazards can present themselves in various media shapes e.g. air, water, soil…etc. The influence they can exert on our health is very complex and may be modulated by our genetic makeup, psychological factors and by our perceptions of the risks that they present. Our main concern in this research is not to improve general health, rather than to propose a data mining approach
... Show MoreThis research is focused on an interpretive of 2D seismic data to study is reinterpreting seismic data by applying sufficient software (Petrel 2017) of the area between Al-Razzazah Lake and the Euphrates river belonging to Karbala'a and Al-Anbar Governorates, central Iraq. The delineation of the sub-surface structural features and evaluation of the structure of Najmah and Zubair Formations was done. The structure interpretation showed that the studied area was affected by normal fault bearing (NW-SE) direction with a small displacement. In contrast, time and depth maps showed monocline structures (nose structures) located in the western part of the studied area.
In this study, a chaotic method is proposed that generates S-boxes similar to AES S-boxes with the help of a private key belonging to
In this study, dynamic encryption techniques are explored as an image cipher method to generate S-boxes similar to AES S-boxes with the help of a private key belonging to the user and enable images to be encrypted or decrypted using S-boxes. This study consists of two stages: the dynamic generation of the S-box method and the encryption-decryption method. S-boxes should have a non-linear structure, and for this reason, K/DSA (Knutt Durstenfeld Shuffle Algorithm), which is one of the pseudo-random techniques, is used to generate S-boxes dynamically. The biggest advantage of this approach is the produ
... Show MoreIn this paper, we investigate the automatic recognition of emotion in text. We perform experiments with a new method of classification based on the PPM character-based text compression scheme. These experiments involve both coarse-grained classification (whether a text is emotional or not) and also fine-grained classification such as recognising Ekman’s six basic emotions (Anger, Disgust, Fear, Happiness, Sadness, Surprise). Experimental results with three datasets show that the new method significantly outperforms the traditional word-based text classification methods. The results show that the PPM compression based classification method is able to distinguish between emotional and nonemotional text with high accuracy, between texts invo
... Show MoreIn this research we solved numerically Boltzmann transport equation in order to calculate the transport parameters, such as, drift velocity, W, D/? (ratio of diffusion coefficient to the mobility) and momentum transfer collision frequency ?m, for purpose of determination of magnetic drift velocity WM and magnetic deflection coefficient ? for low energy electrons, that moves in the electric field E, crossed with magnetic field B, i.e; E×B, in the nitrogen, Argon, Helium and it's gases mixtures as a function of: E/N (ratio of electric field strength to the number density of gas), E/P300 (ratio of electric field strength to the gas pressure) and D/? which covered a different ranges for E/P300 at temperatures 300°k (Kelvin). The results show
... Show MoreThe drought is a globally phenomenon, its influence will convert large parts of Middle East and North Africa (MENA) region into hot dry deserts under the expectations of the climate change scenarios. Climate limitations, soil erosion affected by weather properties such as unequally and limited rainfall; temperature changing and wind, unsuitable irrigation techniques, excessive grazing, agricultural expansion against to the natural habitats, extensively clearance of natural vegetation, and soil salinity had all contributed to land degradation, reduced water supplies, and limited agricultural production in Iraq. It is estimated that nearly 54.3 % of Iraq's area is threatened by desertification problems.
In this research, for Iraq the Cl
It has increasingly been recognised that the future developments in geospatial data handling will centre on geospatial data on the web: Volunteered Geographic Information (VGI). The evaluation of VGI data quality, including positional and shape similarity, has become a recurrent subject in the scientific literature in the last ten years. The OpenStreetMap (OSM) project is the most popular one of the leading platforms of VGI datasets. It is an online geospatial database to produce and supply free editable geospatial datasets for a worldwide. The goal of this paper is to present a comprehensive overview of the quality assurance of OSM data. In addition, the credibility of open source geospatial data is discussed, highlight
... Show MoreIn the last decade, the web has rapidly become an attractive platform, and an indispensable part of our lives. Unfortunately, as our dependency on the web increases so programmers focus more on functionality and appearance than security, has resulted in the interest of attackers in exploiting serious security problems that target web applications and web-based information systems e.g. through an SQL injection attack. SQL injection in simple terms, is the process of passing SQL code into interactive web applications that employ database services such applications accept user input such as form and then include this input in database requests, typically SQL statements in a way that was not intende
... Show MoreLoanwords are the words transferred from one language to another, which become essential part of the borrowing language. The loanwords have come from the source language to the recipient language because of many reasons. Detecting these loanwords is complicated task due to that there are no standard specifications for transferring words between languages and hence low accuracy. This work tries to enhance this accuracy of detecting loanwords between Turkish and Arabic language as a case study. In this paper, the proposed system contributes to find all possible loanwords using any set of characters either alphabetically or randomly arranged. Then, it processes the distortion in the pronunciation, and solves the problem of the missing lette
... Show More