The conventional procedures of clustering algorithms are incapable of overcoming the difficulty of managing and analyzing the rapid growth of generated data from different sources. Using the concept of parallel clustering is one of the robust solutions to this problem. Apache Hadoop architecture is one of the assortment ecosystems that provide the capability to store and process the data in a distributed and parallel fashion. In this paper, a parallel model is designed to process the k-means clustering algorithm in the Apache Hadoop ecosystem by connecting three nodes, one is for server (name) nodes and the other two are for clients (data) nodes. The aim is to speed up the time of managing the massive scale of healthcare insurance dataset with the size of 11 GB and also using machine learning algorithms, which are provided by the Mahout Framework. The experimental results depict that the proposed model can efficiently process large datasets. The parallel k-means algorithm outperforms the sequential k-means algorithm based on the execution time of the algorithm, where the required time to execute a data size of 11 GB is around 1.847 hours using the parallel k-means algorithm, while it equals 68.567 hours using the sequential k-means algorithm. As a result, we deduce that when the nodes number in the parallel system increases, the computation time of the proposed algorithm decreases.
In this paper, time spent and the repetition of using the Social Network Sites (SNS) in Android applications are investigated. In this approach, we seek to raise the awareness and limit, but not eliminate the repeated uses of SNS, by introducing AndroidTrack. This AndroidTrack is an android application that was designed to monitor and apply valid experimental studies in order to improve the impacts of social media on Iraqi users. Data generated from the app were aggregated and updated periodically at Google Firebase Real-time Database. The statistical factor analysis (FA) was presented as a result of the user’s interactions.
The reaction of LAs-Cl8 : [ (2,2- (1-(3,4-bis(carboxylicdichloromethoxy)-5-oxo-2,5- dihydrofuran-2-yl)ethane – 1,2-diyl)bis(2,2-dichloroacetic acid)]with sodium azide in ethanol with drops of distilled water has been investigated . The new product L-AZ :(3Z ,5Z,8Z)-2- azido-8-[azido(3Z,5Z)-2-azido-2,6-bis(azidocarbonyl)-8,9-dihydro-2H-1,7-dioxa-3,4,5- triazonine-9-yl]methyl]-9-[(1-azido-1-hydroxy)methyl]-2H-1,7-dioxa-3,4,5-triazonine – 2,6 – dicarbonylazide was isolated and characterized by elemental analysis (C.H.N) , 1H-NMR , Mass spectrum and Fourier transform infrared spectrophotometer (FT-IR) . The reaction of the L-AZ withM+n: [ ( VO(II) , Cr(III) ,Mn(II) , Co(II) , Ni(II) , Cu(II) , Zn(II) , Cd(II) and Hg(II)] has been i
... Show MoreDue to the increased of information existing on the World Wide Web (WWW), the subject of how to extract new and useful knowledge from the log file has gained big interest among researchers in data mining and knowledge discovery topics.
Web miming, which is a subset of data mining divided into three particular ways, web content mining, web structure mining, web usage mining. This paper is interested in server log file, which is belonging to the third category (web usage mining). This file will be analyzed according to the suggested algorithm to extract the behavior of the user. Knowing the behavior is coming from knowing the complete path which is taken from the specific user.
Extracting these types of knowledge required many of KDD
The estimation of the regular regression model requires several assumptions to be satisfied such as "linearity". One problem occurs by partitioning the regression curve into two (or more) parts and then joining them by threshold point(s). This situation is regarded as a linearity violation of regression. Therefore, the multiphase regression model is received increasing attention as an alternative approach which describes the changing of the behavior of the phenomenon through threshold point estimation. Maximum likelihood estimator "MLE" has been used in both model and threshold point estimations. However, MLE is not resistant against violations such as outliers' existence or in case of the heavy-tailed error distribution. The main goal of t
... Show MoreOne of the costliest problems facing the production of hydrocarbons in unconsolidated sandstone reservoirs is the production of sand once hydrocarbon production starts. The sanding start prediction model is very important to decide on sand control in the future, including whether or when sand control should be used. This research developed an easy-to-use Computer program to determine the beginning of sanding sites in the driven area. The model is based on estimating the critical pressure drop that occurs when sand is onset to produced. The outcomes have been drawn as a function of the free sand production with the critical flow rates for reservoir pressure decline. The results show that the pressure drawdown required to
... Show MoreIn this research, several estimators concerning the estimation are introduced. These estimators are closely related to the hazard function by using one of the nonparametric methods namely the kernel function for censored data type with varying bandwidth and kernel boundary. Two types of bandwidth are used: local bandwidth and global bandwidth. Moreover, four types of boundary kernel are used namely: Rectangle, Epanechnikov, Biquadratic and Triquadratic and the proposed function was employed with all kernel functions. Two different simulation techniques are also used for two experiments to compare these estimators. In most of the cases, the results have proved that the local bandwidth is the best for all the
... Show MoreData hiding is the process of encoding extra information in an image by making small modification to its pixels. To be practical, the hidden data must be perceptually invisible yet robust to common signal processing operations. This paper introduces a scheme for hiding a signature image that could be as much as 25% of the host image data and hence could be used both in digital watermarking as well as image/data hiding. The proposed algorithm uses orthogonal discrete wavelet transforms with two zero moments and with improved time localization called discrete slantlet transform for both host and signature image. A scaling factor ? in frequency domain control the quality of the watermarked images. Experimental results of signature image
... Show MoreHemorrhagic insult is a major source of morbidity and mortality in both adults and newborn babies in the developed countries. The mechanisms underlying the non-traumatic rupture of cerebral vessels are not fully clear, but there is strong evidence that stress, which is associated with an increase in arterial blood pressure, plays a crucial role in the development of acute intracranial hemorrhage (ICH), and alterations in cerebral blood flow (CBF) may contribute to the pathogenesis of ICH. The problem is that there are no effective diagnostic methods that allow for a prognosis of risk to be made for the development of ICH. Therefore, quantitative assessment of CBF may significantly advance the underst
Derivative spectrophotometry is one of the analytical chemistry techniques used
in the analysis and determination of chemicals and pharmaceuticals. This method is
characterized by simplicity, sensitivity and speed. Derivative of Spectra conducted
in several ways, including optical, electronic and mathematical. This operation
usually be done within spectrophotometer. The paper is based on form of a new
program. The program construction is written in Visual Basic language within
Microsoft Excel. The program is able to transform the first, second, third and fourth
derivatives of data and the return of these derivatives to zero order (normal plot).
The program was applied on experimental (trial) and reals values of su