This research aims to analyze and simulate biochemical real test data for uncovering the relationships among the tests, and how each of them impacts others. The data were acquired from Iraqi private biochemical laboratory. However, these data have many dimensions with a high rate of null values, and big patient numbers. Then, several experiments have been applied on these data beginning with unsupervised techniques such as hierarchical clustering, and k-means, but the results were not clear. Then the preprocessing step performed, to make the dataset analyzable by supervised techniques such as Linear Discriminant Analysis (LDA), Classification And Regression Tree (CART), Logistic Regression (LR), K-Nearest Neighbor (K-NN), Naïve Bays (NB), and Support Vector Machine (SVM) techniques. CART gives clear results with high accuracy between the six supervised algorithms. It is worth noting that the preprocessing steps take remarkable efforts to handle this type of data, since its pure data set has so many null values of a ratio 94.8%, then it becomes 0% after achieving the preprocessing steps. Then, in order to apply CART algorithm, several determined tests were assumed as classes. The decision to select the tests which had been assumed as classes were depending on their acquired accuracy. Consequently, enabling the physicians to trace and connect the tests result with each other, which extends its impact on patients’ health.
Big data analysis is essential for modern applications in areas such as healthcare, assistive technology, intelligent transportation, environment and climate monitoring. Traditional algorithms in data mining and machine learning do not scale well with data size. Mining and learning from big data need time and memory efficient techniques, albeit the cost of possible loss in accuracy. We have developed a data aggregation structure to summarize data with large number of instances and data generated from multiple data sources. Data are aggregated at multiple resolutions and resolution provides a trade-off between efficiency and accuracy. The structure is built once, updated incrementally, and serves as a common data input for multiple mining an
... Show MoreThe Electrocardiogram records the heart's electrical signals. It is a practice; a painless diagnostic procedure used to rapidly diagnose and monitor heart problems. The ECG is an easy, noninvasive method for diagnosing various common heart conditions. Due to its unique advantages that other humans do not share, in addition to the fact that the heart's electrical activity may be easily detected from the body's surface, security is another area of concern. On this basis, it has become apparent that there are essential steps of pre-processing to deal with data of an electrical nature, signals, and prepare them for use in Biometric systems. Since it depends on the structure and function of the heart, it can be utilized as a biometric attribute
... Show MoreIn data mining, classification is a form of data analysis that can be used to extract models describing important data classes. Two of the well known algorithms used in data mining classification are Backpropagation Neural Network (BNN) and Naïve Bayesian (NB). This paper investigates the performance of these two classification methods using the Car Evaluation dataset. Two models were built for both algorithms and the results were compared. Our experimental results indicated that the BNN classifier yield higher accuracy as compared to the NB classifier but it is less efficient because it is time-consuming and difficult to analyze due to its black-box implementation.
The non static chain is always the problem of static analysis so that explained some of theoretical work, the properties of statistical regression analysis to lose when using strings in statistic and gives the slope of an imaginary relation under consideration. chain is not static can become static by adding variable time to the multivariate analysis the factors to remove the general trend as well as variable placebo seasons to remove the effect of seasonal .convert the data to form exponential or logarithmic , in addition to using the difference repeated d is said in this case it integrated class d. Where the research contained in the theoretical side in parts in the first part the research methodology ha
... Show MoreGenerally, statistical methods are used in various fields of science, especially in the research field, in which Statistical analysis is carried out by adopting several techniques, according to the nature of the study and its objectives. One of these techniques is building statistical models, which is done through regression models. This technique is considered one of the most important statistical methods for studying the relationship between a dependent variable, also called (the response variable) and the other variables, called covariate variables. This research describes the estimation of the partial linear regression model, as well as the estimation of the “missing at random” values (MAR). Regarding the
... Show MoreThere is an assumption implicit but fundamental theory behind the decline by the time series used in the estimate, namely that the time series has a sleep feature Stationary or the language of Engle Gernger chains are integrated level zero, which indicated by I (0). It is well known, for example, tables of t-statistic is designed primarily to deal with the results of the regression that uses static strings. This assumption has been previously treated as an axiom the mid-seventies, where researchers are conducting studies of applied without taking into account the properties of time series used prior to the assessment, was to accept the results of these tests Bmanueh and delivery capabilities based on the applicability of the theo
... Show MoreThe research aimed at measuring the compatibility of Big date with the organizational Ambidexterity dimensions of the Asia cell Mobile telecommunications company in Iraq in order to determine the possibility of adoption of Big data Triple as a approach to achieve organizational Ambidexterity.
The study adopted the descriptive analytical approach to collect and analyze the data collected by the questionnaire tool developed on the Likert scale After a comprehensive review of the literature related to the two basic study dimensions, the data has been subjected to many statistical treatments in accordance with res
... Show MoreImplementation of TSFS (Transposition, Substitution, Folding, and Shifting) algorithm as an encryption algorithm in database security had limitations in character set and the number of keys used. The proposed cryptosystem is based on making some enhancements on the phases of TSFS encryption algorithm by computing the determinant of the keys matrices which affects the implementation of the algorithm phases. These changes showed high security to the database against different types of security attacks by achieving both goals of confusion and diffusion.
The monthly time series of the Total Suspended Solids (TSS) concentrations in Euphrates River at Nasria was analyzed as a time series. The data used for the analysis was the monthly series during (1977-2000).
The series was tested for nonhomogenity and found to be nonhomogeneous. A significant positive jump was observed after 1988. This nonhomogenity was removed using a method suggested by Yevichevich (7). The homogeneous series was then normalized using Box and Cox (2) transformation. The periodic component of the series was fitted using harmonic analyses, and removed from the series to obtain the dependent stochastic component. This component was then modeled using first order autoregressive model (Markovian chain). The above a
... Show MoreThis research aims to study the methods of reduction of dimensions that overcome the problem curse of dimensionality when traditional methods fail to provide a good estimation of the parameters So this problem must be dealt with directly . Two methods were used to solve the problem of high dimensional data, The first method is the non-classical method Slice inverse regression ( SIR ) method and the proposed weight standard Sir (WSIR) method and principal components (PCA) which is the general method used in reducing dimensions, (SIR ) and (PCA) is based on the work of linear combinations of a subset of the original explanatory variables, which may suffer from the problem of heterogeneity and the problem of linear
... Show More