Data Mining Techniques for Iraqi Biochemical Dataset Analysis

Sarah  Sameer; Suhad Faisal  Behadili; Sarah  Sameer; Suhad Faisal  Behadili

doi:10.21123/bsj.2022.19.2.0385

Details

Publication Date

Fri Apr 01 2022

Journal Name

Baghdad Science Journal

Volume

19

Issue Number

2

DOI

10.21123/bsj.2022.19.2.0385

Choose Citation Style

Statistics

View publication

40

Statistics

(2)

(1)

Data Mining Techniques for Iraqi Biochemical Dataset Analysis

Biomedical

Classification And Regression Tree (CART)

Data mining

Hierarchical clustering

K-means.

Sarah Sameer

Suhad Faisal Behadili

...Show More Authors

This research aims to analyze and simulate biochemical real test data for uncovering the relationships among the tests, and how each of them impacts others. The data were acquired from Iraqi private biochemical laboratory. However, these data have many dimensions with a high rate of null values, and big patient numbers. Then, several experiments have been applied on these data beginning with unsupervised techniques such as hierarchical clustering, and k-means, but the results were not clear. Then the preprocessing step performed, to make the dataset analyzable by supervised techniques such as Linear Discriminant Analysis (LDA), Classification And Regression Tree (CART), Logistic Regression (LR), K-Nearest Neighbor (K-NN), Naïve Bays (NB), and Support Vector Machine (SVM) techniques. CART gives clear results with high accuracy between the six supervised algorithms. It is worth noting that the preprocessing steps take remarkable efforts to handle this type of data, since its pure data set has so many null values of a ratio 94.8%, then it becomes 0% after achieving the preprocessing steps. Then, in order to apply CART algorithm, several determined tests were assumed as classes. The decision to select the tests which had been assumed as classes were depending on their acquired accuracy. Consequently, enabling the physicians to trace and connect the tests result with each other, which extends its impact on patients’ health.

View Publication Preview PDF

Quick Preview PDF

Publication Date

Mon Apr 01 2019

Journal Name

2019 International Conference On Automation, Computational And Technology Management (icactm)

Multi-Resolution Hierarchical Structure for Efficient Data Aggregation and Mining of Big Data

Safaa

...Show More Authors

Big data analysis is essential for modern applications in areas such as healthcare, assistive technology, intelligent transportation, environment and climate monitoring. Traditional algorithms in data mining and machine learning do not scale well with data size. Mining and learning from big data need time and memory efficient techniques, albeit the cost of possible loss in accuracy. We have developed a data aggregation structure to summarize data with large number of instances and data generated from multiple data sources. Data are aggregated at multiple resolutions and resolution provides a trade-off between efficiency and accuracy. The structure is built once, updated incrementally, and serves as a common data input for multiple mining an

View Publication

(3)

(2)

Publication Date

Mon Apr 11 2011

Journal Name

Icgst

Employing Neural Network and Naive Bayesian Classifier in Mining Data for Car Evaluation

Data mining

Backpropagation Neural Network

Naïve Bayesian Classifier

Classification

Sarmad

Aida

Junaidah

Ealaf

Mohammed

...Show More Authors

In data mining, classification is a form of data analysis that can be used to extract models describing important data classes. Two of the well known algorithms used in data mining classification are Backpropagation Neural Network (BNN) and Naïve Bayesian (NB). This paper investigates the performance of these two classification methods using the Car Evaluation dataset. Two models were built for both algorithms and the results were compared. Our experimental results indicated that the BNN classifier yield higher accuracy as compared to the NB classifier but it is less efficient because it is time-consuming and difficult to analyze due to its black-box implementation.

Publication Date

Sat Dec 01 2012

Journal Name

Journal Of Economics And Administrative Sciences

Using panel data in structural equations with application

البيانات المقطعية

منظومة المعادلات الانية

التشخيص

السلاسل الزمنية

المربعات الصغرى ذات المرحلتين المدمجة

اختبار فيلبس-بيرون

Panel data

Simultaneous equations

Balanced panel data

Pooled two stage least square

One way fixed time effect

Two way fixed time group effect

Identification

Phillips-Perron

Redundant fixed effect

Akaike information criterion

Schwarz criterion

دجلة ابراهيم

...Show More Authors

The non static chain is always the problem of static analysis so that explained some of theoretical work, the properties of statistical regression analysis to lose when using strings in statistic and gives the slope of an imaginary relation under consideration. chain is not static can become static by adding variable time to the multivariate analysis the factors to remove the general trend as well as variable placebo seasons to remove the effect of seasonal .convert the data to form exponential or logarithmic , in addition to using the difference repeated d is said in this case it integrated class d. Where the research contained in the theoretical side in parts in the first part the research methodology ha

View Publication Preview PDF

Publication Date

Fri Nov 25 2022

Journal Name

Tem Journal

Preparing of ECG Dataset for Biometric ID Identification with Creative Techniques

Mohammed

...Show More Authors

The Electrocardiogram records the heart's electrical signals. It is a practice; a painless diagnostic procedure used to rapidly diagnose and monitor heart problems. The ECG is an easy, noninvasive method for diagnosing various common heart conditions. Due to its unique advantages that other humans do not share, in addition to the fact that the heart's electrical activity may be easily detected from the body's surface, security is another area of concern. On this basis, it has become apparent that there are essential steps of pre-processing to deal with data of an electrical nature, signals, and prepare them for use in Biometric systems. Since it depends on the structure and function of the heart, it can be utilized as a biometric attribute

View Publication

(8)

Publication Date

Sat May 31 2025

Journal Name

Iraqi Journal For Computers And Informatics

Discussion on techniques of data cleaning, user identification, and session identification phases of web usage mining from 2000 to 2022

Web Usage Mining

Data Pre-processing Step

Access Log File

Mohammed

Hala

Ahmed

...Show More Authors

The data preprocessing step is an important step in web usage mining because of the nature of log data, which are heterogeneous, unstructured, and noisy. Given the scalability and efficiency of algorithms in pattern discovery, a preprocessing step must be applied. In this study, the sequential methodologies utilized in the preprocessing of data from web server logs, with an emphasis on sub-phases, such as session identification, user identification, and data cleansing, are comprehensively evaluated and meticulously examined.

View Publication Preview PDF

Publication Date

Tue Jun 30 2020

Journal Name

Journal Of Economics And Administrative Sciences

Comparison of weighted estimated method and proposed method (BEMW) for estimation of semi-parametric model under incomplete data

partial linear regression

Nadarya-Watson

Weighted estimators

suggest method (Expectation-Maximization with Bootstrapping Weighted) (EMBW)

الانحدار الخطي الجزئي

نداريا واتسون

المقدرات الموزونة

الطريقة المقترحة

سعد كاظم

رند هيثم

...Show More Authors

Generally, statistical methods are used in various fields of science, especially in the research field, in which Statistical analysis is carried out by adopting several techniques, according to the nature of the study and its objectives. One of these techniques is building statistical models, which is done through regression models. This technique is considered one of the most important statistical methods for studying the relationship between a dependent variable, also called (the response variable) and the other variables, called covariate variables. This research describes the estimation of the partial linear regression model, as well as the estimation of the “missing at random” values (MAR). Regarding the

View Publication Preview PDF

Publication Date

Fri Jun 01 2012

Journal Name

Journal Of Economics And Administrative Sciences

The Effect of the Stability of Some Commodity Activities in Iraq on the Estimation of the Statistical Data Models for the Period (1988-2000)

الانشطة السلعية

Commodity Activities

احمد سلطان

هيثم يعقوب

...Show More Authors

There is an assumption implicit but fundamental theory behind the decline by the time series used in the estimate, namely that the time series has a sleep feature Stationary or the language of Engle Gernger chains are integrated level zero, which indicated by I (0). It is well known, for example, tables of t-statistic is designed primarily to deal with the results of the regression that uses static strings. This assumption has been previously treated as an axiom the mid-seventies, where researchers are conducting studies of applied without taking into account the properties of time series used prior to the assessment, was to accept the results of these tests Bmanueh and delivery capabilities based on the applicability of the theo

View Publication Preview PDF

Publication Date

Fri Oct 19 2018

Journal Name

Journal Of Economics And Administrative Sciences

Big Data Approch to Enhance Organizational Ambidexterity An Exploratory Study of a Sample of Managers at ASIA Cell For Mobile Telecommunication Company in Iraq

البراعة التنظيمية

البيانات الكبيرة.

organizational Ambidexterity

Big data.

هدى عبد الرحيم

الاء عبد الموجود

...Show More Authors

The research aimed at measuring the compatibility of Big date with the organizational Ambidexterity dimensions of the Asia cell Mobile telecommunications company in Iraq in order to determine the possibility of adoption of Big data Triple as a approach to achieve organizational Ambidexterity.

The study adopted the descriptive analytical approach to collect and analyze the data collected by the questionnaire tool developed on the Likert scale After a comprehensive review of the literature related to the two basic study dimensions, the data has been subjected to many statistical treatments in accordance with res

View Publication Preview PDF

(2)

Publication Date

Mon May 11 2020

Journal Name

Baghdad Science Journal

A Cryptosystem for Database Security Based on TSFS Algorithm

Cryptosystem

Database

Security

TSFS.

Saad Abdulkareem

Ali Habeeb

Ammar Ibraheem

...Show More Authors

Implementation of TSFS (Transposition, Substitution, Folding, and Shifting) algorithm as an encryption algorithm in database security had limitations in character set and the number of keys used. The proposed cryptosystem is based on making some enhancements on the phases of TSFS encryption algorithm by computing the determinant of the keys matrices which affects the implementation of the algorithm phases. These changes showed high security to the database against different types of security attacks by achieving both goals of confusion and diffusion.

View Publication Preview PDF

(7)

(2)

Publication Date

Thu Feb 01 2018

Journal Name

Journal Of Economics And Administrative Sciences

Comparison of Slice inverse regression with the principal components in reducing high-dimensions data by using simulation

اختزال الابعاد

الانحدار الشرائحي المعكوس

المركبات الرئيسية.

dimensions reduction

Slice inverse regression

principal components.

عمر عبد المحسن

زينة ابراهيم

...Show More Authors

This research aims to study the methods of reduction of dimensions that overcome the problem curse of dimensionality when traditional methods fail to provide a good estimation of the parameters So this problem must be dealt with directly . Two methods were used to solve the problem of high dimensional data, The first method is the non-classical method Slice inverse regression ( SIR ) method and the proposed weight standard Sir (WSIR) method and principal components (PCA) which is the general method used in reducing dimensions, (SIR ) and (PCA) is based on the work of linear combinations of a subset of the original explanatory variables, which may suffer from the problem of heterogeneity and the problem of linear

View Publication Preview PDF

1 2 ... 5 6 7 8 ... 1007 1008