doi:10.24996/ijs.2023.64.5.32

Details

Publication Date

Tue May 30 2023

Journal Name

Iraqi Journal Of Science

Volume

64

Issue Number

5

DOI

10.24996/ijs.2023.64.5.32

Choose Citation Style

Statistics

View publication

3

Abstract Views

55

Galley Views

58

Statistics

Application of Data Mining and Imputation Algorithms for Missing Value Handling: A Study Case Car Evaluation Dataset

C5.0

k-NNI

Data Mining

Missing Value Handling

R Studio

Wahyu

Muhammad Fauzan Edy

Muhammad

Panca

Sholeh Hadi

...Show More Authors

Data mining is a data analysis process using software to find certain patterns or rules in a large amount of data, which is expected to provide knowledge to support decisions. However, missing value in data mining often leads to a loss of information. The purpose of this study is to improve the performance of data classification with missing values, precisely and accurately. The test method is carried out using the Car Evaluation dataset from the UCI Machine Learning Repository. RStudio and RapidMiner tools were used for testing the algorithm. This study will result in a data analysis of the tested parameters to measure the performance of the algorithm. Using test variations: performance at C5.0, C4.5, and k-NN at 0% missing rate, performance at C5.0, C4.5, and k-NN at 5–50% missing rate, performance at C5.0 + k-NNI, C4.5 + k-NNI, and k-NN + k-NNI classifier at 5–50% missing rate, and performance at C5.0 + CMI, C4.5 + CMI, and k-NN + CMI classifier at 5–50% missing rate, The results show that C5.0 with k-NNI produces better classification accuracy than other tested imputation and classification algorithms. For example, with 35% of the dataset missing, this method obtains 93.40% validation accuracy and 92% test accuracy. C5.0 with k-NNI also offers fast processing times compared with other methods.

View Publication Preview PDF

Quick Preview PDF

Publication Date

Fri Sep 30 2022

Journal Name

Journal Of Economics And Administrative Sciences

Semi parametric Estimators for Quantile Model via LASSO and SCAD with Missing Data

Quantile regression

partial linear model

LASSO

SCAD

missing data

nearest neighbor

Aws Adnan

Qutaiba N. Nayef

...Show More Authors

In this study, we made a comparison between LASSO & SCAD methods, which are two special methods for dealing with models in partial quantile regression. (Nadaraya & Watson Kernel) was used to estimate the non-parametric part ;in addition, the rule of thumb method was used to estimate the smoothing bandwidth (h). Penalty methods proved to be efficient in estimating the regression coefficients, but the SCAD method according to the mean squared error criterion (MSE) was the best after estimating the missing data using the mean imputation method

View Publication Preview PDF

Publication Date

Sun Jan 01 2023

Journal Name

Journal Of Intelligent Systems

A study on predicting crime rates through machine learning and data mining using text

Ruaa

Husam

...Show More Authors

Abstract<p>Crime is a threat to any nation’s security administration and jurisdiction. Therefore, crime analysis becomes increasingly important because it assigns the time and place based on the collected spatial and temporal data. However, old techniques, such as paperwork, investigative judges, and statistical analysis, are not efficient enough to predict the accurate time and location where the crime had taken place. But when machine learning and data mining methods were deployed in crime analysis, crime analysis and predication accuracy increased dramatically. In this study, various types of criminal analysis and prediction using several machine learning and data mining techniques, based o</p> ... Show More

View Publication

(5)

(2)

Publication Date

Mon Feb 14 2022

Journal Name

Journal Of Educational And Psychological Researches

Comparison between Rush Model Parameters to Completed and Lost Data by Different Methods of Processing Missing Data

missing value

imputation missing value

Rasch model

Nawal Jabbar Saleh

...Show More Authors

The current study aims to compare between the assessments of the Rush model’s parameters to the missing and completed data in various ways of processing the missing data. To achieve the aim of the present study, the researcher followed the following steps: preparing Philip Carter test for the spatial capacity which consists of (20) items on a group of (250) sixth scientific stage students in the directorates of Baghdad Education at Al–Rusafa (1^st, 2^nd and 3^rd) for the academic year (2018-2019). Then, the researcher relied on a single-parameter model to analyze the data. The researcher used Bilog-mg3 model to check the hypotheses, data and match them with the model. In addition

View Publication Preview PDF

Publication Date

Sat Jul 01 2017

Journal Name

2017 Computing Conference

Protecting a sensitive dataset using a time based password in big data

Omar Z.

G. J.

H. S.

...Show More Authors

View Publication

(1)

Publication Date

Wed Dec 30 2020

Journal Name

Iraqi Journal Of Science

A Comparison of Different Estimation Methods to Handle Missing Data in Explanatory Variables

Missing data

Simulation

Recurrent Neural Networks

Expectation- Maximization

Multicycle â€“Expectation -Conditional-Maximization

Expectation-Conditional-Maximization-Either

Manal Jabbar

...Show More Authors

Missing data is one of the problems that may occur in regression models. This problem is usually handled by deletion mechanism available in statistical software. This method reduces statistical inference values because deletion affects sample size. In this paper, Expectation Maximization algorithm (EM), Multicycle-Expectation-Conditional Maximization algorithm (MC-ECM), Expectation-Conditional Maximization Either (ECME), and Recurrent Neural Networks (RNN) are used to estimate multiple regression models when explanatory variables have some missing values. Experimental dataset were generated using Visual Basic programming language with missing values of explanatory variables according to a missing mechanism at random general pattern and s

View Publication Preview PDF

Publication Date

Tue Aug 31 2021

Journal Name

Iraqi Journal Of Science

Application of Neural Network Analysis for Seismic Data to Differentiate Reservoir Units of Yamama Formation in Nasiriya Oilfield A Case Study in Southern Iraq

Neural Network Analysis

Log and Seismic data relationship

Yamama Formation

Nasiriya oilfield

Salman Z.

Maha F.

Ammar A. J.

...Show More Authors

The EMERGE application from Hampsson-Russell suite programs was used in the present study. It is an interesting domain for seismic attributes that predict some of reservoir three dimensional or two dimensional properties, as well as their combination. The objective of this study is to differentiate reservoir/non reservoir units with well data in the Yamama Formation by using seismic tools. P-impedance volume (density x velocity of P-wave) was used in this research to perform a three dimensional seismic model on the oilfield of Nasiriya by using post-stack data of 5 wells. The data (training and application) were utilized in the EMERGE analysis for estimating the reservoir properties of P-wave ve

View Publication Preview PDF

Publication Date

Tue Dec 20 2022

Journal Name

2022 International Conference On Computer And Applications (icca)

Improve Data Mining Techniques with a High-Performance Cluster

Fadhil H.M.

...Show More Authors

View Publication

Publication Date

Fri Sep 30 2022

Journal Name

Iraqi Journal Of Science

Educational Data Mining For Predicting Academic Student Performance Using Active Classification

Educational Data Mining

Active classification

Students’ Prediction

Feature Importance

Random Forest

Multilayer Perceptron

Rasha H.

...Show More Authors

The increasing amount of educational data has rapidly in the latest few years. The Educational Data Mining (EDM) techniques are utilized to detect the valuable pattern so that improves the educational process and to obtain high performance of all educational elements. The proposed work contains three stages: preprocessing, features selection, and an active classification stage. The dataset was collected using EDM that had a lack in the label data, it contained 2050 records collected by using questionnaires and by using the students’ academic records. There are twenty-five features that were combined from the following five factors: (curriculum, teacher, student, the environment of education, and the family). Active learning ha

View Publication Preview PDF

(2)

Publication Date

Wed Nov 01 2017

Journal Name

Journal Of Economics And Administrative Sciences

Estimate missing value by use analyses of covariance method for split block-design

analysis covariance - Split-block design.

كمال علوان

سيماء فراس

...Show More Authors

The research aims to estimate missing values using covariance analysis method Coons way to the variable response or dependent variable that represents the main character studied in a type of multi-factor designs experiments called split block-design (SBED) so as to increase the accuracy of the analysis results and the accuracy of statistical tests based on this type of designs. as it was noted in the theoretical aspect to the design of dissident sectors and statistical analysis have to analyze the variation in the experience of experiment )SBED) and the use of covariance way coons analysis according to two methods to estimate the missing value, either in the practical side of it has been implemented field experiment wheat crop in

View Publication Preview PDF

Publication Date

Fri Apr 26 2019

Journal Name

Journal Of Contemporary Medical Sciences

Breast Cancer Decisive Parameters for Iraqi Women via Data Mining Techniques

CA 15-3

CEA

Breast Cancer

Saliva

MLP

SLR

J48

data mining

OneR

Iraq

Suhad Faisal

Mustafa S.

Iyden Kamil

Maha Mohammed

...Show More Authors

Objective This research investigates Breast Cancer real data for Iraqi women, these data are acquired manually from several Iraqi Hospitals of early detection for Breast Cancer. Data mining techniques are used to discover the hidden knowledge, unexpected patterns, and new rules from the dataset, which implies a large number of attributes. Methods Data mining techniques manipulate the redundant or simply irrelevant attributes to discover interesting patterns. However, the dataset is processed via Weka (The Waikato Environment for Knowledge Analysis) platform. The OneR technique is used as a machine learning classifier to evaluate the attribute worthy according to the class value. Results The evaluation is performed using

View Publication Preview PDF

(2)

1 2 3 4 ... 3308 3309 3310 3311