Preferred Language
Articles
/
ijs-5948
Application of Data Mining and Imputation Algorithms for Missing Value Handling: A Study Case Car Evaluation Dataset
...Show More Authors

     Data mining is a data analysis process using software to find certain patterns or rules in a large amount of data, which is expected to provide knowledge to support decisions. However, missing value in data mining often leads to a loss of information. The purpose of this study is to improve the performance of data classification with missing values, ​​precisely and accurately. The test method is carried out using the Car Evaluation dataset from the UCI Machine Learning Repository. RStudio and RapidMiner tools were used for testing the algorithm. This study will result in a data analysis of the tested parameters to measure the performance of the algorithm. Using test variations: performance at C5.0, C4.5, and k-NN at 0% missing rate, performance at C5.0, C4.5, and k-NN at 5–50% missing rate, performance at C5.0 + k-NNI, C4.5 + k-NNI, and k-NN + k-NNI classifier at 5–50% missing rate, and performance at C5.0 + CMI, C4.5 + CMI, and k-NN + CMI classifier at 5–50% missing rate, The results show that C5.0 with k-NNI produces better classification accuracy than other tested imputation and classification algorithms. For example, with 35% of the dataset missing, this method obtains 93.40% validation accuracy and 92% test accuracy. C5.0 with k-NNI also offers fast processing times compared with other methods.

Scopus Crossref
View Publication Preview PDF
Quick Preview PDF
Publication Date
Fri Sep 30 2022
Journal Name
Journal Of Economics And Administrative Sciences
Semi parametric Estimators for Quantile Model via LASSO and SCAD with Missing Data
...Show More Authors

In this study, we made a comparison between LASSO & SCAD methods, which are two special methods for dealing with models in partial quantile regression. (Nadaraya & Watson Kernel) was used to estimate the non-parametric part ;in addition, the rule of thumb method was used to estimate the smoothing bandwidth (h). Penalty methods proved to be efficient in estimating the regression coefficients, but the SCAD method according to the mean squared error criterion (MSE) was the best after estimating the missing data using the mean imputation method

View Publication Preview PDF
Crossref
Publication Date
Sun Jan 01 2023
Journal Name
Journal Of Intelligent Systems
A study on predicting crime rates through machine learning and data mining using text
...Show More Authors
Abstract<p>Crime is a threat to any nation’s security administration and jurisdiction. Therefore, crime analysis becomes increasingly important because it assigns the time and place based on the collected spatial and temporal data. However, old techniques, such as paperwork, investigative judges, and statistical analysis, are not efficient enough to predict the accurate time and location where the crime had taken place. But when machine learning and data mining methods were deployed in crime analysis, crime analysis and predication accuracy increased dramatically. In this study, various types of criminal analysis and prediction using several machine learning and data mining techniques, based o</p> ... Show More
View Publication
Scopus (5)
Crossref (2)
Scopus Clarivate Crossref
Publication Date
Mon Feb 14 2022
Journal Name
Journal Of Educational And Psychological Researches
Comparison between Rush Model Parameters to Completed and Lost Data by Different Methods of Processing Missing Data
...Show More Authors

The current study aims to compare between the assessments of the Rush model’s parameters to the missing and completed data in various ways of processing the missing data. To achieve the aim of the present study, the researcher followed the following steps: preparing Philip Carter test for the spatial capacity which consists of (20) items on a group of (250) sixth scientific stage students in the directorates of Baghdad Education at Al–Rusafa (1st, 2nd and 3rd) for the academic year (2018-2019). Then, the researcher relied on a single-parameter model to analyze the data. The researcher used Bilog-mg3 model to check the hypotheses, data and match them with the model. In addition

... Show More
View Publication Preview PDF
Publication Date
Sat Jul 01 2017
Journal Name
2017 Computing Conference
Protecting a sensitive dataset using a time based password in big data
...Show More Authors

View Publication
Crossref (1)
Crossref
Publication Date
Wed Dec 30 2020
Journal Name
Iraqi Journal Of Science
A Comparison of Different Estimation Methods to Handle Missing Data in Explanatory Variables
...Show More Authors

Missing data is one of the problems that may occur in regression models. This problem is usually handled by deletion mechanism available in statistical software. This method reduces statistical inference values because deletion affects sample size. In this paper, Expectation Maximization algorithm (EM), Multicycle-Expectation-Conditional Maximization algorithm (MC-ECM), Expectation-Conditional Maximization Either (ECME), and Recurrent Neural Networks (RNN) are used to estimate multiple regression models when explanatory variables have some missing values. Experimental dataset were generated using Visual Basic programming language with missing values of explanatory variables according to a missing mechanism at random general pattern and s

... Show More
View Publication Preview PDF
Scopus Crossref
Publication Date
Tue Aug 31 2021
Journal Name
Iraqi Journal Of Science
Application of Neural Network Analysis for Seismic Data to Differentiate Reservoir Units of Yamama Formation in Nasiriya Oilfield A Case Study in Southern Iraq
...Show More Authors

      The EMERGE application from Hampsson-Russell suite programs was used in the present study. It is an interesting domain for seismic attributes that predict some of reservoir three dimensional or two dimensional properties, as well as their combination. The objective of this study is to differentiate reservoir/non reservoir units with well data in the Yamama Formation by using seismic tools. P-impedance volume (density x velocity of P-wave) was used in this research to  perform a three dimensional seismic model on the oilfield of Nasiriya by using post-stack data of  5 wells. The data (training and application) were utilized in the EMERGE analysis for estimating the reservoir properties of P-wave ve

... Show More
View Publication Preview PDF
Scopus Crossref
Publication Date
Tue Dec 20 2022
Journal Name
2022 International Conference On Computer And Applications (icca)
Improve Data Mining Techniques with a High-Performance Cluster
...Show More Authors

View Publication
Scopus Crossref
Publication Date
Fri Sep 30 2022
Journal Name
Iraqi Journal Of Science
Educational Data Mining For Predicting Academic Student Performance Using Active Classification
...Show More Authors

     The increasing amount of educational data has rapidly in the latest few years. The Educational Data Mining (EDM) techniques are utilized to detect the valuable pattern so that improves the educational process and to obtain high performance of all educational elements. The proposed work contains three stages: preprocessing, features selection, and an active classification stage. The dataset was collected using EDM that had a lack in the label data, it contained 2050 records collected by using questionnaires and by using the students’ academic records. There are twenty-five features that were combined from the following five factors: (curriculum, teacher, student, the environment of education, and the family). Active learning ha

... Show More
View Publication Preview PDF
Crossref (2)
Scopus Crossref
Publication Date
Wed Nov 01 2017
Journal Name
Journal Of Economics And Administrative Sciences
Estimate missing value by use analyses of covariance method for split block-design
...Show More Authors

The research aims to estimate missing values using covariance analysis method Coons way to the variable response or dependent variable that represents the main character studied in a type of multi-factor designs experiments called split block-design (SBED) so as to increase the accuracy of the analysis results and the accuracy of statistical tests based on this type of designs. as it was noted in the theoretical aspect to the design of dissident sectors and statistical analysis have to analyze the variation in the experience of experiment )SBED) and the use of covariance way coons analysis according to two methods to estimate the missing value, either in the practical side of it has been implemented field experiment wheat crop in

... Show More
View Publication Preview PDF
Crossref
Publication Date
Fri Apr 26 2019
Journal Name
Journal Of Contemporary Medical Sciences
Breast Cancer Decisive Parameters for Iraqi Women via Data Mining Techniques
...Show More Authors

Objective This research investigates Breast Cancer real data for Iraqi women, these data are acquired manually from several Iraqi Hospitals of early detection for Breast Cancer. Data mining techniques are used to discover the hidden knowledge, unexpected patterns, and new rules from the dataset, which implies a large number of attributes. Methods Data mining techniques manipulate the redundant or simply irrelevant attributes to discover interesting patterns. However, the dataset is processed via Weka (The Waikato Environment for Knowledge Analysis) platform. The OneR technique is used as a machine learning classifier to evaluate the attribute worthy according to the class value. Results The evaluation is performed using

... Show More
View Publication Preview PDF
Crossref (2)
Crossref