Data mining is a data analysis process using software to find certain patterns or rules in a large amount of data, which is expected to provide knowledge to support decisions. However, missing value in data mining often leads to a loss of information. The purpose of this study is to improve the performance of data classification with missing values, precisely and accurately. The test method is carried out using the Car Evaluation dataset from the UCI Machine Learning Repository. RStudio and RapidMiner tools were used for testing the algorithm. This study will result in a data analysis of the tested parameters to measure the performance of the algorithm. Using test variations: performance at C5.0, C4.5, and k-NN at 0% missing rate, performance at C5.0, C4.5, and k-NN at 5–50% missing rate, performance at C5.0 + k-NNI, C4.5 + k-NNI, and k-NN + k-NNI classifier at 5–50% missing rate, and performance at C5.0 + CMI, C4.5 + CMI, and k-NN + CMI classifier at 5–50% missing rate, The results show that C5.0 with k-NNI produces better classification accuracy than other tested imputation and classification algorithms. For example, with 35% of the dataset missing, this method obtains 93.40% validation accuracy and 92% test accuracy. C5.0 with k-NNI also offers fast processing times compared with other methods.
In this study, we made a comparison between LASSO & SCAD methods, which are two special methods for dealing with models in partial quantile regression. (Nadaraya & Watson Kernel) was used to estimate the non-parametric part ;in addition, the rule of thumb method was used to estimate the smoothing bandwidth (h). Penalty methods proved to be efficient in estimating the regression coefficients, but the SCAD method according to the mean squared error criterion (MSE) was the best after estimating the missing data using the mean imputation method
Crime is a threat to any nation’s security administration and jurisdiction. Therefore, crime analysis becomes increasingly important because it assigns the time and place based on the collected spatial and temporal data. However, old techniques, such as paperwork, investigative judges, and statistical analysis, are not efficient enough to predict the accurate time and location where the crime had taken place. But when machine learning and data mining methods were deployed in crime analysis, crime analysis and predication accuracy increased dramatically. In this study, various types of criminal analysis and prediction using several machine learning and data mining techniques, based o
The current study aims to compare between the assessments of the Rush model’s parameters to the missing and completed data in various ways of processing the missing data. To achieve the aim of the present study, the researcher followed the following steps: preparing Philip Carter test for the spatial capacity which consists of (20) items on a group of (250) sixth scientific stage students in the directorates of Baghdad Education at Al–Rusafa (1st, 2nd and 3rd) for the academic year (2018-2019). Then, the researcher relied on a single-parameter model to analyze the data. The researcher used Bilog-mg3 model to check the hypotheses, data and match them with the model. In addition
... Show MoreMissing data is one of the problems that may occur in regression models. This problem is usually handled by deletion mechanism available in statistical software. This method reduces statistical inference values because deletion affects sample size. In this paper, Expectation Maximization algorithm (EM), Multicycle-Expectation-Conditional Maximization algorithm (MC-ECM), Expectation-Conditional Maximization Either (ECME), and Recurrent Neural Networks (RNN) are used to estimate multiple regression models when explanatory variables have some missing values. Experimental dataset were generated using Visual Basic programming language with missing values of explanatory variables according to a missing mechanism at random general pattern and s
... Show MoreThe EMERGE application from Hampsson-Russell suite programs was used in the present study. It is an interesting domain for seismic attributes that predict some of reservoir three dimensional or two dimensional properties, as well as their combination. The objective of this study is to differentiate reservoir/non reservoir units with well data in the Yamama Formation by using seismic tools. P-impedance volume (density x velocity of P-wave) was used in this research to perform a three dimensional seismic model on the oilfield of Nasiriya by using post-stack data of 5 wells. The data (training and application) were utilized in the EMERGE analysis for estimating the reservoir properties of P-wave ve
... Show MoreThe increasing amount of educational data has rapidly in the latest few years. The Educational Data Mining (EDM) techniques are utilized to detect the valuable pattern so that improves the educational process and to obtain high performance of all educational elements. The proposed work contains three stages: preprocessing, features selection, and an active classification stage. The dataset was collected using EDM that had a lack in the label data, it contained 2050 records collected by using questionnaires and by using the students’ academic records. There are twenty-five features that were combined from the following five factors: (curriculum, teacher, student, the environment of education, and the family). Active learning ha
... Show MoreThe research aims to estimate missing values using covariance analysis method Coons way to the variable response or dependent variable that represents the main character studied in a type of multi-factor designs experiments called split block-design (SBED) so as to increase the accuracy of the analysis results and the accuracy of statistical tests based on this type of designs. as it was noted in the theoretical aspect to the design of dissident sectors and statistical analysis have to analyze the variation in the experience of experiment )SBED) and the use of covariance way coons analysis according to two methods to estimate the missing value, either in the practical side of it has been implemented field experiment wheat crop in
... Show MoreObjective This research investigates Breast Cancer real data for Iraqi women, these data are acquired manually from several Iraqi Hospitals of early detection for Breast Cancer. Data mining techniques are used to discover the hidden knowledge, unexpected patterns, and new rules from the dataset, which implies a large number of attributes. Methods Data mining techniques manipulate the redundant or simply irrelevant attributes to discover interesting patterns. However, the dataset is processed via Weka (The Waikato Environment for Knowledge Analysis) platform. The OneR technique is used as a machine learning classifier to evaluate the attribute worthy according to the class value. Results The evaluation is performed using
... Show More