Preferred Language
Articles
/
ijs-5948
Application of Data Mining and Imputation Algorithms for Missing Value Handling: A Study Case Car Evaluation Dataset
...Show More Authors

     Data mining is a data analysis process using software to find certain patterns or rules in a large amount of data, which is expected to provide knowledge to support decisions. However, missing value in data mining often leads to a loss of information. The purpose of this study is to improve the performance of data classification with missing values, ​​precisely and accurately. The test method is carried out using the Car Evaluation dataset from the UCI Machine Learning Repository. RStudio and RapidMiner tools were used for testing the algorithm. This study will result in a data analysis of the tested parameters to measure the performance of the algorithm. Using test variations: performance at C5.0, C4.5, and k-NN at 0% missing rate, performance at C5.0, C4.5, and k-NN at 5–50% missing rate, performance at C5.0 + k-NNI, C4.5 + k-NNI, and k-NN + k-NNI classifier at 5–50% missing rate, and performance at C5.0 + CMI, C4.5 + CMI, and k-NN + CMI classifier at 5–50% missing rate, The results show that C5.0 with k-NNI produces better classification accuracy than other tested imputation and classification algorithms. For example, with 35% of the dataset missing, this method obtains 93.40% validation accuracy and 92% test accuracy. C5.0 with k-NNI also offers fast processing times compared with other methods.

Scopus Crossref
View Publication Preview PDF
Quick Preview PDF
Publication Date
Mon Aug 01 2016
Journal Name
Journal Of Economics And Administrative Sciences
User (K-Means) for clustering in Data Mining with application
...Show More Authors

 

 

  The great scientific progress has led to widespread Information as information accumulates in large databases is important in trying to revise and compile this vast amount of data and, where its purpose to extract hidden information or classified data under their relations with each other in order to take advantage of them for technical purposes.

      And work with data mining (DM) is appropriate in this area because of the importance of research in the (K-Means) algorithm for clustering data in fact applied with effect can be observed in variables by changing the sample size (n) and the number of clusters (K)

... Show More
View Publication Preview PDF
Crossref
Publication Date
Sun Jan 01 2023
Journal Name
Aip Conference Proceedings
Mining categorical Covid-19 data using chi-square and logistic regression algorithms
...Show More Authors

View Publication Preview PDF
Scopus Crossref
Publication Date
Fri Feb 01 2019
Journal Name
Iraqi Journal Of Information & Communications Technology
Evaluation of DDoS attacks Detection in a New Intrusion Dataset Based on Classification Algorithms
...Show More Authors

Intrusion detection system is an imperative role in increasing security and decreasing the harm of the computer security system and information system when using of network. It observes different events in a network or system to decide occurring an intrusion or not and it is used to make strategic decision, security purposes and analyzing directions. This paper describes host based intrusion detection system architecture for DDoS attack, which intelligently detects the intrusion periodically and dynamically by evaluating the intruder group respective to the present node with its neighbors. We analyze a dependable dataset named CICIDS 2017 that contains benign and DDoS attack network flows, which meets certifiable criteria and is ope

... Show More
View Publication Preview PDF
Crossref (14)
Crossref
Publication Date
Tue Mar 30 2021
Journal Name
Baghdad Science Journal
Application of Data Mining Techniques on Tourist Expenses in Malaysia
...Show More Authors

Tourism plays an important role in Malaysia’s economic development as it can boost business opportunity in its surrounding economic. By apply data mining on tourism data for predicting the area of business opportunity is a good choice. Data mining is the process that takes data as input and produces outputs knowledge. Due to the population of travelling in Asia country has increased in these few years. Many entrepreneurs start their owns business but there are some problems such as wrongly invest in the business fields and bad services quality which affected their business income. The objective of this paper is to use data mining technology to meet the business needs and customer needs of tourism enterprises and find the most effective

... Show More
View Publication Preview PDF
Scopus (3)
Crossref (1)
Scopus Clarivate Crossref
Publication Date
Sun Jul 31 2022
Journal Name
Iraqi Journal Of Science
A Review of Data Mining and Knowledge Discovery Approaches for Bioinformatics
...Show More Authors

     This review explores the Knowledge Discovery Database (KDD) approach, which supports the bioinformatics domain to progress efficiently, and illustrate their relationship with data mining. Thus, it is important to extract advantages of Data Mining (DM) strategy management such as effectively stressing its role in cost control, which is the principle of competitive intelligence, and the role of it in information management. As well as, its ability to discover hidden knowledge. However, there are many challenges such as inaccurate, hand-written data, and analyzing a large amount of variant information for extracting useful knowledge by using DM strategies. These strategies are successfully applied in several applications as data wa

... Show More
View Publication
Scopus (1)
Crossref (2)
Scopus Crossref
Publication Date
Wed Aug 01 2018
Journal Name
Journal Of Economics And Administrative Sciences
Comparison Some Estimation Methods Of GM(1,1) Model With Missing Data and Practical Application
...Show More Authors

This paper presents a grey model GM(1,1) of the first rank and a variable one and is the basis of the grey system theory , This research dealt  properties of grey model and a set of methods to estimate parameters of the grey model GM(1,1)  is the least square Method (LS) , weighted least square method (WLS), total least square method (TLS) and gradient descent method  (DS). These methods were compared based on two types of standards: Mean square error (MSE), mean absolute percentage error (MAPE), and after comparison using simulation the best method was applied to real data represented by the rate of consumption of the two types of oils a Heavy fuel (HFO) and diesel fuel (D.O) and has been applied several tests to

... Show More
View Publication Preview PDF
Crossref
Publication Date
Sat Jan 01 2022
Journal Name
Turkish Journal Of Physiotherapy And Rehabilitation
classification coco dataset using machine learning algorithms
...Show More Authors

In this paper, we used four classification methods to classify objects and compareamong these methods, these are K Nearest Neighbor's (KNN), Stochastic Gradient Descentlearning (SGD), Logistic Regression Algorithm(LR), and Multi-Layer Perceptron (MLP). Weused MCOCO dataset for classification and detection the objects, these dataset image wererandomly divided into training and testing datasets at a ratio of 7:3, respectively. In randomlyselect training and testing dataset images, converted the color images to the gray level, thenenhancement these gray images using the histogram equalization method, resize (20 x 20) fordataset image. Principal component analysis (PCA) was used for feature extraction, andfinally apply four classification metho

... Show More
Publication Date
Sat Jul 31 2021
Journal Name
Iraqi Journal Of Science
A review of Medical Diagnostics Via Data Mining Techniques
...Show More Authors

Data mining is one of the most popular analysis methods in medical research. It involves finding patterns and correlations in previously unknown datasets. Data mining encompasses various areas of biomedical research, including data collection, clinical decision support, illness or safety monitoring, public health, and inquiry research. Health analytics frequently uses computational methods for data mining, such as clustering, classification, and regression. Studies of large numbers of diverse heterogeneous documents, including biological and electronic information, provided extensive material to medical and health studies.

View Publication Preview PDF
Scopus (1)
Scopus Crossref
Publication Date
Mon Feb 21 2022
Journal Name
Iraqi Journal For Computer Science And Mathematics
Fuzzy C means Based Evaluation Algorithms For Cancer Gene Expression Data Clustering
...Show More Authors

The influx of data in bioinformatics is primarily in the form of DNA, RNA, and protein sequences. This condition places a significant burden on scientists and computers. Some genomics studies depend on clustering techniques to group similarly expressed genes into one cluster. Clustering is a type of unsupervised learning that can be used to divide unknown cluster data into clusters. The k-means and fuzzy c-means (FCM) algorithms are examples of algorithms that can be used for clustering. Consequently, clustering is a common approach that divides an input space into several homogeneous zones; it can be achieved using a variety of algorithms. This study used three models to cluster a brain tumor dataset. The first model uses FCM, whic

... Show More
View Publication
Crossref (1)
Crossref
Publication Date
Fri Sep 30 2022
Journal Name
Journal Of Economics And Administrative Sciences
Semi parametric Estimators for Quantile Model via LASSO and SCAD with Missing Data
...Show More Authors

In this study, we made a comparison between LASSO & SCAD methods, which are two special methods for dealing with models in partial quantile regression. (Nadaraya & Watson Kernel) was used to estimate the non-parametric part ;in addition, the rule of thumb method was used to estimate the smoothing bandwidth (h). Penalty methods proved to be efficient in estimating the regression coefficients, but the SCAD method according to the mean squared error criterion (MSE) was the best after estimating the missing data using the mean imputation method

View Publication Preview PDF
Crossref