Classification of imbalanced data is an important issue. Many algorithms have been developed for classification, such as Back Propagation (BP) neural networks, decision tree, Bayesian networks etc., and have been used repeatedly in many fields. These algorithms speak of the problem of imbalanced data, where there are situations that belong to more classes than others. Imbalanced data result in poor performance and bias to a class without other classes. In this paper, we proposed three techniques based on the Over-Sampling (O.S.) technique for processing imbalanced dataset and redistributing it and converting it into balanced dataset. These techniques are (Improved Synthetic Minority Over-Sampling Technique (Improved SMOTE), Borderline-SMOTE + Imbalanced Ratio(IR), Adaptive Synthetic Sampling (ADASYN) +IR) Algorithm, where the work these techniques are generate the synthetic samples for the minority class to achieve balance between minority and majority classes and then calculate the IR between classes of minority and majority. Experimental results show ImprovedSMOTE algorithm outperform the Borderline-SMOTE + IR and ADASYN + IR algorithms because it achieves a high balance between minority and majority classes.
Abstract
The problem of missing data represents a major obstacle before researchers in the process of data analysis in different fields since , this problem is a recurrent one in all fields of study including social , medical , astronomical and clinical experiments .
The presence of such a problem within the data to be studied may influence negatively on the analysis and it may lead to misleading conclusions , together with the fact that these conclusions that result from a great bias caused by that problem in spite of the efficiency of wavelet methods but they are also affected by the missing of data , in addition to the impact of the problem of miss of accuracy estimation
... Show MoreThe question of estimation took a great interest in some engineering, statistical applications, various applied, human sciences, the methods provided by it helped to identify and accurately the many random processes.
In this paper, methods were used through which the reliability function, risk function, and estimation of the distribution parameters were used, and the methods are (Moment Method, Maximum Likelihood Method), where an experimental study was conducted using a simulation method for the purpose of comparing the methods to show which of these methods are competent in practical application This is based on the observations generated from the Rayleigh logarithmic distribution (RL) with sample sizes
... Show MoreMany of the dynamic processes in different sciences are described by models of differential equations. These models explain the change in the behavior of the studied process over time by linking the behavior of the process under study with its derivatives. These models often contain constant and time-varying parameters that vary according to the nature of the process under study in this We will estimate the constant and time-varying parameters in a sequential method in several stages. In the first stage, the state variables and their derivatives are estimated in the method of penalized splines(p- splines) . In the second stage we use pseudo lest square to estimate constant parameters, For the third stage, the rem
... Show MoreIs in this research review of the way minimum absolute deviations values based on linear programming method to estimate the parameters of simple linear regression model and give an overview of this model. We were modeling method deviations of the absolute values proposed using a scale of dispersion and composition of a simple linear regression model based on the proposed measure. Object of the work is to find the capabilities of not affected by abnormal values by using numerical method and at the lowest possible recurrence.
This research includes the study of dual data models with mixed random parameters, which contain two types of parameters, the first is random and the other is fixed. For the random parameter, it is obtained as a result of differences in the marginal tendencies of the cross sections, and for the fixed parameter, it is obtained as a result of differences in fixed limits, and random errors for each section. Accidental bearing the characteristic of heterogeneity of variance in addition to the presence of serial correlation of the first degree, and the main objective in this research is the use of efficient methods commensurate with the paired data in the case of small samples, and to achieve this goal, the feasible general least squa
... Show MoreIn general, the importance of cluster analysis is that one can evaluate elements by clustering multiple homogeneous data; the main objective of this analysis is to collect the elements of a single, homogeneous group into different divisions, depending on many variables. This method of analysis is used to reduce data, generate hypotheses and test them, as well as predict and match models. The research aims to evaluate the fuzzy cluster analysis, which is a special case of cluster analysis, as well as to compare the two methods—classical and fuzzy cluster analysis. The research topic has been allocated to the government and private hospitals. The sampling for this research was comprised of 288 patients being treated in 10 hospitals. As t
... Show Moreلقد كان حرص المؤلف على إصدار هذا الكتاب نابعا ً من قناعة تامة بأن مجال التقويم والقياس بحاجة إلى كتاب علمي حديث يتناول عرض أدوات الاختبار والقياس والمتمثلة بالصدق والثبات ويتسم بالوضوح في التعبير عن المفاهيم والمصطلحات والأنواع لكل منها ليكون وسيلة مبسطة بأيدي الأساتذة والباحثين وطلبتي الدراسات العليا الماجستير والدكتوراه لإستخراج صدق وثبات الاختبارات والمقاييس بطرق إحصائية متقدمة من خلال إستخدام البرنا
... Show MoreBiomass is a popular renewable carbon source because it has a lot of potential as a substitute for scarce fossil fuels and has been used to make essential compounds like 5-hydroxymethylfurfural (HMF). One of the main components of biomass, glucose, has been extensively studied as a precursor for the production of HMF. Several efforts have been made to find efficient and repeatable procedures for the synthesis of HMF, a chemical platform used in the manufacturing of fuels and other high-value compounds. Sulfonated graphite (SG) was produced from spent dry batteries and utilized as a catalyst to convert glucose to 5-hydroxymethylfurfural (HMF). Temperature, reaction time, and catalyst loading were the variables studied. When dimethyl sulfo
... Show MoreIn this study; a three-dimensional model was created to simulate groundwater in Al-Haydariyah area of the governorate of Al-Najaf. A solid model was created to utilize the cross sections of 25 boreholes in the research region, and it was made out of two layers: sand and clay. The steady-state calibration was employed in six observation wells to calibrate the model and establish the hydraulic conductivity, which was 17.49 m/d for sand and 1.042 m/d for clay, with a recharge rate of 0.00007 m/day. The wells in the research region were reallocated with a distance of 1500 m between each well, resulting in 140 wells evenly distributed throughout the study area and with a discharge of 5 l/s, and the scenarios were run for 1000
... Show More