Big data analysis has important applications in many areas such as sensor networks and connected healthcare. High volume and velocity of big data bring many challenges to data analysis. One possible solution is to summarize the data and provides a manageable data structure to hold a scalable summarization of data for efficient and effective analysis. This research extends our previous work on developing an effective technique to create, organize, access, and maintain summarization of big data and develops algorithms for Bayes classification and entropy discretization of large data sets using the multi-resolution data summarization structure. Bayes classification and data discretization play essential roles in many learning algorithms such as decision tree and nearest neighbor search. The proposed method can handle streaming data efficiently and, for entropy discretization, provide su the optimal split value.
The issue of penalized regression model has received considerable critical attention to variable selection. It plays an essential role in dealing with high dimensional data. Arctangent denoted by the Atan penalty has been used in both estimation and variable selection as an efficient method recently. However, the Atan penalty is very sensitive to outliers in response to variables or heavy-tailed error distribution. While the least absolute deviation is a good method to get robustness in regression estimation. The specific objective of this research is to propose a robust Atan estimator from combining these two ideas at once. Simulation experiments and real data applications show that the proposed LAD-Atan estimator
... Show MoreThe Machine learning methods, which are one of the most important branches of promising artificial intelligence, have great importance in all sciences such as engineering, medical, and also recently involved widely in statistical sciences and its various branches, including analysis of survival, as it can be considered a new branch used to estimate the survival and was parallel with parametric, nonparametric and semi-parametric methods that are widely used to estimate survival in statistical research. In this paper, the estimate of survival based on medical images of patients with breast cancer who receive their treatment in Iraqi hospitals was discussed. Three algorithms for feature extraction were explained: The first principal compone
... Show MoreGround-based active optical sensors (GBAOS) have been successfully used in agriculture to predict crop yield potential (YP) early in the season and to improvise N rates for optimal crop yield. However, the models were found weak or inconsistent due to environmental variation especially rainfall. The objectives of the study were to evaluate if GBAOS could predict YP across multiple locations, soil types, cultivation systems, and rainfall differences. This study was carried from 2011 to 2013 on corn (Zea mays L.) in North Dakota, and in 2017 in potatoes in Maine. Six N rates were used on 50 sites in North Dakota and 12 N rates on two sites, one dryland and one irrigated, in Maine. Two active GBAOS used for this study were GreenSeeker and Holl
... Show MoreThe financial markets are one of the sectors whose data is characterized by continuous movement in most of the times and it is constantly changing, so it is difficult to predict its trends , and this leads to the need of methods , means and techniques for making decisions, and that pushes investors and analysts in the financial markets to use various and different methods in order to reach at predicting the movement of the direction of the financial markets. In order to reach the goal of making decisions in different investments, where the algorithm of the support vector machine and the CART regression tree algorithm are used to classify the stock data in order to determine
... Show MoreThis study aims to estimate the accuracy of digital elevation models (DEM) which are created with exploitation of open source Google Earth data and comparing with the widely available DEM datasets, Shuttle Radar Topography Mission (SRTM), version 3, and Advanced Spaceborne Thermal Emission and Reflection Radiometer Global Digital Elevation Model (ASTER GDEM), version 2. The GPS technique is used in this study to produce digital elevation raster with a high level of accuracy, as reference raster, compared to the DEM datasets. Baghdad University, Al Jadriya campus, is selected as a study area. Besides, 151 reference points were created within the study area to evaluate the results based on the values of RMS.Furthermore, th
... Show MoreMersing is one of the places that have the potential for wind power development in Malaysia. Researchers often suggest it as an ideal place for generating electricity from wind power. However, before a location is chosen, several factors need to be considered. By analyzing the location ahead of time, resource waste can be avoided and maximum profitability to various parties can be realized. For this study, the focus is to identify the distribution of the wind speed of Mersing and to determine the optimal average of wind speed. This study is critical because the wind speed data for any region has its distribution. It changes daily and by season. Moreover, no determination has been made regarding selecting the average wind speed used for w
... Show MoreThe data preprocessing step is an important step in web usage mining because of the nature of log data, which are heterogeneous, unstructured, and noisy. Given the scalability and efficiency of algorithms in pattern discovery, a preprocessing step must be applied. In this study, the sequential methodologies utilized in the preprocessing of data from web server logs, with an emphasis on sub-phases, such as session identification, user identification, and data cleansing, are comprehensively evaluated and meticulously examined.
In this research want to make analysis for some indicators and it's classifications that related with the teaching process and the scientific level for graduate studies in the university by using analysis of variance for ranked data for repeated measurements instead of the ordinary analysis of variance . We reach many conclusions for the
important classifications for each indicator that has affected on the teaching process. &nb
... Show MoreIntroduction: Nitrofurantoin (NFT) is abroad spectrum bactericidal antibiotic. The bioavailability of NFT is affected by many factors. Samafurantin® tablets containing 50 mg NFT were manufactured by Samarra drug industry. Urinary excretion studies were employed since; the urinary tract is the main site of NFT action and excretion. Objective: The objective of the study was to investigate the effect of Uricol® and food on secondary pharmacokinetic parameters of Samafurantin® tablets with different doses by applying urinary data. Methods: Twelve healthy male volunteers participated in this study. Urine samples were collected from each volunteer after overnight fasting at a specified time intervals which considered as a blank sample for meas
... Show MoreIn this research, a factorial experiment (4*4) was studied, applied in a completely random block design, with a size of observations, where the design of experiments is used to study the effect of transactions on experimental units and thus obtain data representing experiment observations that The difference in the application of these transactions under different environmental and experimental conditions It causes noise that affects the observation value and thus an increase in the mean square error of the experiment, and to reduce this noise, multiple wavelet reduction was used as a filter for the observations by suggesting an improved threshold that takes into account the different transformation levels based on the logarithm of the b
... Show More