Big data analysis has important applications in many areas such as sensor networks and connected healthcare. High volume and velocity of big data bring many challenges to data analysis. One possible solution is to summarize the data and provides a manageable data structure to hold a scalable summarization of data for efficient and effective analysis. This research extends our previous work on developing an effective technique to create, organize, access, and maintain summarization of big data and develops algorithms for Bayes classification and entropy discretization of large data sets using the multi-resolution data summarization structure. Bayes classification and data discretization play essential roles in many learning algorithms such as decision tree and nearest neighbor search. The proposed method can handle streaming data efficiently and, for entropy discretization, provide su the optimal split value.
The issue of penalized regression model has received considerable critical attention to variable selection. It plays an essential role in dealing with high dimensional data. Arctangent denoted by the Atan penalty has been used in both estimation and variable selection as an efficient method recently. However, the Atan penalty is very sensitive to outliers in response to variables or heavy-tailed error distribution. While the least absolute deviation is a good method to get robustness in regression estimation. The specific objective of this research is to propose a robust Atan estimator from combining these two ideas at once. Simulation experiments and real data applications show that the proposed LAD-Atan estimator
... Show MoreIt has increasingly been recognised that the future developments in geospatial data handling will centre on geospatial data on the web: Volunteered Geographic Information (VGI). The evaluation of VGI data quality, including positional and shape similarity, has become a recurrent subject in the scientific literature in the last ten years. The OpenStreetMap (OSM) project is the most popular one of the leading platforms of VGI datasets. It is an online geospatial database to produce and supply free editable geospatial datasets for a worldwide. The goal of this paper is to present a comprehensive overview of the quality assurance of OSM data. In addition, the credibility of open source geospatial data is discussed, highlight
... Show MoreThe Machine learning methods, which are one of the most important branches of promising artificial intelligence, have great importance in all sciences such as engineering, medical, and also recently involved widely in statistical sciences and its various branches, including analysis of survival, as it can be considered a new branch used to estimate the survival and was parallel with parametric, nonparametric and semi-parametric methods that are widely used to estimate survival in statistical research. In this paper, the estimate of survival based on medical images of patients with breast cancer who receive their treatment in Iraqi hospitals was discussed. Three algorithms for feature extraction were explained: The first principal compone
... Show MoreGround-based active optical sensors (GBAOS) have been successfully used in agriculture to predict crop yield potential (YP) early in the season and to improvise N rates for optimal crop yield. However, the models were found weak or inconsistent due to environmental variation especially rainfall. The objectives of the study were to evaluate if GBAOS could predict YP across multiple locations, soil types, cultivation systems, and rainfall differences. This study was carried from 2011 to 2013 on corn (Zea mays L.) in North Dakota, and in 2017 in potatoes in Maine. Six N rates were used on 50 sites in North Dakota and 12 N rates on two sites, one dryland and one irrigated, in Maine. Two active GBAOS used for this study were GreenSeeker and Holl
... Show MoreIt has increasingly been recognised that the future developments in geospatial data handling will centre on geospatial data on the web: Volunteered Geographic Information (VGI). The evaluation of VGI data quality, including positional and shape similarity, has become a recurrent subject in the scientific literature in the last ten years. The OpenStreetMap (OSM) project is the most popular one of the leading platforms of VGI datasets. It is an online geospatial database to produce and supply free editable geospatial datasets for a worldwide. The goal of this paper is to present a comprehensive overview of the quality assurance of OSM data. In addition, the credibility of open source geospatial data is discussed, highlighting the diff
... Show MoreThe issue of penalized regression model has received considerable critical attention to variable selection. It plays an essential role in dealing with high dimensional data. Arctangent denoted by the Atan penalty has been used in both estimation and variable selection as an efficient method recently. However, the Atan penalty is very sensitive to outliers in response to variables or heavy-tailed error distribution. While the least absolute deviation is a good method to get robustness in regression estimation. The specific objective of this research is to propose a robust Atan estimator from combining these two ideas at once. Simulation experiments and real data applications show that the p
... Show MoreThis study aims to estimate the accuracy of digital elevation models (DEM) which are created with exploitation of open source Google Earth data and comparing with the widely available DEM datasets, Shuttle Radar Topography Mission (SRTM), version 3, and Advanced Spaceborne Thermal Emission and Reflection Radiometer Global Digital Elevation Model (ASTER GDEM), version 2. The GPS technique is used in this study to produce digital elevation raster with a high level of accuracy, as reference raster, compared to the DEM datasets. Baghdad University, Al Jadriya campus, is selected as a study area. Besides, 151 reference points were created within the study area to evaluate the results based on the values of RMS.Furthermore, th
... Show MoreMersing is one of the places that have the potential for wind power development in Malaysia. Researchers often suggest it as an ideal place for generating electricity from wind power. However, before a location is chosen, several factors need to be considered. By analyzing the location ahead of time, resource waste can be avoided and maximum profitability to various parties can be realized. For this study, the focus is to identify the distribution of the wind speed of Mersing and to determine the optimal average of wind speed. This study is critical because the wind speed data for any region has its distribution. It changes daily and by season. Moreover, no determination has been made regarding selecting the average wind speed used for w
... Show MoreIntroduction: Nitrofurantoin (NFT) is abroad spectrum bactericidal antibiotic. The bioavailability of NFT is affected by many factors. Samafurantin® tablets containing 50 mg NFT were manufactured by Samarra drug industry. Urinary excretion studies were employed since; the urinary tract is the main site of NFT action and excretion. Objective: The objective of the study was to investigate the effect of Uricol® and food on secondary pharmacokinetic parameters of Samafurantin® tablets with different doses by applying urinary data. Methods: Twelve healthy male volunteers participated in this study. Urine samples were collected from each volunteer after overnight fasting at a specified time intervals which considered as a blank sample for meas
... Show More