Big data analysis has important applications in many areas such as sensor networks and connected healthcare. High volume and velocity of big data bring many challenges to data analysis. One possible solution is to summarize the data and provides a manageable data structure to hold a scalable summarization of data for efficient and effective analysis. This research extends our previous work on developing an effective technique to create, organize, access, and maintain summarization of big data and develops algorithms for Bayes classification and entropy discretization of large data sets using the multi-resolution data summarization structure. Bayes classification and data discretization play essential roles in many learning algorithms such as decision tree and nearest neighbor search. The proposed method can handle streaming data efficiently and, for entropy discretization, provide su the optimal split value.
It has increasingly been recognised that the future developments in geospatial data handling will centre on geospatial data on the web: Volunteered Geographic Information (VGI). The evaluation of VGI data quality, including positional and shape similarity, has become a recurrent subject in the scientific literature in the last ten years. The OpenStreetMap (OSM) project is the most popular one of the leading platforms of VGI datasets. It is an online geospatial database to produce and supply free editable geospatial datasets for a worldwide. The goal of this paper is to present a comprehensive overview of the quality assurance of OSM data. In addition, the credibility of open source geospatial data is discussed, highlighting the diff
... Show MoreThe issue of penalized regression model has received considerable critical attention to variable selection. It plays an essential role in dealing with high dimensional data. Arctangent denoted by the Atan penalty has been used in both estimation and variable selection as an efficient method recently. However, the Atan penalty is very sensitive to outliers in response to variables or heavy-tailed error distribution. While the least absolute deviation is a good method to get robustness in regression estimation. The specific objective of this research is to propose a robust Atan estimator from combining these two ideas at once. Simulation experiments and real data applications show that the p
... Show MoreIt has increasingly been recognised that the future developments in geospatial data handling will centre on geospatial data on the web: Volunteered Geographic Information (VGI). The evaluation of VGI data quality, including positional and shape similarity, has become a recurrent subject in the scientific literature in the last ten years. The OpenStreetMap (OSM) project is the most popular one of the leading platforms of VGI datasets. It is an online geospatial database to produce and supply free editable geospatial datasets for a worldwide. The goal of this paper is to present a comprehensive overview of the quality assurance of OSM data. In addition, the credibility of open source geospatial data is discussed, highlight
... Show MoreMissing data is one of the problems that may occur in regression models. This problem is usually handled by deletion mechanism available in statistical software. This method reduces statistical inference values because deletion affects sample size. In this paper, Expectation Maximization algorithm (EM), Multicycle-Expectation-Conditional Maximization algorithm (MC-ECM), Expectation-Conditional Maximization Either (ECME), and Recurrent Neural Networks (RNN) are used to estimate multiple regression models when explanatory variables have some missing values. Experimental dataset were generated using Visual Basic programming language with missing values of explanatory variables according to a missing mechanism at random general pattern and s
... Show MoreThe Machine learning methods, which are one of the most important branches of promising artificial intelligence, have great importance in all sciences such as engineering, medical, and also recently involved widely in statistical sciences and its various branches, including analysis of survival, as it can be considered a new branch used to estimate the survival and was parallel with parametric, nonparametric and semi-parametric methods that are widely used to estimate survival in statistical research. In this paper, the estimate of survival based on medical images of patients with breast cancer who receive their treatment in Iraqi hospitals was discussed. Three algorithms for feature extraction were explained: The first principal compone
... Show MoreThe study included the collection of samples of raw cow milk to isolate Leuconostoc bacteria, samples were sub cultured on De-Man Rogosa Sharpe-Vancomycin medium, the pure colonies were selected and subjected to the cultural and microscopically tests, according to that 25 cocci bacterial isolates were obtained, then isolates were subjected to biochemical tests. Result of tests showed that 12 isolates belong to the genus Leuconostoc out of 25 cocci bacterial isolates, Vitek2 system was used as a supplementary step. Results of final identification showed that 3 sub species were obtained included Leuconostoc mesenteroides ssp. cremoris 9 out of 12 isolates, while it was 2 isolates of Leuconostoc mesenteroides ssp. mesenteroides and one isol
... Show MoreEconomic analysis plays a pivotal role in managerial decision-making processes. This analysis is predicated on deeply understanding economic forces and market factors influencing corporate strategies and decisions. This paper delves into the role of economic data analysis in managing small and medium-sized enterprises (SMEs) to make strategic decisions and enhance performance. The study underscores the significance of this approach and its impact on corporate outcomes. The research analyzes annual reports from three companies: Al-Mahfaza for Mobile and Internet Financial Payment and Settlement Services Company Limited, Al-Arab for Electronic Payment Company, and Iraq Electronic Gateway for Financial Services Company. The paper concl
... Show MoreThis study aims to estimate the accuracy of digital elevation models (DEM) which are created with exploitation of open source Google Earth data and comparing with the widely available DEM datasets, Shuttle Radar Topography Mission (SRTM), version 3, and Advanced Spaceborne Thermal Emission and Reflection Radiometer Global Digital Elevation Model (ASTER GDEM), version 2. The GPS technique is used in this study to produce digital elevation raster with a high level of accuracy, as reference raster, compared to the DEM datasets. Baghdad University, Al Jadriya campus, is selected as a study area. Besides, 151 reference points were created within the study area to evaluate the results based on the values of RMS.Furthermore, th
... Show MoreMersing is one of the places that have the potential for wind power development in Malaysia. Researchers often suggest it as an ideal place for generating electricity from wind power. However, before a location is chosen, several factors need to be considered. By analyzing the location ahead of time, resource waste can be avoided and maximum profitability to various parties can be realized. For this study, the focus is to identify the distribution of the wind speed of Mersing and to determine the optimal average of wind speed. This study is critical because the wind speed data for any region has its distribution. It changes daily and by season. Moreover, no determination has been made regarding selecting the average wind speed used for w
... Show More