Big data analysis is essential for modern applications in areas such as healthcare, assistive technology, intelligent transportation, environment and climate monitoring. Traditional algorithms in data mining and machine learning do not scale well with data size. Mining and learning from big data need time and memory efficient techniques, albeit the cost of possible loss in accuracy. We have developed a data aggregation structure to summarize data with large number of instances and data generated from multiple data sources. Data are aggregated at multiple resolutions and resolution provides a trade-off between efficiency and accuracy. The structure is built once, updated incrementally, and serves as a common data input for multiple mining and learning algorithms. Data mining algorithms are modified to accept the aggregated data as input. Hierarchical data aggregation serves as a paradigm under which novel …
In this paper, some commonly used hierarchical cluster techniques have been compared. A comparison was made between the agglomerative hierarchical clustering technique and the k-means technique, which includes the k-mean technique, the variant K-means technique, and the bisecting K-means, although the hierarchical cluster technique is considered to be one of the best clustering methods. It has a limited usage due to the time complexity. The results, which are calculated based on the analysis of the characteristics of the cluster algorithms and the nature of the data, showed that the bisecting K-means technique is the best compared to the rest of the other methods used.
Prediction of the formation of pore and fracture pressure before constructing a drilling wells program are a crucial since it helps to prevent several drilling operations issues including lost circulation, kick, pipe sticking, blowout, and other issues. IP (Interactive Petrophysics) software is used to calculate and measure pore and fracture pressure. Eaton method, Matthews and Kelly, Modified Eaton, and Barker and Wood equations are used to calculate fracture pressure, whereas only Eaton method is used to measure pore pressure. These approaches are based on log data obtained from six wells, three from the north dome; BUCN-52, BUCN-51, BUCN-43 and the other from the south dome; BUCS-49, BUCS-48, BUCS-47. Along with the overburden pressur
... Show MorePrediction of the formation of pore and fracture pressure before constructing a drilling wells program are a crucial since it helps to prevent several drilling operations issues including lost circulation, kick, pipe sticking, blowout, and other issues. IP (Interactive Petrophysics) software is used to calculate and measure pore and fracture pressure. Eaton method, Matthews and Kelly, Modified Eaton, and Barker and Wood equations are used to calculate fracture pressure, whereas only Eaton method is used to measure pore pressure. These approaches are based on log data obtained from six wells, three from the north dome; BUCN-52, BUCN-51, BUCN-43 and the other from the south dome; BUCS-49, BUCS-48, BUCS-47. Along with the overburden pr
... Show MoreMultilocus haplotype analysis of candidate variants with genome wide association studies (GWAS) data may provide evidence of association with disease, even when the individual loci themselves do not. Unfortunately, when a large number of candidate variants are investigated, identifying risk haplotypes can be very difficult. To meet the challenge, a number of approaches have been put forward in recent years. However, most of them are not directly linked to the disease-penetrances of haplotypes and thus may not be efficient. To fill this gap, we propose a mixture model-based approach for detecting risk haplotypes. Under the mixture model, haplotypes are clustered directly according to their estimated d
This research a study model of linear regression problem of autocorrelation of random error is spread when a normal distribution as used in linear regression analysis for relationship between variables and through this relationship can predict the value of a variable with the values of other variables, and was comparing methods (method of least squares, method of the average un-weighted, Thiel method and Laplace method) using the mean square error (MSE) boxes and simulation and the study included fore sizes of samples (15, 30, 60, 100). The results showed that the least-squares method is best, applying the fore methods of buckwheat production data and the cultivated area of the provinces of Iraq for years (2010), (2011), (2012),
... Show MoreNoor oil field is one of smallest fields in Missan province. Twelve well penetrates the Mishrif Formation in Noor field and eight of them were selected for this study. Mishrif formation is one of the most important reservoirs in Noor field and it consists of one anticline dome and bounded by the Khasib formation at the top and the Rumaila formation at the bottom. The reservoir was divided into eight units separated by isolated units according to partition taken by a rounding fields.
In this paper histograms frequency distribution of the porosity, permeability, and water saturation were plotted for MA unit of Mishrif formation in Noor field, and then transformed to the normal distribution by applying the Box-Cox transformation alg
... Show MorePrediction of daily rainfall is important for flood forecasting, reservoir operation, and many other hydrological applications. The artificial intelligence (AI) algorithm is generally used for stochastic forecasting rainfall which is not capable to simulate unseen extreme rainfall events which become common due to climate change. A new model is developed in this study for prediction of daily rainfall for different lead times based on sea level pressure (SLP) which is physically related to rainfall on land and thus able to predict unseen rainfall events. Daily rainfall of east coast of Peninsular Malaysia (PM) was predicted using SLP data over the climate domain. Five advanced AI algorithms such as extreme learning machine (ELM), Bay
... Show More