The region-based association analysis has been proposed to capture the collective behavior of sets of variants by testing the association of each set instead of individual variants with the disease. Such an analysis typically involves a list of unphased multiple-locus genotypes with potentially sparse frequencies in cases and controls. To tackle the problem of the sparse distribution, a two-stage approach was proposed in literature: In the first stage, haplotypes are computationally inferred from genotypes, followed by a haplotype coclassification. In the second stage, the association analysis is performed on the inferred haplotype groups. If a haplotype is unevenly distributed between the case and control samples, this haplotype is labeled as a risk haplotype. Unfortunately, the in-silico reconstruction of haplotypes might produce a proportion of false haplotypes which hamper the detection of rare but true haplotypes. Here, to address the issue, we propose an alternative approach: In Stage 1, we cluster genotypes instead of inferred haplotypes and estimate the risk genotypes based on a finite mixture model. In Stage 2, we infer risk haplotypes from risk genotypes inferred from the previous stage. To estimate the finite mixture model, we propose an EM algorithm with a novel data partition-based initialization. The performance of the proposed procedure is assessed by simulation studies and a real data analysis. Compared to the existing multiple Z-test procedure, we find that the power of genome-wide association studies can be increased by using the proposed procedure.
This paper provides an attempt for modeling rate of penetration (ROP) for an Iraqi oil field with aid of mud logging data. Data of Umm Radhuma formation was selected for this modeling. These data include weight on bit, rotary speed, flow rate and mud density. A statistical approach was applied on these data for improving rate of penetration modeling. As result, an empirical linear ROP model has been developed with good fitness when compared with actual data. Also, a nonlinear regression analysis of different forms was attempted, and the results showed that the power model has good predicting capability with respect to other forms.
Linear discriminant analysis and logistic regression are the most widely used in multivariate statistical methods for analysis of data with categorical outcome variables .Both of them are appropriate for the development of linear classification models .linear discriminant analysis has been that the data of explanatory variables must be distributed multivariate normal distribution. While logistic regression no assumptions on the distribution of the explanatory data. Hence ,It is assumed that logistic regression is the more flexible and more robust method in case of violations of these assumptions.
In this paper we have been focus for the comparison between three forms for classification data belongs
... Show MoreToday, problems of spatial data integration have been further complicated by the rapid development in communication technologies and the increasing amount of available data sources on the World Wide Web. Thus, web-based geospatial data sources can be managed by different communities and the data themselves can vary in respect to quality, coverage, and purpose. Integrating such multiple geospatial datasets remains a challenge for geospatial data consumers. This paper concentrates on the integration of geometric and classification schemes for official data, such as Ordnance Survey (OS) national mapping data, with volunteered geographic information (VGI) data, such as the data derived from the OpenStreetMap (OSM) project. Useful descriptions o
... Show MoreThe purpose of this work is to concurrently estimate the UVvisible spectra of binary combinations of piroxicam and mefenamic acid using the chemometric approach. To create the model, spectral data from 73 samples (with wavelengths between 200 and 400 nm) were employed. A two-layer artificial neural network model was created, with two neurons in the output layer and fourteen neurons in the hidden layer. The model was trained to simulate the concentrations and spectra of piroxicam and mefenamic acid. For piroxicam and mefenamic acid, respectively, the Levenberg-Marquardt algorithm with feed-forward back-propagation learning produced root mean square errors of prediction of 0.1679 μg/mL and 0.1154 μg/mL, with coefficients of determination of
... Show MoreBig data analysis is essential for modern applications in areas such as healthcare, assistive technology, intelligent transportation, environment and climate monitoring. Traditional algorithms in data mining and machine learning do not scale well with data size. Mining and learning from big data need time and memory efficient techniques, albeit the cost of possible loss in accuracy. We have developed a data aggregation structure to summarize data with large number of instances and data generated from multiple data sources. Data are aggregated at multiple resolutions and resolution provides a trade-off between efficiency and accuracy. The structure is built once, updated incrementally, and serves as a common data input for multiple mining an
... Show MoreWith the growth of mobile phones, short message service (SMS) became an essential text communication service. However, the low cost and ease use of SMS led to an increase in SMS Spam. In this paper, the characteristics of SMS spam has studied and a set of features has introduced to get rid of SMS spam. In addition, the problem of SMS spam detection was addressed as a clustering analysis that requires a metaheuristic algorithm to find the clustering structures. Three differential evolution variants viz DE/rand/1, jDE/rand/1, jDE/best/1, are adopted for solving the SMS spam problem. Experimental results illustrate that the jDE/best/1 produces best results over other variants in terms of accuracy, false-positive rate and false-negative
... Show MoreThe Light and the Dark is the fourth novel in a series written by Charles Percy Snow where it tackles a phase of gifted scholar and remarkable individual Roy Calvert as he search for a source of power and meaning in life to relieve his inner turmoil. The character Roy Calvert is based on Snow's friend, Charles Allbery who exposes the message the character of Roy intends to convey in a certain phase of his life and the prophecy the novel carries amid catastrophe so widespread in the thirties of the twentieth century
Abstract
Bivariate time series modeling and forecasting have become a promising field of applied studies in recent times. For this purpose, the Linear Autoregressive Moving Average with exogenous variable ARMAX model is the most widely used technique over the past few years in modeling and forecasting this type of data. The most important assumptions of this model are linearity and homogenous for random error variance of the appropriate model. In practice, these two assumptions are often violated, so the Generalized Autoregressive Conditional Heteroscedasticity (ARCH) and (GARCH) with exogenous varia
... Show More