The region-based association analysis has been proposed to capture the collective behavior of sets of variants by testing the association of each set instead of individual variants with the disease. Such an analysis typically involves a list of unphased multiple-locus genotypes with potentially sparse frequencies in cases and controls. To tackle the problem of the sparse distribution, a two-stage approach was proposed in literature: In the first stage, haplotypes are computationally inferred from genotypes, followed by a haplotype coclassification. In the second stage, the association analysis is performed on the inferred haplotype groups. If a haplotype is unevenly distributed between the case and control samples, this haplotype is labeled as a risk haplotype. Unfortunately, the in-silico reconstruction of haplotypes might produce a proportion of false haplotypes which hamper the detection of rare but true haplotypes. Here, to address the issue, we propose an alternative approach: In Stage 1, we cluster genotypes instead of inferred haplotypes and estimate the risk genotypes based on a finite mixture model. In Stage 2, we infer risk haplotypes from risk genotypes inferred from the previous stage. To estimate the finite mixture model, we propose an EM algorithm with a novel data partition-based initialization. The performance of the proposed procedure is assessed by simulation studies and a real data analysis. Compared to the existing multiple Z-test procedure, we find that the power of genome-wide association studies can be increased by using the proposed procedure.
String matching is seen as one of the essential problems in computer science. A variety of computer applications provide the string matching service for their end users. The remarkable boost in the number of data that is created and kept by modern computational devices influences researchers to obtain even more powerful methods for coping with this problem. In this research, the Quick Search string matching algorithm are adopted to be implemented under the multi-core environment using OpenMP directive which can be employed to reduce the overall execution time of the program. English text, Proteins and DNA data types are utilized to examine the effect of parallelization and implementation of Quick Search string matching algorithm on multi-co
... Show MoreIn most manufacturing processes, and in spite of statistical control, several process capability indices refer to non conformance of the true mean (µc ) from the target mean ( µT ), and the variation is also high. In this paper, data have been analyzed and studied for a blow molded plastic product (Zahi Bottle) (ZB). WinQSB software was used to facilitate the statistical process control, and process capability analysis and some of capability indices. The relationship between different process capability indices and the true mean of the process were represented, and then with the standard deviation (σ ), of achievement of process capability value that can reduce the standard deviation value and improve production out of theoretical con
... Show MoreConcrete structures are exposed to aggressive environmental conditions that lead to corrosion of the embedded reinforcement and pre-stressing steel. Consequently, the safety of concrete structures may be compromised, and this requires a significant budgets to repair and maintain critical infrastructure. Prediction of structural safety can lead to significant reductions in maintenance costs by maximizing the impact of investments. The aim of this paper is to establish a framework to assess the reliability of existing post-tensioned concrete bridges. A time-dependent reliability analysis of an existing post-tensioned involving the assessment of Ynys-y-Gwas bridge has been presented in this study. The main cause of failure of this bridge was c
... Show MoreIn this study, we made a comparison between LASSO & SCAD methods, which are two special methods for dealing with models in partial quantile regression. (Nadaraya & Watson Kernel) was used to estimate the non-parametric part ;in addition, the rule of thumb method was used to estimate the smoothing bandwidth (h). Penalty methods proved to be efficient in estimating the regression coefficients, but the SCAD method according to the mean squared error criterion (MSE) was the best after estimating the missing data using the mean imputation method
Shatt Al-Hilla was considered one of the important branches of Euphrates River that supplies irrigation water to millions of dunams of planted areas. It is important to control the velocity and water level along the river to maintain the required level for easily diverting water to the branches located along the river. So, in this research, a numerical model was developed to simulate the gradually varied unsteady flow in Shatt AL-Hilla. The present study aims to solve the continuity and momentum (Saint-Venant) equations numerically to predict the hydraulic characteristics in the river using Galerkin finite element method. A computer program was designed and built using the programming language FORTRAN-77. Fifty kilometers was consid
... Show MorePoverty phenomenon is very substantial topic that determines the future of societies and governments and the way that they deals with education, health and economy. Sometimes poverty takes multidimensional trends through education and health. The research aims at studying multidimensional poverty in Iraq by using panelized regression methods, to analyze Big Data sets from demographical surveys collected by the Central Statistical Organization in Iraq. We choose classical penalized regression method represented by The Ridge Regression, Moreover; we choose another penalized method which is the Smooth Integration of Counting and Absolute Deviation (SICA) to analyze Big Data sets related to the different poverty forms in Iraq. Euclidian Distanc
... Show MoreMixture experiments are response variables based on the proportions of component for this mixture. In our research we will compare the scheffʼe model with the kronecker model for the mixture experiments, especially when the experimental area is restricted.
Because of the experience of the mixture of high correlation problem and the problem of multicollinearity between the explanatory variables, which has an effect on the calculation of the Fisher information matrix of the regression model.
to estimate the parameters of the mixture model, we used the (generalized inverse ) And the Stepwise Regression procedure
... Show MoreWe studied in this research how to find a method of estimating the quantity (Kinetically) of three kinds of Insecticide and their mixture, which are used in agriculture. The extracted insecticide from the polluted samples with these insect from air, soil, and the leaves of trees, have be used into the reaction with H2O2 and benzedine. The kinetic study of this reaction was formed in basic medium,( pH= 8.6), using UV. Spectra at (?= 420nm). The study showed that the reaction is the first order, and the speed of the reaction was used to estimate the concentration of insecticide in solution and mixture. The experiments of this study indicated that this method has the speed and efficiency for quantitatively estimating these
... Show MoreThe Estimation Of The Reliability Function Depends On The Accuracy Of The Data Used To Estimate The Parameters Of The Probability distribution, and Because Some Data Suffer from a Skew in their Data to Estimate the Parameters and Calculate the Reliability Function in light of the Presence of Some Skew in the Data, there must be a Distribution that has flexibility in dealing with that Data. As in the data of Diyala Company for Electrical Industries, as it was observed that there was a positive twisting in the data collected from the Power and Machinery Department, which required distribution that deals with those data and searches for methods that accommodate this problem and lead to accurate estimates of the reliability function,
... Show More