Search for risk haplotype segments with GWAS data by use of finite mixture models

ALI Fadhaa; Jian Zhang

doi:10.4310/SII.2016.v9.n3.a2

Details

Publication Date

Fri Jan 01 2016

Journal Name

Statistics And Its Interface

Volume

9

DOI

10.4310/SII.2016.v9.n3.a2

Choose Citation Style

Statistics

View publication

9

Statistics

Search for risk haplotype segments with GWAS data by use of finite mixture models

ALI Fadhaa

Jian Zhang

...Show More Authors

The region-based association analysis has been proposed to capture the collective behavior of sets of variants by testing the association of each set instead of individual variants with the disease. Such an analysis typically involves a list of unphased multiple-locus genotypes with potentially sparse frequencies in cases and controls. To tackle the problem of the sparse distribution, a two-stage approach was proposed in literature: In the first stage, haplotypes are computationally inferred from genotypes, followed by a haplotype coclassification. In the second stage, the association analysis is performed on the inferred haplotype groups. If a haplotype is unevenly distributed between the case and control samples, this haplotype is labeled as a risk haplotype. Unfortunately, the in-silico reconstruction of haplotypes might produce a proportion of false haplotypes which hamper the detection of rare but true haplotypes. Here, to address the issue, we propose an alternative approach: In Stage 1, we cluster genotypes instead of inferred haplotypes and estimate the risk genotypes based on a finite mixture model. In Stage 2, we infer risk haplotypes from risk genotypes inferred from the previous stage. To estimate the finite mixture model, we propose an EM algorithm with a novel data partition-based initialization. The performance of the proposed procedure is assessed by simulation studies and a real data analysis. Compared to the existing multiple Z-test procedure, we find that the power of genome-wide association studies can be increased by using the proposed procedure.

View Publication

Publication Date

Tue Aug 12 2025

Journal Name

Journal Of Systems Science And Mathematical Sciences

SCREENING TESTS FOR DISEASE RISK HAPLOTYPE SEGMENTS IN GENOME BY USE OF PERMUTATION

Region-based association analysis

genotype mixture models

odds ratios

genome wide association studies

expectation-maximization algorithm

ALI

Jian

...Show More Authors

The haplotype association analysis has been proposed to capture the collective behavior of sets of variants by testing the association of each set instead of individual variants with the disease.Such an analysis typically involves a list of unphased multiple-locus genotypes with potentially sparse frequencies in cases and controls.It starts with inferring haplotypes from genotypes followed by a haplotype co-classification and marginal screening for disease-associated haplotypes.Unfortunately,phasing uncertainty may have a strong effects on the haplotype co-classification and therefore on the accuracy of predicting risk haplotypes.Here,to address the issue,we propose an alternative approach:In Stage 1,we select potential risk genotypes inste

View Publication

Publication Date

Fri Apr 01 2022

Journal Name

Baghdad Science Journal

Improved Firefly Algorithm with Variable Neighborhood Search for Data Clustering

Data clustering

Data mining

Firefly algorithm

Machine learning

Variable neighborhood search.

Hayder Naser Khraibet

...Show More Authors

Among the metaheuristic algorithms, population-based algorithms are an explorative search algorithm superior to the local search algorithm in terms of exploring the search space to find globally optimal solutions. However, the primary downside of such algorithms is their low exploitative capability, which prevents the expansion of the search space neighborhood for more optimal solutions. The firefly algorithm (FA) is a population-based algorithm that has been widely used in clustering problems. However, FA is limited in terms of its premature convergence when no neighborhood search strategies are employed to improve the quality of clustering solutions in the neighborhood region and exploring the global regions in the search space. On the

View Publication Preview PDF

(14)

(4)

Publication Date

Wed Jun 29 2022

Journal Name

Journal Of Al-rafidain University College For Sciences ( Print Issn: 1681-6870 ,online Issn: 2790-2293 )

The Use Of Genetic Algorithm In Estimating The Parameter Of Finite Mixture Of Linear Regression

Mixture of linear regression

the a robust bi-square (MixBi)

MM-Estimator

Gaussian Mixture

RobGA

Classification Error(CE)..

Urdak

ALI

...Show More Authors

The estimation of the parameters of linear regression is based on the usual Least Square method, as this method is based on the estimation of several basic assumptions. Therefore, the accuracy of estimating the parameters of the model depends on the validity of these hypotheses. The most successful technique was the robust estimation method which is minimizing maximum likelihood estimator (MM-estimator) that proved its efficiency in this purpose. However, the use of the model becomes unrealistic and one of these assumptions is the uniformity of the variance and the normal distribution of the error. These assumptions are not achievable in the case of studying a specific problem that may include complex data of more than one model. To

View Publication

Publication Date

Thu Sep 30 2021

Journal Name

Journal Of Economics And Administrative Sciences

Comparison of Some Methods for Estimating Mixture of Linear Regression Models with Application

Mixture Model

EM algorithm

Linear Regression

Trimmed Maximum Likelihood

Laplace Distribution

Urdak Ibrahim

Fadhaa Mezher

...Show More Authors

A mixture model is used to model data that come from more than one component. In recent years, it became an effective tool in drawing inferences about the complex data that we might come across in real life. Moreover, it can represent a tremendous confirmatory tool in classification observations based on similarities amongst them. In this paper, several mixture regression-based methods were conducted under the assumption that the data come from a finite number of components. A comparison of these methods has been made according to their results in estimating component parameters. Also, observation membership has been inferred and assessed for these methods. The results showed that the flexible mixture model outperformed the

View Publication Preview PDF

Publication Date

Thu Sep 30 2021

Journal Name

Journal Of Economics And Administrative Sciences

Comparison of Some Methods for Estimating Mixture of Linear Regression Models with Application

Mixture Model

EM algorithm

Linear Regression

Trimmed Maximum Likelihood

Laplace Distribution

Urdak Ibrahim

ALI

...Show More Authors

A mixture model is used to model data that come from more than one component. In recent years, it became an effective tool in drawing inferences about the complex data that we might come across in real life. Moreover, it can represent a tremendous confirmatory tool in classification observations based on similarities amongst them. In this paper, several mixture regression-based methods were conducted under the assumption that the data come from a finite number of components. A comparison of these methods has been made according to their results in estimating component parameters. Also, observation membership has been inferred and assessed for these methods. The results showed that the flexible mixture model outperformed the others

Publication Date

Fri Dec 30 2022

Journal Name

Journal Of Mathematics

Estimation of Parameters of Finite Mixture of Rayleigh Distribution by the Expectation-Maximization Algorithm

Noor

Fadhaa

...Show More Authors

In the lifetime process in some systems, most data cannot belong to one single population. In fact, it can represent several subpopulations. In such a case, the known distribution cannot be used to model data. Instead, a mixture of distribution is used to modulate the data and classify them into several subgroups. The mixture of Rayleigh distribution is best to be used with the lifetime process. This paper aims to infer model parameters by the expectation-maximization (EM) algorithm through the maximum likelihood function. The technique is applied to simulated data by following several scenarios. The accuracy of estimation has been examined by the average mean square error (AMSE) and the average classification success rate (ACSR). T

View Publication Preview PDF

(5)

(2)

Publication Date

Mon Oct 21 2024

Journal Name

Iraqi Statisticians Journal

On Inference of Finite Mixture of Rayleigh Distribution by Gibbs Sampler and Metropolis-Hastings

Mixture of Rayleigh Distribution

Bayesian inference

Gibbs Sampler

Metropolis- Hastings

Bayesian Information Criteria(BIC)

ALI

...Show More Authors

Inferential methods of statistical distributions have reached a high level of interest in recent years. However, in real life, data can follow more than one distribution, and then mixture models must be fitted to such data. One of which is a finite mixture of Rayleigh distribution that is widely used in modelling lifetime data in many fields, such as medicine, agriculture and engineering. In this paper, we proposed a new Bayesian frameworks by assuming conjugate priors for the square of the component parameters. We used this prior distribution in the classical Bayesian, Metropolis-hasting (MH) and Gibbs sampler methods. The performance of these techniques were assessed by conducting data which was generated from two and three-component mixt

View Publication

Publication Date

Wed Jan 11 2023

Journal Name

Mathematical Problems In Engineering

Bayesian Methods for Estimation the Parameters of Finite Mixture of Inverse Rayleigh Distribution

Fadhaa

...Show More Authors

Methods of estimating statistical distribution have attracted many researchers when it comes to fitting a specific distribution to data. However, when the data belong to more than one component, a popular distribution cannot be fitted to such data. To tackle this issue, mixture models are fitted by choosing the correct number of components that represent the data. This can be obvious in lifetime processes that are involved in a wide range of engineering applications as well as biological systems. In this paper, we introduce an application of estimating a finite mixture of Inverse Rayleigh distribution by the use of the Bayesian framework when considering the model as Markov chain Monte Carlo (MCMC). We employed the Gibbs sampler and

View Publication Preview PDF

(2)

Publication Date

Sat Dec 31 2022

Journal Name

Journal Of Economics And Administrative Sciences

Using Some Estimation Methods for Mixed-Random Panel Data Regression Models with Serially Correlated Errors with Application

FGLS estimation method

mixed-stochastic parameter regression model

first-order serial correlation

(MG) estimation method

Musaab

Mohammed

...Show More Authors

This research includes the study of dual data models with mixed random parameters, which contain two types of parameters, the first is random and the other is fixed. For the random parameter, it is obtained as a result of differences in the marginal tendencies of the cross sections, and for the fixed parameter, it is obtained as a result of differences in fixed limits, and random errors for each section. Accidental bearing the characteristic of heterogeneity of variance in addition to the presence of serial correlation of the first degree, and the main objective in this research is the use of efficient methods commensurate with the paired data in the case of small samples, and to achieve this goal, the feasible general least squa

View Publication Preview PDF

Publication Date

Wed Aug 01 2018

Journal Name

Journal Of Economics And Administrative Sciences

Compare to the conditional logistic regression models with fixed and mixed effects for longitudinal data

طريقة الإمكان الأعظم

الانحدار اللوجستي الشرطي

البيانات الطولية

نماذج التأثيرات المختلطة

معيار شبه الإمكان في ظل نموذج الاستقلال (QIC)

معيار اكايكي التجريبي (EAIC)

التلوث البيئي

التحليل العنقودي

انتصار عريبي

يوسف خليل

...Show More Authors

Mixed-effects conditional logistic regression is evidently more effective in the study of qualitative differences in longitudinal pollution data as well as their implications on heterogeneous subgroups. This study seeks that conditional logistic regression is a robust evaluation method for environmental studies, thru the analysis of environment pollution as a function of oil production and environmental factors. Consequently, it has been established theoretically that the primary objective of model selection in this research is to identify the candidate model that is optimal for the conditional design. The candidate model should achieve generalizability, goodness-of-fit, parsimony and establish equilibrium between bias and variab

View Publication Preview PDF

1 2 3 4 ... 2640 2641 2642 2643