The region-based association analysis has been proposed to capture the collective behavior of sets of variants by testing the association of each set instead of individual variants with the disease. Such an analysis typically involves a list of unphased multiple-locus genotypes with potentially sparse frequencies in cases and controls. To tackle the problem of the sparse distribution, a two-stage approach was proposed in literature: In the first stage, haplotypes are computationally inferred from genotypes, followed by a haplotype coclassification. In the second stage, the association analysis is performed on the inferred haplotype groups. If a haplotype is unevenly distributed between the case and control samples, this haplotype is labeled as a risk haplotype. Unfortunately, the in-silico reconstruction of haplotypes might produce a proportion of false haplotypes which hamper the detection of rare but true haplotypes. Here, to address the issue, we propose an alternative approach: In Stage 1, we cluster genotypes instead of inferred haplotypes and estimate the risk genotypes based on a finite mixture model. In Stage 2, we infer risk haplotypes from risk genotypes inferred from the previous stage. To estimate the finite mixture model, we propose an EM algorithm with a novel data partition-based initialization. The performance of the proposed procedure is assessed by simulation studies and a real data analysis. Compared to the existing multiple Z-test procedure, we find that the power of genome-wide association studies can be increased by using the proposed procedure.
This research sought to present a concept of cross-sectional data models, A crucial double data to take the impact of the change in time and obtained from the measured phenomenon of repeated observations in different time periods, Where the models of the panel data were defined by different types of fixed , random and mixed, and Comparing them by studying and analyzing the mathematical relationship between the influence of time with a set of basic variables Which are the main axes on which the research is based and is represented by the monthly revenue of the working individual and the profits it generates, which represents the variable response And its relationship to a set of explanatory variables represented by the
... Show MoreTwitter data analysis is an emerging field of research that utilizes data collected from Twitter to address many issues such as disaster response, sentiment analysis, and demographic studies. The success of data analysis relies on collecting accurate and representative data of the studied group or phenomena to get the best results. Various twitter analysis applications rely on collecting the locations of the users sending the tweets, but this information is not always available. There are several attempts at estimating location based aspects of a tweet. However, there is a lack of attempts on investigating the data collection methods that are focused on location. In this paper, we investigate the two methods for obtaining location-based dat
... Show MoreLongitudinal data is becoming increasingly common, especially in the medical and economic fields, and various methods have been analyzed and developed to analyze this type of data.
In this research, the focus was on compiling and analyzing this data, as cluster analysis plays an important role in identifying and grouping co-expressed subfiles over time and employing them on the nonparametric smoothing cubic B-spline model, which is characterized by providing continuous first and second derivatives, resulting in a smoother curve with fewer abrupt changes in slope. It is also more flexible and can pick up on more complex patterns and fluctuations in the data.
The longitudinal balanced data profile was compiled into subgroup
... Show More
Abstract
Due to the lack of previous statistical study of the behavior of payments, specifically health insurance, which represents the largest proportion of payments in the general insurance companies in Iraq, this study was selected and applied in the Iraqi insurance company.
In order to find the convenient model representing the health insurance payments, we initially detected two probability models by using (Easy Fit) software:
First, a single Lognormal for the whole sample and the other is a Compound Weibull for the two Sub samples (small payments and large payments), and we focused on the compoun
... Show MoreData-driven models perform poorly on part-of-speech tagging problems with the square Hmong language, a low-resource corpus. This paper designs a weight evaluation function to reduce the influence of unknown words. It proposes an improved harmony search algorithm utilizing the roulette and local evaluation strategies for handling the square Hmong part-of-speech tagging problem. The experiment shows that the average accuracy of the proposed model is 6%, 8% more than HMM and BiLSTM-CRF models, respectively. Meanwhile, the average F1 of the proposed model is also 6%, 3% more than HMM and BiLSTM-CRF models, respectively.
There are many methods of searching large amount of data to find one particular piece of information. Such as find name of person in record of mobile. Certain methods of organizing data make the search process more efficient the objective of these methods is to find the element with least cost (least time). Binary search algorithm is faster than sequential and other commonly used search algorithms. This research develops binary search algorithm by using new structure called Triple, structure in this structure data are represented as triple. It consists of three locations (1-Top, 2-Left, and 3-Right) Binary search algorithm divide the search interval in half, this process makes the maximum number of comparisons (Average case com
... Show MoreThe complexity and partially defined nature of jet grouting make it hard to predict the performance of grouted piles. So the trials of cement injection at a location with similar soil properties as the erecting site are necessary to assess the performance of the grouted piles. Nevertheless, instead of executing trial-injected piles at the pilot site, which wastes money, time, and effort, the laboratory cement injection devices are essential alternatives for evaluating soil injection ability. This study assesses the performance of a low-pressure laboratory grouting device by improving loose sandy soil injected using binders formed of Silica Fume (SF) as a chemical admixture (10% of Ordinary Portland Cement OPC mass) to di
... Show MoreIn light of the development in computer science and modern technologies, the impersonation crime rate has increased. Consequently, face recognition technology and biometric systems have been employed for security purposes in a variety of applications including human-computer interaction, surveillance systems, etc. Building an advanced sophisticated model to tackle impersonation-related crimes is essential. This study proposes classification Machine Learning (ML) and Deep Learning (DL) models, utilizing Viola-Jones, Linear Discriminant Analysis (LDA), Mutual Information (MI), and Analysis of Variance (ANOVA) techniques. The two proposed facial classification systems are J48 with LDA feature extraction method as input, and a one-dimen
... Show More