Advances in digital technology and the World Wide Web has led to the increase of digital documents that are used for various purposes such as publishing and digital library. This phenomenon raises awareness for the requirement of effective techniques that can help during the search and retrieval of text. One of the most needed tasks is clustering, which categorizes documents automatically into meaningful groups. Clustering is an important task in data mining and machine learning. The accuracy of clustering depends tightly on the selection of the text representation method. Traditional methods of text representation model documents as bags of words using term-frequency index document frequency (TFIDF). This method ignores the relationship and meanings of words in the document. As a result the sparsity and semantic problem that is prevalent in textual document are not resolved. In this study, the problem of sparsity and semantic is reduced by proposing a graph based text representation method, namely dependency graph with the aim of improving the accuracy of document clustering. The dependency graph representation scheme is created through an accumulation of syntactic and semantic analysis. A sample of 20 news groups, dataset was used in this study. The text documents undergo pre-processing and syntactic parsing in order to identify the sentence structure. Then the semantic of words are modeled using dependency graph. The produced dependency graph is then used in the process of cluster analysis. K-means clustering technique was used in this study. The dependency graph based clustering result were compared with the popular text representation method, i.e. TFIDF and Ontology based text representation. The result shows that the dependency graph outperforms both TFIDF and Ontology based text representation. The findings proved that the proposed text representation method leads to more accurate document clustering results.
Multilocus haplotype analysis of candidate variants with genome wide association studies (GWAS) data may provide evidence of association with disease, even when the individual loci themselves do not. Unfortunately, when a large number of candidate variants are investigated, identifying risk haplotypes can be very difficult. To meet the challenge, a number of approaches have been put forward in recent years. However, most of them are not directly linked to the disease-penetrances of haplotypes and thus may not be efficient. To fill this gap, we propose a mixture model-based approach for detecting risk haplotypes. Under the mixture model, haplotypes are clustered directly according to their estimated d
Two- dimensional numerical simulations are carried out to study the elements of observing a Dirac point source and a Dirac binary system. The essential features of this simulation are demonstrated in terms of the point spread function and the modulation transfer function. Two mathematical equations have been extracted to present, firstly the relationship between the radius of optical telescope and the distance between the central frequency and cut-off frequency of the optical telescope, secondly the relationship between the radius of the optical telescope and the average frequency components of the modulation transfer function.
Intrusion detection system is an imperative role in increasing security and decreasing the harm of the computer security system and information system when using of network. It observes different events in a network or system to decide occurring an intrusion or not and it is used to make strategic decision, security purposes and analyzing directions. This paper describes host based intrusion detection system architecture for DDoS attack, which intelligently detects the intrusion periodically and dynamically by evaluating the intruder group respective to the present node with its neighbors. We analyze a dependable dataset named CICIDS 2017 that contains benign and DDoS attack network flows, which meets certifiable criteria and is ope
... Show MoreThis paper proposes a new approach, of Clustering Ultrasound images using the Hybrid Filter (CUHF) to determine the gender of the fetus in the early stages. The possible advantage of CUHF, a better result can be achieved when fuzzy c-mean FCM returns incorrect clusters. The proposed approach is conducted in two steps. Firstly, a preprocessing step to decrease the noise presented in ultrasound images by applying the filters: Local Binary Pattern (LBP), median, median and discrete wavelet (DWT),(median, DWT & LBP) and (median & Laplacian) ML. Secondly, implementing Fuzzy C-Mean (FCM) for clustering the resulted images from the first step. Amongst those filters, Median & Laplace has recorded a better accuracy. Our experimental evaluation on re
... Show MoreThis paper is focused on orthogonal function approximation technique FAT-based adaptive backstepping control of a geared DC motor coupled with a rotational mechanical component. It is assumed that all parameters of the actuator are unknown including the torque-current constant (i.e., unknown input coefficient) and hence a control system with three motor control modes is proposed: 1) motor torque control mode, 2) motor current control mode, and 3) motor voltage control mode. The proposed control algorithm is a powerful tool to control a dynamic system with an unknown input coefficient. Each uncertain parameter/term is represented by a linear combination of weighting and orthogonal basis function vectors. Chebyshev polynomial is used
... Show MoreIn this paper a modified approach have been used to find the approximate solution of ordinary delay differential equations with constant delay using the collocation method based on Bernstien polynomials.