Advances in digital technology and the World Wide Web has led to the increase of digital documents that are used for various purposes such as publishing and digital library. This phenomenon raises awareness for the requirement of effective techniques that can help during the search and retrieval of text. One of the most needed tasks is clustering, which categorizes documents automatically into meaningful groups. Clustering is an important task in data mining and machine learning. The accuracy of clustering depends tightly on the selection of the text representation method. Traditional methods of text representation model documents as bags of words using term-frequency index document frequency (TFIDF). This method ignores the relationship and meanings of words in the document. As a result the sparsity and semantic problem that is prevalent in textual document are not resolved. In this study, the problem of sparsity and semantic is reduced by proposing a graph based text representation method, namely dependency graph with the aim of improving the accuracy of document clustering. The dependency graph representation scheme is created through an accumulation of syntactic and semantic analysis. A sample of 20 news groups, dataset was used in this study. The text documents undergo pre-processing and syntactic parsing in order to identify the sentence structure. Then the semantic of words are modeled using dependency graph. The produced dependency graph is then used in the process of cluster analysis. K-means clustering technique was used in this study. The dependency graph based clustering result were compared with the popular text representation method, i.e. TFIDF and Ontology based text representation. The result shows that the dependency graph outperforms both TFIDF and Ontology based text representation. The findings proved that the proposed text representation method leads to more accurate document clustering results.
Twitter data analysis is an emerging field of research that utilizes data collected from Twitter to address many issues such as disaster response, sentiment analysis, and demographic studies. The success of data analysis relies on collecting accurate and representative data of the studied group or phenomena to get the best results. Various twitter analysis applications rely on collecting the locations of the users sending the tweets, but this information is not always available. There are several attempts at estimating location based aspects of a tweet. However, there is a lack of attempts on investigating the data collection methods that are focused on location. In this paper, we investigate the two methods for obtaining location-based dat
... Show MorePeak ground acceleration (PGA) is one of the critical factors that affect the determination of earthquake intensity. PGA is generally utilized to describe ground motion in a particular zone and is able to efficiently predict the parameters of site ground motion for the design of engineering structures. Therefore, novel models are developed to forecast PGA in the case of the Iraqi database, which utilizes the particle swarm optimization (PSO) approach. A data set of 187 historical ground-motion recordings in Iraq’s tectonic regions was used to build the explicit proposed models. The proposed PGA models relate to different seismic parameters, including the magnitude of the earthquake (Mw), average shear-wave velocity (VS30), focal depth (FD
... Show MoreA 3D geological model is an essential step to reveal reservoir heterogeneity and reservoir properties distribution. In the present study, a three-dimensional geological model for the Mishrif reservoir was built based on data obtained from seven wells and core data. The methodology includes building a 3D grid and populating it with petrophysical properties such as (facies, porosity, water saturation, and net to gross ratio). The structural model was built based on a base contour map obtained from 2D seismic interpretation along with well tops from seven wells. A simple grid method was used to build the structural framework with 234x278x91 grid cells in the X, Y, and Z directions, respectively, with lengths equal to 150 meters. The to
... Show MoreThis paper is focused on orthogonal function approximation technique FAT-based adaptive backstepping control of a geared DC motor coupled with a rotational mechanical component. It is assumed that all parameters of the actuator are unknown including the torque-current constant (i.e., unknown input coefficient) and hence a control system with three motor control modes is proposed: 1) motor torque control mode, 2) motor current control mode, and 3) motor voltage control mode. The proposed control algorithm is a powerful tool to control a dynamic system with an unknown input coefficient. Each uncertain parameter/term is represented by a linear combination of weighting and orthogonal basis function vectors. Chebyshev polynomial is used
... Show MoreIn this paper a modified approach have been used to find the approximate solution of ordinary delay differential equations with constant delay using the collocation method based on Bernstien polynomials.
This paper proposes a new approach, of Clustering Ultrasound images using the Hybrid Filter (CUHF) to determine the gender of the fetus in the early stages. The possible advantage of CUHF, a better result can be achieved when fuzzy c-mean FCM returns incorrect clusters. The proposed approach is conducted in two steps. Firstly, a preprocessing step to decrease the noise presented in ultrasound images by applying the filters: Local Binary Pattern (LBP), median, median and discrete wavelet (DWT), (median, DWT & LBP) and (median & Laplacian) ML. Secondly, implementing Fuzzy C-Mean (FCM) for clustering the resulted images from the first step. Amongst those filters, Median & Lap