Graph based text representation for document clustering

Asma Khazaal Abdulsahib Abdulsahib; SITI SAKIRA KAMARUDDIN KAMARUDDIN

Details

Publication Date

Thu Jan 01 2015

Journal Name

Journal Of Theoretical And Applied Information Technology

Volume

76

Issue Number

1

Choose Citation Style

Statistics

View publication

5

View pdf

3

Statistics

(15)

Graph based text representation for document clustering

Text Representation Schemes

Dependency Graph

Document Clustering

Sparsity Problem

Semantic Problem.

Asma Khazaal Abdulsahib Abdulsahib

SITI SAKIRA KAMARUDDIN KAMARUDDIN

...Show More Authors

Advances in digital technology and the World Wide Web has led to the increase of digital documents that are used for various purposes such as publishing and digital library. This phenomenon raises awareness for the requirement of effective techniques that can help during the search and retrieval of text. One of the most needed tasks is clustering, which categorizes documents automatically into meaningful groups. Clustering is an important task in data mining and machine learning. The accuracy of clustering depends tightly on the selection of the text representation method. Traditional methods of text representation model documents as bags of words using term-frequency index document frequency (TFIDF). This method ignores the relationship and meanings of words in the document. As a result the sparsity and semantic problem that is prevalent in textual document are not resolved. In this study, the problem of sparsity and semantic is reduced by proposing a graph based text representation method, namely dependency graph with the aim of improving the accuracy of document clustering. The dependency graph representation scheme is created through an accumulation of syntactic and semantic analysis. A sample of 20 news groups, dataset was used in this study. The text documents undergo pre-processing and syntactic parsing in order to identify the sentence structure. Then the semantic of words are modeled using dependency graph. The produced dependency graph is then used in the process of cluster analysis. K-means clustering technique was used in this study. The dependency graph based clustering result were compared with the popular text representation method, i.e. TFIDF and Ontology based text representation. The result shows that the dependency graph outperforms both TFIDF and Ontology based text representation. The findings proved that the proposed text representation method leads to more accurate document clustering results.

Preview PDF

Quick Preview PDF

Publication Date

Wed Jan 01 2020

Journal Name

International Journal Of Computing

Twitter Location-Based Data: Evaluating the Methods of Data Collection Provided by Twitter Api

location data

Social media

Twitter

N.A.

Haneen

...Show More Authors

Twitter data analysis is an emerging field of research that utilizes data collected from Twitter to address many issues such as disaster response, sentiment analysis, and demographic studies. The success of data analysis relies on collecting accurate and representative data of the studied group or phenomena to get the best results. Various twitter analysis applications rely on collecting the locations of the users sending the tweets, but this information is not always available. There are several attempts at estimating location based aspects of a tweet. However, there is a lack of attempts on investigating the data collection methods that are focused on location. In this paper, we investigate the two methods for obtaining location-based dat

View Publication

(5)

(2)

Publication Date

Wed Nov 01 2023

Journal Name

Journal Of King Saud University - Engineering Sciences

Particle swarm optimization technique-based prediction of peak ground acceleration of Iraq’s tectonic regions

Mahir M.

Ammar N.

Ali A.

...Show More Authors

Peak ground acceleration (PGA) is one of the critical factors that affect the determination of earthquake intensity. PGA is generally utilized to describe ground motion in a particular zone and is able to efficiently predict the parameters of site ground motion for the design of engineering structures. Therefore, novel models are developed to forecast PGA in the case of the Iraqi database, which utilizes the particle swarm optimization (PSO) approach. A data set of 187 historical ground-motion recordings in Iraq’s tectonic regions was used to build the explicit proposed models. The proposed PGA models relate to different seismic parameters, including the magnitude of the earthquake (Mw), average shear-wave velocity (VS30), focal depth (FD

View Publication Preview PDF

(21)

(6)

Publication Date

Sun Sep 11 2022

Journal Name

Journal Of Petroleum Research And Studies

Distribution of Petrophysical Properties Based on Conceptual Facies Model, Mishrif Reservoir/South of Iraq

Sameera

Osamah Shareef

Mohammed K.

...Show More Authors

A 3D geological model is an essential step to reveal reservoir heterogeneity and reservoir properties distribution. In the present study, a three-dimensional geological model for the Mishrif reservoir was built based on data obtained from seven wells and core data. The methodology includes building a 3D grid and populating it with petrophysical properties such as (facies, porosity, water saturation, and net to gross ratio). The structural model was built based on a base contour map obtained from 2D seismic interpretation along with well tops from seven wells. A simple grid method was used to build the structural framework with 234x278x91 grid cells in the X, Y, and Z directions, respectively, with lengths equal to 150 meters. The to

View Publication

(1)

Publication Date

Fri Jan 01 2021

Journal Name

Fme Transactions

FAT-based adaptive backstepping control of an electromechanical system with an unknown input coefficient

Hayder

...Show More Authors

This paper is focused on orthogonal function approximation technique FAT-based adaptive backstepping control of a geared DC motor coupled with a rotational mechanical component. It is assumed that all parameters of the actuator are unknown including the torque-current constant (i.e., unknown input coefficient) and hence a control system with three motor control modes is proposed: 1) motor torque control mode, 2) motor current control mode, and 3) motor voltage control mode. The proposed control algorithm is a powerful tool to control a dynamic system with an unknown input coefficient. Each uncertain parameter/term is represented by a linear combination of weighting and orthogonal basis function vectors. Chebyshev polynomial is used

View Publication

(6)

(7)

Publication Date

Sun Sep 04 2011

Journal Name

Baghdad Science Journal

Approximate Solution of Delay Differential Equations Using the Collocation Method Based on Bernstien Polynomials???? ???????? ????????? ????????? ????????? ???????? ?????????? ???????? ??? ??????? ???? ?????????

Bernstien polynomial

Delay differential equation

Asmaa A.

...Show More Authors

In this paper a modified approach have been used to find the approximate solution of ordinary delay differential equations with constant delay using the collocation method based on Bernstien polynomials.

View Publication Preview PDF

(1)

Publication Date

Sun Jan 01 2023

Journal Name

International Conference Of Computational Methods In Sciences And Engineering Iccmse 2021

Synthesis, description and bacteriological valuation of metal complexes including an amoxicillin−based Schiff base

Zaid Mohammed

Rehab Kadhim Raheem

Ahlaam J.

...Show More Authors

View Publication

(1)

Publication Date

Sat May 01 2021

Journal Name

Journal Of Physics: Conference Series

The Classification of Fetus Gender Based on Fuzzy C-Mean Using a Hybrid Filter

Ahmed

Firas A.

Duraid Y.

...Show More Authors

Abstract<p>This paper proposes a new approach, of Clustering Ultrasound images using the Hybrid Filter (CUHF) to determine the gender of the fetus in the early stages. The possible advantage of CUHF, a better result can be achieved when fuzzy c-mean FCM returns incorrect clusters. The proposed approach is conducted in two steps. Firstly, a preprocessing step to decrease the noise presented in ultrasound images by applying the filters: Local Binary Pattern (LBP), median, median and discrete wavelet (DWT), (median, DWT & LBP) and (median & Laplacian) ML. Secondly, implementing Fuzzy C-Mean (FCM) for clustering the resulted images from the first step. Amongst those filters, Median & Lap</p> ... Show More

View Publication

(3)

(2)

Publication Date

Tue Aug 01 2023

Journal Name

European Heart Journal

Routine electronic health record-based clinical trials: what should an early-career trialist know?

Zainab Atiyah

...Show More Authors

View Publication

(2)

Publication Date

Mon Nov 01 2021

Journal Name

2021 International Conference On Intelligent Technology, System And Service For Internet Of Everything (itss-ioe)

Application of MQ-Sensors to Indoor Air Quality Monitoring in Lab based on IoT

Hussein J.

Faik K.

Ziad T.

...Show More Authors

(13)

(12)

Publication Date

Mon Apr 10 2023

Journal Name

The European Physical Journal Plus

Improved performance of D149 dye-sensitized ZnO-based solar cell under solvents activation effect

Hadi J.

...Show More Authors

View Publication

(5)

1 2 ... 103 104 105 106 ... 721 722