Graph based text representation for document clustering

Asma Khazaal Abdulsahib Abdulsahib; SITI SAKIRA KAMARUDDIN KAMARUDDIN

Details

Publication Date

Thu Jan 01 2015

Journal Name

Journal Of Theoretical And Applied Information Technology

Volume

76

Issue Number

1

Choose Citation Style

Statistics

View publication

5

View pdf

3

Statistics

(15)

Graph based text representation for document clustering

Text Representation Schemes

Dependency Graph

Document Clustering

Sparsity Problem

Semantic Problem.

Asma Khazaal Abdulsahib Abdulsahib

SITI SAKIRA KAMARUDDIN KAMARUDDIN

...Show More Authors

Advances in digital technology and the World Wide Web has led to the increase of digital documents that are used for various purposes such as publishing and digital library. This phenomenon raises awareness for the requirement of effective techniques that can help during the search and retrieval of text. One of the most needed tasks is clustering, which categorizes documents automatically into meaningful groups. Clustering is an important task in data mining and machine learning. The accuracy of clustering depends tightly on the selection of the text representation method. Traditional methods of text representation model documents as bags of words using term-frequency index document frequency (TFIDF). This method ignores the relationship and meanings of words in the document. As a result the sparsity and semantic problem that is prevalent in textual document are not resolved. In this study, the problem of sparsity and semantic is reduced by proposing a graph based text representation method, namely dependency graph with the aim of improving the accuracy of document clustering. The dependency graph representation scheme is created through an accumulation of syntactic and semantic analysis. A sample of 20 news groups, dataset was used in this study. The text documents undergo pre-processing and syntactic parsing in order to identify the sentence structure. Then the semantic of words are modeled using dependency graph. The produced dependency graph is then used in the process of cluster analysis. K-means clustering technique was used in this study. The dependency graph based clustering result were compared with the popular text representation method, i.e. TFIDF and Ontology based text representation. The result shows that the dependency graph outperforms both TFIDF and Ontology based text representation. The findings proved that the proposed text representation method leads to more accurate document clustering results.

Preview PDF

Quick Preview PDF

Publication Date

Fri Sep 15 2017

Journal Name

Research Journal Of Applied Sciences, Engineering And Technology

Graph-Based Text Representation: A Survey of Current Approaches

Geehan Sabah

Asma Khazaal

Siti Sakira

...Show More Authors

View Publication

(3)

Publication Date

Fri Oct 02 2015

Journal Name

American Journal Of Applied Sciences

Advances in Document Clustering with Evolutionary-Based Algorithms

Text Document Clustering

Hypertext Clustering

Evolutionary Algorithms

Genetic Algorithms

Text Dimensional Reduction

Sarmad

...Show More Authors

Document clustering is the process of organizing a particular electronic corpus of documents into subgroups of similar text features. Formerly, a number of conventional algorithms had been applied to perform document clustering. There are current endeavors to enhance clustering performance by employing evolutionary algorithms. Thus, such endeavors became an emerging topic gaining more attention in recent years. The aim of this paper is to present an up-to-date and self-contained review fully devoted to document clustering via evolutionary algorithms. It firstly provides a comprehensive inspection to the document clustering model revealing its various components with its related concepts. Then it shows and analyzes the principle research wor

View Publication

(2)

Publication Date

Thu Oct 01 2015

Journal Name

Engineering And Technology Journal

Genetic Based Optimization Models for Enhancing Multi- Document Text Summarization

Hilal

Nasreen J.

...Show More Authors

View Publication

Publication Date

Sun Jun 01 2008

Journal Name

Baghdad Science Journal

Tamper Detection in Text Document

Ali Kadhim

...Show More Authors

Although text document images authentication is difficult due to the binary nature and clear separation between the background and foreground but it is getting higher demand for many applications. Most previous researches in this field depend on insertion watermark in the document, the drawback in these techniques lie in the fact that changing pixel values in a binary document could introduce irregularities that are very visually noticeable. In this paper, a new method is proposed for object-based text document authentication, in which I propose a different approach where a text document is signed by shifting individual words slightly left or right from their original positions to make the center of gravity for each line fall in with the m

View Publication Preview PDF

Publication Date

Tue Feb 01 2022

Journal Name

Baghdad Science Journal

Securing Text Messages Using Graph Theory and Steganography

graph theory

data security

text hiding and encryption

Samaher Adnan

Renna D.

Enas Wahab

...Show More Authors

Data security is an important component of data communication and transmission systems. Its main role is to keep sensitive information safe and integrated from the sender to the receiver. The proposed system aims to secure text messages through two security principles encryption and steganography. The system produced a novel method for encryption using graph theory properties; it formed a graph from a password to generate an encryption key as a weight matrix of that graph and invested the Least Significant Bit (LSB) method for hiding the encrypted message in a colored image within a green component. Practical experiments of (perceptibility, capacity, and robustness) were calculated using similarity measures like PSNR, MSE, and

View Publication Preview PDF

(11)

(4)

Publication Date

Tue Jun 16 2026

Journal Name

International Journal Of Data And Network Science

Multi-objective of wind-driven optimization as feature selection and clustering to enhance text clustering

Text Clustering

Multi-Objectives

Wind Driven Optimization

K-Means

Unsupervised Feature Selection

Meta-heuristics optimization

MEHDI G. DUAIMI

Bsoul,Q.

AL-Gburi, A.

...Show More Authors

Text Clustering consists of grouping objects of similar categories. The initial centroids influence operation of the system with the potential to become trapped in local optima. The second issue pertains to the impact of a huge number of features on the determination of optimal initial centroids. The problem of dimensionality may be reduced by feature selection. Therefore, Wind Driven Optimization (WDO) was employed as Feature Selection to reduce the unimportant words from the text. In addition, the current study has integrated a novel clustering optimization technique called the WDO (Wasp Swarm Optimization) to effectively determine the most suitable initial centroids. The result showed the new meta-heuristic which is WDO was employed as t

View Publication Preview PDF

(1)

Publication Date

Mon May 15 2017

Journal Name

Journal Of Theoretical And Applied Information Technology

Anomaly detection in text data that represented as a graph using dbscan algorithm

Anomaly Detection

Enhanced DBSCAN algorithm

Unsupervised anomaly detection and Concept Frame Graph (CFG)

Asma Khazaal Abdulsahib

...Show More Authors

Anomaly detection is still a difficult task. To address this problem, we propose to strengthen DBSCAN algorithm for the data by converting all data to the graph concept frame (CFG). As is well known that the work DBSCAN method used to compile the data set belong to the same species in a while it will be considered in the external behavior of the cluster as a noise or anomalies. It can detect anomalies by DBSCAN algorithm can detect abnormal points that are far from certain set threshold (extremism). However, the abnormalities are not those cases, abnormal and unusual or far from a specific group, There is a type of data that is do not happen repeatedly, but are considered abnormal for the group of known. The analysis showed DBSCAN using the

Preview PDF

(4)

Publication Date

Mon Oct 28 2019

Journal Name

Journal Of Mechanics Of Continua And Mathematical Sciences

Heuristic Initialization And Similarity Integration Based Model for Improving Extractive Multi-Document Summarization

Nasreen

...Show More Authors

View Publication

Publication Date

Mon Feb 21 2022

Journal Name

Iraqi Journal For Computer Science And Mathematics

Fuzzy C means Based Evaluation Algorithms For Cancer Gene Expression Data Clustering

Omar

Basad

...Show More Authors

The influx of data in bioinformatics is primarily in the form of DNA, RNA, and protein sequences. This condition places a significant burden on scientists and computers. Some genomics studies depend on clustering techniques to group similarly expressed genes into one cluster. Clustering is a type of unsupervised learning that can be used to divide unknown cluster data into clusters. The k-means and fuzzy c-means (FCM) algorithms are examples of algorithms that can be used for clustering. Consequently, clustering is a common approach that divides an input space into several homogeneous zones; it can be achieved using a variety of algorithms. This study used three models to cluster a brain tumor dataset. The first model uses FCM, whic

View Publication

(1)

Publication Date

Sat Jan 02 2021

Journal Name

Journal Of The College Of Languages (jcl)

A Study of Feminist Stylistic Analysis of Language Issues of Gender Representation in Selected Literary text

feminist stylistics

gender

sexism

transitivity choices

Ali Abdulilah

...Show More Authors

Stylistics is the analysis of the language of literary texts integrated within various approaches to create a framework of different devices that describe and distinct a particular work. Therefore, feminist stylistics relied on theories of feminist criticism tries to present a counter- image of a woman both in language use and society, to draw attention , raise awareness and change ways that gender represents. Feminist stylistic analysis is related not only to describe sexism in a text, but also to analyze the way that point of view, agency, metaphor, and transitivity choices are unanticipatedly and carefully connected to issues of gender(Mills,1995:1) &nb

View Publication Preview PDF

(1)

1 2 3 4 ... 721 722 723 724