Graph based text representation for document clustering

Asma Khazaal Abdulsahib Abdulsahib; SITI SAKIRA KAMARUDDIN KAMARUDDIN

Details

Publication Date

Thu Jan 01 2015

Journal Name

Journal Of Theoretical And Applied Information Technology

Volume

76

Issue Number

1

Choose Citation Style

Statistics

View publication

6

View pdf

3

Statistics

(15)

Graph based text representation for document clustering

Text Representation Schemes

Dependency Graph

Document Clustering

Sparsity Problem

Semantic Problem.

Asma Khazaal Abdulsahib Abdulsahib

SITI SAKIRA KAMARUDDIN KAMARUDDIN

...Show More Authors

Advances in digital technology and the World Wide Web has led to the increase of digital documents that are used for various purposes such as publishing and digital library. This phenomenon raises awareness for the requirement of effective techniques that can help during the search and retrieval of text. One of the most needed tasks is clustering, which categorizes documents automatically into meaningful groups. Clustering is an important task in data mining and machine learning. The accuracy of clustering depends tightly on the selection of the text representation method. Traditional methods of text representation model documents as bags of words using term-frequency index document frequency (TFIDF). This method ignores the relationship and meanings of words in the document. As a result the sparsity and semantic problem that is prevalent in textual document are not resolved. In this study, the problem of sparsity and semantic is reduced by proposing a graph based text representation method, namely dependency graph with the aim of improving the accuracy of document clustering. The dependency graph representation scheme is created through an accumulation of syntactic and semantic analysis. A sample of 20 news groups, dataset was used in this study. The text documents undergo pre-processing and syntactic parsing in order to identify the sentence structure. Then the semantic of words are modeled using dependency graph. The produced dependency graph is then used in the process of cluster analysis. K-means clustering technique was used in this study. The dependency graph based clustering result were compared with the popular text representation method, i.e. TFIDF and Ontology based text representation. The result shows that the dependency graph outperforms both TFIDF and Ontology based text representation. The findings proved that the proposed text representation method leads to more accurate document clustering results.

Preview PDF

Quick Preview PDF

Publication Date

Wed Oct 26 2022

Journal Name

Iraqi Journal Of Science

Gene Expression Analysis via Spatial Clustering and Evaluation Indexing

Basad

...Show More Authors

The density-based spatial clustering for applications with noise (DBSCAN) is one of the most popular applications of clustering in data mining, and it is used to identify useful patterns and interesting distributions in the underlying data. Aggregation methods for classifying nonlinear aggregated data. In particular, DNA methylations, gene expression. That show the differentially skewed by distance sites and grouped nonlinearly by cancer daisies and the change Situations for gene excretion on it. Under these conditions, DBSCAN is expected to have a desirable clustering feature i that can be used to show the results of the changes. This research reviews the DBSCAN and compares its performance with other algorithms, such as the tradit

View Publication

(5)

Publication Date

Tue Dec 01 2015

Journal Name

Journal Of Engineering

Data Aggregation in Wireless Sensor Networks Using Modified Voronoi Fuzzy Clustering Algorithm

voronoi fuzzy

fuzzy c-means

clustering algorithm

data aggregation.

Nadia Adnan

Maab Alaa

...Show More Authors

Data centric techniques, like data aggregation via modified algorithm based on fuzzy clustering algorithm with voronoi diagram which is called modified Voronoi Fuzzy Clustering Algorithm (VFCA) is presented in this paper. In the modified algorithm, the sensed area divided into number of voronoi cells by applying voronoi diagram, these cells are clustered by a fuzzy C-means method (FCM) to reduce the transmission distance. Then an appropriate cluster head (CH) for each cluster is elected. Three parameters are used for this election process, the energy, distance between CH and its neighbor sensors and packet loss values. Furthermore, data aggregation is employed in each CH to reduce the amount of data transmission which le

View Publication Preview PDF

Publication Date

Tue Sep 24 2024

Journal Name

Arab World English Journal

A Critical Discourse Analysis of Women’s Representation in Maysaloon Hadi’s Novel The Black Eyes

critical discourse analysis

Fairclough’ s model

feminism

Iraqi literature

Maysaloon Hadi

The Black Eyes

women's representation

Noor

...Show More Authors

This study applies a discourse analysis framework to explore the portrayal of women in Maysloon Hadi’s novel (The Black Eyes) (2011), using Critical Discourse Analysis (CDA) and Norman Fairclough’s tri-dimensional model (1989) as the analytical foundation. It investigates the roles and challenges women face in the novel. While there is growing interest in the portrayal of women in literature, Iraqi literature—especially from the perspective of Iraqi women writers remains underexplored. Hadi’s *The Black Eyes* provides a unique case to examine this intersection. Despite the novel’s rich narrative, which offers insight into Iraqi women’s lives, there is a lack of comprehensive CDA to understand how its language constructs

View Publication Preview PDF

(1)

Publication Date

Fri Dec 01 2023

Journal Name

Bulletin Of Electrical Engineering And Informatics

A comparative study of Gaussian mixture algorithm and K-means algorithm for efficient energy clustering in MWSN

Iman Ameer

Muna Mohammed

Ali M.

...Show More Authors

Wireless sensor networks (WSNs) represent one of the key technologies in internet of things (IoTs) networks. Since WSNs have finite energy sources, there is ongoing research work to develop new strategies for minimizing power consumption or enhancing traditional techniques. In this paper, a novel Gaussian mixture models (GMMs) algorithm is proposed for mobile wireless sensor networks (MWSNs) for energy saving. Performance evaluation of the clustering process with the GMM algorithm shows a remarkable energy saving in the network of up to 92%. In addition, a comparison with another clustering strategy that uses the K-means algorithm has been made, and the developed method has outperformed K-means with superior performance, saving ener

View Publication

(5)

(4)

Publication Date

Mon Feb 01 2016

Journal Name

Swarm And Evolutionary Computation

Improving the performance of evolutionary multi-objective co-clustering models for community detection in complex social networks

Bara׳a A.

Wisam A.

Mayyadah F.

...Show More Authors

(34)

(29)

Publication Date

Fri Jun 30 2023

Journal Name

Journal Of The College Of Islamic Sciences

The translated text is a second launch of the text (Al-Kharja Al-Muwashah Al- Andalusian as an example

The text

the translation

the Andalusian kharaj

the muwashshah

the singular

siham saib

...Show More Authors

Literary translation is one of the most difficult types of translation ,because it conveys feelings that differ from one person to another, and since the language constitutes an obstacle to understanding the Andalusian excerpts, the translators resorted to translating it, and this was a second start to the text, different from its first start, is said from the tongue of the Al-washah , The muwashshah is a poetic art that appeared in Andalusia after the Arabs entered it ,characterized by special system It differs from the traditional Arabic poem, as it has a beginning represented in the beginning of the muwashshah and several equal parts ending with differentrhymes.

View Publication Preview PDF

Publication Date

Thu Oct 31 2024

Journal Name

Intelligent Automation And Soft Computing

Fusion of Type-2 Neutrosophic Similarity Measure in Signatures Verification Systems: A New Forensic Document Analysis Paradigm

Type-2 neutrosophic reasoning

biometric signature verification

forensic document experts’

analysis

Shahlaa

Wisal Hashim

Oday

Saad

...Show More Authors

Signature verification involves vague situations in which a signature could resemble many reference samples or might differ because of handwriting variances. By presenting the features and similarity score of signatures from the matching algorithm as fuzzy sets and capturing the degrees of membership, non-membership, and indeterminacy, a neutrosophic engine can significantly contribute to signature verification by addressing the inherent uncertainties and ambiguities present in signatures. But type-1 neutrosophic logic gives these membership functions fixed values, which could not adequately capture the various degrees of uncertainty in the characteristics of signatures. Type-1 neutrosophic representation is also unable to adjust to various

View Publication Preview PDF

(5)

(4)

Publication Date

Thu Sep 15 2022

Journal Name

Al-academy

The semiotic approach in analyzing contemporary graphic text

semiotic approach

graphic text

Akram

...Show More Authors

With a great diversity in the curriculum contemporary monetary and visions, and development that hit the graphic design field, it has become imperative for the workers in the contemporary design research and investigation in accordance with the intellectual treatises and methods of modern criticism, because the work design requires the designer and recipient both know the mechanics of tibographic text analysis in a heavy world of texts and images varied vocabulary and graphics, and designer on before anyone else manages the process of analysis to know what you offer others of shipments visual often of oriented intended from behind, what is meant, in the midst of this world, the curriculum Alsemiae directly overlap with such diverse offer

View Publication Preview PDF

Publication Date

Sat Jan 01 2022

Journal Name

Ieee Access

Wrapper and Hybrid Feature Selection Methods Using Metaheuristic Algorithms for English Text Classification: A Systematic Review

Metaheuristics

Feature extraction

Text categorization

Classification algorithms

Systematics

Search problems

Business

Osamah Mohammed

Yu-N

Ammar Kamal

Omar Mustafa

...Show More Authors

Feature selection (FS) constitutes a series of processes used to decide which relevant features/attributes to include and which irrelevant features to exclude for predictive modeling. It is a crucial task that aids machine learning classifiers in reducing error rates, computation time, overfitting, and improving classification accuracy. It has demonstrated its efficacy in myriads of domains, ranging from its use for text classification (TC), text mining, and image recognition. While there are many traditional FS methods, recent research efforts have been devoted to applying metaheuristic algorithms as FS techniques for the TC task. However, there are few literature reviews concerning TC. Therefore, a comprehensive overview was systematicall

View Publication Preview PDF

(72)

(58)

Publication Date

Wed Jun 01 2022

Journal Name

International Journal Of Electrical And Computer Engineering (ijece)

American Standard Code for Information Interchange mapping technique for text hiding in the RGB and gray images

Abdulrudah

mohsen

Lafta

...Show More Authors

1 2 ... 11 12 13 14 ... 730 731