Graph based text representation for document clustering

Asma Khazaal Abdulsahib Abdulsahib; SITI SAKIRA KAMARUDDIN KAMARUDDIN

Details

Publication Date

Thu Jan 01 2015

Journal Name

Journal Of Theoretical And Applied Information Technology

Volume

76

Issue Number

1

Choose Citation Style

Statistics

View publication

5

View pdf

3

Statistics

(15)

Graph based text representation for document clustering

Text Representation Schemes

Dependency Graph

Document Clustering

Sparsity Problem

Semantic Problem.

Asma Khazaal Abdulsahib Abdulsahib

SITI SAKIRA KAMARUDDIN KAMARUDDIN

...Show More Authors

Advances in digital technology and the World Wide Web has led to the increase of digital documents that are used for various purposes such as publishing and digital library. This phenomenon raises awareness for the requirement of effective techniques that can help during the search and retrieval of text. One of the most needed tasks is clustering, which categorizes documents automatically into meaningful groups. Clustering is an important task in data mining and machine learning. The accuracy of clustering depends tightly on the selection of the text representation method. Traditional methods of text representation model documents as bags of words using term-frequency index document frequency (TFIDF). This method ignores the relationship and meanings of words in the document. As a result the sparsity and semantic problem that is prevalent in textual document are not resolved. In this study, the problem of sparsity and semantic is reduced by proposing a graph based text representation method, namely dependency graph with the aim of improving the accuracy of document clustering. The dependency graph representation scheme is created through an accumulation of syntactic and semantic analysis. A sample of 20 news groups, dataset was used in this study. The text documents undergo pre-processing and syntactic parsing in order to identify the sentence structure. Then the semantic of words are modeled using dependency graph. The produced dependency graph is then used in the process of cluster analysis. K-means clustering technique was used in this study. The dependency graph based clustering result were compared with the popular text representation method, i.e. TFIDF and Ontology based text representation. The result shows that the dependency graph outperforms both TFIDF and Ontology based text representation. The findings proved that the proposed text representation method leads to more accurate document clustering results.

Preview PDF

Quick Preview PDF

Publication Date

Tue Nov 21 2017

Journal Name

Lecture Notes In Computer Science

Emotion Recognition in Text Using PPM

Emotion Recognition

emotion classification

machine learning

PPM

text mining

Amer

William John

...Show More Authors

In this paper we investigate the automatic recognition of emotion in text. We propose a new method for emotion recognition based on the PPM (PPM is short for Prediction by Partial Matching) character-based text compression scheme in order to recognize Ekman’s six basic emotions (Anger, Disgust, Fear, Happiness, Sadness, Surprise). Experimental results with three datasets show that the new method is very effective when compared with traditional word-based text classification methods. We have also found that our method works best if the sizes of text in all classes used for training are similar, and that performance significantly improves with increased data.

View Publication

(6)

(5)

Publication Date

Mon Feb 01 2016

Journal Name

International Journal Of Applied Mathematics & Statistical Sciences

Topological Structures Using Mixed Degree Systems in Graph Theory

Digraph

In-Degree System

Mixed Degree System

M-Space

Out-Degree System

Y. Y.

SARA SAAD

...Show More Authors

This paper is concerned with introducing and studying the M-space by using the mixed degree systems which are the core concept in this paper. The necessary and sufficient condition for the equivalence of two reflexive M-spaces is super imposed. In addition, the m-derived graphs, m-open graphs, m-closed graphs, m-interior operators, m-closure operators and M-subspace are introduced. From an M-space, a unique supratopological space is introduced. Furthermore, the m-continuous (m-open and m-closed) functions are defined and the fundamental theorem of the m-continuity is provided. Finally, the m-homeomorphism is defined and some of its properties are investigated.

View Publication Preview PDF

Publication Date

Fri Apr 01 2016

Journal Name

Bulletin Of Mathematics And Statistics Research

GRAPH OF EQUIVALENCE CLASSES OF A COMMUTATIVE IS-ALGEBRA

SAMY M.

FATEMA F.

...Show More Authors

Publication Date

Sun Mar 30 2025

Journal Name

Iraqi Journal Of Science

Segmentation of Aerial Images Using Different Clustering Techniques

Maha A.

Firas A.

Tole

...Show More Authors

The segmentation of aerial images using different clustering techniques offers valuable insights into interpreting and analyzing such images. By partitioning the images into meaningful regions, clustering techniques help identify and differentiate various objects and areas of interest, facilitating various applications, including urban planning, environmental monitoring, and disaster management. This paper aims to segment color aerial images to provide a means of organizing and understanding the visual information contained within the image for various applications and research purposes. It is also important to look into and compare the basic workings of three popular clustering algorithms: K-Medoids, Fuzzy C-Mean (FCM), and Gaussia

View Publication

(1)

Publication Date

Wed Jun 01 2011

Journal Name

Journal Of Al-nahrain University Science

A Lexical and Syntax Checker Tool for the Hyper Text Markup Language

Maysa

...Show More Authors

View Publication

Publication Date

Thu Dec 01 2022

Journal Name

Baghdad Science Journal

Using Graph Mining Method in Analyzing Turkish Loanwords Derived from Arabic Language

Arabic language

Data mining

Graph mining

Loanwords

Turkish language

Abbood Kirebut

Muneam Jabbar

Ahmed Hussein

...Show More Authors

Loanwords are the words transferred from one language to another, which become essential part of the borrowing language. The loanwords have come from the source language to the recipient language because of many reasons. Detecting these loanwords is complicated task due to that there are no standard specifications for transferring words between languages and hence low accuracy. This work tries to enhance this accuracy of detecting loanwords between Turkish and Arabic language as a case study. In this paper, the proposed system contributes to find all possible loanwords using any set of characters either alphabetically or randomly arranged. Then, it processes the distortion in the pronunciation, and solves the problem of the missing lette

View Publication Preview PDF

(4)

(5)

Publication Date

Mon Jan 01 2018

Journal Name

Communications In Computer And Information Science

Automatically Recognizing Emotions in Text Using Prediction by Partial Matching (PPM) Text Compression Method

Emotion recognition

emotion classification

PPM

machine learning

Data mining

Amer

William John

...Show More Authors

In this paper, we investigate the automatic recognition of emotion in text. We perform experiments with a new method of classification based on the PPM character-based text compression scheme. These experiments involve both coarse-grained classification (whether a text is emotional or not) and also fine-grained classification such as recognising Ekman’s six basic emotions (Anger, Disgust, Fear, Happiness, Sadness, Surprise). Experimental results with three datasets show that the new method significantly outperforms the traditional word-based text classification methods. The results show that the PPM compression based classification method is able to distinguish between emotional and nonemotional text with high accuracy, between texts invo

View Publication

(2)

(3)

Publication Date

Sat Jun 01 2019

Journal Name

2019 International Symposium On Networks, Computers And Communications (isncc)

An Interference Mitigation Scheme for Millimetre Wave Heterogeneous Cloud Radio Access Network with Dynamic RRH Clustering

Zainab H.

Firas

H.S.

...Show More Authors

View Publication

(2)

Publication Date

Sun Jul 09 2023

Journal Name

Journal Of Engineering

MR Brain Image Segmentation Using Spatial Fuzzy C- Means Clustering Algorithm

fuzzy c-means

spatial information

image segmentation

clustering

mri brain image.

Safa Soud

Reem Shakir

...Show More Authors

conventional FCM algorithm does not fully utilize the spatial information in the image. In this research, we use a FCM algorithm that incorporates spatial information into the membership function for clustering. The spatial function is the summation of the membership functions in the neighborhood of each pixel under consideration. The advantages of the method are that it is less
sensitive to noise than other techniques, and it yields regions more homogeneous than those of other methods. This technique is a powerful method for noisy image segmentation.

View Publication Preview PDF

(1)

Publication Date

Wed Feb 01 2017

Journal Name

International Journal Of Science And Research (ijsr)

Supra-Approximation Spaces Using Mixed Degree System in Graph Theory

o-space

i-space

supra-approximation space

near supra-approximation space

m-lower approximation and m-upper approximation

Y. Y.

S. S.

...Show More Authors

This paper is concerned with introducing and studying the o-space by using out degree system (resp. i-space by using in degree system) which are the core concept in this paper. In addition, the m-lower approximations, the m-upper approximations and ospace and i-space. Furthermore, we introduce near supraopen (near supraclosed) d. g.'s. Finally, the supra-lower approximation, supraupper approximation, supra-accuracy are defined and some of its properties are investigated.

1 2 ... 10 11 12 13 ... 726 727