Graph based text representation for document clustering

Asma Khazaal Abdulsahib Abdulsahib; SITI SAKIRA KAMARUDDIN KAMARUDDIN

Details

Publication Date

Thu Jan 01 2015

Journal Name

Journal Of Theoretical And Applied Information Technology

Volume

76

Issue Number

1

Choose Citation Style

Statistics

View publication

5

View pdf

3

Statistics

(15)

Graph based text representation for document clustering

Text Representation Schemes

Dependency Graph

Document Clustering

Sparsity Problem

Semantic Problem.

Asma Khazaal Abdulsahib Abdulsahib

SITI SAKIRA KAMARUDDIN KAMARUDDIN

...Show More Authors

Advances in digital technology and the World Wide Web has led to the increase of digital documents that are used for various purposes such as publishing and digital library. This phenomenon raises awareness for the requirement of effective techniques that can help during the search and retrieval of text. One of the most needed tasks is clustering, which categorizes documents automatically into meaningful groups. Clustering is an important task in data mining and machine learning. The accuracy of clustering depends tightly on the selection of the text representation method. Traditional methods of text representation model documents as bags of words using term-frequency index document frequency (TFIDF). This method ignores the relationship and meanings of words in the document. As a result the sparsity and semantic problem that is prevalent in textual document are not resolved. In this study, the problem of sparsity and semantic is reduced by proposing a graph based text representation method, namely dependency graph with the aim of improving the accuracy of document clustering. The dependency graph representation scheme is created through an accumulation of syntactic and semantic analysis. A sample of 20 news groups, dataset was used in this study. The text documents undergo pre-processing and syntactic parsing in order to identify the sentence structure. Then the semantic of words are modeled using dependency graph. The produced dependency graph is then used in the process of cluster analysis. K-means clustering technique was used in this study. The dependency graph based clustering result were compared with the popular text representation method, i.e. TFIDF and Ontology based text representation. The result shows that the dependency graph outperforms both TFIDF and Ontology based text representation. The findings proved that the proposed text representation method leads to more accurate document clustering results.

Preview PDF

Quick Preview PDF

Publication Date

Sun Mar 26 2023

Journal Name

Wasit Journal Of Pure Sciences

Pure Graph of a Commutative Ring

Graph theory

commutative ring.

Nermen

Nabeel

Tamadher

...Show More Authors

A new definition of a graph called Pure graph of a ring denote Pur(R) was presented , where the vertices of the graph represent the elements of R such that there is an edge between the two vertices ???? and ???? if and only if ????=???????? ???????? ????=????????, denoted by pur(R) . In this work we studied some new properties of pur(R) finally we defined the complement of pur(R) and studied some of it is properties

View Publication Preview PDF

Publication Date

Mon Dec 01 2014

Journal Name

Ain Shams University

New Studies for Topological Generalizations and Uncertainty in Graph Theory

Y. Y.

...Show More Authors

Topology and its applications occupy the interest of many researching centers in the advanced world. From this point of view and because the near open sets play a very important role in general topology and they are now the research topics of many topologists worldwide and its sets doesn’t enter in fibrewise topology yet. Therefore, we use some of the near open sets to be model for introduce results and new spaces in fibrewise topological spaces. Also, there is a very important role of closure operators in constructing a topological spaces, so we introduce a new closure operators on the power set of vertices on graphs and conclusion theorems and new spaces from it. Furthermore, we discuss the relationships of connectedness between some ty

Preview PDF

Publication Date

Sat Dec 01 2018

Journal Name

Indian Journal Of Ecology

Classification of al-hammar marshes satellite images in Iraq using artificial neural network based on coding representation

Abdulla A.S.

...Show More Authors

(2)

Publication Date

Wed Jan 01 2025

Journal Name

Iv. International Rimar Congress Of Pure, Applied Sciences

A New Intrusion Detection Approach Based on RNA Encoding and K-Means Clustering Algorithm Using KDD-Cup99 Dataset

Intrusion Detection

Hybrid

Misuse

Anomaly

Clustering.

Omar Fitian

Safa Ahmed

عماد

...Show More Authors

Intrusion detection systems (IDS) are useful tools that help security administrators in the developing task to secure the network and alert in any possible harmful event. IDS can be classified either as misuse or anomaly, depending on the detection methodology. Where Misuse IDS can recognize the known attack based on their signatures, the main disadvantage of these systems is that they cannot detect new attacks. At the same time, the anomaly IDS depends on normal behaviour, where the main advantage of this system is its ability to discover new attacks. On the other hand, the main drawback of anomaly IDS is high false alarm rate results. Therefore, a hybrid IDS is a combination of misuse and anomaly and acts as a solution to overcome the dis

Preview PDF

Publication Date

Wed Mar 01 2023

Journal Name

Baghdad Science Journal

An Investigation of Corona Domination Number for Some Special Graphs and Jahangir Graph

Corona dominating set

Dominating set

Jahangir graph

Pendant and support vertex

Tadpole graph

L.

G.

C.

...Show More Authors

In this work, the study of corona domination in graphs is carried over which was initially proposed by G. Mahadevan et al. Let be a simple graph. A dominating set S of a graph is said to be a corona-dominating set if every vertex in is either a pendant vertex or a support vertex. The minimum cardinality among all corona-dominating sets is called the corona-domination number and is denoted by (i.e) . In this work, the exact value of the corona domination number for some specific types of graphs are given. Also, some results on the corona domination number for some classes of graphs are obtained and the method used in this paper is a well-known number theory concept with some modification this method can also be applied to obt

View Publication Preview PDF

(4)

(3)

Publication Date

Fri Jan 01 2010

Journal Name

Iraqi Journal Of Science

RETRIEVING DOCUMENT WITH COMPACT GENETIC ALGORITHM(CGA)

Sarab

Maysaa

Zainab

...Show More Authors

Preview PDF

Publication Date

Mon Jan 02 2017

Journal Name

Journal Of Educational And Psychological Researches

Test anxiety and cognitive representation among university students

Test anxiety and cognitive representation

ايناس محمد مهدي

...Show More Authors

This study was conducted to determine the relationship between test anxiety and cognitive representation among university students. To this end, 152 student (male, female) were chosen randomly from scientific and social departments to fill out the questionnaires of test anxiety and cognitive representation. The researcher utilized Independent Samples T-Test, Pearson product-moment correlation coefficient, Cronbach's alpha and T-Test in his study. The result revealed that there were negative and a weak correlation between test anxiety and cognitive representation among university students.

View Publication Preview PDF

Publication Date

Thu Feb 01 2024

Journal Name

Baghdad Science Journal

A Novel Gravity ‎Optimization Algorithm for Extractive Arabic Text Summarization

Abstractive Summarization

Extractive Summarization

Arabic Text Summarization

Similarity Graph

Gravitational Optimization Algorithm

Mustafa

Ayad R.

Osamah Y.

...Show More Authors

An automatic text summarization system mimics how humans summarize by picking the most ‎significant sentences in a source text. However, the complexities of the Arabic language have become ‎challenging to obtain information quickly and effectively. The main disadvantage of the ‎traditional approaches is that they are strictly constrained (especially for the Arabic language) by the ‎accuracy of sentence feature ‎functions, weighting schemes, ‎and similarity calculations. On the other hand, the meta-heuristic search approaches have a feature tha

View Publication Preview PDF

(2)

Publication Date

Wed Feb 01 2023

Journal Name

Baghdad Science Journal

Order Sum Graph of a Group

Algebraic graphs

Center

Domination

Graph spectra

Order sum graphs. MSC2010: CXCL10

CXCL16

Date

Javeria

Sudev

...Show More Authors

The concept of the order sum graph associated with a finite group based on the order of the group and order of group elements is introduced. Some of the properties and characteristics such as size, chromatic number, domination number, diameter, circumference, independence number, clique number, vertex connectivity, spectra, and Laplacian spectra of the order sum graph are determined. Characterizations of the order sum graph to be complete, perfect, etc. are also obtained.

View Publication Preview PDF

(15)

(5)

Publication Date

Fri Nov 20 2020

Journal Name

Solid State Technology

Comparative Study for Bi-Clustering Algorithms: Historical and Methodological Notes

Safa S

Hiba S

Saif S

...Show More Authors

View Publication

1 2 ... 5 6 7 8 ... 726 727