Graph based text representation for document clustering

Asma Khazaal Abdulsahib Abdulsahib; SITI SAKIRA KAMARUDDIN KAMARUDDIN

Details

Publication Date

Thu Jan 01 2015

Journal Name

Journal Of Theoretical And Applied Information Technology

Volume

76

Issue Number

1

Choose Citation Style

Statistics

View publication

6

View pdf

3

Statistics

(15)

Graph based text representation for document clustering

Text Representation Schemes

Dependency Graph

Document Clustering

Sparsity Problem

Semantic Problem.

Asma Khazaal Abdulsahib Abdulsahib

SITI SAKIRA KAMARUDDIN KAMARUDDIN

...Show More Authors

Advances in digital technology and the World Wide Web has led to the increase of digital documents that are used for various purposes such as publishing and digital library. This phenomenon raises awareness for the requirement of effective techniques that can help during the search and retrieval of text. One of the most needed tasks is clustering, which categorizes documents automatically into meaningful groups. Clustering is an important task in data mining and machine learning. The accuracy of clustering depends tightly on the selection of the text representation method. Traditional methods of text representation model documents as bags of words using term-frequency index document frequency (TFIDF). This method ignores the relationship and meanings of words in the document. As a result the sparsity and semantic problem that is prevalent in textual document are not resolved. In this study, the problem of sparsity and semantic is reduced by proposing a graph based text representation method, namely dependency graph with the aim of improving the accuracy of document clustering. The dependency graph representation scheme is created through an accumulation of syntactic and semantic analysis. A sample of 20 news groups, dataset was used in this study. The text documents undergo pre-processing and syntactic parsing in order to identify the sentence structure. Then the semantic of words are modeled using dependency graph. The produced dependency graph is then used in the process of cluster analysis. K-means clustering technique was used in this study. The dependency graph based clustering result were compared with the popular text representation method, i.e. TFIDF and Ontology based text representation. The result shows that the dependency graph outperforms both TFIDF and Ontology based text representation. The findings proved that the proposed text representation method leads to more accurate document clustering results.

Preview PDF

Quick Preview PDF

Publication Date

Fri Sep 01 2023

Journal Name

Indonesian Journal Of Electrical Engineering And Computer Science

Document retrieval using term term frequency inverse sentence frequency weighting scheme

Document representation

Document retrieval

Similarity measures

Term frequency inverse

sentence frequency

Weighting schemes

Mohannad T.

Omar Fitian

...Show More Authors

The need for an efficient method to find the furthermost appropriate document corresponding to a particular search query has become crucial due to the exponential development in the number of papers that are now readily available to us on the web. The vector space model (VSM) a perfect model used in “information retrieval”, represents these words as a vector in space and gives them weights via a popular weighting method known as term frequency inverse document frequency (TF-IDF). In this research, work has been proposed to retrieve the most relevant document focused on representing documents and queries as vectors comprising average term term frequency inverse sentence frequency (TF-ISF) weights instead of representing them as v

View Publication

(10)

(7)

Publication Date

Wed Jul 13 2016

Journal Name

International Journal Of Mathematics Trends And Technology

Designed Algorithms for Compute the Tenser Product of Representation for the Special Linear Groups

Designed algorithms

representation for the group

degree of the representation

character of representation

tenser product.

Hani

Niran

Samera Shams

Sukaina

...Show More Authors

The main objective of this paper is to designed algorithms and implemented in the construction of the main program designated for the determination the tenser product of representation for the special linear group.

View Publication Preview PDF

Publication Date

Sun Jan 01 2023

Journal Name

2nd International Conference On Mathematical Techniques And Applications: Icmta2021

Review of clustering for gene expression data

Omar

Basad

...Show More Authors

View Publication

(2)

Publication Date

Sun Jan 01 2023

Journal Name

Journal Of Discrete Mathematical Sciences And Cryptography

A4-graph for the twisted group 3D4 (3)

Hamid S.M.

Ali Abd

...Show More Authors

Assume that G is a finite group and X = tG where t is non-identity element with t3 = 1. The simple graph with node set being X such that a, b ∈ X, are adjacent if ab-1 is an involution element, is called the A4-graph, and designated by A4(G, X). In this article, the construction of A4(G, X) is analyzed for G is the twisted group of Lie type 3D4(3).

View Publication

Publication Date

Sun Dec 04 2011

Journal Name

Baghdad Science Journal

Modifying Hebbian Network for Text Cipher

Hebbian Network

Neural Network

Text Security.

Noor Adnan

...Show More Authors

The objective of this work is to design and implement a cryptography system that enables the sender to send message through any channel (even if this channel is insecure) and the receiver to decrypt the received message without allowing any intruder to break the system and extracting the secret information. This work modernize the feedforward neural network, so the secret message will be encrypted by unsupervised neural network method to get the cipher text that can be decrypted using the same network to get the original text. The security of any cipher system depends on the security of the related keys (that are used by the encryption and the decryption processes) and their corresponding lengths. In this work, the key is the final weights

View Publication Preview PDF

Publication Date

Thu Sep 01 2016

Journal Name

2016 8th Computer Science And Electronic Engineering (ceec)

Class-specific pre-trained sparse autoencoders for learning effective features for document classification

Maysa

...Show More Authors

View Publication

(6)

(2)

Publication Date

Tue Sep 08 2020

Journal Name

Baghdad Science Journal

Hiding the Type of Skin Texture in Mice based on Fuzzy Clustering Technique

C-Mean

Extracting

LSB

Information hiding

Steganography

Alaa Noori

Ekhlas Falih

...Show More Authors

A substantial matter to confidential messages' interchange through the internet is transmission of information safely. For example, digital products' consumers and producers are keen for knowing those products are genuine and must be distinguished from worthless products. Encryption's science can be defined as the technique to embed the data in an images file, audio or videos in a style which should be met the safety requirements. Steganography is a portion of data concealment science that aiming to be reached a coveted security scale in the interchange of private not clear commercial and military data. This research offers a novel technique for steganography based on hiding data inside the clusters that resulted from fuzzy clustering. T

View Publication Preview PDF

(5)

Publication Date

Tue Aug 24 2021

Journal Name

Conference: The 5th International Multi-conference On Artificial Intelligence Technology (mcait 2021).

Text Encryption Based on DNA Cryptography, RNA, and Amino Acid

Cryptography

DNA cryptography

Encryption

Decryption

Security.

Omar Fitian

...Show More Authors

To achieve safe security to transfer data from the sender to receiver, cryptography is one way that is used for such purposes. However, to increase the level of data security, DNA as a new term was introduced to cryptography. The DNA can be easily used to store and transfer the data, and it becomes an effective procedure for such aims and used to implement the computation. A new cryptography system is proposed, consisting of two phases: the encryption phase and the decryption phase. The encryption phase includes six steps, starting by converting plaintext to their equivalent ASCII values and converting them to binary values. After that, the binary values are converted to DNA characters and then converted to their equivalent complementary DN

Publication Date

Mon Dec 05 2022

Journal Name

Baghdad Science Journal

Short Text Semantic Similarity Measurement Approach Based on Semantic Network

Naamah Hussein

Adel M.

Ahmed T.

...Show More Authors

Estimating the semantic similarity between short texts plays an increasingly prominent role in many fields related to text mining and natural language processing applications, especially with the large increase in the volume of textual data that is produced daily. Traditional approaches for calculating the degree of similarity between two texts, based on the words they share, do not perform well with short texts because two similar texts may be written in different terms by employing synonyms. As a result, short texts should be semantically compared. In this paper, a semantic similarity measurement method between texts is presented which combines knowledge-based and corpus-based semantic information to build a semantic network that repre

View Publication Preview PDF

(4)

(3)

Publication Date

Wed Apr 20 2022

Journal Name

Periodicals Of Engineering And Natural Sciences (pen)

Text image secret sharing with hiding based on color feature

Nuha

Yossra

Alyaa

Tarik Ahmed

...Show More Authors

View Publication

(1)

1 2 3 4 ... 726 727 728 729