Graph based text representation for document clustering

Asma Khazaal Abdulsahib Abdulsahib; SITI SAKIRA KAMARUDDIN KAMARUDDIN

Details

Publication Date

Thu Jan 01 2015

Journal Name

Journal Of Theoretical And Applied Information Technology

Volume

76

Issue Number

1

Choose Citation Style

Statistics

View publication

5

View pdf

3

Statistics

(15)

Graph based text representation for document clustering

Text Representation Schemes

Dependency Graph

Document Clustering

Sparsity Problem

Semantic Problem.

Asma Khazaal Abdulsahib Abdulsahib

SITI SAKIRA KAMARUDDIN KAMARUDDIN

...Show More Authors

Advances in digital technology and the World Wide Web has led to the increase of digital documents that are used for various purposes such as publishing and digital library. This phenomenon raises awareness for the requirement of effective techniques that can help during the search and retrieval of text. One of the most needed tasks is clustering, which categorizes documents automatically into meaningful groups. Clustering is an important task in data mining and machine learning. The accuracy of clustering depends tightly on the selection of the text representation method. Traditional methods of text representation model documents as bags of words using term-frequency index document frequency (TFIDF). This method ignores the relationship and meanings of words in the document. As a result the sparsity and semantic problem that is prevalent in textual document are not resolved. In this study, the problem of sparsity and semantic is reduced by proposing a graph based text representation method, namely dependency graph with the aim of improving the accuracy of document clustering. The dependency graph representation scheme is created through an accumulation of syntactic and semantic analysis. A sample of 20 news groups, dataset was used in this study. The text documents undergo pre-processing and syntactic parsing in order to identify the sentence structure. Then the semantic of words are modeled using dependency graph. The produced dependency graph is then used in the process of cluster analysis. K-means clustering technique was used in this study. The dependency graph based clustering result were compared with the popular text representation method, i.e. TFIDF and Ontology based text representation. The result shows that the dependency graph outperforms both TFIDF and Ontology based text representation. The findings proved that the proposed text representation method leads to more accurate document clustering results.

Preview PDF

Quick Preview PDF

Publication Date

Sat Jan 14 2023

Journal Name

Cogent Engineering

C. B interrupt duty reduction based controlling TRV and symmetrical breaking current

Nadheer A.

Yasar N.

Mafaza N.

...Show More Authors

View Publication

(6)

(2)

Publication Date

Mon Jan 02 2023

Journal Name

International Journal Of Nonlinear Analysis And Applications

Diagnostic COVID-19 based on chest imaging of COVID-19: A survey

Saja

Alyaa

...Show More Authors

Publication Date

Tue Nov 01 2022

Journal Name

Aci Structural Journal

Stress at Ultimate in Internally Unbonded Steel Based on Genetic Expression Programming

Nazar K.

Iqbal

...Show More Authors

View Publication

Publication Date

Fri Nov 01 2019

Journal Name

International Journal Of Computer Science And Mobile Computing

Adaptive Color Image Compression of Hybrid Coding and Inter Differentiation Based Techniques

Ghadah

...Show More Authors

Publication Date

Sun Dec 01 2013

Journal Name

Diyala Journal Of Engineering Sciences

Design and Simulation of parallel CDMA System Based on 3D-Hadamard Transform

DS-CDMA

3-D Hadamard

2-D Spreading Codes

2-D CDMA

2-D ISI Channel

Ali

...Show More Authors

Future wireless systems aim to provide higher transmission data rates, improved spectral efficiency and greater capacity. In this paper a spectral efficient two dimensional (2-D) parallel code division multiple access (CDMA) system is proposed for generating and transmitting (2-D CDMA) symbols through 2-D Inter-Symbol Interference (ISI) channel to increase the transmission speed. The 3D-Hadamard matrix is used to generate the 2-D spreading codes required to spread the two-dimensional data for each user row wise and column wise. The quadrature amplitude modulation (QAM) is used as a data mapping technique due to the increased spectral efficiency offered. The new structure simulated using MATLAB and a comparison of performance for ser

View Publication Preview PDF

Publication Date

Wed Feb 10 2016

Journal Name

Scientific Reports

Experimental demonstration on the deterministic quantum key distribution based on entangled photons

Hua

Zhi-Yuan

Alaa Jabbar Jumaah

Zhen-Qiang

Juan

Yun-Guang

Shuang

Hong-Wei

De-Yong

Shelan Khasro

Bao-Sen

Guang-Can

Wei

Zheng-Fu

...Show More Authors

As an important resource, entanglement light source has been used in developing quantum information technologies, such as quantum key distribution(QKD). There are few experiments implementing entanglement-based deterministic QKD protocols since the security of existing protocols may be compromised in lossy channels. In this work, we report on a loss-tolerant deterministic QKD experiment which follows a modified “Ping-Pong”(PP) protocol. The experiment results demonstrate for the first time that a secure deterministic QKD session can be fulfilled in a channel with an optical loss of 9 dB, based on a telecom-band entangled photon source. This exhibits a conceivable prospect of ultilizing entanglement light source in real-life fiber-based

View Publication

(17)

(16)

Publication Date

Fri Mar 31 2023

Journal Name

Wasit Journal Of Computer And Mathematics Science

Security In Wireless Sensor Networks Based On Lightweight Algorithms : An Effective Survey

Data Confidentiality

Lightweight Cryptography

Security in Wireless Net-works

Wireless Sensor Networks

Mohammed

Sif

...Show More Authors

At the level of both individuals and companies, Wireless Sensor Networks (WSNs) get a wide range of applications and uses. Sensors are used in a wide range of industries, including agriculture, transportation, health, and many more. Many technologies, such as wireless communication protocols, the Internet of Things, cloud computing, mobile computing, and other emerging technologies, are connected to the usage of sensors. In many circumstances, this contact necessitates the transmission of crucial data, necessitating the need to protect that data from potential threats. However, as the WSN components often have constrained computation and power capabilities, protecting the communication in WSNs comes at a significant performance pena

View Publication

Publication Date

Wed Jan 01 2020

Journal Name

International Journal Of Advance Science And Technology

MR Images Classification of Alzheimer's Disease Based on Deep Belief Network Method

Alzheimer’s Disease

Magnetic Resonance Imaging

Deep Belief Network

Gray Level

Co-occurrence Matrix

Mohammed S. H.

...Show More Authors

Background/Objectives: The purpose of this study was to classify Alzheimer’s disease (AD) patients from Normal Control (NC) patients using Magnetic Resonance Imaging (MRI). Methods/Statistical analysis: The performance evolution is carried out for 346 MR images from Alzheimer's Neuroimaging Initiative (ADNI) dataset. The classifier Deep Belief Network (DBN) is used for the function of classification. The network is trained using a sample training set, and the weights produced are then used to check the system's recognition capability. Findings: As a result, this paper presented a novel method of automated classification system for AD determination. The suggested method offers good performance of the experiments carried out show that the

Publication Date

Mon Oct 01 2018

Journal Name

Radioelectronics And Communications Systems

Optical CDMA Coded STBC Based on Chaotic Technique in FSO Communication Systems

Lwaa Faisal

...Show More Authors

Free-Space Optical (FSO) can provide high-speed communications when the effect of turbulence is not serious. However, Space-Time-Block-Code (STBC) is a good candidate to mitigate this seriousness. This paper proposes a hybrid of an Optical Code Division Multiple Access (OCDMA) and STBC in FSO communication for last mile solutions, where access to remote areas is complicated. The main weakness effecting a FSO link is the atmospheric turbulence. The feasibility of employing STBC in OCDMA is to mitigate these effects. The current work evaluates the Bit-Error-Rate (BER) performance of OCDMA operating under the scintillation effect, where this effect can be described by the gamma-gamma model. The most obvious finding to emerge from the analysis

View Publication

(6)

Publication Date

Tue Oct 25 2022

Journal Name

Minar Congress 6

HANDWRITTEN DIGITS CLASSIFICATION BASED ON DISCRETE WAVELET TRANSFORM AND SPIKE NEURAL NETWORK

Machine Learning

Artificial Intelligence

Classification

Discrete Wavelet Transform

Spike Neural Network.

Dina

Marwa

...Show More Authors

In this paper, a handwritten digit classification system is proposed based on the Discrete Wavelet Transform and Spike Neural Network. The system consists of three stages. The first stage is for preprocessing the data and the second stage is for feature extraction, which is based on Discrete Wavelet Transform (DWT). The third stage is for classification and is based on a Spiking Neural Network (SNN). To evaluate the system, two standard databases are used: the MADBase database and the MNIST database. The proposed system achieved a high classification accuracy rate with 99.1% for the MADBase database and 99.9% for the MNIST database

View Publication Preview PDF

1 2 ... 88 89 90 91 ... 721 722