Graph based text representation for document clustering

Asma Khazaal Abdulsahib Abdulsahib; SITI SAKIRA KAMARUDDIN KAMARUDDIN

Details

Publication Date

Thu Jan 01 2015

Journal Name

Journal Of Theoretical And Applied Information Technology

Volume

76

Issue Number

1

Choose Citation Style

Statistics

View publication

5

View pdf

3

Statistics

(15)

Graph based text representation for document clustering

Text Representation Schemes

Dependency Graph

Document Clustering

Sparsity Problem

Semantic Problem.

Asma Khazaal Abdulsahib Abdulsahib

SITI SAKIRA KAMARUDDIN KAMARUDDIN

...Show More Authors

Advances in digital technology and the World Wide Web has led to the increase of digital documents that are used for various purposes such as publishing and digital library. This phenomenon raises awareness for the requirement of effective techniques that can help during the search and retrieval of text. One of the most needed tasks is clustering, which categorizes documents automatically into meaningful groups. Clustering is an important task in data mining and machine learning. The accuracy of clustering depends tightly on the selection of the text representation method. Traditional methods of text representation model documents as bags of words using term-frequency index document frequency (TFIDF). This method ignores the relationship and meanings of words in the document. As a result the sparsity and semantic problem that is prevalent in textual document are not resolved. In this study, the problem of sparsity and semantic is reduced by proposing a graph based text representation method, namely dependency graph with the aim of improving the accuracy of document clustering. The dependency graph representation scheme is created through an accumulation of syntactic and semantic analysis. A sample of 20 news groups, dataset was used in this study. The text documents undergo pre-processing and syntactic parsing in order to identify the sentence structure. Then the semantic of words are modeled using dependency graph. The produced dependency graph is then used in the process of cluster analysis. K-means clustering technique was used in this study. The dependency graph based clustering result were compared with the popular text representation method, i.e. TFIDF and Ontology based text representation. The result shows that the dependency graph outperforms both TFIDF and Ontology based text representation. The findings proved that the proposed text representation method leads to more accurate document clustering results.

Preview PDF

Quick Preview PDF

Publication Date

Sat Jul 09 2022

Journal Name

Wireless Communications And Mobile Computing

An Optimized Approach for Industrial IoT Based on Edge Computing

Mohammed

...Show More Authors

The Internet of Things (IoT) is an information network that connects gadgets and sensors to allow new autonomous tasks. The Industrial Internet of Things (IIoT) refers to the integration of IoT with industrial applications. Some vital infrastructures, such as water delivery networks, use IIoT. The scattered topology of IIoT and resource limits of edge computing provide new difficulties to traditional data storage, transport, and security protection with the rapid expansion of the IIoT. In this paper, a recovery mechanism to recover the edge network failure is proposed by considering repair cost and computational demands. The NP-hard problem was divided into interdependent major and minor problems that could be solved in polynomial t

View Publication

(4)

(1)

Publication Date

Tue Nov 27 2018

Journal Name

The Iraqi Geological Journal

CHRONOSTRATIGRAPHICALLY BASED RESERVOIR MODEL FOR CENOMANIAN CARBONATES, SOUTHEASTERN IRAQ OILFIELDS

Awadees M.R.

Sameer N.A.

Salih

Mjeed M.

...Show More Authors

The Cenomanian – Turronian sedimentary succession in the south Iraq oil fields, including Ahmadi, Rumaila, Mishrif and Khasib formations have undergone into high-resolution reservoir-scale genetic sequence stratigraphic analysis. Some oil-wells from Majnoon and West-Qurna oil fields were selected as a representative case for the regional sequence stratigraphic analysis. The south Iraqi Albian – Cenomanian – Turronian succession of 2nd-order depositional super-sequence has been analyzed based on the Arabian Plate chronosequence stratigraphic context, properly distinguished by three main chrono-markers (The maximum flooding surface, MFS-K100 of the upper shale member of Nahr Umr Formation, MFS-K140 of the upper Mishrif carbonate

View Publication

(2)

Publication Date

Sat Feb 01 2020

Journal Name

International Journal Of Computer Science And Mobile Computing

Hierarchical Fixed Prediction of Mixed based for Medical Image Compression.

Ghadah

...Show More Authors

Publication Date

Fri May 17 2019

Journal Name

Lecture Notes In Networks And Systems

Features Selection for Intrusion Detection System Based on DNA Encoding

Intrusion detection system

DNA encoding

Feature selection

KDD Cup 99 dataset

NSL-KDD dataset

Omar Fitian

Zulaiha Ali

Suhaila

...Show More Authors

Intrusion detection systems detect attacks inside computers and networks, where the detection of the attacks must be in fast time and high rate. Various methods proposed achieved high detection rate, this was done either by improving the algorithm or hybridizing with another algorithm. However, they are suffering from the time, especially after the improvement of the algorithm and dealing with large traffic data. On the other hand, past researches have been successfully applied to the DNA sequences detection approaches for intrusion detection system; the achieved detection rate results were very low, on other hand, the processing time was fast. Also, feature selection used to reduce the computation and complexity lead to speed up the system

(5)

Publication Date

Sat Nov 01 2025

Journal Name

Kufa Journal Of Engineering

SIMULATION FOR PERFORMANCE EVALUATION OF SATELLITE-BASED QUANTUM COMMUNICATION SYSTEM

Geometrical loss

Quantum key distribution

Satellite systems

Secure key rate

Single photon detector

Adil Fadhil

Shelan

Axel

...Show More Authors

The selection and assessment of single-photon detection modules is a crucial problem in satellite-based QKD systems. The system's overall efficiency, secure key rate and quantum bit error rate are all significantly influenced by single-photon detection modules. There is a knowledge gap about the practical performance of commercially available single-photon detectors because existing research frequently relies on theoretical characteristics. This paper introduces a study on the effect of the parameters of three commercial single photon detection modules from ID Quantique company: ID Qube, ID100, and ID281 on certain Bennett-Brassard 1984 protocol parameters such as secure key rate, mean photon number per pulse, quantum bit error rate

View Publication Preview PDF

(1)

Publication Date

Fri Apr 01 2011

Journal Name

Al-mustansiriyah Journal Of Science

A Genetic Algorithm Based Approach For Generating Unit Maintenance Scheduling

Wathiq N.

...Show More Authors

Publication Date

Thu Dec 01 2022

Journal Name

Neuroscience Informatics

Epileptic EEG activity detection for children using entropy-based biomarkers

Sadeem Nabeel Saleem

Noor Kamal

Sumai Hamad

Mohannad K.

...Show More Authors

View Publication

(22)

(14)

Publication Date

Tue Jan 01 2013

Journal Name

International Journal Of Application Or Innovation In Engineering & Management (ijaiem)

Probabilistic Neural Network for User Authentication Based on Keystroke Dynamics

Mays M. Hoobi

...Show More Authors

Computer systems and networks are increasingly used for many types of applications; as a result the security threats to computers and networks have also increased significantly. Traditionally, password user authentication is widely used to authenticate legitimate user, but this method has many loopholes such as password sharing, brute force attack, dictionary attack and more. The aim of this paper is to improve the password authentication method using Probabilistic Neural Networks (PNNs) with three types of distance include Euclidean Distance, Manhattan Distance and Euclidean Squared Distance and four features of keystroke dynamics including Dwell Time (DT), Flight Time (FT), mixture of (DT) and (FT), and finally Up-Up Time (UUT). The resul

Publication Date

Wed Oct 01 2008

Journal Name

2008 First International Conference On Distributed Framework And Applications

A strategy for Grid based t-way test data generation

Mohammed I.

Kamal Z.

Nor Ashidi Mat

...Show More Authors

View Publication

(22)

(17)

Publication Date

Tue Sep 01 2020

Journal Name

Al-khwarizmi Engineering Journal

Arduino-Based Controller for Sequence Development of Automated Manufacturing System

Shahad Sarmad

Maher Yahya

Ahmed M.

...Show More Authors

It has become necessary to change from a traditional system to an automated system in production processes, because it has high advantages. The most important of them is improving and increasing production. But there is still a need to improve and develop the work of these systems.

The objective of this work is to study time reduction by combining multiple sequences of operations into one process. To carry out this work, the pneumatic system is designed to decrease\ increase the time of the sequence that performs a pick and place process through optimizing the sequences based on the obstacle dimensions. Three axes are represented using pneumatic cylinders that move according to the sequence used. The system is implemented and con

View Publication Preview PDF

(5)

1 2 ... 22 23 24 25 ... 720 721