Graph based text representation for document clustering

Asma Khazaal Abdulsahib Abdulsahib; SITI SAKIRA KAMARUDDIN KAMARUDDIN

Details

Publication Date

Thu Jan 01 2015

Journal Name

Journal Of Theoretical And Applied Information Technology

Volume

76

Issue Number

1

Choose Citation Style

Statistics

View pdf

3

Statistics

(15)

Graph based text representation for document clustering

Text Representation Schemes

Dependency Graph

Document Clustering

Sparsity Problem

Semantic Problem.

Asma Khazaal Abdulsahib Abdulsahib

SITI SAKIRA KAMARUDDIN KAMARUDDIN

...Show More Authors

Advances in digital technology and the World Wide Web has led to the increase of digital documents that are used for various purposes such as publishing and digital library. This phenomenon raises awareness for the requirement of effective techniques that can help during the search and retrieval of text. One of the most needed tasks is clustering, which categorizes documents automatically into meaningful groups. Clustering is an important task in data mining and machine learning. The accuracy of clustering depends tightly on the selection of the text representation method. Traditional methods of text representation model documents as bags of words using term-frequency index document frequency (TFIDF). This method ignores the relationship and meanings of words in the document. As a result the sparsity and semantic problem that is prevalent in textual document are not resolved. In this study, the problem of sparsity and semantic is reduced by proposing a graph based text representation method, namely dependency graph with the aim of improving the accuracy of document clustering. The dependency graph representation scheme is created through an accumulation of syntactic and semantic analysis. A sample of 20 news groups, dataset was used in this study. The text documents undergo pre-processing and syntactic parsing in order to identify the sentence structure. Then the semantic of words are modeled using dependency graph. The produced dependency graph is then used in the process of cluster analysis. K-means clustering technique was used in this study. The dependency graph based clustering result were compared with the popular text representation method, i.e. TFIDF and Ontology based text representation. The result shows that the dependency graph outperforms both TFIDF and Ontology based text representation. The findings proved that the proposed text representation method leads to more accurate document clustering results.

Preview PDF

Quick Preview PDF

Publication Date

Wed Sep 20 2017

Journal Name

Ibn Al-haitham Journal For Pure And Applied Sciences

Modified Radial Based Neural Network for Clustering and Routing Optimal Path in Wireless Network

ad hoc wireless network

MANET

Clustering

routing

wireless network clustering

modified Radial based neural network

kmean .

Haider Katdhum

Tuka Kareem

...Show More Authors

Several methods have been developed for routing problem in MANETs wireless network, because it considered very important problem in this network ,we suggested proposed method based on modified radial basis function networks RBFN and Kmean++ algorithm. The modification in RBFN for routing operation in order to find the optimal path between source and destination in MANETs clusters. Modified Radial Based Neural Network is very simple, adaptable and efficient method to increase the life time of nodes, packet delivery ratio and the throughput of the network will increase and connection become more useful because the optimal path has the best parameters from other paths including the best bitrate and best life link with minimum delays. The re

View Publication Preview PDF

Publication Date

Mon Feb 21 2022

Journal Name

Iraqi Journal For Computer Science And Mathematics

Fuzzy C means Based Evaluation Algorithms For Cancer Gene Expression Data Clustering

Omar

Basad

...Show More Authors

The influx of data in bioinformatics is primarily in the form of DNA, RNA, and protein sequences. This condition places a significant burden on scientists and computers. Some genomics studies depend on clustering techniques to group similarly expressed genes into one cluster. Clustering is a type of unsupervised learning that can be used to divide unknown cluster data into clusters. The k-means and fuzzy c-means (FCM) algorithms are examples of algorithms that can be used for clustering. Consequently, clustering is a common approach that divides an input space into several homogeneous zones; it can be achieved using a variety of algorithms. This study used three models to cluster a brain tumor dataset. The first model uses FCM, whic

View Publication

(1)

Publication Date

Thu Oct 01 2020

Journal Name

Defence Technology

A novel facial emotion recognition scheme based on graph mining

Emotion recognition

Facial landmarks

Graph mining

gSpan algorithm

Binary cat swarm optimization (BCSO)

Neural network

Suhaila N.

...Show More Authors

Recent years have seen an explosion in graph data from a variety of scientific, social and technological fields. From these fields, emotion recognition is an interesting research area because it finds many applications in real life such as in effective social robotics to increase the interactivity of the robot with human, driver safety during driving, pain monitoring during surgery etc. A novel facial emotion recognition based on graph mining has been proposed in this paper to make a paradigm shift in the way of representing the face region, where the face region is represented as a graph of nodes and edges and the gSpan frequent sub-graphs mining algorithm is used to find the frequent sub-structures in the graph database of each emotion. T

View Publication Preview PDF

(47)

(37)

Publication Date

Tue May 01 2018

Journal Name

Journal Of Physics: Conference Series

Hiding Techniques for Dynamic Encryption Text based on Corner Point

steganography

Harris corner point algorithm

dynamic encryption

Dynamic coding table

Firas A.

Alaa A.

Amna

...Show More Authors

Hiding technique for dynamic encryption text using encoding table and symmetric encryption method (AES algorithm) is presented in this paper. The encoding table is generated dynamically from MSB of the cover image points that used as the first phase of encryption. The Harris corner point algorithm is applied on cover image to generate the corner points which are used to generate dynamic AES key to second phase of text encryption. The embedded process in the LSB for the image pixels except the Harris corner points for more robust. Experimental results have demonstrated that the proposed scheme have embedding quality, error-free text recovery, and high value in PSNR.

View Publication Preview PDF

(11)

(3)

Publication Date

Mon Dec 05 2022

Journal Name

Baghdad Science Journal

Proposed Framework for Official Document Sharing and Verification in E-government Environment Based on Blockchain Technology

Rana F.

Asia Ali Salman

Shakir Mahmood

...Show More Authors

Progression in Computer networks and emerging of new technologies in this field helps to find out new protocols and frameworks that provides new computer network-based services. E-government services, a modernized version of conventional government, are created through the steady evolution of technology in addition to the growing need of societies for numerous services. Government services are deeply related to citizens’ daily lives; therefore, it is important to evolve with technological developments—it is necessary to move from the traditional methods of managing government work to cutting-edge technical approaches that improve the effectiveness of government systems for providing services to citizens. Blockchain technology is amon

View Publication Preview PDF

(6)

(2)

Publication Date

Sun Apr 23 2017

Journal Name

International Conference Of Reliable Information And Communication Technology

Classification of Arabic Writer Based on Clustering Techniques

Mohammed S. H.

...Show More Authors

Arabic text categorization for pattern recognitions is challenging. We propose for the first time a novel holistic method based on clustering for classifying Arabic writer. The categorization is accomplished stage-wise. Firstly, these document images are sectioned into lines, words, and characters. Secondly, their structural and statistical features are obtained from sectioned portions. Thirdly, F-Measure is used to evaluate the performance of the extracted features and their combination in different linkage methods for each distance measures and different numbers of groups. Finally, experiments are conducted on the standard KHATT dataset of Arabic handwritten text comprised of varying samples from 1000 writers. The results in the generatio

(6)

Publication Date

Fri Nov 11 2022

Journal Name

Al-mansour Journal

Text Cryptography Based on Three Different Keys

Text Cryptography

Cryptography

Plaintext

Ciphertext

Omar Fitian

Mohammed Jasim

Mustafa

...Show More Authors

Secure information transmission over the internet is becoming an important requirement in data communication. These days, authenticity, secrecy, and confidentiality are the most important concerns in securing data communication. For that reason, information hiding methods are used, such as Cryptography, Steganography and Watermarking methods, to secure data transmission, where cryptography method is used to encrypt the information in an unreadable form. At the same time, steganography covers the information within images, audio or video. Finally, watermarking is used to protect information from intruders. This paper proposed a new cryptography method by using thre

Publication Date

Fri Jan 01 2021

Journal Name

Ieee Access

Microwave Nondestructive Testing for Defect Detection in Composites Based on K-Means Clustering Algorithm

Nawaf H. M. M.

Ghassan N.

Nor Ashidi Mat

Muhammad Firdaus

...Show More Authors

View Publication

(59)

Publication Date

Fri Sep 23 2022

Journal Name

Specialusis Ugdymas

Text Cryptography based on Arabic Words Characters Number

Cryptography

Text cryptography

Arabic characters

Encryption

Decryption.

Omar Fitian

Mohammed Jasim

Mustafa Sabah

...Show More Authors

Cryptography is a method used to mask text based on any encryption method, and the authorized user only can decrypt and read this message. An intruder tried to attack in many manners to access the communication channel, like impersonating, non-repudiation, denial of services, modification of data, threatening confidentiality and breaking availability of services. The high electronic communications between people need to ensure that transactions remain confidential. Cryptography methods give the best solution to this problem. This paper proposed a new cryptography method based on Arabic words; this method is done based on two steps. Where the first step is binary encoding generation used t

Publication Date

Sun Jan 20 2019

Journal Name

Ibn Al-haitham Journal For Pure And Applied Sciences

Text Classification Based on Weighted Extreme Learning Machine

Text Classification

Multiple Linear Regression

Extrem Machine Learning.

Hayder Mahmood

...Show More Authors

The huge amount of documents in the internet led to the rapid need of text classification (TC). TC is used to organize these text documents. In this research paper, a new model is based on Extreme Machine learning (EML) is used. The proposed model consists of many phases including: preprocessing, feature extraction, Multiple Linear Regression (MLR) and ELM. The basic idea of the proposed model is built upon the calculation of feature weights by using MLR. These feature weights with the extracted features introduced as an input to the ELM that produced weighted Extreme Learning Machine (WELM). The results showed a great competence of the proposed WELM compared to the ELM.

View Publication Preview PDF

(3)

1 2 3 4 ... 697 698 699 700