Graph based text representation for document clustering

Asma Khazaal Abdulsahib Abdulsahib; SITI SAKIRA KAMARUDDIN KAMARUDDIN

Details

Publication Date

Thu Jan 01 2015

Journal Name

Journal Of Theoretical And Applied Information Technology

Volume

76

Issue Number

1

Choose Citation Style

Statistics

View publication

5

View pdf

3

Statistics

(15)

Graph based text representation for document clustering

Text Representation Schemes

Dependency Graph

Document Clustering

Sparsity Problem

Semantic Problem.

Asma Khazaal Abdulsahib Abdulsahib

SITI SAKIRA KAMARUDDIN KAMARUDDIN

...Show More Authors

Advances in digital technology and the World Wide Web has led to the increase of digital documents that are used for various purposes such as publishing and digital library. This phenomenon raises awareness for the requirement of effective techniques that can help during the search and retrieval of text. One of the most needed tasks is clustering, which categorizes documents automatically into meaningful groups. Clustering is an important task in data mining and machine learning. The accuracy of clustering depends tightly on the selection of the text representation method. Traditional methods of text representation model documents as bags of words using term-frequency index document frequency (TFIDF). This method ignores the relationship and meanings of words in the document. As a result the sparsity and semantic problem that is prevalent in textual document are not resolved. In this study, the problem of sparsity and semantic is reduced by proposing a graph based text representation method, namely dependency graph with the aim of improving the accuracy of document clustering. The dependency graph representation scheme is created through an accumulation of syntactic and semantic analysis. A sample of 20 news groups, dataset was used in this study. The text documents undergo pre-processing and syntactic parsing in order to identify the sentence structure. Then the semantic of words are modeled using dependency graph. The produced dependency graph is then used in the process of cluster analysis. K-means clustering technique was used in this study. The dependency graph based clustering result were compared with the popular text representation method, i.e. TFIDF and Ontology based text representation. The result shows that the dependency graph outperforms both TFIDF and Ontology based text representation. The findings proved that the proposed text representation method leads to more accurate document clustering results.

Preview PDF

Quick Preview PDF

Publication Date

Tue Feb 01 2022

Journal Name

Int. J. Nonlinear Anal. Appl.

Computer-based plagiarism detection techniques: A comparative study

Plagiarism

Academic

Detection

Dataset

Pan

Mohammed S. H.

...Show More Authors

Plagiarism is becoming more of a problem in academics. It’s made worse by the ease with which a wide range of resources can be found on the internet, as well as the ease with which they can be copied and pasted. It is academic theft since the perpetrator has ”taken” and presented the work of others as his or her own. Manual detection of plagiarism by a human being is difficult, imprecise, and time-consuming because it is difficult for anyone to compare their work to current data. Plagiarism is a big problem in higher education, and it can happen on any topic. Plagiarism detection has been studied in many scientific articles, and methods for recognition have been created utilizing the Plagiarism analysis, Authorship identification, and

Publication Date

Sun Sep 01 2013

Journal Name

International Journal Of Computer Applications

Concise Architecture of a Remote Network based Controller

Acquisition

Monitoring

Scalability

Reusability

Economical

microcontroller.

Hayder

Sadiq H.

Basheera M.

...Show More Authors

The development of microcontroller is used in monitoring and data acquisition recently. This development has born various architectures for spreading and interfacing the microcontroller in network environment. Some of existing architecture suffers from redundant in resources, extra processing, high cost and delay in response. This paper presents flexible concise architecture for building distributed microcontroller networked system. The system consists of only one server, works through the internet, and a set of microcontrollers distributed in different sites. Each microcontroller is connected through the Ethernet to the internet. In this system the client requesting data from certain side is accomplished through just one server that is in

View Publication Preview PDF

Publication Date

Thu Aug 01 2019

Journal Name

2019 2nd International Conference On Engineering Technology And Its Applications (iiceta)

Human Gait Identification System Based on Average Silhouette

Mohanad Hazim Nsaif

Nawaf Hazim

Sinan Sameer Mahmood

...Show More Authors

View Publication

(1)

Publication Date

Wed Sep 01 2021

Journal Name

Baghdad Science Journal

Optimum Median Filter Based on Crow Optimization Algorithm

Image processing

Impulse noise

Noise removal

Optimum median filter

Crow optimization algorithm.

Basma Jumaa

Ahmed Yousif Falih

Ali Talib Qasim

Lamees abdalhasan

...Show More Authors

A novel median filter based on crow optimization algorithms (OMF) is suggested to reduce the random salt and pepper noise and improve the quality of the RGB-colored and gray images. The fundamental idea of the approach is that first, the crow optimization algorithm detects noise pixels, and that replacing them with an optimum median value depending on a criterion of maximization fitness function. Finally, the standard measure peak signal-to-noise ratio (PSNR), Structural Similarity, absolute square error and mean square error have been used to test the performance of suggested filters (original and improved median filter) used to removed noise from images. It achieves the simulation based on MATLAB R2019b and the resul

View Publication Preview PDF

(8)

(4)

Publication Date

Tue Feb 28 2023

Journal Name

International Journal Of Intelligent Engineering And Systems

Design and Implementation of EEG-Based Smart Structure

Oger Zaya

Yarub

...Show More Authors

View Publication

(6)

(1)

Publication Date

Thu Jan 01 2015

Journal Name

Iraqi Journal Of Science

Keystroke Dynamics Authentication based on Naïve Bayes Classifier

Mays M. Hoobi

...Show More Authors

Authentication is the process of determining whether someone or something is, in fact, who or what it is declared to be. As the dependence upon computers and computer networks grows, the need for user authentication has increased. User’s claimed identity can be verified by one of several methods. One of the most popular of these methods is represented by (something user know), such as password or Personal Identification Number (PIN). Biometrics is the science and technology of authentication by identifying the living individual’s physiological or behavioral attributes. Keystroke authentication is a new behavioral access control system to identify legitimate users via their typing behavior. The objective of this paper is to provide user

Publication Date

Mon Apr 15 2019

Journal Name

Proceedings Of The International Conference On Information And Communication Technology

Hybrid LDPC-STBC communications system based on chaos

Lwaa Faisal

Jokhakar Jignesh

U.

Muralidhar

...Show More Authors

View Publication

Publication Date

Sat Oct 01 2022

Journal Name

Therapeutic Delivery

Particles-based Medicated Wound Dressings: A Comprehensive Review

Kawther K

Amaraporn

...Show More Authors

View Publication

(3)

(2)

Publication Date

Tue Jan 01 2013

Journal Name

International Journal Of Computer Applications

Content-based Image Retrieval (CBIR) using Hybrid Technique

CBIR

feature extraction

properties

color histogram

GLCM

hybrid

similarity measure

Zainab

Israa

Nabeel

...Show More Authors

Image retrieval is used in searching for images from images database. In this paper, content – based image retrieval (CBIR) using four feature extraction techniques has been achieved. The four techniques are colored histogram features technique, properties features technique, gray level co- occurrence matrix (GLCM) statistical features technique and hybrid technique. The features are extracted from the data base images and query (test) images in order to find the similarity measure. The similarity-based matching is very important in CBIR, so, three types of similarity measure are used, normalized Mahalanobis distance, Euclidean distance and Manhattan distance. A comparison between them has been implemented. From the results, it is conclud

View Publication

Publication Date

Sun Sep 24 2023

Journal Name

Journal Of Al-qadisiyah For Computer Science And Mathematics

Iris Data Compression Based on Hexa-Data Coding

Ghadah

Haider Hameed

Mohammed M.

Marcos. A.

...Show More Authors

Iris research is focused on developing techniques for identifying and locating relevant biometric features, accurate segmentation and efficient computation while lending themselves to compression methods. Most iris segmentation methods are based on complex modelling of traits and characteristics which, in turn, reduce the effectiveness of the system being used as a real time system. This paper introduces a novel parameterized technique for iris segmentation. The method is based on a number of steps starting from converting grayscale eye image to a bit plane representation, selection of the most significant bit planes followed by a parameterization of the iris location resulting in an accurate segmentation of the iris from the origin

View Publication

1 2 ... 60 61 62 63 ... 696 697