Graph based text representation for document clustering

Asma Khazaal Abdulsahib Abdulsahib; SITI SAKIRA KAMARUDDIN KAMARUDDIN

Details

Publication Date

Thu Jan 01 2015

Journal Name

Journal Of Theoretical And Applied Information Technology

Volume

76

Issue Number

1

Choose Citation Style

Statistics

View publication

5

View pdf

3

Statistics

(15)

Graph based text representation for document clustering

Text Representation Schemes

Dependency Graph

Document Clustering

Sparsity Problem

Semantic Problem.

Asma Khazaal Abdulsahib Abdulsahib

SITI SAKIRA KAMARUDDIN KAMARUDDIN

...Show More Authors

Advances in digital technology and the World Wide Web has led to the increase of digital documents that are used for various purposes such as publishing and digital library. This phenomenon raises awareness for the requirement of effective techniques that can help during the search and retrieval of text. One of the most needed tasks is clustering, which categorizes documents automatically into meaningful groups. Clustering is an important task in data mining and machine learning. The accuracy of clustering depends tightly on the selection of the text representation method. Traditional methods of text representation model documents as bags of words using term-frequency index document frequency (TFIDF). This method ignores the relationship and meanings of words in the document. As a result the sparsity and semantic problem that is prevalent in textual document are not resolved. In this study, the problem of sparsity and semantic is reduced by proposing a graph based text representation method, namely dependency graph with the aim of improving the accuracy of document clustering. The dependency graph representation scheme is created through an accumulation of syntactic and semantic analysis. A sample of 20 news groups, dataset was used in this study. The text documents undergo pre-processing and syntactic parsing in order to identify the sentence structure. Then the semantic of words are modeled using dependency graph. The produced dependency graph is then used in the process of cluster analysis. K-means clustering technique was used in this study. The dependency graph based clustering result were compared with the popular text representation method, i.e. TFIDF and Ontology based text representation. The result shows that the dependency graph outperforms both TFIDF and Ontology based text representation. The findings proved that the proposed text representation method leads to more accurate document clustering results.

Preview PDF

Quick Preview PDF

Publication Date

Wed May 01 2024

Journal Name

Journal Of Drug Delivery Science And Technology

Antibacterial and wound healing performance of a novel electrospun nanofibers based on polymethyl-methacrylate/gelatin impregnated with different content of propolis

Basma

Elham

Hanan

Mastafa H.

Iman Bahjat Namuq

Soghra

Marjan

Fatemeh

...Show More Authors

View Publication

(21)

(12)

Publication Date

Mon Nov 18 2024

Journal Name

Molecular Crystals And Liquid Crystals

Synthesis and liquid crystal properties of a new class of calamitic mesogens based on twin 1,3,4-thiadiazole derivatives with imine linkage

Ivan Hameed R.

Jumbad H.

Ammar H.

...Show More Authors

View Publication

(2)

Publication Date

Sat Nov 30 2024

Journal Name

Asia Pacific Journal Of Molecular Biology And Biotechnology

Random amplified polymorphic DNA-based polymerase chain reaction is an effective tool to examine the genotoxic effects of some food colors

Doaa Adil

Inam Jasim

...Show More Authors

A large number of natural or synthetic dyes have been removed from both national and international lists of permitted food colors because of their mutagenic or carcinogenic activity. Therefore, this study aimed to use the Random Amplified Polymorphic DNA-Based Polymerase Chain Reaction (RAPD-PCR) assay as a feasible method to evaluate the ability of some food colors as genotoxin-induced DNA damage and mutations. Lactiplantibacillus plantarum was used as a bioindicator to determine the genotoxic effects by RAPD-PCR using M13 primer after treatment with some synthetic dyes currently used as food color additives, including Sunset Yellow, Carmoisine, and Tartrazine. Besides qualitative analysis, the bioinformatic GelJ software was used for clus

View Publication Preview PDF

Publication Date

Sat Feb 01 2025

Journal Name

Pharmaceutical Nanotechnology

Preparation, In-vitro, Ex-vivo, and Pharmacokinetic Study of Lasmiditan as Intranasal Nanoemulsion-based In Situ Gel

Bioavailability study

ex vivo permeation study

C-max

Lasmiditan

nanoemulsion-based in situ gel (NEIG)

T-max

aqueous-LAS suspension.

Jaber S.H

Newal ayash rajab

...Show More Authors

Background:

Lasmiditan (LAS) is a recently developed antimigraine drug and was approved in October, 2019 for the treatment of acute migraines; however, it suffers from low oral bioavailability, which is around 40%.

Objective:

This study aimed to improve the LAS bioavailability via formulation as nanoemulsionbased in situ gel (NEIG) given intranasally and then compare the traditional aqueous-LASsuspension (AQS) with the two successful intranasal prepared formulations (NEIG 2 and NEIG 5) in order to determine its relative bioavailability (F-relative) via using rabbits.

Method: ... Show More

View Publication

(10)

(9)

Publication Date

Wed Oct 18 2023

Journal Name

Ieee Access

A New Imputation Technique Based a Multi-Spike Neural Network to Handle Missing Data in the Internet of Things Network (IoT)

Nadia Adnan Shiltagh

Ibtesam R.K.

Ahmed R.

...Show More Authors

View Publication

(11)

(10)

Publication Date

Mon Nov 01 2021

Journal Name

Iop Conference Series: Earth And Environmental Science

Treatability influence of municipal sewage effluent on surface water quality assessment based on Nemerow pollution index using an artificial neural network

Municipal Sewage Effluent

Water Quality Assessment

Nemerow Pollution Index

Artificial Neural Network

R

Basim H.

...Show More Authors

AbstractAssessing water quality provides a scientific foundation for the development and management of water resources. The objective of the research is to evaluate the impact treated effluent from North Rustumiyia wastewater treatment plant (WWTP) on the quality of Diyala river. The model of the artificial neural network (ANN) and factor analysis (FA) based on Nemerow pollution index (NPI). To define important water quality parameters for North Al-Rustumiyia for the line(F2), the Nemerow Pollution Index was introduced. The most important parameters of assessment of water variation quality of wastewater were the parameter used in the model: biochemical oxygen demand (BOD), chemical oxygen dem ... Show More

View Publication

(12)

(10)

Publication Date

Thu Apr 03 2025

Journal Name

Engineering, Technology & Applied Science Research

Application of the One-Step Second-Derivative Method for Solving the Transient Distribution in Markov Chain

transient distribution

Chapman-Kolmogorov

differential equation

numerical method

initial value problem

Zeina

...Show More Authors

Markov chains are an application of stochastic models in operation research, helping the analysis and optimization of processes with random events and transitions. The method that will be deployed to obtain the transient solution to a Markov chain problem is an important part of this process. The present paper introduces a novel Ordinary Differential Equation (ODE) approach to solve the Markov chain problem. The probability distribution of a continuous-time Markov chain with an infinitesimal generator at a given time is considered, which is a resulting solution of the Chapman-Kolmogorov differential equation. This study presents a one-step second-derivative method with better accuracy in solving the first-order Initial Value Problem

View Publication

(1)

Publication Date

Thu Jun 01 2023

Journal Name

Journal Of Engineering

Evaluating Roads Network Connectivity for Two Municipalities in Baghdad-Iraq

Topological characteristic

Connectivity

Graph theory

ArcGIS

Hala Jafar

Maythm

Afrah L.

...Show More Authors

The road network serves as a hub for opportunities in production and consumption, resource extraction, and social cohabitation. In turn, this promotes a higher standard of living and the expansion of cities. This research explores the road network's spatial connectedness and its effects on travel and urban form in the Al-Kadhimiya and Al-Adhamiya municipalities. Satellite images and paper maps have been employed to extract information on the existing road network, including their kinds, conditions, density, and lengths. The spatial structure of the road network was then generated using the ArcGIS software environment. The road pattern connectivity was evaluated using graph theory indices. The study demands the abstractio

View Publication Preview PDF

(8)

(4)

Publication Date

Wed Jul 17 2019

Journal Name

Advances In Intelligent Systems And Computing

A New Arabic Dataset for Emotion Recognition

emotions recognition

text categorization

machine learn-ing

PPM

WEKA

Arabic corpus

Amer J.

William J.

...Show More Authors

In this study, we have created a new Arabic dataset annotated according to Ekman’s basic emotions (Anger, Disgust, Fear, Happiness, Sadness and Surprise). This dataset is composed from Facebook posts written in the Iraqi dialect. We evaluated the quality of this dataset using four external judges which resulted in an average inter-annotation agreement of 0.751. Then we explored six different supervised machine learning methods to test the new dataset. We used Weka standard classifiers ZeroR, J48, Naïve Bayes, Multinomial Naïve Bayes for Text, and SMO. We also used a further compression-based classifier called PPM not included in Weka. Our study reveals that the PPM classifier significantly outperforms other classifiers such as SVM and N

View Publication

(26)

(15)

Publication Date

Fri Apr 01 2022

Journal Name

Baghdad Science Journal

Data Mining Techniques for Iraqi Biochemical Dataset Analysis

Biomedical

Classification And Regression Tree (CART)

Data mining

Hierarchical clustering

K-means.

Sarah

Suhad Faisal

...Show More Authors

This research aims to analyze and simulate biochemical real test data for uncovering the relationships among the tests, and how each of them impacts others. The data were acquired from Iraqi private biochemical laboratory. However, these data have many dimensions with a high rate of null values, and big patient numbers. Then, several experiments have been applied on these data beginning with unsupervised techniques such as hierarchical clustering, and k-means, but the results were not clear. Then the preprocessing step performed, to make the dataset analyzable by supervised techniques such as Linear Discriminant Analysis (LDA), Classification And Regression Tree (CART), Logistic Regression (LR), K-Nearest Neighbor (K-NN), Naïve Bays (NB

View Publication Preview PDF

(2)

(1)

1 2 ... 137 138 139 140 ... 723 724