Preferred Language
Articles
/
8hYn5IsBVTCNdQwCFON1
Graph based text representation for document clustering
...Show More Authors

Advances in digital technology and the World Wide Web has led to the increase of digital documents that are used for various purposes such as publishing and digital library. This phenomenon raises awareness for the requirement of effective techniques that can help during the search and retrieval of text. One of the most needed tasks is clustering, which categorizes documents automatically into meaningful groups. Clustering is an important task in data mining and machine learning. The accuracy of clustering depends tightly on the selection of the text representation method. Traditional methods of text representation model documents as bags of words using term-frequency index document frequency (TFIDF). This method ignores the relationship and meanings of words in the document. As a result the sparsity and semantic problem that is prevalent in textual document are not resolved. In this study, the problem of sparsity and semantic is reduced by proposing a graph based text representation method, namely dependency graph with the aim of improving the accuracy of document clustering. The dependency graph representation scheme is created through an accumulation of syntactic and semantic analysis. A sample of 20 news groups, dataset was used in this study. The text documents undergo pre-processing and syntactic parsing in order to identify the sentence structure. Then the semantic of words are modeled using dependency graph. The produced dependency graph is then used in the process of cluster analysis. K-means clustering technique was used in this study. The dependency graph based clustering result were compared with the popular text representation method, i.e. TFIDF and Ontology based text representation. The result shows that the dependency graph outperforms both TFIDF and Ontology based text representation. The findings proved that the proposed text representation method leads to more accurate document clustering results.

Scopus
Preview PDF
Quick Preview PDF
Publication Date
Thu Nov 29 2018
Journal Name
Al-khwarizmi Engineering Journal
Surface Roughness Prediction for Steel 304 In Edm Using Response Graph Modeling
...Show More Authors

Electrical Discharge Machining (EDM) is a non-traditional cutting technique for metals removing which is relied upon the basic fact that negligible tool force is produced during the machining process. Also, electrical discharge machining is used in manufacturing very hard materials that are electrically conductive. Regarding the electrical discharge machining procedure, the most significant factor of the cutting parameter is the surface roughness (Ra). Conventional try and error method is time consuming as well as high cost. The purpose of the present research is to develop a mathematical model using response graph modeling (RGM). The impact of various parameters such as (current, pulsation on time and pulsation off time) are studied on

... Show More
View Publication
Crossref (1)
Crossref
Publication Date
Tue Sep 01 2015
Journal Name
2015 7th Computer Science And Electronic Engineering Conference (ceec)
An experimental investigation on PCA based on cosine similarity and correlation for text feature dimensionality reduction
...Show More Authors

View Publication
Scopus (6)
Crossref (5)
Scopus Crossref
Publication Date
Wed May 10 2017
Journal Name
Australian Journal Of Basic And Applied Sciences
Block-based Image Steganography for Text Hiding Using YUV Color Model and Secret Key Cryptography Methods
...Show More Authors

Preview PDF
Publication Date
Mon Apr 17 2023
Journal Name
Wireless Communications And Mobile Computing
A Double Clustering Approach for Color Image Segmentation
...Show More Authors

One of the significant stages in computer vision is image segmentation which is fundamental for different applications, for example, robot control and military target recognition, as well as image analysis of remote sensing applications. Studies have dealt with the process of improving the classification of all types of data, whether text or audio or images, one of the latest studies in which researchers have worked to build a simple, effective, and high-accuracy model capable of classifying emotions from speech data, while several studies dealt with improving textual grouping. In this study, we seek to improve the classification of image division using a novel approach depending on two methods used to segment the images. The first

... Show More
View Publication
Scopus (3)
Crossref (1)
Scopus Crossref
Publication Date
Mon Dec 01 2014
Journal Name
Ain Shams University
New Studies for Topological Generalizations and Uncertainty in Graph Theory
...Show More Authors

Topology and its applications occupy the interest of many researching centers in the advanced world. From this point of view and because the near open sets play a very important role in general topology and they are now the research topics of many topologists worldwide and its sets doesn’t enter in fibrewise topology yet. Therefore, we use some of the near open sets to be model for introduce results and new spaces in fibrewise topological spaces. Also, there is a very important role of closure operators in constructing a topological spaces, so we introduce a new closure operators on the power set of vertices on graphs and conclusion theorems and new spaces from it. Furthermore, we discuss the relationships of connectedness between some ty

... Show More
Preview PDF
Publication Date
Fri Apr 01 2022
Journal Name
Baghdad Science Journal
Improved Firefly Algorithm with Variable Neighborhood Search for Data Clustering
...Show More Authors

Among the metaheuristic algorithms, population-based algorithms are an explorative search algorithm superior to the local search algorithm in terms of exploring the search space to find globally optimal solutions. However, the primary downside of such algorithms is their low exploitative capability, which prevents the expansion of the search space neighborhood for more optimal solutions. The firefly algorithm (FA) is a population-based algorithm that has been widely used in clustering problems. However, FA is limited in terms of its premature convergence when no neighborhood search strategies are employed to improve the quality of clustering solutions in the neighborhood region and exploring the global regions in the search space. On the

... Show More
View Publication Preview PDF
Scopus (13)
Crossref (3)
Scopus Clarivate Crossref
Publication Date
Sat Jul 06 2024
Journal Name
Multimedia Tools And Applications
Text classification based on optimization feature selection methods: a review and future directions
...Show More Authors

A substantial portion of today’s multimedia data exists in the form of unstructured text. However, the unstructured nature of text poses a significant task in meeting users’ information requirements. Text classification (TC) has been extensively employed in text mining to facilitate multimedia data processing. However, accurately categorizing texts becomes challenging due to the increasing presence of non-informative features within the corpus. Several reviews on TC, encompassing various feature selection (FS) approaches to eliminate non-informative features, have been previously published. However, these reviews do not adequately cover the recently explored approaches to TC problem-solving utilizing FS, such as optimization techniques.

... Show More
View Publication Preview PDF
Scopus (2)
Crossref (3)
Scopus Crossref
Publication Date
Mon Apr 15 2024
Journal Name
Journal Of Engineering Science And Technology
Text Steganography Based on Arabic Characters Linguistic Features and Word Shifting Method
...Show More Authors

In the field of data security, the critical challenge of preserving sensitive information during its transmission through public channels takes centre stage. Steganography, a method employed to conceal data within various carrier objects such as text, can be proposed to address these security challenges. Text, owing to its extensive usage and constrained bandwidth, stands out as an optimal medium for this purpose. Despite the richness of the Arabic language in its linguistic features, only a small number of studies have explored Arabic text steganography. Arabic text, characterized by its distinctive script and linguistic features, has gained notable attention as a promising domain for steganographic ventures. Arabic text steganography harn

... Show More
Publication Date
Wed Mar 01 2023
Journal Name
Baghdad Science Journal
An Investigation of Corona Domination Number for Some Special Graphs and Jahangir Graph
...Show More Authors

In this work,  the study of corona domination in graphs is carried over which was initially proposed by G. Mahadevan et al. Let be a simple graph. A dominating set S of a graph is said to be a corona-dominating set if every vertex in is either a pendant vertex or a support vertex. The minimum cardinality among all corona-dominating sets is called the corona-domination number and is denoted by (i.e) . In this work, the exact value of the corona domination number for some specific types of graphs are given. Also, some results on the corona domination number for some classes of graphs are obtained and the method used in this paper is a well-known number theory concept with some modification this method can also be applied to obt

... Show More
View Publication Preview PDF
Scopus (3)
Crossref (2)
Scopus Clarivate Crossref
Publication Date
Sun Jun 20 2021
Journal Name
Baghdad Science Journal
Wireless Propagation Multipaths using Spectral Clustering and Three-Constraint Affinity Matrix Spectral Clustering
...Show More Authors

This study focused on spectral clustering (SC) and three-constraint affinity matrix spectral clustering (3CAM-SC) to determine the number of clusters and the membership of the clusters of the COST 2100 channel model (C2CM) multipath dataset simultaneously. Various multipath clustering approaches solve only the number of clusters without taking into consideration the membership of clusters. The problem of giving only the number of clusters is that there is no assurance that the membership of the multipath clusters is accurate even though the number of clusters is correct. SC and 3CAM-SC aimed to solve this problem by determining the membership of the clusters. The cluster and the cluster count were then computed through the cluster-wise J

... Show More
View Publication Preview PDF
Scopus (5)
Crossref (3)
Scopus Clarivate Crossref