Advances in digital technology and the World Wide Web has led to the increase of digital documents that are used for various purposes such as publishing and digital library. This phenomenon raises awareness for the requirement of effective techniques that can help during the search and retrieval of text. One of the most needed tasks is clustering, which categorizes documents automatically into meaningful groups. Clustering is an important task in data mining and machine learning. The accuracy of clustering depends tightly on the selection of the text representation method. Traditional methods of text representation model documents as bags of words using term-frequency index document frequency (TFIDF). This method ignores the relationship and meanings of words in the document. As a result the sparsity and semantic problem that is prevalent in textual document are not resolved. In this study, the problem of sparsity and semantic is reduced by proposing a graph based text representation method, namely dependency graph with the aim of improving the accuracy of document clustering. The dependency graph representation scheme is created through an accumulation of syntactic and semantic analysis. A sample of 20 news groups, dataset was used in this study. The text documents undergo pre-processing and syntactic parsing in order to identify the sentence structure. Then the semantic of words are modeled using dependency graph. The produced dependency graph is then used in the process of cluster analysis. K-means clustering technique was used in this study. The dependency graph based clustering result were compared with the popular text representation method, i.e. TFIDF and Ontology based text representation. The result shows that the dependency graph outperforms both TFIDF and Ontology based text representation. The findings proved that the proposed text representation method leads to more accurate document clustering results.
A large number of natural or synthetic dyes have been removed from both national and international lists of permitted food colors because of their mutagenic or carcinogenic activity. Therefore, this study aimed to use the Random Amplified Polymorphic DNA-Based Polymerase Chain Reaction (RAPD-PCR) assay as a feasible method to evaluate the ability of some food colors as genotoxin-induced DNA damage and mutations. Lactiplantibacillus plantarum was used as a bioindicator to determine the genotoxic effects by RAPD-PCR using M13 primer after treatment with some synthetic dyes currently used as food color additives, including Sunset Yellow, Carmoisine, and Tartrazine. Besides qualitative analysis, the bioinformatic GelJ software was used for clus
... Show MoreLasmiditan (LAS) is a recently developed antimigraine drug and was approved in October, 2019 for the treatment of acute migraines; however, it suffers from low oral bioavailability, which is around 40%.
This study aimed to improve the LAS bioavailability via formulation as nanoemulsionbased in situ gel (NEIG) given intranasally and then compare the traditional aqueous-LASsuspension (AQS) with the two successful intranasal prepared formulations (NEIG 2 and NEIG 5) in order to determine its relative bioavailability (F-relative) via using rabbits.
Assessing water quality provides a scientific foundation for the development and management of water resources. The objective of the research is to evaluate the impact treated effluent from North Rustumiyia wastewater treatment plant (WWTP) on the quality of Diyala river. The model of the artificial neural network (ANN) and factor analysis (FA) based on Nemerow pollution index (NPI). To define important water quality parameters for North Al-Rustumiyia for the line(F2), the Nemerow Pollution Index was introduced. The most important parameters of assessment of water variation quality of wastewater were the parameter used in the model: biochemical oxygen demand (BOD), chemical oxygen dem
Markov chains are an application of stochastic models in operation research, helping the analysis and optimization of processes with random events and transitions. The method that will be deployed to obtain the transient solution to a Markov chain problem is an important part of this process. The present paper introduces a novel Ordinary Differential Equation (ODE) approach to solve the Markov chain problem. The probability distribution of a continuous-time Markov chain with an infinitesimal generator at a given time is considered, which is a resulting solution of the Chapman-Kolmogorov differential equation. This study presents a one-step second-derivative method with better accuracy in solving the first-order Initial Value Problem
... Show MoreThe road network serves as a hub for opportunities in production and consumption, resource extraction, and social cohabitation. In turn, this promotes a higher standard of living and the expansion of cities. This research explores the road network's spatial connectedness and its effects on travel and urban form in the Al-Kadhimiya and Al-Adhamiya municipalities. Satellite images and paper maps have been employed to extract information on the existing road network, including their kinds, conditions, density, and lengths. The spatial structure of the road network was then generated using the ArcGIS software environment. The road pattern connectivity was evaluated using graph theory indices. The study demands the abstractio
... Show MoreIn this study, we have created a new Arabic dataset annotated according to Ekman’s basic emotions (Anger, Disgust, Fear, Happiness, Sadness and Surprise). This dataset is composed from Facebook posts written in the Iraqi dialect. We evaluated the quality of this dataset using four external judges which resulted in an average inter-annotation agreement of 0.751. Then we explored six different supervised machine learning methods to test the new dataset. We used Weka standard classifiers ZeroR, J48, Naïve Bayes, Multinomial Naïve Bayes for Text, and SMO. We also used a further compression-based classifier called PPM not included in Weka. Our study reveals that the PPM classifier significantly outperforms other classifiers such as SVM and N
... Show MoreThis research aims to analyze and simulate biochemical real test data for uncovering the relationships among the tests, and how each of them impacts others. The data were acquired from Iraqi private biochemical laboratory. However, these data have many dimensions with a high rate of null values, and big patient numbers. Then, several experiments have been applied on these data beginning with unsupervised techniques such as hierarchical clustering, and k-means, but the results were not clear. Then the preprocessing step performed, to make the dataset analyzable by supervised techniques such as Linear Discriminant Analysis (LDA), Classification And Regression Tree (CART), Logistic Regression (LR), K-Nearest Neighbor (K-NN), Naïve Bays (NB
... Show More