Advances in digital technology and the World Wide Web has led to the increase of digital documents that are used for various purposes such as publishing and digital library. This phenomenon raises awareness for the requirement of effective techniques that can help during the search and retrieval of text. One of the most needed tasks is clustering, which categorizes documents automatically into meaningful groups. Clustering is an important task in data mining and machine learning. The accuracy of clustering depends tightly on the selection of the text representation method. Traditional methods of text representation model documents as bags of words using term-frequency index document frequency (TFIDF). This method ignores the relationship and meanings of words in the document. As a result the sparsity and semantic problem that is prevalent in textual document are not resolved. In this study, the problem of sparsity and semantic is reduced by proposing a graph based text representation method, namely dependency graph with the aim of improving the accuracy of document clustering. The dependency graph representation scheme is created through an accumulation of syntactic and semantic analysis. A sample of 20 news groups, dataset was used in this study. The text documents undergo pre-processing and syntactic parsing in order to identify the sentence structure. Then the semantic of words are modeled using dependency graph. The produced dependency graph is then used in the process of cluster analysis. K-means clustering technique was used in this study. The dependency graph based clustering result were compared with the popular text representation method, i.e. TFIDF and Ontology based text representation. The result shows that the dependency graph outperforms both TFIDF and Ontology based text representation. The findings proved that the proposed text representation method leads to more accurate document clustering results.
The choice of binary Pseudonoise (PN) sequences with specific properties, having long period high complexity, randomness, minimum cross and auto- correlation which are essential for some communication systems. In this research a nonlinear PN generator is introduced . It consists of a combination of basic components like Linear Feedback Shift Register (LFSR), ?-element which is a type of RxR crossbar switches. The period and complexity of a sequence which are generated by the proposed generator are computed and the randomness properties of these sequences are measured by well-known randomness tests.
An accurate assessment of the pipes’ conditions is required for effective management of the trunk sewers. In this paper the semi-Markov model was developed and tested using the sewer dataset from the Zublin trunk sewer in Baghdad, Iraq, in order to evaluate the future performance of the sewer. For the development of this model the cumulative waiting time distribution of sewers was used in each condition that was derived directly from the sewer condition class and age data. Results showed that the semi-Markov model was inconsistent with the data by adopting ( 2 test) and also, showed that the error in prediction is due to lack of data on the sewer waiting times at each condition state which can be solved by using successive conditi
... Show MoreGraphite Coated Electrodes (GCE) based on molecularly imprinted polymers were fabricated for the selective potentiometric determination of Risperidone (Ris). The molecularly imprinted (MIP) and nonimprinted (NIP) polymers were synthesized by bulk polymerization using (Ris.) as a template, acrylic acid (AA) and acrylamide (AAm) as monomers, ethylene glycol dimethacrylate (EGDMA) as a cross-linker and benzoyl peroxide (BPO) as an initiator. The imprinted membranes and the non-imprinted membranes were prepared using dioctyl phthalate (DOP) and Dibutylphthalate (DBP) as plasticizers in PVC matrix. The membranes were coated on graphite electrodes. The MIP electrodes using
... Show MoreMost of the medical datasets suffer from missing data, due to the expense of some tests or human faults while recording these tests. This issue affects the performance of the machine learning models because the values of some features will be missing. Therefore, there is a need for a specific type of methods for imputing these missing data. In this research, the salp swarm algorithm (SSA) is used for generating and imputing the missing values in the pain in my ass (also known Pima) Indian diabetes disease (PIDD) dataset, the proposed algorithm is called (ISSA). The obtained results showed that the classification performance of three different classifiers which are support vector machine (SVM), K-nearest neighbour (KNN), and Naïve B
... Show MoreThis paper interest to estimation the unknown parameters for generalized Rayleigh distribution model based on censored samples of singly type one . In this paper the probability density function for generalized Rayleigh is defined with its properties . The maximum likelihood estimator method is used to derive the point estimation for all unknown parameters based on iterative method , as Newton – Raphson method , then derive confidence interval estimation which based on Fisher information matrix . Finally , testing whether the current model ( GRD ) fits to a set of real data , then compute the survival function and hazard function for this real data.
Binary relations or interactions among bio-entities, such as proteins, set up the essential part of any living biological system. Protein-protein interactions are usually structured in a graph data structure called "protein-protein interaction networks" (PPINs). Analysis of PPINs into complexes tries to lay out the significant knowledge needed to answer many unresolved questions, including how cells are organized and how proteins work. However, complex detection problems fall under the category of non-deterministic polynomial-time hard (NP-Hard) problems due to their computational complexity. To accommodate such combinatorial explosions, evolutionary algorithms (EAs) are proven effective alternatives to heuristics in solvin
... Show MoreImaging by Ultrasound (US) is an accurate and useful modality for the assessment of gestational age (GA), estimation fetal weight, and monitoring the fetal growth during pregnancy, is a routine part of prenatal care, and that can greatly impact obstetric management. Estimation of GA is important in obstetric care, making appropriate management decisions requires accurate appraisal of GA. Accurate GA estimation may assist obstetricians in appropriately counseling women who are at risk of a preterm delivery about likely neonatal outcomes, and it is essential in the evaluation of the fetal growth and detection of intrauterine growth restriction. There are many formulas are used to estimate fetal GA in the world, but it's not specify fo
... Show MoreA medical- service platform is a mobile application through which patients are provided with doctor’s diagnoses based on information gleaned from medical images. The content of these diagnostic results must not be illegitimately altered during transmission and must be returned to the correct patient. In this paper, we present a solution to these problems using blind, reversible, and fragile watermarking based on authentication of the host image. In our proposed algorithm, the binary version of the Bose_Chaudhuri_Hocquengham (BCH) code for patient medical report (PMR) and binary patient medical image (PMI) after fuzzy exclusive or (F-XoR) are used to produce the patient's unique mark using secret sharing schema (SSS). The patient’s un
... Show MoreEarly detection of brain tumors is critical for enhancing treatment options and extending patient survival. Magnetic resonance imaging (MRI) scanning gives more detailed information, such as greater contrast and clarity than any other scanning method. Manually dividing brain tumors from many MRI images collected in clinical practice for cancer diagnosis is a tough and time-consuming task. Tumors and MRI scans of the brain can be discovered using algorithms and machine learning technologies, making the process easier for doctors because MRI images can appear healthy when the person may have a tumor or be malignant. Recently, deep learning techniques based on deep convolutional neural networks have been used to analyze med
... Show More