Advances in digital technology and the World Wide Web has led to the increase of digital documents that are used for various purposes such as publishing and digital library. This phenomenon raises awareness for the requirement of effective techniques that can help during the search and retrieval of text. One of the most needed tasks is clustering, which categorizes documents automatically into meaningful groups. Clustering is an important task in data mining and machine learning. The accuracy of clustering depends tightly on the selection of the text representation method. Traditional methods of text representation model documents as bags of words using term-frequency index document frequency (TFIDF). This method ignores the relationship and meanings of words in the document. As a result the sparsity and semantic problem that is prevalent in textual document are not resolved. In this study, the problem of sparsity and semantic is reduced by proposing a graph based text representation method, namely dependency graph with the aim of improving the accuracy of document clustering. The dependency graph representation scheme is created through an accumulation of syntactic and semantic analysis. A sample of 20 news groups, dataset was used in this study. The text documents undergo pre-processing and syntactic parsing in order to identify the sentence structure. Then the semantic of words are modeled using dependency graph. The produced dependency graph is then used in the process of cluster analysis. K-means clustering technique was used in this study. The dependency graph based clustering result were compared with the popular text representation method, i.e. TFIDF and Ontology based text representation. The result shows that the dependency graph outperforms both TFIDF and Ontology based text representation. The findings proved that the proposed text representation method leads to more accurate document clustering results.
Abstract
In this research we study the wavelet characteristics for the important time series known as Sunspot, on the aim of verifying the periodogram that other researchers had reached by the spectral transform, and noticing the variation in the period length on one side and the shifting on another.
A continuous wavelet analysis is done for this series and the periodogram in it is marked primarily. for more accuracy, the series is partitioned to its the approximate and the details components to five levels, filtering these components by using fixed threshold on one time and independent threshold on another, finding the noise series which represents the difference between
... Show MoreGumbel distribution was dealt with great care by researchers and statisticians. There are traditional methods to estimate two parameters of Gumbel distribution known as Maximum Likelihood, the Method of Moments and recently the method of re-sampling called (Jackknife). However, these methods suffer from some mathematical difficulties in solving them analytically. Accordingly, there are other non-traditional methods, like the principle of the nearest neighbors, used in computer science especially, artificial intelligence algorithms, including the genetic algorithm, the artificial neural network algorithm, and others that may to be classified as meta-heuristic methods. Moreover, this principle of nearest neighbors has useful statistical featu
... Show MoreIn the current study, a novel approach for separating ethanol-water mixture by microbubble distillation technology was investigated. Traditional distillation processes require large amounts of energy to raise the liquid to its boiling point to effect removal of volatile components. The concept of microbubble distillation by comparison is to heat the gas phase rather than the liquid phase to achieve separation. The removal of ethanol from the thermally sensitive fermentation broths was taken as a case of study. Consequently the results were then compared with those which could be obtained under equilibrium conditions expected in an “ideal” distillation unit. Microbubble distillation has achieved vapour compositions higher than th
... Show MoreElectrocoagulation is an electrochemical process of treating polluted water where sacrificial anode corrodes to produce active coagulant (usually aluminum or iron cations) into solution. Accompanying electrolytic reactions evolve gas (usually as hydrogen bubbles). The present study investigates the removal of phenol from water by this method. A glass tank with 1 liter volume and two electrodes were used to perform the experiments. The electrode connected to a D.C. power supply. The effect of various factors on the removal of phenol (initial phenol concentration, electrode size, electrodes gab, current density, pH and treatment time) were studied. The results indicated that the removal efficiency decreased as initial phenol concentration
... Show MoreChildhood is characterized by ahigh privacy in the life of the child overall educational institutions in the world. Based on this specificity, modern education begins with a holistic vision of the child through all developmental aspects (moral, religious, emotional, social, linguistic, physical, health, and mental). This integration could be achieved through taking into consideration the needs and rights of children and developing curricula that consider these needs and capacities to provide opportunities for developing and supporting the developmental aspects of the child. The contemporary technological developments in the field of computer and the Internet have brought with it new forms, ideas, and problems for children in recent years
... Show MoreCircular thin walled structures have wide range of applications. This type of structure is generally exposed to different types of loads, but one of the most important types is a buckling. In this work, the phenomena of buckling was studied by using finite element analysis. The circular thin walled structure in this study is constructed from; cylindrical thin shell strengthen by longitudinal stringers, subjected to pure bending in one plane. In addition, Taguchi method was used to identify the optimum combination set of parameters for enhancement of the critical buckling load value, as well as to investigate the most effective parameter. The parameters that have been analyzed were; cylinder shell thickness, shape of stiffeners section an
... Show MoreThe Khor Mor gas-condensate processing plant in Iraq is currently facing operational challenges due to foaming issues in the sweetening tower caused by high-soluble hydrocarbon liquids entering the tower. The root cause of the problem could be liquid carry-over as the separation vessels within the plant fail to remove liquid droplets from the gas phase. This study employs Aspen HYSYS v.11 software to investigate the performance of the industrial three-phase horizontal separator, Bravo #2, located upstream of the Khor Mor sweetening tower, under both current and future operational conditions. The simulation results, regarding the size distribution of liquid droplets in the gas product and the efficiency gas/liquid separation, r
... Show More
