Advances in digital technology and the World Wide Web has led to the increase of digital documents that are used for various purposes such as publishing and digital library. This phenomenon raises awareness for the requirement of effective techniques that can help during the search and retrieval of text. One of the most needed tasks is clustering, which categorizes documents automatically into meaningful groups. Clustering is an important task in data mining and machine learning. The accuracy of clustering depends tightly on the selection of the text representation method. Traditional methods of text representation model documents as bags of words using term-frequency index document frequency (TFIDF). This method ignores the relationship and meanings of words in the document. As a result the sparsity and semantic problem that is prevalent in textual document are not resolved. In this study, the problem of sparsity and semantic is reduced by proposing a graph based text representation method, namely dependency graph with the aim of improving the accuracy of document clustering. The dependency graph representation scheme is created through an accumulation of syntactic and semantic analysis. A sample of 20 news groups, dataset was used in this study. The text documents undergo pre-processing and syntactic parsing in order to identify the sentence structure. Then the semantic of words are modeled using dependency graph. The produced dependency graph is then used in the process of cluster analysis. K-means clustering technique was used in this study. The dependency graph based clustering result were compared with the popular text representation method, i.e. TFIDF and Ontology based text representation. The result shows that the dependency graph outperforms both TFIDF and Ontology based text representation. The findings proved that the proposed text representation method leads to more accurate document clustering results.
The educational sector is one of the important sectors in the world, and it is considered one of the means of community development. In addition, it is one of the means of making the country’s renaissance and devel-opment because it represents the factory of thinking minds that make change. There is no doubt that this sector is the same as any other sector. The deficit in the studied scientific planning has been prolonged, which led to its deterioration, and the problems of education remain diverse and inherited from previous time periods, where the hierarchical cluster analysis was used on postgraduate students in universities in Iraq, except for Kurdistan region, and the number of universities that were included in the study was
... Show MoreMillions of lives might be saved if stained tissues could be detected quickly. Image classification algorithms may be used to detect the shape of cancerous cells, which is crucial in determining the severity of the disease. With the rapid advancement of digital technology, digital images now play a critical role in the current day, with rapid applications in the medical and visualization fields. Tissue segmentation in whole-slide photographs is a crucial task in digital pathology, as it is necessary for fast and accurate computer-aided diagnoses. When a tissue picture is stained with eosin and hematoxylin, precise tissue segmentation is especially important for a successful diagnosis. This kind of staining aids pathologists in disti
... Show MoreAlthough the number of stomach tumor patients reduced obviously during last decades in western countries, but this illness is still one of the main causes of death in developing countries. The aim of this research is to detect the area of a tumor in a stomach images based on fuzzy clustering. The proposed methodology consists of three stages. The stomach images are divided into four quarters and then features elicited from each quarter in the first stage by utilizing seven moments invariant. Fuzzy C-Mean clustering (FCM) was employed in the second stage for each quarter to collect the features of each quarter into clusters. Manhattan distance was calculated in the third stage among all clusters' centers in all quarters to disclosure of t
... Show MoreThe research aims at integrating the disclosure of the business models with the qualitative characteristics of accounting information. To achieve this, the elements of the business model should be identified and disclosed, and then study the possibility of integrating the disclosure of the business model with the qualitative characteristics of accounting information.
To achieve this objective, the research was based on the indicators of disclosure of the business model of the International Accounting Standards Board to measure the disclosure of the business model.
The research reached a number of conclusions, the most important of which were as follows:
Fi
... Show MoreA simple straightforward mathematical method has been developed to cluster grid nodes on a boundary segment of an arbitrary geometry that can be fitted by a relevant polynomial. The method of solution is accomplished in two steps. At the first step, the length of the boundary segment is evaluated by using the mean value theorem, then grids are clustered as desired, using relevant linear clustering functions. At the second step, as the coordinates cell nodes have been computed and the incremental distance between each two nodes has been evaluated, the original coordinate of each node is then computed utilizing the same fitted polynomial with the mean value theorem but reversibly.
The method is utilized to predict
... Show MoreThe article considers a creolized text as a means of modern communication, describing its key verbal and visual components; the relationship of concepts polycode and creolized text has been shown; the universal basic image features have been called; the following kinds of creolized texts have been distinguished; it has been proved that the effective means of attracting the attention of the addressee is the use of expressive font features, which are divided into two groups: topographics (mechanisms of varying of areal syntagmatic of a text) and supragraphcs (change of typeface of font).
Care and attention to the structure in the sixties of the last century replaced the mark, and if the structure of Ms. pampered in research and studies, it has become the mark is also a spoiled lady .. But the relationship between the structure and the mark was not a break and break, but the relationship of integration, His themes are structural analysis, and these are intellectual themes that can not be surpassed in contemporary research, especially since semiotics have emerged from the linguistic inflection.
We have tried to distinguish between text and speech, which is a daunting task, as it seems that whenever the difference between them is clear and clear, we come back to wonder whether the text is the same discourse, and is
... Show MoreThe sensitive and important data are increased in the last decades rapidly, since the tremendous updating of networking infrastructure and communications. to secure this data becomes necessary with increasing volume of it, to satisfy securing for data, using different cipher techniques and methods to ensure goals of security that are integrity, confidentiality, and availability. This paper presented a proposed hybrid text cryptography method to encrypt a sensitive data by using different encryption algorithms such as: Caesar, Vigenère, Affine, and multiplicative. Using this hybrid text cryptography method aims to make the encryption process more secure and effective. The hybrid text cryptography method depends on circular queue. Using circ
... Show MoreIn this paper, a method for hiding cipher text in an image file is introduced . The
proposed method is to hide the cipher text message in the frequency domain of the image.
This method contained two phases: the first is embedding phase and the second is extraction
phase. In the embedding phase the image is transformed from time domain to frequency
domain using discrete wavelet decomposition technique (Haar). The text message encrypted
using RSA algorithm; then Least Significant Bit (LSB) algorithm used to hide secret message
in high frequency. The proposed method is tested in different images and showed success in
hiding information according to the Peak Signal to Noise Ratio (PSNR) measure of the the
original ima