Advances in digital technology and the World Wide Web has led to the increase of digital documents that are used for various purposes such as publishing and digital library. This phenomenon raises awareness for the requirement of effective techniques that can help during the search and retrieval of text. One of the most needed tasks is clustering, which categorizes documents automatically into meaningful groups. Clustering is an important task in data mining and machine learning. The accuracy of clustering depends tightly on the selection of the text representation method. Traditional methods of text representation model documents as bags of words using term-frequency index document frequency (TFIDF). This method ignores the relationship and meanings of words in the document. As a result the sparsity and semantic problem that is prevalent in textual document are not resolved. In this study, the problem of sparsity and semantic is reduced by proposing a graph based text representation method, namely dependency graph with the aim of improving the accuracy of document clustering. The dependency graph representation scheme is created through an accumulation of syntactic and semantic analysis. A sample of 20 news groups, dataset was used in this study. The text documents undergo pre-processing and syntactic parsing in order to identify the sentence structure. Then the semantic of words are modeled using dependency graph. The produced dependency graph is then used in the process of cluster analysis. K-means clustering technique was used in this study. The dependency graph based clustering result were compared with the popular text representation method, i.e. TFIDF and Ontology based text representation. The result shows that the dependency graph outperforms both TFIDF and Ontology based text representation. The findings proved that the proposed text representation method leads to more accurate document clustering results.
In this paper we have made different regular graphs by using block designs. In one of our applicable methods, first we have changed symmetric block designs into new block designs by using a method called a union method. Then we have made various regular graphs from each of them. For symmetric block designs with (which is named finite projective geometry), this method leads to infinite class of regular graphs. With some examples we will show that these graphs can be strongly regular or semi-strongly regular. We have also propounded this conjecture that if two semi-symmetric block designs are non-isomorphic, then the resultant block graphs of them are non-isomorphic, too.
F index is a connected graph, sum of the cubes of the vertex degrees. The forgotten topological index has been designed to be employed in the examination of drug molecular structures, which is extremely useful for pharmaceutical and medical experts in understanding the biological activities. Among all the topological indices, the forgotten index is based on degree connectivity on bonds. This paper characterized the forgotten index of union of graphs, join graphs, limits on trees and its complements, and accuracy is measured. Co-index values are analyzed for the various molecular structure of chemical compounds
Data mining is one of the most popular analysis methods in medical research. It involves finding patterns and correlations in previously unknown datasets. Data mining encompasses various areas of biomedical research, including data collection, clinical decision support, illness or safety monitoring, public health, and inquiry research. Health analytics frequently uses computational methods for data mining, such as clustering, classification, and regression. Studies of large numbers of diverse heterogeneous documents, including biological and electronic information, provided extensive material to medical and health studies.
Even though image retrieval is considered as one of the most important research areas in the last two decades, there is still room for improvement since it is still not satisfying for many users. Two of the major problems which need to be improved are the accuracy and the speed of the image retrieval system, in order to achieve user satisfaction and also to make the image retrieval system suitable for all platforms. In this work, the proposed retrieval system uses features with spatial information to analyze the visual content of the image. Then, the feature extraction process is followed by applying the fuzzy c-means (FCM) clustering algorithm to reduce the search space and speed up the retrieval process. The experimental results show t
... Show MoreThe interests toward developing accurate automatic face emotion recognition methodologies are growing vastly, and it is still one of an ever growing research field in the region of computer vision, artificial intelligent and automation. However, there is a challenge to build an automated system which equals human ability to recognize facial emotion because of the lack of an effective facial feature descriptor and the difficulty of choosing proper classification method. In this paper, a geometric based feature vector has been proposed. For the classification purpose, three different types of classification methods are tested: statistical, artificial neural network (NN) and Support Vector Machine (SVM). A modified K-Means clustering algorithm
... Show MoreThe government of Iraq states that despite the massive amounts invested in the power generating sector, the country has been plagued by power outages for more than three decades; One of the most common sources of the problem and significant impact on the waste of public funds in contractual processes. The Ministry of Planning issued the sectorial
specialized standard bidding documents (SSBD) of Design, Supply, and Installation of the Electromechanical Works (DSIoEW), which is primarily designed to support the Ministry of Electricity (MoE) by developing economic projects to improve the contractual process that led to raisings Iraqi electricity generation field. The research evaluates the impact of
applying the SSBD-DSIoEW for
Cohesion is well known as the study of the relationships, whether grammatical and/or lexical, between the different elements of a particular text by the use of what are commonly called 'cohesive devices'. These devices bring connectivity and bind a text together. Besides, the nature and the amount of such cohesive devices usually affect the understanding of that text in the sense of making it easier to comprehend. The present study is intendedto examine the use of grammatical cohesive devicesin relation to narrative techniques. The story of Joseph from the Holy Quran has been selected to be examined by using Halliday and Hasan's Model of Cohesion (1976, 1989). The aim of the study is to comparatively examine to what extent the type
... Show MoreCryptography steganography is a practical tool for data security. Hybridization of the cryptography with steganography can provide more security by taking advantage of each technique. This work proposes a method for improving the crypto-stego method by utilizing the proposed dictionary method to modified ciphertext. After that, the modified encrypt ciphertext id was hidden in the text by using the proposed method. For cryptography, an Advanced Encryption Standard (AES) was utilized to encrypt the message. The AES employed a 128bit block size and 256bit key size. The ciphertext characters were then replaced by the characters identified by a dictionary list. The dictionary is time-dependent, where each of the equivalent words shift based o
... Show MoreA graph is a structure amounting to a set of objects in which some pairs of the objects are in some sense related. The objects correspond to mathematical abstractions called vertices (also called nodes or points) and each of the related pairs of vertices is called an edge (also called link or line). A directed graph is a graph in which edges have orientation. A simple graph is a graph that does not have more than one edge between any two vertices and no edge starts and ends at the same vertex. For a simple undirected graph G with order n, and let denotes its complement. Let δ(G), ∆(G) denotes the minimum degree and maximum degree of G respectively. The complement degree polynomial of G is the polynomial CD[G,x]= , where C
... Show MoreVoice Activity Detection (VAD) is considered as an important pre-processing step in speech processing systems such as speech enhancement, speech recognition, gender and age identification. VAD helps in reducing the time required to process speech data and to improve final system accuracy by focusing the work on the voiced part of the speech. An automatic technique for VAD using Fuzzy-Neuro technique (FN-AVAD) is presented in this paper. The aim of this work is to alleviate the problem of choosing the best threshold value in traditional VAD methods and achieves automaticity by combining fuzzy clustering and machine learning techniques. Four features are extracted from each speech segment, which are short term energy, zero-crossing rate, auto
... Show More