Advances in digital technology and the World Wide Web has led to the increase of digital documents that are used for various purposes such as publishing and digital library. This phenomenon raises awareness for the requirement of effective techniques that can help during the search and retrieval of text. One of the most needed tasks is clustering, which categorizes documents automatically into meaningful groups. Clustering is an important task in data mining and machine learning. The accuracy of clustering depends tightly on the selection of the text representation method. Traditional methods of text representation model documents as bags of words using term-frequency index document frequency (TFIDF). This method ignores the relationship and meanings of words in the document. As a result the sparsity and semantic problem that is prevalent in textual document are not resolved. In this study, the problem of sparsity and semantic is reduced by proposing a graph based text representation method, namely dependency graph with the aim of improving the accuracy of document clustering. The dependency graph representation scheme is created through an accumulation of syntactic and semantic analysis. A sample of 20 news groups, dataset was used in this study. The text documents undergo pre-processing and syntactic parsing in order to identify the sentence structure. Then the semantic of words are modeled using dependency graph. The produced dependency graph is then used in the process of cluster analysis. K-means clustering technique was used in this study. The dependency graph based clustering result were compared with the popular text representation method, i.e. TFIDF and Ontology based text representation. The result shows that the dependency graph outperforms both TFIDF and Ontology based text representation. The findings proved that the proposed text representation method leads to more accurate document clustering results.
Assume that G is a finite group and X = tG where t is non-identity element with t3 = 1. The simple graph with node set being X such that a, b ∈ X, are adjacent if ab-1 is an involution element, is called the A4-graph, and designated by A4(G, X). In this article, the construction of A4(G, X) is analyzed for G is the twisted group of Lie type 3D4(3).
The objective of this work is to design and implement a cryptography system that enables the sender to send message through any channel (even if this channel is insecure) and the receiver to decrypt the received message without allowing any intruder to break the system and extracting the secret information. This work modernize the feedforward neural network, so the secret message will be encrypted by unsupervised neural network method to get the cipher text that can be decrypted using the same network to get the original text. The security of any cipher system depends on the security of the related keys (that are used by the encryption and the decryption processes) and their corresponding lengths. In this work, the key is the final weights
... Show MoreThe need for an efficient method to find the furthermost appropriate document corresponding to a particular search query has become crucial due to the exponential development in the number of papers that are now readily available to us on the web. The vector space model (VSM) a perfect model used in “information retrieval”, represents these words as a vector in space and gives them weights via a popular weighting method known as term frequency inverse document frequency (TF-IDF). In this research, work has been proposed to retrieve the most relevant document focused on representing documents and queries as vectors comprising average term term frequency inverse sentence frequency (TF-ISF) weights instead of representing them as v
... Show MoreElectrical distribution system loads are permanently not fixed and alter in value and nature with time. Therefore, accurate consumer load data and models are required for performing system planning, system operation, and analysis studies. Moreover, realistic consumer load data are vital for load management, services, and billing purposes. In this work, a realistic aggregate electric load model is developed and proposed for a sample operative substation in Baghdad distribution network. The model involves aggregation of hundreds of thousands of individual components devices such as motors, appliances, and lighting fixtures. Sana’a substation in Al-kadhimiya area supplies mainly residential grade loads. Measurement-based
... Show MoreEstimating the semantic similarity between short texts plays an increasingly prominent role in many fields related to text mining and natural language processing applications, especially with the large increase in the volume of textual data that is produced daily. Traditional approaches for calculating the degree of similarity between two texts, based on the words they share, do not perform well with short texts because two similar texts may be written in different terms by employing synonyms. As a result, short texts should be semantically compared. In this paper, a semantic similarity measurement method between texts is presented which combines knowledge-based and corpus-based semantic information to build a semantic network that repre
... Show MoreTo achieve safe security to transfer data from the sender to receiver, cryptography is one way that is used for such purposes. However, to increase the level of data security, DNA as a new term was introduced to cryptography. The DNA can be easily used to store and transfer the data, and it becomes an effective procedure for such aims and used to implement the computation. A new cryptography system is proposed, consisting of two phases: the encryption phase and the decryption phase. The encryption phase includes six steps, starting by converting plaintext to their equivalent ASCII values and converting them to binary values. After that, the binary values are converted to DNA characters and then converted to their equivalent complementary DN
... Show More