Advances in digital technology and the World Wide Web has led to the increase of digital documents that are used for various purposes such as publishing and digital library. This phenomenon raises awareness for the requirement of effective techniques that can help during the search and retrieval of text. One of the most needed tasks is clustering, which categorizes documents automatically into meaningful groups. Clustering is an important task in data mining and machine learning. The accuracy of clustering depends tightly on the selection of the text representation method. Traditional methods of text representation model documents as bags of words using term-frequency index document frequency (TFIDF). This method ignores the relationship and meanings of words in the document. As a result the sparsity and semantic problem that is prevalent in textual document are not resolved. In this study, the problem of sparsity and semantic is reduced by proposing a graph based text representation method, namely dependency graph with the aim of improving the accuracy of document clustering. The dependency graph representation scheme is created through an accumulation of syntactic and semantic analysis. A sample of 20 news groups, dataset was used in this study. The text documents undergo pre-processing and syntactic parsing in order to identify the sentence structure. Then the semantic of words are modeled using dependency graph. The produced dependency graph is then used in the process of cluster analysis. K-means clustering technique was used in this study. The dependency graph based clustering result were compared with the popular text representation method, i.e. TFIDF and Ontology based text representation. The result shows that the dependency graph outperforms both TFIDF and Ontology based text representation. The findings proved that the proposed text representation method leads to more accurate document clustering results.
With the rapid development of computers and network technologies, the security of information in the internet becomes compromise and many threats may affect the integrity of such information. Many researches are focused theirs works on providing solution to this threat. Machine learning and data mining are widely used in anomaly-detection schemes to decide whether or not a malicious activity is taking place on a network. In this paper a hierarchical classification for anomaly based intrusion detection system is proposed. Two levels of features selection and classification are used. In the first level, the global feature vector for detection the basic attacks (DoS, U2R, R2L and Probe) is selected. In the second level, four local feature vect
... Show MoreText based-image clustering (TBIC) is an insufficient approach for clustering related web images. It is a challenging task to abstract the visual features of images with the support of textual information in a database. In content-based image clustering (CBIC), image data are clustered on the foundation of specific features like texture, colors, boundaries, shapes. In this paper, an effective CBIC) technique is presented, which uses texture and statistical features of the images. The statistical features or moments of colors (mean, skewness, standard deviation, kurtosis, and variance) are extracted from the images. These features are collected in a one dimension array, and then genetic algorithm (GA) is applied for image clustering.
... Show MoreUntil recently, researchers have utilized and applied various techniques for intrusion detection system (IDS), including DNA encoding and clustering that are widely used for this purpose. In addition to the other two major techniques for detection are anomaly and misuse detection, where anomaly detection is done based on user behavior, while misuse detection is done based on known attacks signatures. However, both techniques have some drawbacks, such as a high false alarm rate. Therefore, hybrid IDS takes advantage of combining the strength of both techniques to overcome their limitations. In this paper, a hybrid IDS is proposed based on the DNA encoding and clustering method. The proposed DNA encoding is done based on the UNSW-NB15
... Show MoreThe need for an efficient method to find the furthermost appropriate document corresponding to a particular search query has become crucial due to the exponential development in the number of papers that are now readily available to us on the web. The vector space model (VSM) a perfect model used in “information retrieval”, represents these words as a vector in space and gives them weights via a popular weighting method known as term frequency inverse document frequency (TF-IDF). In this research, work has been proposed to retrieve the most relevant document focused on representing documents and queries as vectors comprising average term term frequency inverse sentence frequency (TF-ISF) weights instead of representing them as v
... Show MoreThe steganography (text in image hiding) methods still considered important issues to the researchers at the present time. The steganography methods were varied in its hiding styles from a simple to complex techniques that are resistant to potential attacks. In current research the attack on the host's secret text problem didn’t considered, but an improved text hiding within the image have highly confidential was proposed and implemented companied with a strong password method, so as to ensure no change will be made in the pixel values of the host image after text hiding. The phrase “highly confidential” denoted to the low suspicious it has been performed may be found in the covered image. The Experimental results show that the covere
... Show MoreThe main objective of this paper is to designed algorithms and implemented in the construction of the main program designated for the determination the tenser product of representation for the special linear group.
Assume that G is a finite group and X = tG where t is non-identity element with t3 = 1. The simple graph with node set being X such that a, b ∈ X, are adjacent if ab-1 is an involution element, is called the A4-graph, and designated by A4(G, X). In this article, the construction of A4(G, X) is analyzed for G is the twisted group of Lie type 3D4(3).