Advances in digital technology and the World Wide Web has led to the increase of digital documents that are used for various purposes such as publishing and digital library. This phenomenon raises awareness for the requirement of effective techniques that can help during the search and retrieval of text. One of the most needed tasks is clustering, which categorizes documents automatically into meaningful groups. Clustering is an important task in data mining and machine learning. The accuracy of clustering depends tightly on the selection of the text representation method. Traditional methods of text representation model documents as bags of words using term-frequency index document frequency (TFIDF). This method ignores the relationship and meanings of words in the document. As a result the sparsity and semantic problem that is prevalent in textual document are not resolved. In this study, the problem of sparsity and semantic is reduced by proposing a graph based text representation method, namely dependency graph with the aim of improving the accuracy of document clustering. The dependency graph representation scheme is created through an accumulation of syntactic and semantic analysis. A sample of 20 news groups, dataset was used in this study. The text documents undergo pre-processing and syntactic parsing in order to identify the sentence structure. Then the semantic of words are modeled using dependency graph. The produced dependency graph is then used in the process of cluster analysis. K-means clustering technique was used in this study. The dependency graph based clustering result were compared with the popular text representation method, i.e. TFIDF and Ontology based text representation. The result shows that the dependency graph outperforms both TFIDF and Ontology based text representation. The findings proved that the proposed text representation method leads to more accurate document clustering results.
Clustering algorithms have recently gained attention in the related literature since
they can help current intrusion detection systems in several aspects. This paper
proposes genetic algorithm (GA) based clustering, serving to distinguish patterns
incoming from network traffic packets into normal and attack. Two GA based
clustering models for solving intrusion detection problem are introduced. The first
model coined as handles numeric features of the network packet, whereas
the second one coined as concerns all features of the network packet.
Moreover, a new mutation operator directed for binary and symbolic features is
proposed. The basic concept of proposed mutation operator depends on the most
frequent value
An Auto Crop method is used for detection and extraction signature, logo and stamp from the document image. This method improves the performance of security system based on signature, logo and stamp images as well as it is extracted images from the original document image and keeping the content information of cropped images. An Auto Crop method reduces the time cost associated with document contents recognition. This method consists of preprocessing, feature extraction and classification. The HSL color space is used to extract color features from cropped image. The k-Nearest Neighbors (KNN) classifier is used for classification.
Printed Arabic document image retrieval is a very important and needed system for many companies, governments and various users. In this paper, a printed Arabic document images retrieval system based on spotting the header words of official Arabic documents is proposed. The proposed system uses an efficient segmentation, preprocessing methods and an accurate proposed feature extraction method in order to prepare the document for classification process. Besides that, Support Vector Machine (SVM) is used for classification. The experiments show the system achieved best results of accuracy that is 96.8% by using polynomial kernel of SVM classifier.
In any security system, we need a high level of security, to maintain the secrecy of important data. Steganography is one of the security systems that are hiding secret information within a certain cover (video, image, sound, text), so that the adversary does not suspect the existence of such confidential information. In our proposed work will hide secret messages (Arabic or English) text in the Arabic cover text, we employed the RNA as a tool for encoding the secret information and used non-printed characters to hide these codes. Each character (English or Arabic) is represented by using only six bits based on secret tables this operation has provided a good compression since each Arabic character needs 16 bits and each English characte
... Show MoreAnomaly detection is still a difficult task. To address this problem, we propose to strengthen DBSCAN algorithm for the data by converting all data to the graph concept frame (CFG). As is well known that the work DBSCAN method used to compile the data set belong to the same species in a while it will be considered in the external behavior of the cluster as a noise or anomalies. It can detect anomalies by DBSCAN algorithm can detect abnormal points that are far from certain set threshold (extremism). However, the abnormalities are not those cases, abnormal and unusual or far from a specific group, There is a type of data that is do not happen repeatedly, but are considered abnormal for the group of known. The analysis showed DBSCAN using the
... Show MoreBackground: image processing of medical images is major method to increase reliability of cancer diagnosis.
Methods: The proposed system proceeded into two stages: First, enhancement stage which was performed using of median filter to reduce the noise and artifacts that present in a CT image of a human lung with a cancer, Second: implementation of k-means clustering algorithm.
Results: the result image of k-means algorithm compared with the image resulted from implementation of fuzzy c-means (FCM) algorithm.
Conclusion: We found that the time required for k-means algorithm implementation is less than that of FCM algorithm.MATLAB package (version 7.3) was used in writing the programming code of our w
The messages are ancient method to exchange information between peoples. It had many ways to send it with some security.
Encryption and steganography was oldest ways to message security, but there are still many problems in key generation, key distribution, suitable cover image and others. In this paper we present proposed algorithm to exchange security message without any encryption, or image as cover to hidden. Our proposed algorithm depends on two copies of the same collection images set (CIS), one in sender side and other in receiver side which always exchange message between them.
To send any message text the sender converts message to ASCII c
... Show MoreAs of late, humankind has experienced radiation issues either computerized tomography (CT) or X-rays. In this investigation, we endeavor to limit the effect of examination hardware. To do this the medical image is cropping (cut and zoom) then represented the vascular network as a graph such that each contraction as the vertices and the vessel represented as an edges, the area of the coagulation was processed already, in the current search the shortest distance to reach to the place of the blood vessel clot is computed
Medical imaging is a technique that has been used for diagnosis and treatment of a large number of diseases. Therefore it has become necessary to conduct a good image processing to extract the finest desired result and information. In this study, genetic algorithm (GA)-based clustering technique (K-means and Fuzzy C Means (FCM)) were used to segment thyroid Computed Tomography (CT) images to an extraction thyroid tumor. Traditional GA, K-means and FCM algorithms were applied separately on the original images and on the enhanced image with Anisotropic Diffusion Filter (ADF). The resulting cluster centers from K-means and FCM were used as the initial population in GA for the implementation of GAK-Mean and GAFCM. Jaccard index was used to s
... Show More