Advances in digital technology and the World Wide Web has led to the increase of digital documents that are used for various purposes such as publishing and digital library. This phenomenon raises awareness for the requirement of effective techniques that can help during the search and retrieval of text. One of the most needed tasks is clustering, which categorizes documents automatically into meaningful groups. Clustering is an important task in data mining and machine learning. The accuracy of clustering depends tightly on the selection of the text representation method. Traditional methods of text representation model documents as bags of words using term-frequency index document frequency (TFIDF). This method ignores the relationship and meanings of words in the document. As a result the sparsity and semantic problem that is prevalent in textual document are not resolved. In this study, the problem of sparsity and semantic is reduced by proposing a graph based text representation method, namely dependency graph with the aim of improving the accuracy of document clustering. The dependency graph representation scheme is created through an accumulation of syntactic and semantic analysis. A sample of 20 news groups, dataset was used in this study. The text documents undergo pre-processing and syntactic parsing in order to identify the sentence structure. Then the semantic of words are modeled using dependency graph. The produced dependency graph is then used in the process of cluster analysis. K-means clustering technique was used in this study. The dependency graph based clustering result were compared with the popular text representation method, i.e. TFIDF and Ontology based text representation. The result shows that the dependency graph outperforms both TFIDF and Ontology based text representation. The findings proved that the proposed text representation method leads to more accurate document clustering results.
Plagiarism is becoming more of a problem in academics. It’s made worse by the ease with which a wide range of resources can be found on the internet, as well as the ease with which they can be copied and pasted. It is academic theft since the perpetrator has ”taken” and presented the work of others as his or her own. Manual detection of plagiarism by a human being is difficult, imprecise, and time-consuming because it is difficult for anyone to compare their work to current data. Plagiarism is a big problem in higher education, and it can happen on any topic. Plagiarism detection has been studied in many scientific articles, and methods for recognition have been created utilizing the Plagiarism analysis, Authorship identification, and
... Show MoreThe development of microcontroller is used in monitoring and data acquisition recently. This development has born various architectures for spreading and interfacing the microcontroller in network environment. Some of existing architecture suffers from redundant in resources, extra processing, high cost and delay in response. This paper presents flexible concise architecture for building distributed microcontroller networked system. The system consists of only one server, works through the internet, and a set of microcontrollers distributed in different sites. Each microcontroller is connected through the Ethernet to the internet. In this system the client requesting data from certain side is accomplished through just one server that is in
... Show MoreA novel median filter based on crow optimization algorithms (OMF) is suggested to reduce the random salt and pepper noise and improve the quality of the RGB-colored and gray images. The fundamental idea of the approach is that first, the crow optimization algorithm detects noise pixels, and that replacing them with an optimum median value depending on a criterion of maximization fitness function. Finally, the standard measure peak signal-to-noise ratio (PSNR), Structural Similarity, absolute square error and mean square error have been used to test the performance of suggested filters (original and improved median filter) used to removed noise from images. It achieves the simulation based on MATLAB R2019b and the resul
... Show MoreAuthentication is the process of determining whether someone or something is, in fact, who or what it is declared to be. As the dependence upon computers and computer networks grows, the need for user authentication has increased. User’s claimed identity can be verified by one of several methods. One of the most popular of these methods is represented by (something user know), such as password or Personal Identification Number (PIN). Biometrics is the science and technology of authentication by identifying the living individual’s physiological or behavioral attributes. Keystroke authentication is a new behavioral access control system to identify legitimate users via their typing behavior. The objective of this paper is to provide user
... Show MoreImage retrieval is used in searching for images from images database. In this paper, content – based image retrieval (CBIR) using four feature extraction techniques has been achieved. The four techniques are colored histogram features technique, properties features technique, gray level co- occurrence matrix (GLCM) statistical features technique and hybrid technique. The features are extracted from the data base images and query (test) images in order to find the similarity measure. The similarity-based matching is very important in CBIR, so, three types of similarity measure are used, normalized Mahalanobis distance, Euclidean distance and Manhattan distance. A comparison between them has been implemented. From the results, it is conclud
... Show MoreIris research is focused on developing techniques for identifying and locating relevant biometric features, accurate segmentation and efficient computation while lending themselves to compression methods. Most iris segmentation methods are based on complex modelling of traits and characteristics which, in turn, reduce the effectiveness of the system being used as a real time system. This paper introduces a novel parameterized technique for iris segmentation. The method is based on a number of steps starting from converting grayscale eye image to a bit plane representation, selection of the most significant bit planes followed by a parameterization of the iris location resulting in an accurate segmentation of the iris from the origin
... Show More