Advances in digital technology and the World Wide Web has led to the increase of digital documents that are used for various purposes such as publishing and digital library. This phenomenon raises awareness for the requirement of effective techniques that can help during the search and retrieval of text. One of the most needed tasks is clustering, which categorizes documents automatically into meaningful groups. Clustering is an important task in data mining and machine learning. The accuracy of clustering depends tightly on the selection of the text representation method. Traditional methods of text representation model documents as bags of words using term-frequency index document frequency (TFIDF). This method ignores the relationship and meanings of words in the document. As a result the sparsity and semantic problem that is prevalent in textual document are not resolved. In this study, the problem of sparsity and semantic is reduced by proposing a graph based text representation method, namely dependency graph with the aim of improving the accuracy of document clustering. The dependency graph representation scheme is created through an accumulation of syntactic and semantic analysis. A sample of 20 news groups, dataset was used in this study. The text documents undergo pre-processing and syntactic parsing in order to identify the sentence structure. Then the semantic of words are modeled using dependency graph. The produced dependency graph is then used in the process of cluster analysis. K-means clustering technique was used in this study. The dependency graph based clustering result were compared with the popular text representation method, i.e. TFIDF and Ontology based text representation. The result shows that the dependency graph outperforms both TFIDF and Ontology based text representation. The findings proved that the proposed text representation method leads to more accurate document clustering results.
Content-based image retrieval has been keenly developed in numerous fields. This provides more active management and retrieval of images than the keyword-based method. So the content based image retrieval becomes one of the liveliest researches in the past few years. In a given set of objects, the retrieval of information suggests solutions to search for those in response to a particular description. The set of objects which can be considered are documents, images, videos, or sounds. This paper proposes a method to retrieve a multi-view face from a large face database according to color and texture attributes. Some of the features used for retrieval are color attributes such as the mean, the variance, and the color image's bitmap. In add
... Show MorePlagiarism is becoming more of a problem in academics. It’s made worse by the ease with which a wide range of resources can be found on the internet, as well as the ease with which they can be copied and pasted. It is academic theft since the perpetrator has ”taken” and presented the work of others as his or her own. Manual detection of plagiarism by a human being is difficult, imprecise, and time-consuming because it is difficult for anyone to compare their work to current data. Plagiarism is a big problem in higher education, and it can happen on any topic. Plagiarism detection has been studied in many scientific articles, and methods for recognition have been created utilizing the Plagiarism analysis, Authorship identification, and
... Show MoreThe development of microcontroller is used in monitoring and data acquisition recently. This development has born various architectures for spreading and interfacing the microcontroller in network environment. Some of existing architecture suffers from redundant in resources, extra processing, high cost and delay in response. This paper presents flexible concise architecture for building distributed microcontroller networked system. The system consists of only one server, works through the internet, and a set of microcontrollers distributed in different sites. Each microcontroller is connected through the Ethernet to the internet. In this system the client requesting data from certain side is accomplished through just one server that is in
... Show MoreIris research is focused on developing techniques for identifying and locating relevant biometric features, accurate segmentation and efficient computation while lending themselves to compression methods. Most iris segmentation methods are based on complex modelling of traits and characteristics which, in turn, reduce the effectiveness of the system being used as a real time system. This paper introduces a novel parameterized technique for iris segmentation. The method is based on a number of steps starting from converting grayscale eye image to a bit plane representation, selection of the most significant bit planes followed by a parameterization of the iris location resulting in an accurate segmentation of the iris from the origin
... Show MoreContent-based image retrieval has been keenly developed in numerous fields. This provides more active management and retrieval of images than the keyword-based method. So the content based image retrieval becomes one of the liveliest researches in the past few years. In a given set of objects, the retrieval of information suggests solutions to search for those in response to a particular description. The set of objects which can be considered are documents, images, videos, or sounds. This paper proposes a method to retrieve a multi-view face from a large face database according to color and texture attributes. Some of the features used for retrieval are color attributes such as the mean, the variance, and the color image's bitmap. In add
... Show MoreThis study assessed the advantage of using earthworms in combination with punch waste and nutrients in remediating drill cuttings contaminated with hydrocarbons. Analyses were performed on day 0, 7, 14, 21, and 28 of the experiment. Two hydrocarbon concentrations were used (20000 mg/kg and 40000 mg/kg) for three groups of earthworms number which were five, ten and twenty earthworms. After 28 days, the total petroleum hydrocarbon (TPH) concentration (20000 mg/kg) was reduced to 13200 mg/kg, 9800 mg/kg, and 6300 mg/kg in treatments with five, ten and twenty earthworms respectively. Also, TPH concentration (40000 mg/kg) was reduced to 22000 mg/kg, 10100 mg/kg, and 4200 mg/kg in treatments with the above number of earthworms respectively. The p
... Show MoreToday with increase using social media, a lot of researchers have interested in topic extraction from Twitter. Twitter is an unstructured short text and messy that it is critical to find topics from tweets. While topic modeling algorithms such as Latent Semantic Analysis (LSA) and Latent Dirichlet Allocation (LDA) are originally designed to derive topics from large documents such as articles, and books. They are often less efficient when applied to short text content like Twitter. Luckily, Twitter has many features that represent the interaction between users. Tweets have rich user-generated hashtags as keywords. In this paper, we exploit the hashtags feature to improve topics learned