Advances in digital technology and the World Wide Web has led to the increase of digital documents that are used for various purposes such as publishing and digital library. This phenomenon raises awareness for the requirement of effective techniques that can help during the search and retrieval of text. One of the most needed tasks is clustering, which categorizes documents automatically into meaningful groups. Clustering is an important task in data mining and machine learning. The accuracy of clustering depends tightly on the selection of the text representation method. Traditional methods of text representation model documents as bags of words using term-frequency index document frequency (TFIDF). This method ignores the relationship and meanings of words in the document. As a result the sparsity and semantic problem that is prevalent in textual document are not resolved. In this study, the problem of sparsity and semantic is reduced by proposing a graph based text representation method, namely dependency graph with the aim of improving the accuracy of document clustering. The dependency graph representation scheme is created through an accumulation of syntactic and semantic analysis. A sample of 20 news groups, dataset was used in this study. The text documents undergo pre-processing and syntactic parsing in order to identify the sentence structure. Then the semantic of words are modeled using dependency graph. The produced dependency graph is then used in the process of cluster analysis. K-means clustering technique was used in this study. The dependency graph based clustering result were compared with the popular text representation method, i.e. TFIDF and Ontology based text representation. The result shows that the dependency graph outperforms both TFIDF and Ontology based text representation. The findings proved that the proposed text representation method leads to more accurate document clustering results.
Identifying the total number of fruits on trees has long been of interest in agricultural crop estimation work. Yield prediction of fruits in practical environment is one of the hard and significant tasks to obtain better results in crop management system to achieve more productivity with regard to moderate cost. Utilized color vision in machine vision system to identify citrus fruits, and estimated yield information of the citrus grove in-real time. Fruit recognition algorithms based on color features to estimate the number of fruit. In the current research work, some low complexity and efficient image analysis approach was proposed to count yield fruits image in the natural scene. Semi automatic segmentation and yield calculation of fruit
... Show MoreAlzheimer’s disease (AD) is an age-related progressive and neurodegenerative disorder, which is characterized by loss of memory and cognitive decline. It is the main cause of disability among older people. The rapid increase in the number of people living with AD and other forms of dementia due to the aging population represents a major challenge to health and social care systems worldwide. Degeneration of brain cells due to AD starts many years before the clinical manifestations become clear. Early diagnosis of AD will contribute to the development of effective treatments that could slow, stop, or prevent significant cognitive decline. Consequently, early diagnosis of AD may also be valuable in detecting patients with dementia who have n
... Show MoreActivity recognition (AR) is a new interesting and challenging research area with many applications (e.g. healthcare, security, and event detection). Basically, activity recognition (e.g. identifying user’s physical activity) is more likely to be considered as a classification problem. In this paper, a combination of 7 classification methods is employed and experimented on accelerometer data collected via smartphones, and compared for best performance. The dataset is collected from 59 individuals who performed 6 different activities (i.e. walk, jog, sit, stand, upstairs, and downstairs). The total number of dataset instances is 5418 with 46 labeled features. The results show that the proposed method of ensemble boost-based classif
... Show MoreThe concept of Cech fuzzy soft bi-closure space ( ˇ Cfs bi-csp) ( ˇ U, L1, L2, S) is initiated and studied by the authors in [6]. The notion of pairwise fuzzy soft separated sets in Cfs bi-csp is defined in this study, and various features of ˇ this notion are proved. Then, we introduce and investigate the concept of connectedness in both Cfs bi-csps and its ˇ associated fuzzy soft bitopological spaces utilizing the concept of pairwise fuzzy soft separated sets. Furthermore, the concept of pairwise feebly connected is introduced, and the relationship between pairwise connected and pairwise feebly connected is discussed. Finally, we provide various instances to further explain our findings.
Text categorization refers to the process of grouping text or documents into classes or categories according to their content. Text categorization process consists of three phases which are: preprocessing, feature extraction and classification. In comparison to the English language, just few studies have been done to categorize and classify the Arabic language. For a variety of applications, such as text classification and clustering, Arabic text representation is a difficult task because Arabic language is noted for its richness, diversity, and complicated morphology. This paper presents a comprehensive analysis and a comparison for researchers in the last five years based on the dataset, year, algorithms and the accuracy th
... Show MoreThe modern textual study researched the textuality of the texts and specified for that seven well-known standards, relying in all of that on the main elements of the text (the speaker, the text, and the recipient). This study was to investigate the textuality of philology, and the jurisprudence of the science of the text.
This research aims to solve the problem of selection using clustering algorithm, in this research optimal portfolio is formation using the single index model, and the real data are consisting from the stocks Iraqi Stock Exchange in the period 1/1/2007 to 31/12/2019. because the data series have missing values ,we used the two-stage missing value compensation method, the knowledge gap was inability the portfolio models to reduce The estimation error , inaccuracy of the cut-off rate and the Treynor ratio combine stocks into the portfolio that caused to decline in their performance, all these problems required employing clustering technic to data mining and regrouping it within clusters with similar characteristics to outperform the portfolio
... Show More