Advances in digital technology and the World Wide Web has led to the increase of digital documents that are used for various purposes such as publishing and digital library. This phenomenon raises awareness for the requirement of effective techniques that can help during the search and retrieval of text. One of the most needed tasks is clustering, which categorizes documents automatically into meaningful groups. Clustering is an important task in data mining and machine learning. The accuracy of clustering depends tightly on the selection of the text representation method. Traditional methods of text representation model documents as bags of words using term-frequency index document frequency (TFIDF). This method ignores the relationship and meanings of words in the document. As a result the sparsity and semantic problem that is prevalent in textual document are not resolved. In this study, the problem of sparsity and semantic is reduced by proposing a graph based text representation method, namely dependency graph with the aim of improving the accuracy of document clustering. The dependency graph representation scheme is created through an accumulation of syntactic and semantic analysis. A sample of 20 news groups, dataset was used in this study. The text documents undergo pre-processing and syntactic parsing in order to identify the sentence structure. Then the semantic of words are modeled using dependency graph. The produced dependency graph is then used in the process of cluster analysis. K-means clustering technique was used in this study. The dependency graph based clustering result were compared with the popular text representation method, i.e. TFIDF and Ontology based text representation. The result shows that the dependency graph outperforms both TFIDF and Ontology based text representation. The findings proved that the proposed text representation method leads to more accurate document clustering results.
The present article discusses innovative word-formation processes in Internet texts, the emergence of new derivative words, new affixes, word-formation models, and word-formation methods. Using several neologisms as an example, the article shows both the possibilities of Internet word-making process and the possibilities of studying a newly established work through Internet communication. The words selected for analysis can be attributed to the keywords of the current time. (In particular, the words included in the list of "Words of 2019") there are number of words formed by the suffix method, which is the traditional method of the Russian word formation. A negation of these words is usually made thro
... Show MoreThe article considers the main reason for A. I. Herzen's address to obsolete words, which is their ability to acquire a stylistic coloring in the context of speech, as well as the possibility of combining, in some cases, with neutral lexemes of various functional styles. A certain stylistic effect of such characteristics of this type of vocabulary is represented, as a result of which their stylistic coloring in syntagmatic terms does not coincide with stylistic coloring in terms of paradigmatics, that is, in speech they have a completely stylistic meaning. Attention is focused on the role of outdated vocabulary, which consists in the fact that they serve to implement such features of the artistic style as imagery, emotionality, and their
... Show MoreThe research aims to highlight the significance and composition and the diversity of meanings and the Quranic context in the necessary and transgressive verbs in Surat (Abs).
This research consists of : a preamble , and two studies . The researcher addressed in the preliminary the importance of the phenomenon of necessity and infringement, the signs of the necessary action , the structure and controls of the act , the methods of infringement , its sections and signs.
As for the first topic : The researcher addressed the necessary verbs in Surat Abs , an applied study in terms of grammati
... Show More