Advances in digital technology and the World Wide Web has led to the increase of digital documents that are used for various purposes such as publishing and digital library. This phenomenon raises awareness for the requirement of effective techniques that can help during the search and retrieval of text. One of the most needed tasks is clustering, which categorizes documents automatically into meaningful groups. Clustering is an important task in data mining and machine learning. The accuracy of clustering depends tightly on the selection of the text representation method. Traditional methods of text representation model documents as bags of words using term-frequency index document frequency (TFIDF). This method ignores the relationship and meanings of words in the document. As a result the sparsity and semantic problem that is prevalent in textual document are not resolved. In this study, the problem of sparsity and semantic is reduced by proposing a graph based text representation method, namely dependency graph with the aim of improving the accuracy of document clustering. The dependency graph representation scheme is created through an accumulation of syntactic and semantic analysis. A sample of 20 news groups, dataset was used in this study. The text documents undergo pre-processing and syntactic parsing in order to identify the sentence structure. Then the semantic of words are modeled using dependency graph. The produced dependency graph is then used in the process of cluster analysis. K-means clustering technique was used in this study. The dependency graph based clustering result were compared with the popular text representation method, i.e. TFIDF and Ontology based text representation. The result shows that the dependency graph outperforms both TFIDF and Ontology based text representation. The findings proved that the proposed text representation method leads to more accurate document clustering results.
The present paper stresses the direct effect of the situational dimension termed as “reality” on the authors’ thoughts and attitudes. Every text is placed within a particular situation which has to be correctly identified by the translator as the first and the most important step for a good translation. Hence, the content of any word production reflects some part of reality. Comprehending any text includes comprehending the reality’s different dimensions as reflected in the text and, thus illuminating the connection of reality features.
Аннотация
Исследование под названием ((«Понимание реальности» средство полно
... Show MoreThe article considers semantic and stylistic motivations for using obsolete lexicon (historicisms) in the text of a work of art. The specifics of the functioning of this process are presented against the background of the features of the contemporary Russian literary language. Attention is focused on the fact that the layer of obsolete lexical units belongs to a number of nationally specific vocabulary, the development of which forms an understanding of the nature of the actualized language. In addition, it should be noted that the semantics of historicisms is culturally commensurate: the latter is explained by the fact that the deactuation of linguistic units is positioned as parallel to the sociocultural and political changes.
... Show MoreThe article states that the Russian verbs of destruction belong to the lexical-semantic group of physical impact. They include verbs with the meaning of “damage” and “destroy”. It is emphasized that each of these groups is relatively independent; the cut line between them is fuzzy and arbitrary. It is postulated that when the object is completely destroyed, then the verb has the meaning of “destruction”, and when the object is partially destroyed, then the verb has the meaning of “damage”. It is this feature that individualizes the meaning of verbs. The study distinguishes between the groups and the nature of the object as- animate / inanimate. The object of the action of the “destruction” can only be inan
... Show MoreThe present article discusses innovative word-formation processes in Internet texts, the emergence of new derivative words, new affixes, word-formation models, and word-formation methods. Using several neologisms as an example, the article shows both the possibilities of Internet word-making process and the possibilities of studying a newly established work through Internet communication. The words selected for analysis can be attributed to the keywords of the current time. (In particular, the words included in the list of "Words of 2019") there are number of words formed by the suffix method, which is the traditional method of the Russian word formation. A negation of these words is usually made thro
... Show MoreThe article considers the main reason for A. I. Herzen's address to obsolete words, which is their ability to acquire a stylistic coloring in the context of speech, as well as the possibility of combining, in some cases, with neutral lexemes of various functional styles. A certain stylistic effect of such characteristics of this type of vocabulary is represented, as a result of which their stylistic coloring in syntagmatic terms does not coincide with stylistic coloring in terms of paradigmatics, that is, in speech they have a completely stylistic meaning. Attention is focused on the role of outdated vocabulary, which consists in the fact that they serve to implement such features of the artistic style as imagery, emotionality, and their
... Show MoreThe research aims to highlight the significance and composition and the diversity of meanings and the Quranic context in the necessary and transgressive verbs in Surat (Abs).
This research consists of : a preamble , and two studies . The researcher addressed in the preliminary the importance of the phenomenon of necessity and infringement, the signs of the necessary action , the structure and controls of the act , the methods of infringement , its sections and signs.
As for the first topic : The researcher addressed the necessary verbs in Surat Abs , an applied study in terms of grammati
... Show More