Advances in digital technology and the World Wide Web has led to the increase of digital documents that are used for various purposes such as publishing and digital library. This phenomenon raises awareness for the requirement of effective techniques that can help during the search and retrieval of text. One of the most needed tasks is clustering, which categorizes documents automatically into meaningful groups. Clustering is an important task in data mining and machine learning. The accuracy of clustering depends tightly on the selection of the text representation method. Traditional methods of text representation model documents as bags of words using term-frequency index document frequency (TFIDF). This method ignores the relationship and meanings of words in the document. As a result the sparsity and semantic problem that is prevalent in textual document are not resolved. In this study, the problem of sparsity and semantic is reduced by proposing a graph based text representation method, namely dependency graph with the aim of improving the accuracy of document clustering. The dependency graph representation scheme is created through an accumulation of syntactic and semantic analysis. A sample of 20 news groups, dataset was used in this study. The text documents undergo pre-processing and syntactic parsing in order to identify the sentence structure. Then the semantic of words are modeled using dependency graph. The produced dependency graph is then used in the process of cluster analysis. K-means clustering technique was used in this study. The dependency graph based clustering result were compared with the popular text representation method, i.e. TFIDF and Ontology based text representation. The result shows that the dependency graph outperforms both TFIDF and Ontology based text representation. The findings proved that the proposed text representation method leads to more accurate document clustering results.
Based on the German language department’s theoretical and practical aspects as well as educational programs, the present study discusses the semantic relations in text sentences and their role in the science of translation. Through clarifying the semantic relationship between the text sentence and the methods used to express a news item, a situation or an occurrence and through the statement of the multiple theoretical semantic structures of the text’s construction and interrelation, a translator can easily translate a text into the target language.
It is known that language learners face multiple difficulties in writing and creating an inte
... Show MoreLet R be a commutative ring , the pseudo – von neuman regular graph of the ring R is define as a graph whose vertex set consists of all elements of R and any two distinct vertices a and b are adjacent if and only if , this graph denoted by P-VG(R) , in this work we got some new results a bout chromatic number of P-VG(R).
In this thesis, we study the topological structure in graph theory and various related results. Chapter one, contains fundamental concept of topology and basic definitions about near open sets and give an account of uncertainty rough sets theories also, we introduce the concepts of graph theory. Chapter two, deals with main concepts concerning topological structures using mixed degree systems in graph theory, which is M-space by using the mixed degree systems. In addition, the m-derived graphs, m-open graphs, m-closed graphs, m-interior operators, m-closure operators and M-subspace are defined and studied. In chapter three we study supra-approximation spaces using mixed degree systems and primary object in this chapter are two topological
... Show MoreDoubts arise about the originality of a document when noticing a change in its writing style. This evidence to plagiarism has made the intrinsic approach for detecting plagiarism uncover the plagiarized passages through the analysis of the writing style for the suspicious document where a reference corpus to compare with is absent. The proposed work aims at discovering the deviations in document writing style through applying several steps: Firstly, the entire document is segmented into disjointed segments wherein each corresponds to a paragraph in the original document. For the entire document and for each segment, center vectors comprising average weight of their word are constructed. Second, the degree of cl
... Show MoreSteganography can be defined as the art and science of hiding information in the data that could be read by computer. This science cannot recognize stego-cover and the original one whether by eye or by computer when seeing the statistical samples. This paper presents a new method to hide text in text characters. The systematic method uses the structure of invisible character to hide and extract secret texts. The creation of secret message comprises four main stages such using the letter from the original message, selecting the suitable cover text, dividing the cover text into blocks, hiding the secret text using the invisible character and comparing the cover-text and stego-object. This study uses an invisible character (white space
... Show MoreLet be a non-trivial simple graph. A dominating set in a graph is a set of vertices such that every vertex not in the set is adjacent to at least one vertex in the set. A subset is a minimum neighborhood dominating set if is a dominating set and if for every holds. The minimum cardinality of the minimum neighborhood dominating set of a graph is called as minimum neighborhood dominating number and it is denoted by . A minimum neighborhood dominating set is a dominating set where the intersection of the neighborhoods of all vertices in the set is as small as possible, (i.e., ). The minimum neighborhood dominating number, denoted by , is the minimum cardinality of a minimum neighborhood dominating set. In other words, it is the
... Show MoreBuilding a system to identify individuals through their speech recording can find its application in diverse areas, such as telephone shopping, voice mail and security control. However, building such systems is a tricky task because of the vast range of differences in the human voice. Thus, selecting strong features becomes very crucial for the recognition system. Therefore, a speaker recognition system based on new spin-image descriptors (SISR) is proposed in this paper. In the proposed system, circular windows (spins) are extracted from the frequency domain of the spectrogram image of the sound, and then a run length matrix is built for each spin, to work as a base for feature extraction tasks. Five different descriptors are generated fro
... Show MoreThis research attempted to take advantage of modern techniques in the study of the superstructural phonetic features of spoken text in language using phonetic programs to achieve more accurate and objective results, far from being limited to self-perception and personal judgment, which varies from person to person.
It should be noted that these phonological features (Nabr, waqf, toning) are performance controls that determine the fate of the meaning of the word or sentence, but in the modern era has received little attention and attention, and that little attention to some of them came to study issues related to the composition or style Therefore, we recommend that more attention should be given to the study of