Doubts arise about the originality of a document when noticing a change in its writing style. This evidence to plagiarism has made the intrinsic approach for detecting plagiarism uncover the plagiarized passages through the analysis of the writing style for the suspicious document where a reference corpus to compare with is absent. The proposed work aims at discovering the deviations in document writing style through applying several steps: Firstly, the entire document is segmented into disjointed segments wherein each corresponds to a paragraph in the original document. For the entire document and for each segment, center vectors comprising average weight of their word are constructed. Second, the degree of closeness is calculated through applying Cosine similarity to measure for each segment, the deviation of its center vector from the center vector of the entire document. Additionally, word n-gram length will be investigated to show its effect on the proposed system performance wherein, center vectors are computed considering word n-grams for different values of n (n= 1, 2, and 3). Performance evaluation of the proposed method was accomplished through the use of Precision, Recall, F-measure, Granularity, and Plagdet as evaluation measures. Moreover, PAN-PC-09 and PAN-PC-11 were used for detecting intrinsic plagiarism as evaluation corpora. It is shown that the proposed approach has achieved results that are comparable to the state-of-the-art methods. Positive impact was observed through discovering deviations in document writing style by computing weight vectors dissimilarity rather than calculating the difference between the word n-grams that exist in segments and their corresponding word n-grams in the suspicious document. Furthermore, when considering the length of word n-gram, better results were recorded for system performance when word bi-grams was used compared to word uni-grams and word tri-grams.
Backgound:
Multi-document summarization is an optimization problem demanding optimization of more than one objective function simultaneously. The proposed work regards balancing of the two significant objectives: content coverage and diversity when generating summaries from a collection of text documents.
Any automatic text summarization system has the challenge of producing high quality summary. Despite the existing efforts on designing and evaluating the performance of many text summarization techniques, their formulations lack the introduction of any model that can give an explicit representation of – coverage and diversity – the two contradictory semantics of any summary. In this work, the design of
... Show MoreCrime is a threat to any nation’s security administration and jurisdiction. Therefore, crime analysis becomes increasingly important because it assigns the time and place based on the collected spatial and temporal data. However, old techniques, such as paperwork, investigative judges, and statistical analysis, are not efficient enough to predict the accurate time and location where the crime had taken place. But when machine learning and data mining methods were deployed in crime analysis, crime analysis and predication accuracy increased dramatically. In this study, various types of criminal analysis and prediction using several machine learning and data mining techniques, based o
Although text document images authentication is difficult due to the binary nature and clear separation between the background and foreground but it is getting higher demand for many applications. Most previous researches in this field depend on insertion watermark in the document, the drawback in these techniques lie in the fact that changing pixel values in a binary document could introduce irregularities that are very visually noticeable. In this paper, a new method is proposed for object-based text document authentication, in which I propose a different approach where a text document is signed by shifting individual words slightly left or right from their original positions to make the center of gravity for each line fall in with the m
... Show MoreWriting in English is one of the essential factors for successful EFL learning .Iraqi students at the preparatory schools encounter problems when using their background knowledge in handling subskills of writing(Burhan,2013:164).Therefore, this study aims to investigate the 4thyear preparatory school students’ problems in English composition writing, and find solutions to these pro
... Show MoreWrong behavioral deviations among youth witnessed an obvious growing, some of these deviations are, robbery, drug and internet addiction, and smoking. Additionally, intellectual extremism and the disrespect of the traditions and genuine customs, In light of this issue, the research aims to shed light on the educational and preventative role of the woman in protecting and immunizing children from the wrong behavioral deviations. The woman has a significant and influential role in the society in general and in her family, in particular to the responsibility lies on her. The research shed light on the manifestations of behavioral deviation in children and the mechanisms of prevention and treatment. The research concluded with a set of recom
... Show MoreDocument clustering is the process of organizing a particular electronic corpus of documents into subgroups of similar text features. Formerly, a number of conventional algorithms had been applied to perform document clustering. There are current endeavors to enhance clustering performance by employing evolutionary algorithms. Thus, such endeavors became an emerging topic gaining more attention in recent years. The aim of this paper is to present an up-to-date and self-contained review fully devoted to document clustering via evolutionary algorithms. It firstly provides a comprehensive inspection to the document clustering model revealing its various components with its related concepts. Then it shows and analyzes the principle research wor
... Show MoreMedia writing is accuracy writing. Clarity and concision are its predominant features. It is a writing that goes straight to the essence because it has no time to waste. Furthermore, it must be as accurate as scientific writing. It is destined for the average reader and has to be understood by everyone. However, it can be as elegant as literary writing. The variety in its forms of expression does not prevent media writing from having its own amplitude.
In short, this study is a practical approach that aims at studying different kinds of writing styles and identifying the specificity of media writing using some patterns and examples