Importance of Arabic language stemming algorithm is not less than that of other languages stemming in Information Retrieval (IR) field. Lots of algorithms for finding the Arabic root are available and they are mainly categorized under two approaches which are light (stem)-based approach and root-based approach. The latter approach is somehow better than the first approach. A new root-based stemmer is proposed and its performance is compared with Khoja stemmer which is the most efficient root-based stemmers. The accuracy ratio of the proposed stemmer is (99.7) with a difference (1.9) with Khoja stemmer.
Calculating similarities between texts that have been written in one language or multiple languages still one of the most important challenges facing the natural language processing. This work offers many approaches that used for the texts similarity. The proposed system will find the similarity between two Arabic texts by using hybrid similarity measures techniques: Semantic similarity measure, Cosine similarity measure and N-gram ( using the Dice similarity measure). In our proposed system we will design Arabic SemanticNet that store the keywords for a specific field(computer science), by this network we can find semantic similarity between words according to specific equations. Cosine and N-gram similarity measures are used in order t
... Show MoreIn this study, we have created a new Arabic dataset annotated according to Ekman’s basic emotions (Anger, Disgust, Fear, Happiness, Sadness and Surprise). This dataset is composed from Facebook posts written in the Iraqi dialect. We evaluated the quality of this dataset using four external judges which resulted in an average inter-annotation agreement of 0.751. Then we explored six different supervised machine learning methods to test the new dataset. We used Weka standard classifiers ZeroR, J48, Naïve Bayes, Multinomial Naïve Bayes for Text, and SMO. We also used a further compression-based classifier called PPM not included in Weka. Our study reveals that the PPM classifier significantly outperforms other classifiers such as SVM and N
... Show MoreKeywords provide the reader with a summary of the contents of the document and play a significant role in information retrieval systems, especially in search engine optimization and bibliographic databases. Furthermore keywords help to classify the document into the related topic. Keywords extraction included manual extracting depends on the content of the document or article and the judgment of its author. Manual extracting of keywords is costly, consumes effort and time, and error probability. In this research an automatic Arabic keywords extraction model based on deep learning algorithms is proposed. The model consists of three main steps: preprocessing, feature extraction and classification to classify the document
... Show MoreStemming is a pre-processing step in Text mining applications as well as it is very important in most of the Information Retrieval systems. The goal of stemming is to reduce different grammatical forms of a word and sometimes derivationally related forms of a word to a common base (root or stem) form like reducing noun, adjective, verb, adverb etc. to its base form. The stem needs not to be identical to the morphological root of the word; it is usually sufficient that related words map to the same stem, even if this stem is not in itself a valid root. As in other languages; there is a need for an effective stemming algorithm for the indexing and retrieval of Arabic documents while the Arabic stemming algorithms are not widely available.
... Show MoreThis research is intended to high light the uses of political content in foreign Arabic / speaking websites, such as “ CNN “ and” Euro News“, The research problem stems from the main question: What is the nature of the use of the websites in the political content provided through them? A set of sub-questions that give the research aspects and aims to achieve a set of objectives , including the identification of topics that included , the political content provided through , the sample sites during the time period for analysis and determine that the study uses descriptive research based on the discovery of the researcher, describing it accurately and defining the relations between the components.
The research conducted the des