Spelling correction is considered a challenging task for resource-scarce languages. The Arabic language is one of these resource-scarce languages, which suffers from the absence of a large spelling correction dataset, thus datasets injected with artificial errors are used to overcome this problem. In this paper, we trained the Text-to-Text Transfer Transformer (T5) model using artificial errors to correct Arabic soft spelling mistakes. Our T5 model can correct 97.8% of the artificial errors that were injected into the test set. Additionally, our T5 model achieves a character error rate (CER) of 0.77% on a set that contains real soft spelling mistakes. We achieved these results using a 4-layer T5 model trained with a 90% error injection rate, with a maximum sequence length of 300 characters.
Translating culture-specific proverbs (CSPs) is a challenging task since they often occur in a peculiar context. Further, CSPs are intended to imply meanings that extend far beyond the literal meaning of such a kind of proverbs. As far as English and Arabic are concerned, translators often encounter problems in translating CSPs due to cultural differences between the source language(SL) and the target language (TL) as well as what seems to be the lack of equivalence for some CSPs.
In view of this, the present study aims at investigating the translation of CSPs in three English-Arabic dictionaries of proverbs, namely Dictionary of Common English Proverbs Translated and Explained (2004), One thousand and One English Pr
... Show MoreSentiment analysis refers to the task of identifying polarity of positive and negative for particular text that yield an opinion. Arabic language has been expanded dramatically in the last decade especially with the emergence of social websites (e.g. Twitter, Facebook, etc.). Several studies addressed sentiment analysis for Arabic language using various techniques. The most efficient techniques according to the literature were the machine learning due to their capabilities to build a training model. Yet, there is still issues facing the Arabic sentiment analysis using machine learning techniques. Such issues are related to employing robust features that have the ability to discrimina
... Show MoreText categorization refers to the process of grouping text or documents into classes or categories according to their content. Text categorization process consists of three phases which are: preprocessing, feature extraction and classification. In comparison to the English language, just few studies have been done to categorize and classify the Arabic language. For a variety of applications, such as text classification and clustering, Arabic text representation is a difficult task because Arabic language is noted for its richness, diversity, and complicated morphology. This paper presents a comprehensive analysis and a comparison for researchers in the last five years based on the dataset, year, algorithms and the accuracy th
... Show MoreIn the field of data security, the critical challenge of preserving sensitive information during its transmission through public channels takes centre stage. Steganography, a method employed to conceal data within various carrier objects such as text, can be proposed to address these security challenges. Text, owing to its extensive usage and constrained bandwidth, stands out as an optimal medium for this purpose. Despite the richness of the Arabic language in its linguistic features, only a small number of studies have explored Arabic text steganography. Arabic text, characterized by its distinctive script and linguistic features, has gained notable attention as a promising domain for steganographic ventures. Arabic text steganography harn
... Show MoreLanguage plays a major role in all aspects of life. Communication is regarded as the most important of these aspects, as language is used on a daily basis by humanity either in written or spoken forms. Language is also regarded as the main factor of exchanging peoples’ cultures and traditions and in handing down these attributes from generation to generation. Thus, language is a fundamental element in identifying peoples’ ideologies and traditions in the past and the present. Despite these facts, the feminist linguists have objections to some of the language structures, demonstrating that language is gender biased to men. That is, language promotes patriarchal values. This pushed towards developing extensive studies to substantiate s
... Show MoreDeep learning convolution neural network has been widely used to recognize or classify voice. Various techniques have been used together with convolution neural network to prepare voice data before the training process in developing the classification model. However, not all model can produce good classification accuracy as there are many types of voice or speech. Classification of Arabic alphabet pronunciation is a one of the types of voice and accurate pronunciation is required in the learning of the Qur’an reading. Thus, the technique to process the pronunciation and training of the processed data requires specific approach. To overcome this issue, a method based on padding and deep learning convolution neural network is proposed to
... Show MoreThis study deals with examining UCAS students’ attitudes in Gaza towards learning Arabic grammar online during the Corona pandemic. The researcher has adopted a descriptive approach and used a questionnaire as a tool for data collection. The results of the study have statistically shown significant differences at the level of "0.01" between the average scores of students in favor of the students of the humanities specializations. It has also been found that the students’ attitudes at the Department of Humanities and Media towards learning Arabic grammar online are positive. Additionally, the results revealed no statistical significant differences due to the variable of UCAS students’ scientific qualifications. The results stressed
... Show More