Arabic text categorization for pattern recognitions is challenging. We propose for the first time a novel holistic method based on clustering for classifying Arabic writer. The categorization is accomplished stage-wise. Firstly, these document images are sectioned into lines, words, and characters. Secondly, their structural and statistical features are obtained from sectioned portions. Thirdly, F-Measure is used to evaluate the performance of the extracted features and their combination in different linkage methods for each distance measures and different numbers of groups. Finally, experiments are conducted on the standard KHATT dataset of Arabic handwritten text comprised of varying samples from 1000 writers. The results in the generation step are obtained from multiple runs of individual clustering methods for each distance measures. The best results are achieved when intensity, lines slope and their
Nowadays, people's expression on the Internet is no longer limited to text, especially with the rise of the short video boom, leading to the emergence of a large number of modal data such as text, pictures, audio, and video. Compared to single mode data ,the multi-modal data always contains massive information. The mining process of multi-modal information can help computers to better understand human emotional characteristics. However, because the multi-modal data show obvious dynamic time series features, it is necessary to solve the dynamic correlation problem within a single mode and between different modes in the same application scene during the fusion process. To solve this problem, in this paper, a feature extraction framework of
... Show MoreText categorization refers to the process of grouping text or documents into classes or categories according to their content. Text categorization process consists of three phases which are: preprocessing, feature extraction and classification. In comparison to the English language, just few studies have been done to categorize and classify the Arabic language. For a variety of applications, such as text classification and clustering, Arabic text representation is a difficult task because Arabic language is noted for its richness, diversity, and complicated morphology. This paper presents a comprehensive analysis and a comparison for researchers in the last five years based on the dataset, year, algorithms and the accuracy th
... Show MoreText categorization refers to the process of grouping text or documents into classes or categories according to their content. Text categorization process consists of three phases which are: preprocessing, feature extraction and classification. In comparison to the English language, just few studies have been done to categorize and classify the Arabic language. For a variety of applications, such as text classification and clustering, Arabic text representation is a difficult task because Arabic language is noted for its richness, diversity, and complicated morphology. This paper presents a comprehensive analysis and a comparison for researchers in the last five years based on the dataset, year, algorithms and the accu
... Show MoreThe huge amount of documents in the internet led to the rapid need of text classification (TC). TC is used to organize these text documents. In this research paper, a new model is based on Extreme Machine learning (EML) is used. The proposed model consists of many phases including: preprocessing, feature extraction, Multiple Linear Regression (MLR) and ELM. The basic idea of the proposed model is built upon the calculation of feature weights by using MLR. These feature weights with the extracted features introduced as an input to the ELM that produced weighted Extreme Learning Machine (WELM). The results showed a great competence of the proposed WELM compared to the ELM.
The conventional procedures of clustering algorithms are incapable of overcoming the difficulty of managing and analyzing the rapid growth of generated data from different sources. Using the concept of parallel clustering is one of the robust solutions to this problem. Apache Hadoop architecture is one of the assortment ecosystems that provide the capability to store and process the data in a distributed and parallel fashion. In this paper, a parallel model is designed to process the k-means clustering algorithm in the Apache Hadoop ecosystem by connecting three nodes, one is for server (name) nodes and the other two are for clients (data) nodes. The aim is to speed up the time of managing the massive sc
... Show MoreA substantial matter to confidential messages' interchange through the internet is transmission of information safely. For example, digital products' consumers and producers are keen for knowing those products are genuine and must be distinguished from worthless products. Encryption's science can be defined as the technique to embed the data in an images file, audio or videos in a style which should be met the safety requirements. Steganography is a portion of data concealment science that aiming to be reached a coveted security scale in the interchange of private not clear commercial and military data. This research offers a novel technique for steganography based on hiding data inside the clusters that resulted from fuzzy clustering. T
... Show MoreWireless sensor applications are susceptible to energy constraints. Most of the energy is consumed in communication between wireless nodes. Clustering and data aggregation are the two widely used strategies for reducing energy usage and increasing the lifetime of wireless sensor networks. In target tracking applications, large amount of redundant data is produced regularly. Hence, deployment of effective data aggregation schemes is vital to eliminate data redundancy. This work aims to conduct a comparative study of various research approaches that employ clustering techniques for efficiently aggregating data in target tracking applications as selection of an appropriate clustering algorithm may reflect positive results in the data aggregati
... Show MoreDocument clustering is the process of organizing a particular electronic corpus of documents into subgroups of similar text features. Formerly, a number of conventional algorithms had been applied to perform document clustering. There are current endeavors to enhance clustering performance by employing evolutionary algorithms. Thus, such endeavors became an emerging topic gaining more attention in recent years. The aim of this paper is to present an up-to-date and self-contained review fully devoted to document clustering via evolutionary algorithms. It firstly provides a comprehensive inspection to the document clustering model revealing its various components with its related concepts. Then it shows and analyzes the principle research wor
... Show MoreEmotion recognition has important applications in human-computer interaction. Various sources such as facial expressions and speech have been considered for interpreting human emotions. The aim of this paper is to develop an emotion recognition system from facial expressions and speech using a hybrid of machine-learning algorithms in order to enhance the overall performance of human computer communication. For facial emotion recognition, a deep convolutional neural network is used for feature extraction and classification, whereas for speech emotion recognition, the zero-crossing rate, mean, standard deviation and mel frequency cepstral coefficient features are extracted. The extracted features are then fed to a random forest classifier. In
... Show MoreUntil recently, researchers have utilized and applied various techniques for intrusion detection system (IDS), including DNA encoding and clustering that are widely used for this purpose. In addition to the other two major techniques for detection are anomaly and misuse detection, where anomaly detection is done based on user behavior, while misuse detection is done based on known attacks signatures. However, both techniques have some drawbacks, such as a high false alarm rate. Therefore, hybrid IDS takes advantage of combining the strength of both techniques to overcome their limitations. In this paper, a hybrid IDS is proposed based on the DNA encoding and clustering method. The proposed DNA encoding is done based on the UNSW-NB15
... Show More