Data mining is a data analysis process using software to find certain patterns or rules in a large amount of data, which is expected to provide knowledge to support decisions. However, missing value in data mining often leads to a loss of information. The purpose of this study is to improve the performance of data classification with missing values, precisely and accurately. The test method is carried out using the Car Evaluation dataset from the UCI Machine Learning Repository. RStudio and RapidMiner tools were used for testing the algorithm. This study will result in a data analysis of the tested parameters to measure the performance of the algorithm. Using test variations: performance at C5.0, C4.5, and k-NN at 0% missing rate, performance at C5.0, C4.5, and k-NN at 5–50% missing rate, performance at C5.0 + k-NNI, C4.5 + k-NNI, and k-NN + k-NNI classifier at 5–50% missing rate, and performance at C5.0 + CMI, C4.5 + CMI, and k-NN + CMI classifier at 5–50% missing rate, The results show that C5.0 with k-NNI produces better classification accuracy than other tested imputation and classification algorithms. For example, with 35% of the dataset missing, this method obtains 93.40% validation accuracy and 92% test accuracy. C5.0 with k-NNI also offers fast processing times compared with other methods.
Rumors are typically described as remarks whose true value is unknown. A rumor on social media has the potential to spread erroneous information to a large group of individuals. Those false facts will influence decision-making in a variety of societies. In online social media, where enormous amounts of information are simply distributed over a large network of sources with unverified authority, detecting rumors is critical. This research proposes that rumor detection be done using Natural Language Processing (NLP) tools as well as six distinct Machine Learning (ML) methods (Nave Bayes (NB), random forest (RF), K-nearest neighbor (KNN), Logistic Regression (LR), Stochastic Gradient Descent (SGD) and Decision Tree (
... Show MoreThis paper presents a hybrid approach for solving null values problem; it hybridizes rough set theory with intelligent swarm algorithm. The proposed approach is a supervised learning model. A large set of complete data called learning data is used to find the decision rule sets that then have been used in solving the incomplete data problem. The intelligent swarm algorithm is used for feature selection which represents bees algorithm as heuristic search algorithm combined with rough set theory as evaluation function. Also another feature selection algorithm called ID3 is presented, it works as statistical algorithm instead of intelligent algorithm. A comparison between those two approaches is made in their performance for null values estima
... Show MoreEven though image retrieval is considered as one of the most important research areas in the last two decades, there is still room for improvement since it is still not satisfying for many users. Two of the major problems which need to be improved are the accuracy and the speed of the image retrieval system, in order to achieve user satisfaction and also to make the image retrieval system suitable for all platforms. In this work, the proposed retrieval system uses features with spatial information to analyze the visual content of the image. Then, the feature extraction process is followed by applying the fuzzy c-means (FCM) clustering algorithm to reduce the search space and speed up the retrieval process. The experimental results show t
... Show MoreFrequent data in weather records is essential for forecasting, numerical model development, and research, but data recording interruptions may occur for various reasons. So, this study aims to find a way to treat these missing data and know their accuracy by comparing them with the original data values. The mean method was used to treat daily and monthly missing temperature data. The results show that treating the monthly temperature data for the stations (Baghdad, Hilla, Basra, Nasiriya, and Samawa) in Iraq for all periods (1980-2020), the percentage for matching between the original and the treating values did not exceed (80%). So, the period was divided into four periods. It was noted that most of the congruence values increased, re
... Show MoreTwitter popularity has increasingly grown in the last few years, influencing life’s social, political, and business aspects. People would leave their tweets on social media about an event, and simultaneously inquire to see other people's experiences and whether they had a positive/negative opinion about that event. Sentiment Analysis can be used to obtain this categorization. Product reviews, events, and other topics from all users that comprise unstructured text comments are gathered and categorized as good, harmful, or neutral using sentiment analysis. Such issues are called polarity classifications. This study aims to use Twitter data about OK cuisine reviews obtained from the Amazon website and compare the effectiveness
... Show MoreAbstract
Value Added Tax (VAT) is one of the most important types of indirect taxes because of its advantages in achieving financial, economic and financial objectives. The introduction of VAT is part of the reform of the structure of the Lebanese public tax system aimed at reducing the fiscal deficit and resulting inflation, which still lacks a general consumption tax. There is also an urgent need to increase treasury revenues , Because of its broad tax base, as it imposes on the consumption of locally produced and imported goods, in addition to the role played by this tax in support of the local product &nbs
... Show MoreIn this study, we have created a new Arabic dataset annotated according to Ekman’s basic emotions (Anger, Disgust, Fear, Happiness, Sadness and Surprise). This dataset is composed from Facebook posts written in the Iraqi dialect. We evaluated the quality of this dataset using four external judges which resulted in an average inter-annotation agreement of 0.751. Then we explored six different supervised machine learning methods to test the new dataset. We used Weka standard classifiers ZeroR, J48, Naïve Bayes, Multinomial Naïve Bayes for Text, and SMO. We also used a further compression-based classifier called PPM not included in Weka. Our study reveals that the PPM classifier significantly outperforms other classifiers such as SVM and N
... Show MoreThe majority of systems dealing with natural language processing (NLP) and artificial intelligence (AI) can assist in making automated and automatically-supported decisions. However, these systems may face challenges and difficulties or find it confusing to identify the required information (characterization) for eliciting a decision by extracting or summarizing relevant information from large text documents or colossal content. When obtaining these documents online, for instance from social networking or social media, these sites undergo a remarkable increase in the textual content. The main objective of the present study is to conduct a survey and show the latest developments about the implementation of text-mining techniqu
... Show MoreReferral techniques are normally employed in internet business applications. Existing frameworks prescribe things to a particular client according to client inclinations and former high evaluations. Quite a number of methods, such as cooperative filtering and content-based methodologies, dominate the architectural design of referral frameworks. Many referral schemes are domain-specific and cannot be deployed in a general-purpose setting. This study proposes a two-dimensional (User × Item)-space multimode referral scheme, having an enormous client base but few articles on offer. Additionally, the design of the referral scheme is anchored on the and articles, as expressed by a particular client, and is a combination of affi
... Show More