E-mail is an efficient and reliable data exchange service. Spams are undesired e-mail messages which are randomly sent in bulk usually for commercial aims. Obfuscated image spamming is one of the new tricks to bypass text-based and Optical Character Recognition (OCR)-based spam filters. Image spam detection based on image visual features has the advantage of efficiency in terms of reducing the computational cost and improving the performance. In this paper, an image spam detection schema is presented. Suitable image processing techniques were used to capture the image features that can differentiate spam images from non-spam ones. Weighted k-nearest neighbor, which is a simple, yet powerful, machine learning algorithm, was used as a classifier. The results confirm the effectiveness of the proposed schema as it is evaluated over two datasets. The first dataset is a real and benchmark dataset while the other is a real-like, modern, and more challenging dataset collected from social media and many public available image spam datasets. The obtained accuracy was 99.36% and 91% on benchmark and the proposed dataset, respectively.
Data mining has the most important role in healthcare for discovering hidden relationships in big datasets, especially in breast cancer diagnostics, which is the most popular cause of death in the world. In this paper two algorithms are applied that are decision tree and K-Nearest Neighbour for diagnosing Breast Cancer Grad in order to reduce its risk on patients. In decision tree with feature selection, the Gini index gives an accuracy of %87.83, while with entropy, the feature selection gives an accuracy of %86.77. In both cases, Age appeared as the most effective parameter, particularly when Age<49.5. Whereas Ki67 appeared as a second effective parameter. Furthermore, K- Nearest Neighbor is based on the minimum err
... Show MoreData mining has the most important role in healthcare for discovering hidden relationships in big datasets, especially in breast cancer diagnostics, which is the most popular cause of death in the world. In this paper two algorithms are applied that are decision tree and K-Nearest Neighbour for diagnosing Breast Cancer Grad in order to reduce its risk on patients. In decision tree with feature selection, the Gini index gives an accuracy of %87.83, while with entropy, the feature selection gives an accuracy of %86.77. In both cases, Age appeared as the most effective parameter, particularly when Age<49.5. Whereas Ki67 appeared as a second effective parameter. Furthermore, K- Nearest Neighbor is based on the minimu
... Show MoreThis paper proposed a new method to study functional non-parametric regression data analysis with conditional expectation in the case that the covariates are functional and the Principal Component Analysis was utilized to de-correlate the multivariate response variables. It utilized the formula of the Nadaraya Watson estimator (K-Nearest Neighbour (KNN)) for prediction with different types of the semi-metrics, (which are based on Second Derivative and Functional Principal Component Analysis (FPCA)) for measureing the closeness between curves. Root Mean Square Errors is used for the implementation of this model which is then compared to the independent response method. R program is used for analysing data. Then, when the cov
... Show MoreSupport Vector Machine (SVM) is supervised machine learning technique which has become a popular technique for e-mail classifiers because its performance improves the accuracy of classification. The proposed method combines gain ratio (GR) which is feature selection method with one-class training SVM to increase the efficiency of the detection process and decrease the cost. The results show high accuracy up to 100% and less error rate with less number of feature to 5 features.
A principal problem of any internet user is the increasing number of spam, which became a great problem today. Therefore, spam filtering has become a research fo-cus that attracts the attention of several security researchers and practitioners. Spam filtering can be viewed as a two-class classification problem. To this end, this paper proposes a spam filtering approach based on Possibilistic c-Means (PCM) algorithm and weighted distance coined as (WFCM) that can efficiently distinguish between spam and legitimate email messages. The objective of the formulated fuzzy problem is to construct two fuzzy clusters: spam and email clusters. The weight assignment is set by information gain algorithm. Experimental results on spam based benchmark
... Show MoreHeart disease is a non-communicable disease and the number 1 cause of death in Indonesia. According to WHO predictions, heart disease will cause 11 million deaths in 2020. Bad lifestyle and unhealthy consumption patterns of modern society are the causes of this disease experienced by many people. Lack of knowledge about heart conditions and the potential dangers cause heart disease attacks before any preventive measures are taken. This study aims to produce a system for Predicting Heart Disease, which benefits to prevent and reduce the number of deaths caused by heart disease. The use of technology in the health sector has been widely practiced in various places and one of the advanced technologies is machine lea
... Show MoreA -set in the projective line is a set of projectively distinct points. From the fundamental theorem over the projective line, all -sets are projectively equivalent. In this research, the inequivalent -sets in have been computed and each -set classified to its -sets where Also, the has been splitting into two distinct -sets, equivalent and inequivalent.
For many years, reading rate as word correct per minute (WCPM) has been investigated by many researchers as an indicator of learners’ level of oral reading speed, accuracy, and comprehension. The aim of the study is to predict the levels of WCPM using three machine learning algorithms which are Ensemble Classifier (EC), Decision Tree (DT), and K- Nearest Neighbor (KNN). The data of this study were collected from 100 Kurdish EFL students in the 2nd-year, English language department, at the University of Duhok in 2021. The outcomes showed that the ensemble classifier (EC) obtained the highest accuracy of testing results with a value of 94%. Also, EC recorded the highest precision, recall, and F1 scores with values of 0.92 for
... Show MoreThe huge amount of documents in the internet led to the rapid need of text classification (TC). TC is used to organize these text documents. In this research paper, a new model is based on Extreme Machine learning (EML) is used. The proposed model consists of many phases including: preprocessing, feature extraction, Multiple Linear Regression (MLR) and ELM. The basic idea of the proposed model is built upon the calculation of feature weights by using MLR. These feature weights with the extracted features introduced as an input to the ELM that produced weighted Extreme Learning Machine (WELM). The results showed a great competence of the proposed WELM compared to the ELM.
With the rapid development of smart devices, people's lives have become easier, especially for visually disabled or special-needs people. The new achievements in the fields of machine learning and deep learning let people identify and recognise the surrounding environment. In this study, the efficiency and high performance of deep learning architecture are used to build an image classification system in both indoor and outdoor environments. The proposed methodology starts with collecting two datasets (indoor and outdoor) from different separate datasets. In the second step, the collected dataset is split into training, validation, and test sets. The pre-trained GoogleNet and MobileNet-V2 models are trained using the indoor and outdoor se
... Show More