Text documents are unstructured and high dimensional. Effective feature selection is required to select the most important and significant feature from the sparse feature space. Thus, this paper proposed an embedded feature selection technique based on Term Frequency-Inverse Document Frequency (TF-IDF) and Support Vector Machine-Recursive Feature Elimination (SVM-RFE) for unstructured and high dimensional text classificationhis technique has the ability to measure the feature’s importance in a high-dimensional text document. In addition, it aims to increase the efficiency of the feature selection. Hence, obtaining a promising text classification accuracy. TF-IDF act as a filter approach which measures features importance of the text documents at the first stage. SVM-RFE utilized a backward feature elimination scheme to recursively remove insignificant features from the filtered feature subsets at the second stage. This research executes sets of experiments using a text document retrieved from a benchmark repository comprising a collection of Twitter posts. Pre-processing processes are applied to extract relevant features. After that, the pre-processed features are divided into training and testing datasets. Next, feature selection is implemented on the training dataset by calculating the TF-IDF score for each feature. SVM-RFE is applied for feature ranking as the next feature selection step. Only top-rank features will be selected for text classification using the SVM classifier. Based on the experiments, it shows that the proposed technique able to achieve 98% accuracy that outperformed other existing techniques. In conclusion, the proposed technique able to select the significant features in the unstructured and high dimensional text document.
Cassava, a significant crop in Africa, Asia, and South America, is a staple food for millions. However, classifying cassava species using conventional color, texture, and shape features is inefficient, as cassava leaves exhibit similarities across different types, including toxic and non-toxic varieties. This research aims to overcome the limitations of traditional classification methods by employing deep learning techniques with pre-trained AlexNet as the feature extractor to accurately classify four types of cassava: Gajah, Manggu, Kapok, and Beracun. The dataset was collected from local farms in Lamongan Indonesia. To collect images with agricultural research experts, the dataset consists of 1,400 images, and each type of cassava has
... Show MoreThe huge amount of documents in the internet led to the rapid need of text classification (TC). TC is used to organize these text documents. In this research paper, a new model is based on Extreme Machine learning (EML) is used. The proposed model consists of many phases including: preprocessing, feature extraction, Multiple Linear Regression (MLR) and ELM. The basic idea of the proposed model is built upon the calculation of feature weights by using MLR. These feature weights with the extracted features introduced as an input to the ELM that produced weighted Extreme Learning Machine (WELM). The results showed a great competence of the proposed WELM compared to the ELM.
The interests toward developing accurate automatic face emotion recognition methodologies are growing vastly, and it is still one of an ever growing research field in the region of computer vision, artificial intelligent and automation. However, there is a challenge to build an automated system which equals human ability to recognize facial emotion because of the lack of an effective facial feature descriptor and the difficulty of choosing proper classification method. In this paper, a geometric based feature vector has been proposed. For the classification purpose, three different types of classification methods are tested: statistical, artificial neural network (NN) and Support Vector Machine (SVM). A modified K-Means clustering algorithm
... Show MoreWith the proliferation of both Internet access and data traffic, recent breaches have brought into sharp focus the need for Network Intrusion Detection Systems (NIDS) to protect networks from more complex cyberattacks. To differentiate between normal network processes and possible attacks, Intrusion Detection Systems (IDS) often employ pattern recognition and data mining techniques. Network and host system intrusions, assaults, and policy violations can be automatically detected and classified by an Intrusion Detection System (IDS). Using Python Scikit-Learn the results of this study show that Machine Learning (ML) techniques like Decision Tree (DT), Naïve Bayes (NB), and K-Nearest Neighbor (KNN) can enhance the effectiveness of an Intrusi
... Show MoreAn evaluation for the performance of model pile embedded in expansive soil was investigated. An extensive testing program was planned to achieve the purpose of this research. Therefore, special manufactured system was prepared for studying the behavior of model pile having different length to diameter ratios (L/D). Two types of piles were used in this research, straight shaft and under reamed piles. The effect of model pile type, L/D ratio and number of wetting drying cycles were studied. It is observed that significant reductions in pile movement when under reamed piles were considered. A proposed design charts was presented for straight shaft and under reamed piles to estimate the length of both types of piles that is requi
... Show MoreHeart disease is a significant and impactful health condition that ranks as the leading cause of death in many countries. In order to aid physicians in diagnosing cardiovascular diseases, clinical datasets are available for reference. However, with the rise of big data and medical datasets, it has become increasingly challenging for medical practitioners to accurately predict heart disease due to the abundance of unrelated and redundant features that hinder computational complexity and accuracy. As such, this study aims to identify the most discriminative features within high-dimensional datasets while minimizing complexity and improving accuracy through an Extra Tree feature selection based technique. The work study assesses the efficac
... Show MoreThe extracting of personal sprite from the whole image faced many problems in separating the sprite edge from the unneeded parts, some image software try to automate this process, but usually they couldn't find the edge or have false result. In this paper, the authors have made an enhancement on the use of Canny edge detection to locate the sprite from the whole image by adding some enhancement steps by using MATLAB. Moreover, remove all the non-relevant information from the image by selecting only the sprite and place it in a transparent background. The results of comparing the Canny edge detection with the proposed method shows improvement in the edge detection.