Text documents are unstructured and high dimensional. Effective feature selection is required to select the most important and significant feature from the sparse feature space. Thus, this paper proposed an embedded feature selection technique based on Term Frequency-Inverse Document Frequency (TF-IDF) and Support Vector Machine-Recursive Feature Elimination (SVM-RFE) for unstructured and high dimensional text classificationhis technique has the ability to measure the feature’s importance in a high-dimensional text document. In addition, it aims to increase the efficiency of the feature selection. Hence, obtaining a promising text classification accuracy. TF-IDF act as a filter approach which measures features importance of the text documents at the first stage. SVM-RFE utilized a backward feature elimination scheme to recursively remove insignificant features from the filtered feature subsets at the second stage. This research executes sets of experiments using a text document retrieved from a benchmark repository comprising a collection of Twitter posts. Pre-processing processes are applied to extract relevant features. After that, the pre-processed features are divided into training and testing datasets. Next, feature selection is implemented on the training dataset by calculating the TF-IDF score for each feature. SVM-RFE is applied for feature ranking as the next feature selection step. Only top-rank features will be selected for text classification using the SVM classifier. Based on the experiments, it shows that the proposed technique able to achieve 98% accuracy that outperformed other existing techniques. In conclusion, the proposed technique able to select the significant features in the unstructured and high dimensional text document.
Abstract:
Al-Marba'aniyah, which is a long cold wave, was defined by ancient
Iraqis. It represents the coldest days in Iraq. In this research paper, a new
scale was put to define it. It shows that the period between the minimum
temperature degree recoded in December and the minimum temperature
degree recorded in January is considered to be the period of Al-Marba'aniyah.
The research concluded that Al-Marba'aniyah is unsteady and it changes in
the days of its occurrence. It was also concluded that the dates of the
beginning and the end of Al-Marba'aniyah are unsteady, too. Moreover, it was
found out that each of the Siberian high, European high, and finally the
subtropical high are the responsible systems for
The bit record is a part from the daily drilling report which is contain information about the type and the number of the bit that is used to drill the well, also contain data about the used weight on bit WOB ,revolution per minute RPM , rate of penetration ROP, pump pressure ,footage drilled and bit dull grade. Generally we can say that the bit record is a rich brief about the bit life in the hole. The main purpose of this research is to select the suitable bit to drill the next oil wells because the right bit selection avoid us more than one problems, on the other hand, the wrong bit selection cause more than one problem. Many methods are related to bit selection, this research is familiar with four of thos
... Show MorePhishing is an internet crime achieved by imitating a legitimate website of a host in order to steal confidential information. Many researchers have developed phishing classification models that are limited in real-time and computational efficiency. This paper presents an ensemble learning model composed of DTree and NBayes, by STACKING method, with DTree as base learner. The aim is to combine the advantages of simplicity and effectiveness of DTree with the lower complexity time of NBayes. The models were integrated and appraised independently for data training and the probabilities of each class were averaged by their accuracy on the trained data through testing process. The present results of the empirical study on phishing websi
... Show MoreIn this paper, a new method of selection variables is presented to select some essential variables from large datasets. The new model is a modified version of the Elastic Net model. The modified Elastic Net variable selection model has been summarized in an algorithm. It is applied for Leukemia dataset that has 3051 variables (genes) and 72 samples. In reality, working with this kind of dataset is not accessible due to its large size. The modified model is compared to some standard variable selection methods. Perfect classification is achieved by applying the modified Elastic Net model because it has the best performance. All the calculations that have been done for this paper are in
The study aimed to reach the best rating for the views and variables in the totals characterized by qualities and characteristics common within each group and distinguish them from aggregates other for the purpose of distinguishing between Iraqi provinces which suffer from deprivation, for the purpose of identifying the status of those provinces in the early allowing interested parties and regulators to intervene to take appropriate corrective action in a timely manner. Style has been used cluster analysis Cluster analysis to reach the best rating to those totals from the provinces that suffer from problems, where the provinces were classified, based on the variables (Edu
... Show MoreThe kaizen is considered as one of the most important modern techniques which has been adopted by various economics entities especially manufacturing firms and its beginnings return to the middle of the earlier century that has been used by companies like Toshiba, Matsushita Electric, and Toyota. Which realized that these modern techniques would make a total change in the competitive environment and started qualifying and its staff in such away that enables them to go along with this unique environment. The continuous improvement (Kaizen) depends on the small continuous improvements in the product and the production operations during the production stage. Consequently, the research problem is represented in the improperly of the budg
... Show MoreMobile Ad hoc Networks (MANETs) is a wireless technology that plays an important role in several modern applications which include military, civil, health and real-time applications. Providing Quality of Service (QoS) for this application with network characterized by node mobility, infrastructure-less, limitation resource is a critical issue and takes greater attention. However, transport protocols effected influential on the performance of MANET application. This study provides an analysis and evaluation of the performance for TFRC, UDP and TCP transport protocols in MANET environment. In order to achieve high accuracy results, the three transport protocols are implemented and simulated with four different network topology which are 5, 10
... Show MoreArtificial roughness applied to a Solar Air Heater (SAH) absorber plate is a popular technique for increasing its total thermal efficiency (ηt−th). In this paper, the influence of geometrical parameters of V-down ribs attached below the corrugated absorbing plate of a SAH on the ηt−th was examined. The impacts of key roughness parameters, including relative pitch p/e (6–12), relative height e/D (0.019–0.043), angles of attack α (30–75°), and Re (1000–20,000), were examined under real weather conditions. The SAH ηt−th roughened by V-down ribs was predicted using an in-house developed conjugate heat-transfer numerical model. The maximum SAH ηt−th was shown to be 78.8% as predicted under the steady-state condition
... Show MoreThe OpenStreetMap (OSM) project aims to establish a free geospatial database for the entire world which is editable by international volunteers. The OSM database contains a wide range of different types of geographical data and characteristics, including highways, buildings, and land use regions. The varying scientific backgrounds of the volunteers can affect the quality of the spatial data that is produced and shared on the internet as an OSM dataset. This study aims to compare the completeness and attribute accuracy of the OSM road networks with the data supplied by a digitizing process for areas in the Baghdad and Thi-Qar governorates. The analyses are primarily based on calculating the portion of the commission (extr
... Show More