Text documents are unstructured and high dimensional. Effective feature selection is required to select the most important and significant feature from the sparse feature space. Thus, this paper proposed an embedded feature selection technique based on Term Frequency-Inverse Document Frequency (TF-IDF) and Support Vector Machine-Recursive Feature Elimination (SVM-RFE) for unstructured and high dimensional text classificationhis technique has the ability to measure the feature’s importance in a high-dimensional text document. In addition, it aims to increase the efficiency of the feature selection. Hence, obtaining a promising text classification accuracy. TF-IDF act as a filter approach which measures features importance of the text documents at the first stage. SVM-RFE utilized a backward feature elimination scheme to recursively remove insignificant features from the filtered feature subsets at the second stage. This research executes sets of experiments using a text document retrieved from a benchmark repository comprising a collection of Twitter posts. Pre-processing processes are applied to extract relevant features. After that, the pre-processed features are divided into training and testing datasets. Next, feature selection is implemented on the training dataset by calculating the TF-IDF score for each feature. SVM-RFE is applied for feature ranking as the next feature selection step. Only top-rank features will be selected for text classification using the SVM classifier. Based on the experiments, it shows that the proposed technique able to achieve 98% accuracy that outperformed other existing techniques. In conclusion, the proposed technique able to select the significant features in the unstructured and high dimensional text document.
Abstract: The natural dye, Curcumin, was extracted from Curcuma longa using as a sensitizer in two types of dye sensitized solar cell (DSSC), and their characteristics were studied. The absorption spectrum of the dye solutions, as well as the wavelength of the maximum absorbance of the dye loaded TiO2 film has been studied. The X-Ray diffraction pattern of TiO2 film made with Doctor-Blading technique shown that the grain size of TiO2 was equal to be 40 nm. The electrical performances in terms of short circuit current, open circuit voltage and power conversion efficiency of cells were investigated.
With the fast progress of information technology and the computer networks, it becomes very easy to reproduce and share the geospatial data due to its digital styles. Therefore, the usage of geospatial data suffers from various problems such as data authentication, ownership proffering, and illegal copying ,etc. These problems can represent the big challenge to future uses of the geospatial data. This paper introduces a new watermarking scheme to ensure the copyright protection of the digital vector map. The main idea of proposed scheme is based on transforming the digital map to frequently domain using the Singular Value Decomposition (SVD) in order to determine suitable areas to insert the watermark data.
... Show MoreGlobally, Sustainability is very quickly becoming a fundamental requirement of the construction industry as it delivers its projects; whether buildings or infrastructures. Throughout more than two decades, many modeling schemes, evaluation tools, and rating systems have been introduced en route to realizing sustainable construction. Many of these, however, lack consensus on evaluation criteria, a robust scientific model that captures the logic behind their sustainability performance evaluation, and therefore experience discrepancies between rated results and actual performance. Moreover, very few of the evaluation tools available satisfactorily address infrastructure projects. The res
In this research the specifications of Iraqi drinking bottled water brands are investigated throughout the comparison between local brands, Saudi Arabia and the World Health Organization (WHO) for bottled water standard specifications. These specifications were also compared to that of Iraqi Tap Water standards. To reveal variations in the specifications for Iraqi bottled water, and above mentioned standards some quality control tools are conducted for more than 33% of different bottled water brands (of different origins such as spring, purified,..etc) in Iraq by investigating the selected quality parameters registered on their marketing labels. Results employing Minitab software (ver. 16) to generate X bar,
... Show MoreThe researches to discover useful ways to represent the agents and agent-based
systems are continuous. Unified Modeling Language (UML) is a visual modeling language
used for software and non software modeling systems. The aim of this paper is: using UML
class diagram to design treasury pharmaceuticals agent and explain its internal action. The
diagram explains the movement of the agent among other nodes to achieve user's requests
(external) after it takes them. The paper shows that it is easy to model the practical systems by
using agent UML when they are used in a complex environment.
The aim of this paper is to estimate a nonlinear regression function of the Export of the crude oil Saudi (in Million Barrels) as a function of the number of discovered fields.
Through studying the behavior of the data we show that its behavior was not followed a linear pattern or can put it in a known form so far there was no possibility to see a general trend resulting from such exports.
We use different nonlinear estimators to estimate a regression function, Local linear estimator, Semi-parametric as well as an artificial neural network estimator (ANN).
The results proved that the (ANN) estimator is the best nonlinear estimator am
... Show MoreClustering algorithms have recently gained attention in the related literature since
they can help current intrusion detection systems in several aspects. This paper
proposes genetic algorithm (GA) based clustering, serving to distinguish patterns
incoming from network traffic packets into normal and attack. Two GA based
clustering models for solving intrusion detection problem are introduced. The first
model coined as handles numeric features of the network packet, whereas
the second one coined as concerns all features of the network packet.
Moreover, a new mutation operator directed for binary and symbolic features is
proposed. The basic concept of proposed mutation operator depends on the most
frequent value
This manuscript presents a new approach to accurately calculating exponential integral function that arises in many applications such as contamination, groundwater flow, hydrological problems and mathematical physics. The calculation is obtained with easily computed components without any restrictive assumptions
A detailed comparison of the execution times is performed. The calculated results by the suggested approach are better and faster accuracy convergence than those calculated by other methods. Error analysis of the calculations is studied using the absolute error and high convergence is achieved. The suggested approach out-performs all previous methods used to calculate this function and this decision is
... Show Moreteen sites Baghdad are made. The sites are divided into two groups, one in Karkh and the other in Rusafa. Assessing the underground conditions can be occurred by drilling vertical holes called exploratory boring into the ground, obtaining soil (disturbed and undisturbed) samples, and testing these samples in a laboratory (civil engineering laboratory /University of Baghdad). From disturbed, the tests involved the grain size analysis and then classified the soil, Atterberg limit, chemical test (organic content, sulphate content, gypsum content and chloride content). From undisturbed samples, the test involved the consolidation test (from this test, the following parameters can be obtained: initial void ratio eo, compression index cc, swel
... Show More