Text documents are unstructured and high dimensional. Effective feature selection is required to select the most important and significant feature from the sparse feature space. Thus, this paper proposed an embedded feature selection technique based on Term Frequency-Inverse Document Frequency (TF-IDF) and Support Vector Machine-Recursive Feature Elimination (SVM-RFE) for unstructured and high dimensional text classificationhis technique has the ability to measure the feature’s importance in a high-dimensional text document. In addition, it aims to increase the efficiency of the feature selection. Hence, obtaining a promising text classification accuracy. TF-IDF act as a filter approach which measures features importance of the text documents at the first stage. SVM-RFE utilized a backward feature elimination scheme to recursively remove insignificant features from the filtered feature subsets at the second stage. This research executes sets of experiments using a text document retrieved from a benchmark repository comprising a collection of Twitter posts. Pre-processing processes are applied to extract relevant features. After that, the pre-processed features are divided into training and testing datasets. Next, feature selection is implemented on the training dataset by calculating the TF-IDF score for each feature. SVM-RFE is applied for feature ranking as the next feature selection step. Only top-rank features will be selected for text classification using the SVM classifier. Based on the experiments, it shows that the proposed technique able to achieve 98% accuracy that outperformed other existing techniques. In conclusion, the proposed technique able to select the significant features in the unstructured and high dimensional text document.
ABSTRACT
The study aimed to evaluate the information label of some local pickle products and estimate sodium benzoate therein. 85 samples of locally made pickles were collected from Baghdad city markets and randomly from five different areas in Baghdad it included (Al-Shula, Al-Bayaa, Al-Nahrawan, Al-Taji, and Abu Ghraib), which were divided into groups P1, P2, P3, P4 and P5, respectively, according to those areas, samples information label was scanned and compared with the Iraqi standard specification for the information card of packaged and canned food IQS 230, the results showed that 25.9% of the samples were devoid of the indication card informa
... Show MoreBackground: A carefully planned clinical medical education is critical for the provision of supportive clinical educational environment. The latter will ensure effective teaching, active learning and good attitudes and performance at the bedside. The aim of this study was to evaluate clinical learning environment at AL-Diwaniyah Teaching Hospital. Materials and Methods: A descripitive cross-sectional study involved resident doctors from Internal Medicine and Surgery departments who had six months or more residency training in the respective departments. Data were collected using the Postgraduate Hospital Educational Environment Measure. Data where analyzed using the Statistical Package for Social Sciences version 21.0 and presented us
... Show MoreSince oil is the primary source of vanadium in the environment and crude oil has a correspondingly high percentage of vanadium. Vanadium is crucial as a sign of oil contamination. Twenty soil samples were taken from various locations surrounding the East Baghdad oil field in Iraq during February 2022 and then analyzed to determine the effects of industrialization along with urbanization-related pollutants. The soil samples were analyzed using spectrophotometry analysis. In soil samples taken from the research area, vanadium concentrations range from (0.26 to 1.2 ppm). The contamination (CF), geoaccumulation (Igeo) and Enrichment factors (EF) indicated that all the soil samples are uncontaminated.
The acute and sub chronic toxicity effects of 25.16 nm intraperitoneally- injected zinc oxide nanoparticles (ZnO NPs) were evaluated. Albino male mice were exposed to three different doses (25, 50 ,and 100 mg/kg ), depending on the value of calculated LD50, for 2 and 4 weeks . Considerable changes in organ indexes were shown with a good relevance to the illustrated histopathological effects which ranged from multiple hemorrhagic foci in liver, mild swelling and dilatation in kidney tubules, thickening of intestinal villi, moderate interstitial pneumonia, especially with the high dose , and sever necrosis of seminiferous tubules in testes of all studied groups. Significant changes in both hematological and biochemical parameters as well a
... Show MoreSafe drinking water is essential for the present and future generations' health. This study aims to assess drinking water quality in Baghdad's Al-Rusafa neighborhood. Water samples were taken from 32 neighborhoods on this side. The quality of the examined potable water samples differed depending on the water source. This investigation's pH, chlorine, EC, TDS, TSS, Cd, and Pb levels were below acceptable ranges. TDS levels in Al-Mada'in are more significant than acceptable (>600ppm) water levels. Bacteria have polluted six communities (Shigella, Salmonella, Escherichia coli, and Klebsiella). Bacterial quality of drinking water and gram-negative bacteria resistant to chlorine in Baghdad's municipal water supply. Regarding pH, the w
... Show MoreAt present, the ability to promote national economy by adjusting to political, economic, and technological variables is one of the largest challenges faced by organization productivity. This challenge prompts changes in structure and line productivity, given that cash has not been invested. Thus, the management searches for investment opportunities that have achieved the optimum value of the annual increases in total output value of the production line workers in the laboratory. Therefore, the application of dynamic programming model is adopted in this study by addressing the division of investment expenditures to cope with market-dumping policy and to strive non-stop production at work.
The Indoor Environmental Quality (IEQ) describes an indoor space condition that the wellbeing and comfortability are provided for the users. Many researchers have highlighted the importance of adopting IEQ criteria, although they are not yet well defined in the Kurdistan region. However, environmental quality is not necessary for the contemporary buildings of the Kurdistan Region, and there is no measurement tool in the Region. This research aims to develop an IEQ assessment tool for the Kurdistan region using Mixed method methodology, both qualitative and quantitative. Therefore, a Delphi Technique was used as a method initially developed as systematic, interactive forecasting on a panel of experts. Thirty-five Delphi C
... Show MoreThis paper proposes a new strategy to enhance the performance and accuracy of the Spiral dynamic algorithm (SDA) for use in solving real-world problems by hybridizing the SDA with the Bacterial Foraging optimization algorithm (BFA). The dynamic step size of SDA makes it a useful exploitation approach. However, it has limited exploration throughout the diversification phase, which results in getting trapped at local optima. The optimal initialization position for the SDA algorithm has been determined with the help of the chemotactic strategy of the BFA optimization algorithm, which has been utilized to improve the exploration approach of the SDA. The proposed Hybrid Adaptive Spiral Dynamic Bacterial Foraging (HASDBF)
... Show MoreIn the task of detecting intrinsic plagiarism, the cases where reference corpus is absent are to be dealt with. This task is entirely based on inconsistencies within a given document. Detection of internal plagiarism has been considered as a classification problem. It can be estimated through taking into consideration self-based information from a given document.
The core contribution of the work proposed in this paper is associated with the document representation. Wherein, the document, also, the disjoint segments generated from it, have been represented as weight vectors demonstrating their main content. Where, for each element in these vectors, its average weight has been considered instead of its frequency.
Th
... Show More