A substantial portion of today’s multimedia data exists in the form of unstructured text. However, the unstructured nature of text poses a significant task in meeting users’ information requirements. Text classification (TC) has been extensively employed in text mining to facilitate multimedia data processing. However, accurately categorizing texts becomes challenging due to the increasing presence of non-informative features within the corpus. Several reviews on TC, encompassing various feature selection (FS) approaches to eliminate non-informative features, have been previously published. However, these reviews do not adequately cover the recently explored approaches to TC problem-solving utilizing FS, such as optimization techniques. This study comprehensively analyzes different FS approaches based on optimization algorithms for TC. We begin by introducing the primary phases involved in implementing TC. Subsequently, we explore a wide range of FS approaches for categorizing text documents and attempt to organize the existing works into four fundamental approaches: filter, wrapper, hybrid, and embedded. Furthermore, we review four optimization algorithms utilized in solving text FS problems: swarm intelligence-based, evolutionary-based, physics-based, and human behavior-related algorithms. We discuss the advantages and disadvantages of state-of-the-art studies that employ optimization algorithms for text FS methods. Additionally, we consider several aspects of each proposed method and thoroughly discuss the challenges associated with datasets, FS approaches, optimization algorithms, machine learning classifiers, and evaluation criteria employed to assess new and existing techniques. Finally, by identifying research gaps and proposing future directions, our review provides valuable guidance to researchers in developing and situating further studies within the current body of literature.
Canonical correlation analysis is one of the common methods for analyzing data and know the relationship between two sets of variables under study, as it depends on the process of analyzing the variance matrix or the correlation matrix. Researchers resort to the use of many methods to estimate canonical correlation (CC); some are biased for outliers, and others are resistant to those values; in addition, there are standards that check the efficiency of estimation methods.
In our research, we dealt with robust estimation methods that depend on the correlation matrix in the analysis process to obtain a robust canonical correlation coefficient, which is the method of Biwe
... Show MoreThis study aims to determine the prevalence of Entamoeba histolytica, Entamoeba dispar and
Entamoeba moshkovskii by three methods of diagnosis (microscopic examination, cultivation and PCR) that
were compared to obtain an accurate diagnosis of Entamoeba spp. during amoebiasis. Total (n=150) stool
samples related to patients were (n = 100) and healthy controls (n= 50). Clinically diagnosed stool samples
(n=100) were collected from patients attending the consultant clinics of different hospitals in Basrah during
the period from January 2018 to January 2019. The results showed that 60% of collected samples were
positive in a direct microscopic examination. All samples were cultivated on different media; the Bra
Exponential distribution is one of most common distributions in studies and scientific researches with wide application in the fields of reliability, engineering and in analyzing survival function therefore the researcher has carried on extended studies in the characteristics of this distribution.
In this research, estimation of survival function for truncated exponential distribution in the maximum likelihood methods and Bayes first and second method, least square method and Jackknife dependent in the first place on the maximum likelihood method, then on Bayes first method then comparing then using simulation, thus to accomplish this task, different size samples have been adopted by the searcher us
... Show MoreIn the petroleum industry, multiphase flow dynamics within the tubing string have gained significant attention due to associated challenges. Accurately predicting pressure drops and wellbore pressures is crucial for the effective modeling of vertical lift performance (VLP). This study focuses on predicting the multiphase flow behavior in four wells located in the Faihaa oil field in southern Iraq, utilizing PIPESIM software. The process of selecting the most appropriate multiphase correlation was performed by utilizing production test data to construct a comprehensive survey data catalog. Subsequently, the results were compared with the correlations available within the PIPESIM software. The outcomes reveal that the Hagedorn and Brown (H
... Show MoreThis study included the Zakhikhah area in the Al- Anbar desert, which it bounded on the north, east, and west by the Euphrates River and on the south by the Ramadi-Qaim road. Several exploratory field trips were taken to the study area. During this time, a semi-detailed area survey was carried out based on satellite imagery captured by American Land sat-7, topographic maps, and natural vegetation variance. All necessary field tools, including a digital camera and GPS device, were brought to determine the soil type and collect plant samples. All of these visits are planned to cover the entire state of Zakhikhah. All vegetation cover observations, identifying sampling sites and attempting to inventory and collect medicinal plants in t
... Show MoreIntroduction: Since the hallmark of gestational trophoblastic disease is trophoblastic proliferation, Ki67 is regarded as the best marker in studying hydatidiform mole.This study was conducted to evaluate the role of this proliferative marker in distinguishing among hydropic abortion, partial and complete hydatidiform mole. Materials and methods: This is a cross sectional study involving the application of Ki67 on a total of 90 histological samples of curetting materials from molar (partial and complete mole) and non molar hydropic abortion belong to Iraqi females, so three study groups were created. Immunohistochemical expression in villous cytotrophoblasts, syncytiotrophoblasts and stromal cells were recorded separately by three i
... Show MoreBig data analysis has important applications in many areas such as sensor networks and connected healthcare. High volume and velocity of big data bring many challenges to data analysis. One possible solution is to summarize the data and provides a manageable data structure to hold a scalable summarization of data for efficient and effective analysis. This research extends our previous work on developing an effective technique to create, organize, access, and maintain summarization of big data and develops algorithms for Bayes classification and entropy discretization of large data sets using the multi-resolution data summarization structure. Bayes classification and data discretization play essential roles in many learning algorithms such a
... Show MoreThis study examines the vibrations produced by hydropower operations to improve embankment dam safety. This study consists of two parts: In the first part, ANSYS-CFX was used to generate a three-dimensional (3-D) finite volume (FV) model to simulate a vertical Francis turbine unit in the Mosul hydropower plant. The pressure pattern result of the turbine model was transformed into the dam body to show how the turbine unit's operation affects the dam's stability. The upstream reservoir conditions, various flow rates, and fully open inlet gates were considered. In the second part of this study, a 3-D FE Mosul dam model was simulated using an ANSYS program. The operational turbine model's water pressure pattern is conveyed t
... Show MoreAstronomy image is regarded main source of information to discover outer space, therefore to know the basic contain for galaxy (Milky way), it was classified using Variable Precision Rough Sets technique to determine the different region within galaxy according different color in the image. From classified image we can determined the percentage for each class and then what is the percentage mean. In this technique a good classified image result and faster time required to done the classification process.
The main work of this paper is devoted to a new technique of constructing approximated solutions for linear delay differential equations using the basis functions power series functions with the aid of Weighted residual methods (collocations method, Galerkin’s method and least square method).