A substantial portion of today’s multimedia data exists in the form of unstructured text. However, the unstructured nature of text poses a significant task in meeting users’ information requirements. Text classification (TC) has been extensively employed in text mining to facilitate multimedia data processing. However, accurately categorizing texts becomes challenging due to the increasing presence of non-informative features within the corpus. Several reviews on TC, encompassing various feature selection (FS) approaches to eliminate non-informative features, have been previously published. However, these reviews do not adequately cover the recently explored approaches to TC problem-solving utilizing FS, such as optimization techniques. This study comprehensively analyzes different FS approaches based on optimization algorithms for TC. We begin by introducing the primary phases involved in implementing TC. Subsequently, we explore a wide range of FS approaches for categorizing text documents and attempt to organize the existing works into four fundamental approaches: filter, wrapper, hybrid, and embedded. Furthermore, we review four optimization algorithms utilized in solving text FS problems: swarm intelligence-based, evolutionary-based, physics-based, and human behavior-related algorithms. We discuss the advantages and disadvantages of state-of-the-art studies that employ optimization algorithms for text FS methods. Additionally, we consider several aspects of each proposed method and thoroughly discuss the challenges associated with datasets, FS approaches, optimization algorithms, machine learning classifiers, and evaluation criteria employed to assess new and existing techniques. Finally, by identifying research gaps and proposing future directions, our review provides valuable guidance to researchers in developing and situating further studies within the current body of literature.
Support vector machine (SVM) is a popular supervised learning algorithm based on margin maximization. It has a high training cost and does not scale well to a large number of data points. We propose a multiresolution algorithm MRH-SVM that trains SVM on a hierarchical data aggregation structure, which also serves as a common data input to other learning algorithms. The proposed algorithm learns SVM models using high-level data aggregates and only visits data aggregates at more detailed levels where support vectors reside. In addition to performance improvements, the algorithm has advantages such as the ability to handle data streams and datasets with imbalanced classes. Experimental results show significant performance improvements in compa
... Show More<span lang="EN-US">Diabetes is one of the deadliest diseases in the world that can lead to stroke, blindness, organ failure, and amputation of lower limbs. Researches state that diabetes can be controlled if it is detected at an early stage. Scientists are becoming more interested in classification algorithms in diagnosing diseases. In this study, we have analyzed the performance of five classification algorithms namely naïve Bayes, support vector machine, multi layer perceptron artificial neural network, decision tree, and random forest using diabetes dataset that contains the information of 2000 female patients. Various metrics were applied in evaluating the performance of the classifiers such as precision, area under the c
... Show MoreLinear discriminant analysis and logistic regression are the most widely used in multivariate statistical methods for analysis of data with categorical outcome variables .Both of them are appropriate for the development of linear classification models .linear discriminant analysis has been that the data of explanatory variables must be distributed multivariate normal distribution. While logistic regression no assumptions on the distribution of the explanatory data. Hence ,It is assumed that logistic regression is the more flexible and more robust method in case of violations of these assumptions.
In this paper we have been focus for the comparison between three forms for classification data belongs
... Show MoreObjective: Breast cancer is regarded as a deadly disease in women causing lots of mortalities. Early diagnosis of breast cancer with appropriate tumor biomarkers may facilitate early treatment of the disease, thus reducing the mortality rate. The purpose of the current study is to improve early diagnosis of breast by proposing a two-stage classification of breast tumor biomarkers fora sample of Iraqi women.
Methods: In this study, a two-stage classification system is proposed and tested with four machine learning classifiers. In the first stage, breast features (demographic, blood and salivary-based attributes) are classified into normal or abnormal cases, while in the second stage the abnormal breast cases are
... Show MoreAim: This study aimed to assessing orthodontic knowledge and attitude among general dentists and non-orthodontic specialists. Background: Early detection of orthodontic disorders is essentialin motivating patients to intervene prior to long term complications when the disorders are not recongised. Methods: A questionnaire was distributed amongst dentistsother than orthodontists. This questionnaire consisted of three sections. The first one aimed to collect demographic, educational level and practice type information. Further two sections consisted of closed-end questions designed to evaluateknowledge and attitude of orthodontics. Results: A total of 313 responses to the survey were submitted. No significant correlation was observed, e
... Show MoreThe map of permeability distribution in the reservoirs is considered one of the most essential steps of the geologic model building due to its governing the fluid flow through the reservoir which makes it the most influential parameter on the history matching than other parameters. For that, it is the most petrophysical properties that are tuned during the history matching. Unfortunately, the prediction of the relationship between static petrophysics (porosity) and dynamic petrophysics (permeability) from conventional wells logs has a sophisticated problem to solve by conventional statistical methods for heterogeneous formations. For that, this paper examines the ability and performance of the artificial intelligence method in perme
... Show MoreThe aim of the present study was to develop theophylline (TP) inhalable sustained delivery system by preparing solid lipid microparticles using glyceryl behenate (GB) and poloxamer 188 (PX) as a lipid carrier and a surfactant respectively. The method involves loading TP nanoparticles into the lipid using high shear homogenization – ultrasonication technique followed by lyophilization. The compositional variations and interactions were evaluated using response surface methodology, a Box – Behnken design of experiment (DOE). The DOE constructed using TP (X1), GB (X2) and PX (X3) levels as independent factors. Responses measured were the entrapment efficiency (% EE) (Y1), mass median
... Show More