A substantial portion of today’s multimedia data exists in the form of unstructured text. However, the unstructured nature of text poses a significant task in meeting users’ information requirements. Text classification (TC) has been extensively employed in text mining to facilitate multimedia data processing. However, accurately categorizing texts becomes challenging due to the increasing presence of non-informative features within the corpus. Several reviews on TC, encompassing various feature selection (FS) approaches to eliminate non-informative features, have been previously published. However, these reviews do not adequately cover the recently explored approaches to TC problem-solving utilizing FS, such as optimization techniques. This study comprehensively analyzes different FS approaches based on optimization algorithms for TC. We begin by introducing the primary phases involved in implementing TC. Subsequently, we explore a wide range of FS approaches for categorizing text documents and attempt to organize the existing works into four fundamental approaches: filter, wrapper, hybrid, and embedded. Furthermore, we review four optimization algorithms utilized in solving text FS problems: swarm intelligence-based, evolutionary-based, physics-based, and human behavior-related algorithms. We discuss the advantages and disadvantages of state-of-the-art studies that employ optimization algorithms for text FS methods. Additionally, we consider several aspects of each proposed method and thoroughly discuss the challenges associated with datasets, FS approaches, optimization algorithms, machine learning classifiers, and evaluation criteria employed to assess new and existing techniques. Finally, by identifying research gaps and proposing future directions, our review provides valuable guidance to researchers in developing and situating further studies within the current body of literature.
<span lang="EN-US">Diabetes is one of the deadliest diseases in the world that can lead to stroke, blindness, organ failure, and amputation of lower limbs. Researches state that diabetes can be controlled if it is detected at an early stage. Scientists are becoming more interested in classification algorithms in diagnosing diseases. In this study, we have analyzed the performance of five classification algorithms namely naïve Bayes, support vector machine, multi layer perceptron artificial neural network, decision tree, and random forest using diabetes dataset that contains the information of 2000 female patients. Various metrics were applied in evaluating the performance of the classifiers such as precision, area under the c
... Show MoreLinear discriminant analysis and logistic regression are the most widely used in multivariate statistical methods for analysis of data with categorical outcome variables .Both of them are appropriate for the development of linear classification models .linear discriminant analysis has been that the data of explanatory variables must be distributed multivariate normal distribution. While logistic regression no assumptions on the distribution of the explanatory data. Hence ,It is assumed that logistic regression is the more flexible and more robust method in case of violations of these assumptions.
In this paper we have been focus for the comparison between three forms for classification data belongs
... Show MoreThis Research deals with estimation the reliability function for two-parameters Exponential distribution, using different estimation methods ; Maximum likelihood, Median-First Order Statistics, Ridge Regression, Modified Thompson-Type Shrinkage and Single Stage Shrinkage methods. Comparisons among the estimators were made using Monte Carlo Simulation based on statistical indicter mean squared error (MSE) conclude that the shrinkage method perform better than the other methods
As an important resource, entanglement light source has been used in developing quantum information technologies, such as quantum key distribution(QKD). There are few experiments implementing entanglement-based deterministic QKD protocols since the security of existing protocols may be compromised in lossy channels. In this work, we report on a loss-tolerant deterministic QKD experiment which follows a modified “Ping-Pong”(PP) protocol. The experiment results demonstrate for the first time that a secure deterministic QKD session can be fulfilled in a channel with an optical loss of 9 dB, based on a telecom-band entangled photon source. This exhibits a conceivable prospect of ultilizing entanglement light source in real-life fiber-based
... Show MoreIn this article, Convolution Neural Network (CNN) is used to detect damage and no damage images form satellite imagery using different classifiers. These classifiers are well-known models that are used with CNN to detect and classify images using a specific dataset. The dataset used belongs to the Huston hurricane that caused several damages in the nearby areas. In addition, a transfer learning property is used to store the knowledge (weights) and reuse it in the next task. Moreover, each applied classifier is used to detect the images from the dataset after it is split into training, testing and validation. Keras library is used to apply the CNN algorithm with each selected classifier to detect the images. Furthermore, the performa
... Show MoreAdministrative procedures in various organizations produce numerous crucial records and data. These
records and data are also used in other processes like customer relationship management and accounting
operations.It is incredibly challenging to use and extract valuable and meaningful information from these data
and records because they are frequently enormous and continuously growing in size and complexity.Data
mining is the act of sorting through large data sets to find patterns and relationships that might aid in the data
analysis process of resolving business issues. Using data mining techniques, enterprises can forecast future
trends and make better business decisions.The Apriori algorithm has bee