Data scarcity is a major challenge when training deep learning (DL) models. DL demands a large amount of data to achieve exceptional performance. Unfortunately, many applications have small or inadequate data to train DL frameworks. Usually, manual labeling is needed to provide labeled data, which typically involves human annotators with a vast background of knowledge. This annotation process is costly, time-consuming, and error-prone. Usually, every DL framework is fed by a significant amount of labeled data to automatically learn representations. Ultimately, a larger amount of data would generate a better DL model and its performance is also application dependent. This issue is the main barrier for many applications dismissing the use of DL. Having sufficient data is the first step toward any successful and trustworthy DL application. This paper presents a holistic survey on state-of-the-art techniques to deal with training DL models to overcome three challenges including small, imbalanced datasets, and lack of generalization. This survey starts by listing the learning techniques. Next, the types of DL architectures are introduced. After that, state-of-the-art solutions to address the issue of lack of training data are listed, such as Transfer Learning (TL), Self-Supervised Learning (SSL), Generative Adversarial Networks (GANs), Model Architecture (MA), Physics-Informed Neural Network (PINN), and Deep Synthetic Minority Oversampling Technique (DeepSMOTE). Then, these solutions were followed by some related tips about data acquisition needed prior to training purposes, as well as recommendations for ensuring the trustworthiness of the training dataset. The survey ends with a list of applications that suffer from data scarcity, several alternatives are proposed in order to generate more data in each application including Electromagnetic Imaging (EMI), Civil Structural Health Monitoring, Medical imaging, Meteorology, Wireless Communications, Fluid Mechanics, Microelectromechanical system, and Cybersecurity. To the best of the authors’ knowledge, this is the first review that offers a comprehensive overview on strategies to tackle data scarcity in DL.
Abstract
The research Compared two methods for estimating fourparametersof the compound exponential Weibull - Poisson distribution which are the maximum likelihood method and the Downhill Simplex algorithm. Depending on two data cases, the first one assumed the original data (Non-polluting), while the second one assumeddata contamination. Simulation experimentswere conducted for different sample sizes and initial values of parameters and under different levels of contamination. Downhill Simplex algorithm was found to be the best method for in the estimation of the parameters, the probability function and the reliability function of the compound distribution in cases of natural and contaminateddata.
... Show More
The aim of this study is to identify the effect of enabling the effectiveness of the work of the audit committees in private commercial banks and to identify the extent of awareness of the importance of empowerment in the work of these committees, especially as it is known that these committees, especially the inspection committees that go to private banks and from various sources including committees of the Central Bank of Iraq Committees of the Securities Commission and finally committees of the external audit offices, through an analysis of the determinants of empowerment in the performance of the most important work of the audit committees, namely: supervising the process of preparing reports, supervising the system of intern
... Show MoreThis study aims to formulate an alternative solution for Formalin for preserving fish as study specimens for long periods. The main reason for finding a solution instead of formalin is to get rid of the negative effects of this solution on those who work with it, as well as to better preserve the bodies of fish. Hence, three new solutions were proposed to replace formalin. Thus, Formalin, in turn, may enter the composition of a small part of these solutions to give better results and for long periods of keeping specimens. All solutions prepared in this study participated in being acidic as in formalin. Two solutions succeeded in compensating for the use of formalin in preserving fish
In this study, the photodegradation of Congo red dye (CR) in aqueous solution was investigated using Au-Pd/TiO2 as photocatalyst. The concentration of dye, dosage of photocatalyst, amount of H2O2, pH of the medium and temperature were examined to find the optimum values of these parameters. It has been found that 28 ppm was the best dye concentration. The optimum amount of photocatalyst was 0.09 g/75 mL of dye solution when the degradation percent was ~ 96 % after irradiation time of 12 hours, while the best amount of hydrogen peroxide was 7μl/75 mL of dye solution at degradation percent ~97 % after irradiation time of 10 hours, whereas pH 5 was the best value to carry out the reaction at the highest degradation percent. In additio
... Show MoreThe investigation of determining solutions for the Diophantine equation over the Gaussian integer ring for the specific case of is discussed. The discussion includes various preliminary results later used to build the resolvent theory of the Diophantine equation studied. Our findings show the existence of infinitely many solutions. Since the analytical method used here is based on simple algebraic properties, it can be easily generalized to study the behavior and the conditions for the existence of solutions to other Diophantine equations, allowing a deeper understanding, even when no general solution is known.
The uptake of Cd(II) ions from simulated wastewater onto olive pips was modeled using artificial neural network (ANN) which consisted of three layers. Based on 112 batch experiments, the effect of contact time (10-240 min), initial pH (2-6), initial concentration (25-250 mg/l), biosorbent dosage (0.05-2 g/100 ml), agitation speed (0-250 rpm) and temperature (20-60ºC) were studied. The maximum uptake (=92 %) of Cd(II) was achieved at optimum parameters of 60 min, 6, 50 mg/l, 1 g/100 ml, 250 rpm and 25ºC respectively.
Tangent sigmoid and linear transfer functions of ANN for hidden and output layers respectively with 7 neurons were sufficient to present good predictions for cadmium removal efficiency with coefficient of correlatio
... Show MoreCoagulation - flocculation are basic chemical engineering method in the treatment of metal-bearing industrial wastewater because it removes colloidal particles, some soluble compounds and very fine solid suspensions initially present in the wastewater by destabilization and formation of flocs. This research was conducted to study the feasibility of using natural coagulant such as okra and mallow and chemical coagulant such as alum for removing Cu and increase the removal efficiency and reduce the turbidity of treated water. Fourier transform Infrared (FTIR) was carried out for okra and mallow before and after coagulant to determine their type of functional groups. Carbonyl and hydroxyl functional groups on the surface of
... Show MoreIn this paper we prove the boundedness of the solutions and their derivatives of the second order ordinary differential equation x ?+f(x) x ?+g(x)=u(t), under certain conditions on f,g and u. Our results are generalization of those given in [1].
In this study, the photodegradation of Congo red dye (CR) in aqueous solution was investigated using Au-Pd/TiO2 as photocatalyst. The concentration of dye, dosage of photocatalyst, amount of H2O2, pH of the medium and temperature were examined to find the optimum values of these parameters. It has been found that 28 ppm was the best dye concentration. The optimum amount of photocatalyst was 0.09 g/75 mL of dye solution when the degradation percent was ~ 96 % after irradiation time of 12 hours, while the best amount of hydrogen peroxide was 7μl/75 mL of dye solution at degradation percent ~97 % after irradiation time of 10 hours, whereas pH 5 was the best value to carry out the reaction at the highest deg
... Show MoreMolecular dynamics (MD) simulations were carried out in order to investigate the binding mode of axillaridine-A at the active site of human acetylcholinesterase (AChE) enzyme. 2.0 nanosecond of MD simulations was made for the protein and the complex to dynamically explore the active site and the behavior of the ligand at the peripheral AChE binding site. These calculations for the enzyme alone showed that the active site of AChE is located at the bottom of a deep and narrow cavity whose surface is lined with rings of aromatic residues and Tyr72 is almost perpendicular to the Trp286 ring and forms a stable - interaction. The size of the active site of the complex decreases with time due to increase the interaction. Axillaridine-A forms
... Show More