Data scarcity is a major challenge when training deep learning (DL) models. DL demands a large amount of data to achieve exceptional performance. Unfortunately, many applications have small or inadequate data to train DL frameworks. Usually, manual labeling is needed to provide labeled data, which typically involves human annotators with a vast background of knowledge. This annotation process is costly, time-consuming, and error-prone. Usually, every DL framework is fed by a significant amount of labeled data to automatically learn representations. Ultimately, a larger amount of data would generate a better DL model and its performance is also application dependent. This issue is the main barrier for many applications dismissing the use of DL. Having sufficient data is the first step toward any successful and trustworthy DL application. This paper presents a holistic survey on state-of-the-art techniques to deal with training DL models to overcome three challenges including small, imbalanced datasets, and lack of generalization. This survey starts by listing the learning techniques. Next, the types of DL architectures are introduced. After that, state-of-the-art solutions to address the issue of lack of training data are listed, such as Transfer Learning (TL), Self-Supervised Learning (SSL), Generative Adversarial Networks (GANs), Model Architecture (MA), Physics-Informed Neural Network (PINN), and Deep Synthetic Minority Oversampling Technique (DeepSMOTE). Then, these solutions were followed by some related tips about data acquisition needed prior to training purposes, as well as recommendations for ensuring the trustworthiness of the training dataset. The survey ends with a list of applications that suffer from data scarcity, several alternatives are proposed in order to generate more data in each application including Electromagnetic Imaging (EMI), Civil Structural Health Monitoring, Medical imaging, Meteorology, Wireless Communications, Fluid Mechanics, Microelectromechanical system, and Cybersecurity. To the best of the authors’ knowledge, this is the first review that offers a comprehensive overview on strategies to tackle data scarcity in DL.
Abstract
The study aims to examine the relationships between cognitive absorption and E-Learning readiness in the preparatory stage. The study sample consisted of (190) students who were chosen randomly. The Researcher has developed the cognitive absorption and E-Learning readiness scales. A correlational descriptive approach was adopted. The research revealed that there is a positive statistical relationship between cognitive absorption and eLearning readiness.
The present study investigates deep eutectic solvents (DESs) as potential media for enzymatic hydrolysis. A series of ternary ammonium and phosphonium-based DESs were prepared at different molar ratios by mixing with aqueous glycerol (85%). The physicochemical properties including surface tension, conductivity, density, and viscosity were measured at a temperature range of 298.15 K – 363.15 K. The eutectic points were highly influenced by the variation of temperature. The eutectic point of the choline chloride: glycerol: water (ratio of 1: 2.55: 2.28) and methyltriphenylphosphonium bromide:glycerol:water (ratio of 1: 4.25: 3.75) is 213.4 K and 255.8 K, respectively. The stability of the lipase enzyme isolated from porcine pancreas (PPL) a
... Show MoreThe simulation is the oldest theory in art, since it appeared in the Greek aesthetic thought of the philosopher Plato, as we find in many of the thinkers and philosophers over a wide period of time to reach our world today. Our fascination with art in general and design art in particular is due to the creativity and innovations of the artist through the simulation, as well as the peculiarities in this simulation, which give objects signs and signals that may have an echo that sometimes does not exist in their physical reality.
The real representation of life and design construction, descriptions of the expression of each of them in the form of intellectual construction and the ideas of producti
... Show More The research aims to (identify the applications of pedagogy in art education), the research community included, art education for the primary stage, so the community consisted of (8) main areas in art education, either the research sample was chosen, two main areas (objectives, and content), and included the research methodology (descriptive and analytical), the researcher built the research tool represented (the validity form of the tool) and presented to a group of experts to indicate its validity as well as to measure its stability, To show the results, the researcher used the percentage, and the researcher recommended - modifying the curriculum every period of time, such as every four years, others
Multilocus haplotype analysis of candidate variants with genome wide association studies (GWAS) data may provide evidence of association with disease, even when the individual loci themselves do not. Unfortunately, when a large number of candidate variants are investigated, identifying risk haplotypes can be very difficult. To meet the challenge, a number of approaches have been put forward in recent years. However, most of them are not directly linked to the disease-penetrances of haplotypes and thus may not be efficient. To fill this gap, we propose a mixture model-based approach for detecting risk haplotypes. Under the mixture model, haplotypes are clustered directly according to their estimated d
The region-based association analysis has been proposed to capture the collective behavior of sets of variants by testing the association of each set instead of individual variants with the disease. Such an analysis typically involves a list of unphased multiple-locus genotypes with potentially sparse frequencies in cases and controls. To tackle the problem of the sparse distribution, a two-stage approach was proposed in literature: In the first stage, haplotypes are computationally inferred from genotypes, followed by a haplotype coclassification. In the second stage, the association analysis is performed on the inferred haplotype groups. If a haplotype is unevenly distributed between the case and control samples, this haplotype is labeled
... Show MoreHuman behavior is one of the topics that has captured the attention of researchers throughout the ages, and motivation is one of the manifestations of this behavior, which indicates the extent of interest in a particular topic and their unwillingness to rush towards a particular topic. The topic of motivation is one of the important topics of interest to the teacher and coach in the field of sports. The aim of this research was identifying the level of motivation for junior students, and the differences in the dimensions of motivation for junior students in squash lessons. We used the descriptive survey method, and the research sample was chosen randomly. Only male of the junior students in the College of Phys
... Show MoreFresh vegetables are an important part of a healthy diet. The consumption of raw vegetables without cooking or good washing can be a major rout of transmission to the parasitic infection. The goal of this study was to determine the intestinal parasitic contamination of fresh vegetables from vegetables sales markets in Baghdad province during the different above months of the year. A total of 303 samples of different vegetables were randomly selected from three wholesale markets distributed through different regions in Baghdad (East, West and South) and then were examined by a floatation method. The present study showed that the collected vegetables were contaminated with 12 species of intestinal parasites, and the total percentage of contam
... Show MoreHartha Formation is an overburdened horizon in the X-oilfield which generates a lot of Non-Productive Time (NPT) associated with drilling mud losses. This study has been conducted to investigate the loss events in this formation as well as to provide geological interpretations based on datasets from nine wells in this field of interest. The interpretation was based on different analyses including wireline logs, cuttings descriptions, image logs, and analog data. Seismic and coherency data were also used to formulate the geological interpretations and calibrate that with the loss events of the Hartha Fm.
The results revealed that the upper part of the Hartha Fm. was identified as an interval capable of creating potentia
... Show More