Data scarcity is a major challenge when training deep learning (DL) models. DL demands a large amount of data to achieve exceptional performance. Unfortunately, many applications have small or inadequate data to train DL frameworks. Usually, manual labeling is needed to provide labeled data, which typically involves human annotators with a vast background of knowledge. This annotation process is costly, time-consuming, and error-prone. Usually, every DL framework is fed by a significant amount of labeled data to automatically learn representations. Ultimately, a larger amount of data would generate a better DL model and its performance is also application dependent. This issue is the main barrier for many applications dismissing the use of DL. Having sufficient data is the first step toward any successful and trustworthy DL application. This paper presents a holistic survey on state-of-the-art techniques to deal with training DL models to overcome three challenges including small, imbalanced datasets, and lack of generalization. This survey starts by listing the learning techniques. Next, the types of DL architectures are introduced. After that, state-of-the-art solutions to address the issue of lack of training data are listed, such as Transfer Learning (TL), Self-Supervised Learning (SSL), Generative Adversarial Networks (GANs), Model Architecture (MA), Physics-Informed Neural Network (PINN), and Deep Synthetic Minority Oversampling Technique (DeepSMOTE). Then, these solutions were followed by some related tips about data acquisition needed prior to training purposes, as well as recommendations for ensuring the trustworthiness of the training dataset. The survey ends with a list of applications that suffer from data scarcity, several alternatives are proposed in order to generate more data in each application including Electromagnetic Imaging (EMI), Civil Structural Health Monitoring, Medical imaging, Meteorology, Wireless Communications, Fluid Mechanics, Microelectromechanical system, and Cybersecurity. To the best of the authors’ knowledge, this is the first review that offers a comprehensive overview on strategies to tackle data scarcity in DL.
mixtures of cyclohexane + n-decane and cyclohexane + 1-pentanol have been measured at 298.15, 308.15, 318.15, and 328.15 K over the whole mole fraction range. From these results, excess molar volumes, VE , have been calculated and fitted to the Flory equations. The VE values are negative and positive over the whole mole fraction range and at all temperatures. The excess refractive indices nE and excess viscosities ?E have been calculated from experimental refractive indices and viscosity measurements at different temperature and fitted to the mixing rules equations and Heric – Coursey equation respectively to predict theoretical refractive indices, we found good agreement between them for binary mixtures in this study. The variation of th
... Show More
Background Bilateral cleft lip deformity is much more difficult to correct than unilateral cleft lip deformity. The complexity of the deformity and the sensitive relationships between the arrangement of the muscles and the characteristics of the external lip necessitate a comprehensive preoperative plan for management. The purpose of this study was to evaluate the repair of bilateral cleft lip using the Byrd modification of the traditional Millard and Manchester methods. A key component of this repair technique is focused on reconstruction of the central tubercle.
Methods Fourteen patients with mean age of 5.7 months presented with bilateral cleft lip deformity and were operated on using a mod
... Show MoreThe aim of the research is to use the data content analysis technique (DEA) in evaluating the efficiency of the performance of the eight branches of the General Tax Authority, located in Baghdad, represented by Karrada, Karkh parties, Karkh Center, Dora, Bayaa, Kadhimiya, New Baghdad, Rusafa according to the determination of the inputs represented by the number of non-accountable taxpayers and according to the categories professions and commercial business, deduction, transfer of property ownership, real estate and tenders, In addition to determining the outputs according to the checklist that contains nine dimensions to assess the efficiency of the performance of the investigated branches by investing their available resources T
... Show MoreThese days, it is crucial to discern between different types of human behavior, and artificial intelligence techniques play a big part in that. The characteristics of the feedforward artificial neural network (FANN) algorithm and the genetic algorithm have been combined to create an important working mechanism that aids in this field. The proposed system can be used for essential tasks in life, such as analysis, automation, control, recognition, and other tasks. Crossover and mutation are the two primary mechanisms used by the genetic algorithm in the proposed system to replace the back propagation process in ANN. While the feedforward artificial neural network technique is focused on input processing, this should be based on the proce
... Show MoreQuantitative real-time Polymerase Chain Reaction (RT-qPCR) has become a valuable molecular technique in biomedical research. The selection of suitable endogenous reference genes is necessary for normalization of target gene expression in RT-qPCR experiments. The aim of this study was to determine the suitability of each 18S rRNA and ACTB as internal control genes for normalization of RT-qPCR data in some human cell lines transfected with small interfering RNA (siRNA). Four cancer cell lines including MCF-7, T47D, MDA-MB-231 and Hela cells along with HEK293 representing an embryonic cell line were depleted of E2F6 using siRNA specific for E2F6 compared to negative control cells, which were transfected with siRNA not specific for any gene. Us
... Show MoreRecommender Systems are tools to understand the huge amount of data available in the internet world. Collaborative filtering (CF) is one of the most knowledge discovery methods used positively in recommendation system. Memory collaborative filtering emphasizes on using facts about present users to predict new things for the target user. Similarity measures are the core operations in collaborative filtering and the prediction accuracy is mostly dependent on similarity calculations. In this study, a combination of weighted parameters and traditional similarity measures are conducted to calculate relationship among users over Movie Lens data set rating matrix. The advantages and disadvantages of each measure are spotted. From the study, a n
... Show MoreAdministrative procedures in various organizations produce numerous crucial records and data. These
records and data are also used in other processes like customer relationship management and accounting
operations.It is incredibly challenging to use and extract valuable and meaningful information from these data
and records because they are frequently enormous and continuously growing in size and complexity.Data
mining is the act of sorting through large data sets to find patterns and relationships that might aid in the data
analysis process of resolving business issues. Using data mining techniques, enterprises can forecast future
trends and make better business decisions.The Apriori algorithm has bee
paid recent developments in the information and communications technology and the accompanying developments in the global market to pay particular accounting information users to demand more sophistication in terms of corporate financial reporting systems, which led to the emergence of a new type of reporting (financial reporting in real time). where is the information and communications technology mainstay Nations for the development and progress, thanks to the development of technology that have made the transmission of information easily conducted and high speed to all who need it, communication is instantaneous and the flow of information via the internet dramatically exceeded the border temporal and spatial anywhere in the w
... Show More