Data scarcity is a major challenge when training deep learning (DL) models. DL demands a large amount of data to achieve exceptional performance. Unfortunately, many applications have small or inadequate data to train DL frameworks. Usually, manual labeling is needed to provide labeled data, which typically involves human annotators with a vast background of knowledge. This annotation process is costly, time-consuming, and error-prone. Usually, every DL framework is fed by a significant amount of labeled data to automatically learn representations. Ultimately, a larger amount of data would generate a better DL model and its performance is also application dependent. This issue is the main barrier for many applications dismissing the use of DL. Having sufficient data is the first step toward any successful and trustworthy DL application. This paper presents a holistic survey on state-of-the-art techniques to deal with training DL models to overcome three challenges including small, imbalanced datasets, and lack of generalization. This survey starts by listing the learning techniques. Next, the types of DL architectures are introduced. After that, state-of-the-art solutions to address the issue of lack of training data are listed, such as Transfer Learning (TL), Self-Supervised Learning (SSL), Generative Adversarial Networks (GANs), Model Architecture (MA), Physics-Informed Neural Network (PINN), and Deep Synthetic Minority Oversampling Technique (DeepSMOTE). Then, these solutions were followed by some related tips about data acquisition needed prior to training purposes, as well as recommendations for ensuring the trustworthiness of the training dataset. The survey ends with a list of applications that suffer from data scarcity, several alternatives are proposed in order to generate more data in each application including Electromagnetic Imaging (EMI), Civil Structural Health Monitoring, Medical imaging, Meteorology, Wireless Communications, Fluid Mechanics, Microelectromechanical system, and Cybersecurity. To the best of the authors’ knowledge, this is the first review that offers a comprehensive overview on strategies to tackle data scarcity in DL.
To study the comparative use of some soil minerals (zeolite, bentonite, phosphate rock, and limestone) in the adsorption and release of lead and its removal rates from its aqueous solutions using adsorption equations. Two laboratory experiments were carried out for the adsorption and release of lead. The adsorption experiment took 0.5 g of some of the above soil minerals. Lead was added as Pb (NO3)2 at levels of 3.0, 2.0, 1.5, 1.0, 0.5, and 0.0 mmol L-1 containing a concentration of 0.01M of calcium chloride. The experimental unit’s number was 72, the concentration of dissolved lead in the equilibrium solution was estimated and the amount of lead adsorbed was calculated. As for the lead release experiment, samples fo
... Show MoreThe current paper proposes a new estimator for the linear regression model parameters under Big Data circumstances. From the diversity of Big Data variables comes many challenges that can be interesting to the researchers who try their best to find new and novel methods to estimate the parameters of linear regression model. Data has been collected by Central Statistical Organization IRAQ, and the child labor in Iraq has been chosen as data. Child labor is the most vital phenomena that both society and education are suffering from and it affects the future of our next generation. Two methods have been selected to estimate the parameter
... Show MorePurpose: The research aims to explore the impact Business Intelligence System (BIS) and Knowledge Conversion Processes (KCP) in the Building Learning Organization (LO) in KOREK Telecom Company in Baghdad city.
Design/methodology/approach: in order to achieve the objectives of the research has been the development of a questionnaire prepared for this purpose and then has tested the search in the telecommunications sector, representatives of one of the telecommunications companies in Baghdad city, has therefore chosen KOREK Telecom company as a sample for research, and the choice was based on the best standard international companies to serve mobile communications in terms o
... Show MoreOptimum perforation location selection is an important study to improve well production and hence in the reservoir development process, especially for unconventional high-pressure formations such as the formations under study. Reservoir geomechanics is one of the key factors to find optimal perforation location. This study aims to detect optimum perforation location by investigating the changes in geomechanical properties and wellbore stress for high-pressure formations and studying the difference in different stress type behaviors between normal and abnormal formations. The calculations are achieved by building one-dimensional mechanical earth model using the data of four deep abnormal wells located in Southern Iraqi oil fields. The magni
... Show MoreOnline learning is not a new concept in education, but it has been used extensively since the Covid-19 pandemic and is still in use now. Every student in the world has gone through this learning process from the primary to the college levels, with both teachers and students conducting instruction online (at home). The goal of the current study is to investigate college students’ attitudes towards online learning. To accomplish the goal of the current study, a questionnaire is developed and adjusted before being administered to a sample of 155 students. Additionally, validity and reliability are attained. Some conclusions, recommendations, and suggestions are offered in the end.
The current study aims to examine the level of problems faced by university students in distance learning, in addition to identify the differences in these problems in terms of the availability of internet services, gender, college, GPA, interactions, academic cohort, and family economic status. The study sample consisted of (3172) students (57.3% females). The researchers developed a questionnaire with (32) items to measure distance learning problems in four areas: Psychological (9 items), academic (10 items), technological (7 items), and study environment (6 items). The responses are scored on a (5) point Likert Scale ranging from 1 (strongly disagree) to 5 (strongly agree). Means, standard deviations, and Multivariate Analysis of Vari
... Show MoreIn this paper, we used four classification methods to classify objects and compareamong these methods, these are K Nearest Neighbor's (KNN), Stochastic Gradient Descentlearning (SGD), Logistic Regression Algorithm(LR), and Multi-Layer Perceptron (MLP). Weused MCOCO dataset for classification and detection the objects, these dataset image wererandomly divided into training and testing datasets at a ratio of 7:3, respectively. In randomlyselect training and testing dataset images, converted the color images to the gray level, thenenhancement these gray images using the histogram equalization method, resize (20 x 20) fordataset image. Principal component analysis (PCA) was used for feature extraction, andfinally apply four classification metho
... Show MoreTelevision white spaces (TVWSs) refer to the unused part of the spectrum under the very high frequency (VHF) and ultra-high frequency (UHF) bands. TVWS are frequencies under licenced primary users (PUs) that are not being used and are available for secondary users (SUs). There are several ways of implementing TVWS in communications, one of which is the use of TVWS database (TVWSDB). The primary purpose of TVWSDB is to protect PUs from interference with SUs. There are several geolocation databases available for this purpose. However, it is unclear if those databases have the prediction feature that gives TVWSDB the capability of decreasing the number of inquiries from SUs. With this in mind, the authors present a reinforcement learning-ba
... Show MoreMachine learning (ML) is a key component within the broader field of artificial intelligence (AI) that employs statistical methods to empower computers with the ability to learn and make decisions autonomously, without the need for explicit programming. It is founded on the concept that computers can acquire knowledge from data, identify patterns, and draw conclusions with minimal human intervention. The main categories of ML include supervised learning, unsupervised learning, semisupervised learning, and reinforcement learning. Supervised learning involves training models using labelled datasets and comprises two primary forms: classification and regression. Regression is used for continuous output, while classification is employed
... Show More