Data scarcity is a major challenge when training deep learning (DL) models. DL demands a large amount of data to achieve exceptional performance. Unfortunately, many applications have small or inadequate data to train DL frameworks. Usually, manual labeling is needed to provide labeled data, which typically involves human annotators with a vast background of knowledge. This annotation process is costly, time-consuming, and error-prone. Usually, every DL framework is fed by a significant amount of labeled data to automatically learn representations. Ultimately, a larger amount of data would generate a better DL model and its performance is also application dependent. This issue is the main barrier for many applications dismissing the use of DL. Having sufficient data is the first step toward any successful and trustworthy DL application. This paper presents a holistic survey on state-of-the-art techniques to deal with training DL models to overcome three challenges including small, imbalanced datasets, and lack of generalization. This survey starts by listing the learning techniques. Next, the types of DL architectures are introduced. After that, state-of-the-art solutions to address the issue of lack of training data are listed, such as Transfer Learning (TL), Self-Supervised Learning (SSL), Generative Adversarial Networks (GANs), Model Architecture (MA), Physics-Informed Neural Network (PINN), and Deep Synthetic Minority Oversampling Technique (DeepSMOTE). Then, these solutions were followed by some related tips about data acquisition needed prior to training purposes, as well as recommendations for ensuring the trustworthiness of the training dataset. The survey ends with a list of applications that suffer from data scarcity, several alternatives are proposed in order to generate more data in each application including Electromagnetic Imaging (EMI), Civil Structural Health Monitoring, Medical imaging, Meteorology, Wireless Communications, Fluid Mechanics, Microelectromechanical system, and Cybersecurity. To the best of the authors’ knowledge, this is the first review that offers a comprehensive overview on strategies to tackle data scarcity in DL.
This paper proposes a better solution for EEG-based brain language signals classification, it is using machine learning and optimization algorithms. This project aims to replace the brain signal classification for language processing tasks by achieving the higher accuracy and speed process. Features extraction is performed using a modified Discrete Wavelet Transform (DWT) in this study which increases the capability of capturing signal characteristics appropriately by decomposing EEG signals into significant frequency components. A Gray Wolf Optimization (GWO) algorithm method is applied to improve the results and select the optimal features which achieves more accurate results by selecting impactful features with maximum relevance
... Show MoreBackground: Appreciation of the crucial role of risk factors in the development of coronary artery disease (CAD) is one of the most significant advances in the understanding of this important disease. Extensive epidemiological research has established cigarette smoking, diabetes, hyperlipidemia, and hypertension as independent risk factors for CADObjective: To determine the prevalence of the 4 conventional risk factors(cigarette smoking, diabetes, hyperlipidemia, and hypertension) among patients with CAD and to determine the correlation of Thrombolysis in Myocardial Infarction (TIMI) risk score with the extent of coronary artery disease (CAD) in patients with unstable angina /non ST elevation myocardial infarction (UA/NSTEMI).Methods: We
... Show MoreDiabetes is one of the increasing chronic diseases, affecting millions of people around the earth. Diabetes diagnosis, its prediction, proper cure, and management are compulsory. Machine learning-based prediction techniques for diabetes data analysis can help in the early detection and prediction of the disease and its consequences such as hypo/hyperglycemia. In this paper, we explored the diabetes dataset collected from the medical records of one thousand Iraqi patients. We applied three classifiers, the multilayer perceptron, the KNN and the Random Forest. We involved two experiments: the first experiment used all 12 features of the dataset. The Random Forest outperforms others with 98.8% accuracy. The second experiment used only five att
... Show MoreThe work in this paper involves the planning, design and implementation of a mobile learning system called Nahrain Mobile Learning System (NMLS). This system provides complete teaching resources, which can be accessed by the students, instructors and administrators through the mobile phones. It presents a viable alternative to Electronic learning. It focuses on the mobility and flexibility of the learning practice, and emphasizes the interaction between the learner and learning content. System users are categorized into three categories: administrators, instructors and students. Different learning activities can be carried out throughout the system, offering necessary communication tools to allow the users to communicate with each other
... Show MoreIn the last two decades, arid and semi-arid regions of China suffered rapid changes in the Land Use/Cover Change (LUCC) due to increasing demand on food, resulting from growing population. In the process of this study, we established the land use/cover classification in addition to remote sensing characteristics. This was done by analysis of the dynamics of (LUCC) in Zhengzhou area for the period 1988-2006. Interpretation of a laminar extraction technique was implied in the identification of typical attributes of land use/cover types. A prominent result of the study indicates a gradual development in urbanization giving a gradual reduction in crop field area, due to the progressive economy in Zhengzhou. The results also reflect degradati
... Show MoreThis paper aims at the analytical level to know the security topics that were used with data journalism, and the expression methods used in the statements of the Security Media Cell, as well as to identify the means of clarification used in data journalism. About the Security Media Cell, and the methods preferred by the public in presenting press releases, especially determining the strength of the respondents' attitude towards the data issued by the Security Media Cell. On the Security Media Cell, while the field study included the distribution of a questionnaire to the public of Baghdad Governorate. The study reached several results, the most important of which is the interest of the security media cell in presenting its data in differ
... Show MoreThe research discusses the need to find the innovative structures and methodologies for developing Human Capital (HC) in Iraqi Universities. One of the most important of these structures is Communities of Practice (CoPs) which contributes to develop HC by using learning, teaching and training through the conversion speed of knowledge and creativity into practice. This research has been used the comparative approach through employing the methodology of Data Envelopment Analysis (DEA) by using (Excel 2010 - Solver) as a field evidence to prove the role of CoPs in developing HC. In light of the given information, a researcher adopted on an archived preliminary data about (23) colleges at Mosul University as a deliberate sample for t
... Show MoreBackground: Generally, genetic disorders are a leading cause of spontaneous abortion, neonatal death, increased morbidity and mortality in children and adults as well. They a significant health care and psychosocial burden for the patient, the family, the healthcare system and the community as a whole. Chromosomal abnormalities occur much more frequently than is generally appreciated. It is estimated that approximately 1 of 200 newborn infants had some form of chromosomal abnormality. The figure is much higher in fetuses that do not survive to term. It is estimated that in 50% of first trimester abortions, the fetus has a chromosomal abnormality. Aim of the study: This study aims to shed some light on the results of chromosomal studies per
... Show MoreNoor oil field is one of smallest fields in Missan province. Twelve well penetrates the Mishrif Formation in Noor field and eight of them were selected for this study. Mishrif formation is one of the most important reservoirs in Noor field and it consists of one anticline dome and bounded by the Khasib formation at the top and the Rumaila formation at the bottom. The reservoir was divided into eight units separated by isolated units according to partition taken by a rounding fields.
In this paper histograms frequency distribution of the porosity, permeability, and water saturation were plotted for MA unit of Mishrif formation in Noor field, and then transformed to the normal distribution by applying the Box-Cox transformation alg
... Show More