A survey on deep learning tools dealing with data scarcity: definitions, challenges, solutions, tips, and applications

Ali H. Al-Timemy

doi:10.1186/s40537-023-00727-2

Details

Publication Date

Fri Apr 14 2023

Journal Name

Journal Of Big Data

Volume

10

DOI

10.1186/s40537-023-00727-2

Choose Citation Style

Statistics

View publication

20

View pdf

1

Statistics

(426)

A survey on deep learning tools dealing with data scarcity: definitions, challenges, solutions, tips, and applications

Ali H. Al-Timemy

...Show More Authors

Abstract<p>Data scarcity is a major challenge when training deep learning (DL) models. DL demands a large amount of data to achieve exceptional performance. Unfortunately, many applications have small or inadequate data to train DL frameworks. Usually, manual labeling is needed to provide labeled data, which typically involves human annotators with a vast background of knowledge. This annotation process is costly, time-consuming, and error-prone. Usually, every DL framework is fed by a significant amount of labeled data to automatically learn representations. Ultimately, a larger amount of data would generate a better DL model and its performance is also application dependent. This issue is the main barrier for many applications dismissing the use of DL. Having sufficient data is the first step toward any successful and trustworthy DL application. This paper presents a holistic survey on state-of-the-art techniques to deal with training DL models to overcome three challenges including small, imbalanced datasets, and lack of generalization. This survey starts by listing the learning techniques. Next, the types of DL architectures are introduced. After that, state-of-the-art solutions to address the issue of lack of training data are listed, such as Transfer Learning (TL), Self-Supervised Learning (SSL), Generative Adversarial Networks (GANs), Model Architecture (MA), Physics-Informed Neural Network (PINN), and Deep Synthetic Minority Oversampling Technique (DeepSMOTE). Then, these solutions were followed by some related tips about data acquisition needed prior to training purposes, as well as recommendations for ensuring the trustworthiness of the training dataset. The survey ends with a list of applications that suffer from data scarcity, several alternatives are proposed in order to generate more data in each application including Electromagnetic Imaging (EMI), Civil Structural Health Monitoring, Medical imaging, Meteorology, Wireless Communications, Fluid Mechanics, Microelectromechanical system, and Cybersecurity. To the best of the authors’ knowledge, this is the first review that offers a comprehensive overview on strategies to tackle data scarcity in DL.</p>

View Publication Preview PDF

Quick Preview PDF

Publication Date

Mon Mar 31 2025

Journal Name

The Iraqi Geological Journal

Evaluation of Machine Learning Techniques for Missing Well Log Data in Buzurgan Oil Field: A Case Study

Usama

...Show More Authors

The investigation of machine learning techniques for addressing missing well-log data has garnered considerable interest recently, especially as the oil and gas sector pursues novel approaches to improve data interpretation and reservoir characterization. Conversely, for wells that have been in operation for several years, conventional measurement techniques frequently encounter challenges related to availability, including the lack of well-log data, cost considerations, and precision issues. This study's objective is to enhance reservoir characterization by automating well-log creation using machine-learning techniques. Among the methods are multi-resolution graph-based clustering and the similarity threshold method. By using cutti

View Publication Preview PDF

Publication Date

Mon Oct 01 2012

Journal Name

Al–bahith Al–a'alami

Dealing of the Providers of Sport Media Content with Crises : (The Department of Media of the Ministry of Youth and Sports a Model)

Providers of Sport Media

Sport Media

Ministry of Youth and Sports

Hadi

...Show More Authors

In a report by Transparency Organization in 2010, Iraq has 200 newspapers, magazines, sixty-seven radio stations and 45 satellite TV channels. The increase in these figures is measured in days or weeks and not months and years. This fact confirms the importance of studying content providers, especially youth sports content, for two reasons: the first is that young people constitute the highest percentage in Iraqi society, with all the potential involved in shaping the future aspects; the second reason is that for years sport has become an important pillar in people's lives not only in the entertainment aspect as it was seen in the past; Rather, sport has an influential presence in politi

View Publication Preview PDF

Publication Date

Fri Mar 23 2018

Journal Name

Entropy

Methods and Challenges in Shot Boundary Detection: A Review

Sadiq

Abd

M.

Basheera

Syed

Wissam

...Show More Authors

View Publication

(63)

(56)

Publication Date

Tue Dec 20 2022

Journal Name

2022 International Conference On Computer And Applications (icca)

Improve Data Mining Techniques with a High-Performance Cluster

Fadhil H.M.

...Show More Authors

View Publication

Publication Date

Thu Dec 01 2022

Journal Name

International Journal Of Electrical And Computer Engineering (ijece)

A survey on bio-signal analysis for human-robot interaction

Bio-signals

Health care

Human-robot interaction

Huda

Alia

...Show More Authors

<span lang="EN-US">The use of bio-signals analysis in human-robot interaction is rapidly increasing. There is an urgent demand for it in various applications, including health care, rehabilitation, research, technology, and manufacturing. Despite several state-of-the-art bio-signals analyses in human-robot interaction (HRI) research, it is unclear which one is the best. In this paper, the following topics will be discussed: robotic systems should be given priority in the rehabilitation and aid of amputees and disabled people; second, domains of feature extraction approaches now in use, which are divided into three main sections (time, frequency, and time-frequency). The various domains will be discussed, then a discussion of e

View Publication Preview PDF

(5)

(4)

Publication Date

Sun Jan 01 2023

Journal Name

International Journal Of Data And Network Science

The effects of big data, artificial intelligence, and business intelligence on e-learning and business performance: Evidence from Jordanian telecommunication firms

Ahmad

Rami

Firas

...Show More Authors

This study sought to investigate the impacts of big data, artificial intelligence (AI), and business intelligence (BI) on Firms' e-learning and business performance at Jordanian telecommunications industry. After the samples were checked, a total of 269 were collected. All of the information gathered throughout the investigation was analyzed using the PLS software. The results show a network of interconnections can improve both e-learning and corporate effectiveness. This research concluded that the integration of big data, AI, and BI has a positive impact on e-learning infrastructure development and organizational efficiency. The findings indicate that big data has a positive and direct impact on business performance, including Big

View Publication

(37)

(32)

Publication Date

Tue Oct 23 2018

Journal Name

Journal Of Economics And Administrative Sciences

Processing of missing values in survey data using Principal Component Analysis and probabilistic Principal Component Analysis methods

قتيبة نبيل

بشرى رحيم

...Show More Authors

The idea of carrying out research on incomplete data came from the circumstances of our dear country and the horrors of war, which resulted in the missing of many important data and in all aspects of economic, natural, health, scientific life, etc.,. The reasons for the missing are different, including what is outside the will of the concerned or be the will of the concerned, which is planned for that because of the cost or risk or because of the lack of possibilities for inspection. The missing data in this study were processed using Principal Component Analysis and self-organizing map methods using simulation. The variables of child health and variables affecting children's health were taken into account: breastfeed

View Publication Preview PDF

Publication Date

Fri Nov 18 2022

Journal Name

International Journal Of Nanoscience

Plasma Production and Applications: A Review

N.

Nisreen kh.

Adnan Qahtan

...Show More Authors

Large amounts of plasma, the universe’s fourth most common kind of stuff, may be found across our galaxy and other galaxies. There are four types of matter in the cosmos, and plasma is the most common. By heating the compressed air or inert gases to create negatively and positively charged particles known as ions, electrically neutral particles in their natural state are formed. Many scientists are currently focusing their efforts on the development of artificial plasma and the possible advantages it may have for humankind in the near future. In the literature, there is a scarcity of information regarding plasma applications. It’s the goal of this page to describe particular methods for creating and using plasma, which may be us

View Publication

(16)

(11)

Publication Date

Wed Dec 01 2021

Journal Name

Computers & Electrical Engineering

Utilizing different types of deep learning models for classification of series arc in photovoltaics systems

Alaa Hamza

Dalila Mat

Siti Maherah

Sadiq H.

Haidar

...Show More Authors

View Publication

(12)

Publication Date

Mon Apr 03 2023

Journal Name

Journal Of Al-qadisiyah For Computer Science And Mathematics

A General Overview on the Categories of Image Features Extraction Techniques: A Survey

Pixel-level feature

local feature

global feature

features detection

features description

edge

corner

blob or region

Rafal

...Show More Authors

In the image processing’s field and computer vision it’s important to represent the image by its information. Image information comes from the image’s features that extracted from it using feature detection/extraction techniques and features description. Features in computer vision define informative data. For human eye its perfect to extract information from raw image, but computer cannot recognize image information. This is why various feature extraction techniques have been presented and progressed rapidly. This paper presents a general overview of the feature extraction categories for image.

View Publication Preview PDF

1 2 ... 11 12 13 14 ... 2125 2126