A survey on deep learning tools dealing with data scarcity: definitions, challenges, solutions, tips, and applications

Ali H. Al-Timemy

doi:10.1186/s40537-023-00727-2

Details

Publication Date

Fri Apr 14 2023

Journal Name

Journal Of Big Data

Volume

10

DOI

10.1186/s40537-023-00727-2

Choose Citation Style

Statistics

View publication

25

View pdf

1

Statistics

(534)

(527)

A survey on deep learning tools dealing with data scarcity: definitions, challenges, solutions, tips, and applications

Ali H. Al-Timemy

...Show More Authors

Abstract<p>Data scarcity is a major challenge when training deep learning (DL) models. DL demands a large amount of data to achieve exceptional performance. Unfortunately, many applications have small or inadequate data to train DL frameworks. Usually, manual labeling is needed to provide labeled data, which typically involves human annotators with a vast background of knowledge. This annotation process is costly, time-consuming, and error-prone. Usually, every DL framework is fed by a significant amount of labeled data to automatically learn representations. Ultimately, a larger amount of data would generate a better DL model and its performance is also application dependent. This issue is the main barrier for many applications dismissing the use of DL. Having sufficient data is the first step toward any successful and trustworthy DL application. This paper presents a holistic survey on state-of-the-art techniques to deal with training DL models to overcome three challenges including small, imbalanced datasets, and lack of generalization. This survey starts by listing the learning techniques. Next, the types of DL architectures are introduced. After that, state-of-the-art solutions to address the issue of lack of training data are listed, such as Transfer Learning (TL), Self-Supervised Learning (SSL), Generative Adversarial Networks (GANs), Model Architecture (MA), Physics-Informed Neural Network (PINN), and Deep Synthetic Minority Oversampling Technique (DeepSMOTE). Then, these solutions were followed by some related tips about data acquisition needed prior to training purposes, as well as recommendations for ensuring the trustworthiness of the training dataset. The survey ends with a list of applications that suffer from data scarcity, several alternatives are proposed in order to generate more data in each application including Electromagnetic Imaging (EMI), Civil Structural Health Monitoring, Medical imaging, Meteorology, Wireless Communications, Fluid Mechanics, Microelectromechanical system, and Cybersecurity. To the best of the authors’ knowledge, this is the first review that offers a comprehensive overview on strategies to tackle data scarcity in DL.</p>

View Publication Preview PDF

Quick Preview PDF

Publication Date

Mon Jun 01 2009

Journal Name

Journal Of Economics And Administrative Sciences

Estimation of the average sample size and defective ratio In a finite individualized inspection with a practical application

اسامة محمد

...Show More Authors

The purpose of this research is to find the estimator of the average proportion of defectives based on attribute samples. That have been curtailed either with rejection of a lot finding the kth defective or with acceptance on finding the kth non defective.

The MLE (Maximum likelihood estimator) is derived. And also the ASN in Single Curtailed Sampling has been derived and we obtain a simplified Formula All the Notations needed are explained.

View Publication Preview PDF

Publication Date

Sat Aug 01 2020

Journal Name

Heat Transfer

A parametric study of a photovoltaic panel with cylindrical fins under still and moving air conditions in Iraq

Ammar A.

Mustafa

Ahmed

...Show More Authors

(7)

Publication Date

Wed Jul 01 2015

Journal Name

Journal Of Engineering

Cathodic Protection for Above Ground Storage Tank Bottom Using Data Acquisition

cathodic protection

impressed current

above ground tank bottom

control

DAQ.

Naseer Abbood Issa

...Show More Authors

Impressed current cathodic protection controlled by computer gives the ideal solution to the changes in environmental factors and long term coating degradation. The protection potential distribution achieved and the current demand on the anode can be regulated to protection criteria, to achieve the effective protection for the system.

In this paper, cathodic protection problem of above ground steel storage tank was investigated by an impressed current of cathodic protection with controlled potential of electrical system to manage the variation in soil resistivity. Corrosion controller has been implemented for above ground tank in LabView where tank's bottom potential to soil was manipulated to the desired set poi

View Publication Preview PDF

Publication Date

Fri Jun 01 2018

Journal Name

International Journal Of Computer Science Trends And Technology

Secure Video Data Deduplication in the Cloud Storage Using Compressive Sensing

Video Deduplication

Compressive Sensing

Cloud Computing

video Compression

Qutaiba Mumtaz

Dr. K. Gangadhara

Dr.B.Basaveswara

...Show More Authors

Cloud storage provides scalable and low cost resources featuring economies of scale based on cross-user architecture. As the amount of data outsourced grows explosively, data deduplication, a technique that eliminates data redundancy, becomes essential. The most important cloud service is data storage. In order to protect the privacy of data owner, data are stored in cloud in an encrypted form. However, encrypted data introduce new challenges for cloud data deduplication, which becomes crucial for data storage. Traditional deduplication schemes cannot work on encrypted data. Existing solutions of encrypted data deduplication suffer from security weakness. This paper proposes a combined compressive sensing and video deduplication to maximize

View Publication Preview PDF

Publication Date

Wed Jan 01 2014

Journal Name

Proceedings Of The Aintec 2014 On Asian Internet Engineering Conference - Aintec '14

LTE Peak Data Rate Estimation Using Modified alpha-Shannon Capacity Formula

Bin-Salem A.A.

imad j. mohammed

...Show More Authors

View Publication

(4)

(3)

Publication Date

Mon Aug 01 2022

Journal Name

Baghdad Science Journal

Perceptually Important Points-Based Data Aggregation Method for Wireless Sensor Networks

Data Aggregation

Energy-Saving

Perceptually Important Points (PIP)

Wireless Sensor Network.

Iman Dakhil Idan

Ali Kadhum M.

...Show More Authors

The transmitting and receiving of data consume the most resources in Wireless Sensor Networks (WSNs). The energy supplied by the battery is the most important resource impacting WSN's lifespan in the sensor node. Therefore, because sensor nodes run from their limited battery, energy-saving is necessary. Data aggregation can be defined as a procedure applied for the elimination of redundant transmissions, and it provides fused information to the base stations, which in turn improves the energy effectiveness and increases the lifespan of energy-constrained WSNs. In this paper, a Perceptually Important Points Based Data Aggregation (PIP-DA) method for Wireless Sensor Networks is suggested to reduce redundant data before sending them to the

View Publication Preview PDF

(61)

(51)

Publication Date

Sat Aug 01 2015

Journal Name

2015 Ieee Conference On Computational Intelligence In Bioinformatics And Computational Biology (cibcb)

Granular computing approach for the design of medical data classification systems

M.

...Show More Authors

View Publication

(4)

(3)

Publication Date

Mon Oct 09 2023

Journal Name

2023 Ieee 34th International Symposium On Software Reliability Engineering Workshops (issrew)

Semantics-Based, Automated Preparation of Exploratory Data Analysis for Complex Systems

Noor

Attila

Imre

...Show More Authors

View Publication

(1)

Publication Date

Mon Dec 25 2023

Journal Name

Ieee Access

ITor-SDN: Intelligent Tor Networks-Based SDN for Data Forwarding Management

Anonymity

blockchain

ML

SDN

Tor networks

Fouad A.

Nahlah Abdulrahman

Hamed S.

...Show More Authors

Tor (The Onion Routing) network was designed to enable users to browse the Internet anonymously. It is known for its anonymity and privacy security feature against many agents who desire to observe the area of users or chase users’ browsing conventions. This anonymity stems from the encryption and decryption of Tor traffic. That is, the client’s traffic should be subject to encryption and decryption before the sending and receiving process, which leads to delay and even interruption in data flow. The exchange of cryptographic keys between network devices plays a pivotal and critical role in facilitating secure communication and ensuring the integrity of cryptographic procedures. This essential process is time-consuming, which causes del

View Publication

(3)

(2)

Publication Date

Fri Apr 26 2019

Journal Name

Journal Of Contemporary Medical Sciences

Breast Cancer Decisive Parameters for Iraqi Women via Data Mining Techniques

CA 15-3

CEA

Breast Cancer

Saliva

MLP

SLR

J48

data mining

OneR

Iraq

Suhad Faisal

Mustafa S.

Iyden Kamil

Maha Mohammed

...Show More Authors

Objective This research investigates Breast Cancer real data for Iraqi women, these data are acquired manually from several Iraqi Hospitals of early detection for Breast Cancer. Data mining techniques are used to discover the hidden knowledge, unexpected patterns, and new rules from the dataset, which implies a large number of attributes. Methods Data mining techniques manipulate the redundant or simply irrelevant attributes to discover interesting patterns. However, the dataset is processed via Weka (The Waikato Environment for Knowledge Analysis) platform. The OneR technique is used as a machine learning classifier to evaluate the attribute worthy according to the class value. Results The evaluation is performed using

View Publication Preview PDF

(2)

1 2 ... 137 138 139 140 ... 2193 2194