Data scarcity is a major challenge when training deep learning (DL) models. DL demands a large amount of data to achieve exceptional performance. Unfortunately, many applications have small or inadequate data to train DL frameworks. Usually, manual labeling is needed to provide labeled data, which typically involves human annotators with a vast background of knowledge. This annotation process is costly, time-consuming, and error-prone. Usually, every DL framework is fed by a significant amount of labeled data to automatically learn representations. Ultimately, a larger amount of data would generate a better DL model and its performance is also application dependent. This issue is the main barrier for many applications dismissing the use of DL. Having sufficient data is the first step toward any successful and trustworthy DL application. This paper presents a holistic survey on state-of-the-art techniques to deal with training DL models to overcome three challenges including small, imbalanced datasets, and lack of generalization. This survey starts by listing the learning techniques. Next, the types of DL architectures are introduced. After that, state-of-the-art solutions to address the issue of lack of training data are listed, such as Transfer Learning (TL), Self-Supervised Learning (SSL), Generative Adversarial Networks (GANs), Model Architecture (MA), Physics-Informed Neural Network (PINN), and Deep Synthetic Minority Oversampling Technique (DeepSMOTE). Then, these solutions were followed by some related tips about data acquisition needed prior to training purposes, as well as recommendations for ensuring the trustworthiness of the training dataset. The survey ends with a list of applications that suffer from data scarcity, several alternatives are proposed in order to generate more data in each application including Electromagnetic Imaging (EMI), Civil Structural Health Monitoring, Medical imaging, Meteorology, Wireless Communications, Fluid Mechanics, Microelectromechanical system, and Cybersecurity. To the best of the authors’ knowledge, this is the first review that offers a comprehensive overview on strategies to tackle data scarcity in DL.
Text categorization refers to the process of grouping text or documents into classes or categories according to their content. Text categorization process consists of three phases which are: preprocessing, feature extraction and classification. In comparison to the English language, just few studies have been done to categorize and classify the Arabic language. For a variety of applications, such as text classification and clustering, Arabic text representation is a difficult task because Arabic language is noted for its richness, diversity, and complicated morphology. This paper presents a comprehensive analysis and a comparison for researchers in the last five years based on the dataset, year, algorithms and the accuracy th
... Show MoreText categorization refers to the process of grouping text or documents into classes or categories according to their content. Text categorization process consists of three phases which are: preprocessing, feature extraction and classification. In comparison to the English language, just few studies have been done to categorize and classify the Arabic language. For a variety of applications, such as text classification and clustering, Arabic text representation is a difficult task because Arabic language is noted for its richness, diversity, and complicated morphology. This paper presents a comprehensive analysis and a comparison for researchers in the last five years based on the dataset, year, algorithms and the accu
... Show MoreDistributed Denial of Service (DDoS) attacks on Web-based services have grown in both number and sophistication with the rise of advanced wireless technology and modern computing paradigms. Detecting these attacks in the sea of communication packets is very important. There were a lot of DDoS attacks that were directed at the network and transport layers at first. During the past few years, attackers have changed their strategies to try to get into the application layer. The application layer attacks could be more harmful and stealthier because the attack traffic and the normal traffic flows cannot be told apart. Distributed attacks are hard to fight because they can affect real computing resources as well as network bandwidth. DDoS attacks
... Show MoreArtificial intelligence techniques are reaching us in several forms, some of which are useful but can be exploited in a way that harms us. One of these forms is called deepfakes. Deepfakes is used to completely modify video (or image) content to display something that was not in it originally. The danger of deepfake technology impact on society through the loss of confidence in everything is published. Therefore, in this paper, we focus on deepfakedetection technology from the view of two concepts which are deep learning and forensic tools. The purpose of this survey is to give the reader a deeper overview of i) the environment of deepfake creation and detection, ii) how deep learning and forensic tools contributed to the detection
... Show MoreThe dramatic decrease in the cost of genome sequencing over the last two decades has led to an abundance of genomic data. This data has been used in research related to the discovery of genetic diseases and the production of medicines. At the same time, the huge space for storing the genome (2–3 GB) has led to it being considered one of the most important sources of big data, which has prompted research centers concerned with genetic research to take advantage of the cloud and its services in storing and managing this data. The cloud is a shared storage environment, which makes data stored in it vulnerable to unwanted tampering or disclosure. This leads to serious concerns about securing such data from tampering and unauthoriz
... Show MoreThis paper uses Artificial Intelligence (AI) based algorithm analysis to classify breast cancer Deoxyribonucleic (DNA). Main idea is to focus on application of machine and deep learning techniques. Furthermore, a genetic algorithm is used to diagnose gene expression to reduce the number of misclassified cancers. After patients' genetic data are entered, processing operations that require filling the missing values using different techniques are used. The best data for the classification process are chosen by combining each technique using the genetic algorithm and comparing them in terms of accuracy.
Thyroid disease is a common disease affecting millions worldwide. Early diagnosis and treatment of thyroid disease can help prevent more serious complications and improve long-term health outcomes. However, thyroid disease diagnosis can be challenging due to its variable symptoms and limited diagnostic tests. By processing enormous amounts of data and seeing trends that may not be immediately evident to human doctors, Machine Learning (ML) algorithms may be capable of increasing the accuracy with which thyroid disease is diagnosed. This study seeks to discover the most recent ML-based and data-driven developments and strategies for diagnosing thyroid disease while considering the challenges associated with imbalanced data in thyroid dise
... Show MoreSemantic segmentation realization and understanding is a stringent task not just for computer vision but also in the researches of the sciences of earth, semantic segmentation decompose compound architectures in one elements, the most mutual object in a civil outside or inside senses must classified then reinforced with information meaning of all object, it’s a method for labeling and clustering point cloud automatically. Three dimensions natural scenes classification need a point cloud dataset to representation data format as input, many challenge appeared with working of 3d data like: little number, resolution and accurate of three Dimensional dataset . Deep learning now is the po
The need to constantly and consistently improve the quality and quantity of the educational system is essential. E-learning has emerged from the rapid cycle of change and the expansion of new technologies. Advances in information technology have increased network bandwidth, data access speed, and reduced data storage costs. In recent years, the implementation of cloud computing in educational settings has garnered the interest of major companies, leading to substantial investments in this area. Cloud computing improves engineering education by providing an environment that can be accessed from anywhere and allowing access to educational resources on demand. Cloud computing is a term used to describe the provision of hosting services
... Show MoreIn this paper, we have investigated some of the most recent energy efficient routing protocols for wireless body area networks. This technology has seen advancements in recent times where wireless sensors are injected in the human body to sense and measure body parameters like temperature, heartbeat and glucose level. These tiny wireless sensors gather body data information and send it over a wireless network to the base station. The data measurements are examined by the doctor or physician and the suitable cure is suggested. The whole communication is done through routing protocols in a network environment. Routing protocol consumes energy while helping non-stop communic
... Show More