Beyond the immediate content of speech, the voice can provide rich information about a speaker's demographics, including age and gender. Estimating a speaker's age and gender offers a wide range of applications, spanning from voice forensic analysis to personalized advertising, healthcare monitoring, and human-computer interaction. However, pinpointing precise age remains intricate due to age ambiguity. Specifically, utterances from individuals at adjacent ages are frequently indistinguishable. Addressing this, we propose a novel, end-to-end approach that deploys Mozilla's Common Voice dataset to transform raw audio into high-quality feature representations using Wav2Vec2.0 embeddings. These are then channeled into our self-attention-based convolutional neural network (CNN) model. To address age ambiguity, we evaluate the effects of different loss functions such as focal loss and Kullback-Leibler (KL) divergence loss. Additionally, we evaluate the accuracy of the estimation at different durations of speech. Experimental results from the Common Voice dataset underscore the efficacy of our approach, showcasing an accuracy of 87% for male speakers, 91% for female speakers and 89% overall accuracy, and an accuracy of 99.1% for gender prediction.
Deep learning (DL) plays a significant role in several tasks, especially classification and prediction. Classification tasks can be efficiently achieved via convolutional neural networks (CNN) with a huge dataset, while recurrent neural networks (RNN) can perform prediction tasks due to their ability to remember time series data. In this paper, three models have been proposed to certify the evaluation track for classification and prediction tasks associated with four datasets (two for each task). These models are CNN and RNN, which include two models (Long Short Term Memory (LSTM)) and GRU (Gated Recurrent Unit). Each model is employed to work consequently over the two mentioned tasks to draw a road map of deep learning mod
... Show MorePoetical Words
Recommendation systems are now being used to address the problem of excess information in several sectors such as entertainment, social networking, and e-commerce. Although conventional methods to recommendation systems have achieved significant success in providing item suggestions, they still face many challenges, including the cold start problem and data sparsity. Numerous recommendation models have been created in order to address these difficulties. Nevertheless, including user or item-specific information has the potential to enhance the performance of recommendations. The ConvFM model is a novel convolutional neural network architecture that combines the capabilities of deep learning for feature extraction with the effectiveness o
... Show MoreHierarchical temporal memory (HTM) is a biomimetic sequence memory algorithm that holds promise for invariant representations of spatial and spatio-temporal inputs. This article presents a comprehensive neuromemristive crossbar architecture for the spatial pooler (SP) and the sparse distributed representation classifier, which are fundamental to the algorithm. There are several unique features in the proposed architecture that tightly link with the HTM algorithm. A memristor that is suitable for emulating the HTM synapses is identified and a new Z-window function is proposed. The architecture exploits the concept of synthetic synapses to enable potential synapses in the HTM. The crossbar for the SP avoids dark spots caused by unutil
... Show MoreThe aim of this research is to collect the semantically restricted vocabulary from linguistic vocabulary and make it regular in one wire with an in-depth study. This study is important in detecting the exact meanings of the language. On the genre, as shown in this research, and our purpose to reveal this phenomenon, where it shows the accuracy of Arabic in denoting the meanings, the research has overturned more than sixty-seven words we extracted from the stomachs of the glossaries and books of language, and God ask safety intent and payment of opinion.
In information security, fingerprint verification is one of the most common recent approaches for verifying human identity through a distinctive pattern. The verification process works by comparing a pair of fingerprint templates and identifying the similarity/matching among them. Several research studies have utilized different techniques for the matching process such as fuzzy vault and image filtering approaches. Yet, these approaches are still suffering from the imprecise articulation of the biometrics’ interesting patterns. The emergence of deep learning architectures such as the Convolutional Neural Network (CNN) has been extensively used for image processing and object detection tasks and showed an outstanding performance compare
... Show MoreGender classification is a critical task in computer vision. This task holds substantial importance in various domains, including surveillance, marketing, and human-computer interaction. In this work, the face gender classification model proposed consists of three main phases: the first phase involves applying the Viola-Jones algorithm to detect facial images, which includes four steps: 1) Haar-like features, 2) Integral Image, 3) Adaboost Learning, and 4) Cascade Classifier. In the second phase, four pre-processing operations are employed, namely cropping, resizing, converting the image from(RGB) Color Space to (LAB) color space, and enhancing the images using (HE, CLAHE). The final phase involves utilizing Transfer lea
... Show More