BEYOND WORDS: HARNESSING SPEECH SOUND FOR SPEAKER AGE AND GENDER DETECTION USING 1D CNN ARCHITECTURE WITH SELF-ATTENTION MECHANISM

Umniah Hameed jaid

doi:10.5455/jjcit.71-1703265368

Details

Publication Date

Mon Jan 01 2024

Journal Name

Jordanian Journal Of Computers And Information Technology

DOI

10.5455/jjcit.71-1703265368

Choose Citation Style

Statistics

View publication

7

Statistics

BEYOND WORDS: HARNESSING SPEECH SOUND FOR SPEAKER AGE AND GENDER DETECTION USING 1D CNN ARCHITECTURE WITH SELF-ATTENTION MECHANISM

Umniah Hameed jaid

...Show More Authors

Beyond the immediate content of speech, the voice can provide rich information about a speaker's demographics, including age and gender. Estimating a speaker's age and gender offers a wide range of applications, spanning from voice forensic analysis to personalized advertising, healthcare monitoring, and human-computer interaction. However, pinpointing precise age remains intricate due to age ambiguity. Specifically, utterances from individuals at adjacent ages are frequently indistinguishable. Addressing this, we propose a novel, end-to-end approach that deploys Mozilla's Common Voice dataset to transform raw audio into high-quality feature representations using Wav2Vec2.0 embeddings. These are then channeled into our self-attention-based convolutional neural network (CNN) model. To address age ambiguity, we evaluate the effects of different loss functions such as focal loss and Kullback-Leibler (KL) divergence loss. Additionally, we evaluate the accuracy of the estimation at different durations of speech. Experimental results from the Common Voice dataset underscore the efficacy of our approach, showcasing an accuracy of 87% for male speakers, 91% for female speakers and 89% overall accuracy, and an accuracy of 99.1% for gender prediction.

View Publication

Publication Date

Sat Mar 02 2019

Journal Name

Biochem. Cell.arch.

EVALUATION OF PRIMARY IMPLANTS STABILITY IN IMMEDIATE AND DELAYED TREATMENT PROTOCOLS ACCORDING TO BONE DENSITY, JAWS, GENDER AND AGE UTILIZING PERIOTEST M DEVICE

sahad

maha

saif

...Show More Authors

Publication Date

Fri Jul 17 2026

Journal Name

Journal Of Baghdad College Of Dentistry

Effect of gender, age and tooth loss on the dimensions of incisive canal, and buccal bone anterior to the canal (Computed Tomography study)

Ryaheen G

Ahlam A

...Show More Authors

Background: The incisive canal is an anatomical structure with an important location in the anterior maxilla, analyzing this canal and its relation to the bone anterior to the canal is necessary during dental implant. Aim of this study is evaluated effect of gender, age and tooth loss in area of maxillary central incisors teeth on the dimensions of incisive canal and buccal bone anterior to the canal using spiral computed tomography. Materials and Methods: Sample consists of prospective study for 156 subjects for both gender, they divided into two groups, 120 dentate group (60 male and 60 female) with age ranging from (20-70) and 36 edentate group (with missing maxillary central incisors) (18 male and 18 female) with age ranging from (50-70

View Publication Preview PDF

Publication Date

Sat Feb 02 2019

Journal Name

Journal Of The College Of Education For Women

Poetical Words

أ.د .داود

...Show More Authors

Poetical Words

View Publication Preview PDF

Publication Date

Mon Jun 05 2023

Journal Name

Journal Of Engineering

Isolated Word Speech Recognition Using Mixed Transform

Mixed Transform

Radon Transform

Discrete Wavelet Transform

Discrete Multicircularlet Transform

Dynamic Time Warping

Sadiq Jassim

Shahad Mujeeb

...Show More Authors

Methods of speech recognition have been the subject of several studies over the past decade. Speech recognition has been one of the most exciting areas of the signal processing. Mixed transform is a useful tool for speech signal processing; it is developed for its abilities of improvement in feature extraction. Speech recognition includes three important stages, preprocessing, feature extraction, and classification. Recognition accuracy is so affected by the features extraction stage; therefore different models of mixed transform for feature extraction were proposed. The properties of the recorded isolated word will be 1-D, which achieve the conversion of each 1-D word into a 2-D form. The second step of the word recognizer requires, the

View Publication Preview PDF

(1)

Publication Date

Thu Nov 01 2018

Journal Name

2018 1st Annual International Conference On Information And Sciences (aicis)

Speech Emotion Recognition Using Minimum Extracted Features

Speech emotion recognition

Minimum feature extraction

ZCR

12 MFCC

Random forest

Wisal Hashim

Rafah Shihab

Mohammed Najm

...Show More Authors

Recognizing speech emotions is an important subject in pattern recognition. This work is about studying the effect of extracting the minimum possible number of features on the speech emotion recognition (SER) system. In this paper, three experiments performed to reach the best way that gives good accuracy. The first one extracting only three features: zero crossing rate (ZCR), mean, and standard deviation (SD) from emotional speech samples, the second one extracting only the first 12 Mel frequency cepstral coefficient (MFCC) features, and the last experiment applying feature fusion between the mentioned features. In all experiments, the features are classified using five types of classification techniques, which are the Random Forest (RF),

View Publication Preview PDF

(15)

(7)

Publication Date

Tue May 07 2019

Journal Name

Acm Journal On Emerging Technologies In Computing Systems

Neuromemrisitive Architecture of HTM with On-Device Learning and Neurogenesis

Abdullah M.

Dhireesha

...Show More Authors

Hierarchical temporal memory (HTM) is a biomimetic sequence memory algorithm that holds promise for invariant representations of spatial and spatio-temporal inputs. This article presents a comprehensive neuromemristive crossbar architecture for the spatial pooler (SP) and the sparse distributed representation classifier, which are fundamental to the algorithm. There are several unique features in the proposed architecture that tightly link with the HTM algorithm. A memristor that is suitable for emulating the HTM synapses is identified and a new Z-window function is proposed. The architecture exploits the concept of synthetic synapses to enable potential synapses in the HTM. The crossbar for the SP avoids dark spots caused by unutil

View Publication

(15)

Publication Date

Sat Sep 30 2017

Journal Name

College Of Islamic Sciences

Significant words Collect and study

د.عمار عيسى

...Show More Authors

The aim of this research is to collect the semantically restricted vocabulary from linguistic vocabulary and make it regular in one wire with an in-depth study. This study is important in detecting the exact meanings of the language. On the genre, as shown in this research, and our purpose to reveal this phenomenon, where it shows the accuracy of Arabic in denoting the meanings, the research has overturned more than sixty-seven words we extracted from the stomachs of the glossaries and books of language, and God ask safety intent and payment of opinion.

View Publication Preview PDF

Publication Date

Tue Dec 05 2023

Journal Name

Baghdad Science Journal

AlexNet Convolutional Neural Network Architecture with Cosine and Hamming Similarity/Distance Measures for Fingerprint Biometric Matching

Biometric Cryptosystem

Convolutional Neural Network

Cosine Similarity

Fingerprint Matching

Information Security

Ahmed Sabah Ahmed

Huda Kadhim

Abeer

...Show More Authors

In information security, fingerprint verification is one of the most common recent approaches for verifying human identity through a distinctive pattern. The verification process works by comparing a pair of fingerprint templates and identifying the similarity/matching among them. Several research studies have utilized different techniques for the matching process such as fuzzy vault and image filtering approaches. Yet, these approaches are still suffering from the imprecise articulation of the biometrics’ interesting patterns. The emergence of deep learning architectures such as the Convolutional Neural Network (CNN) has been extensively used for image processing and object detection tasks and showed an outstanding performance compare

View Publication Preview PDF

(6)

(3)

Publication Date

Wed Mar 15 2023

Journal Name

Journal Of The Turkish-german Gynecological Association

Obstetric and neonatal complications in large for gestational age pregnancy with late gestational diabetes

Hyperglycemia

macrosomia

third-trimester gestational diabetes

Shaymaa Kadhim

Hayder

Zina Ismaiel

Rand

...Show More Authors

View Publication Preview PDF

(6)

(1)

Publication Date

Tue Dec 16 2025

Journal Name

Radioelectronics. Nanosystems. Information Technologies.

Intelligent Control and Stability Analysis of Smart Grids Using CNN-LSTM Network and Model Predictive Controller

Model Predictive Control (MPC)

Intelligent Control Systems

Residual CNN–LSTM

Real-time Grid Monitoring

SHAP Explainability

Aws

...Show More Authors

It is important that real time stability in smart grids is ensured as the integration of renewables and the complexity of the systems grows. In this paper, we provide a solid architecture, which combines a Residual CNNLSTM deep neural network predictor, FPGA-accelerated Model Predictive Control (MPC), and SHAP-based explainability. The proposed method predicted with 99.8% accuracy using the Electrical grid Stability Simulated Dataset (UCI) and minimized the instability rates surpassing 85 percent in all operating conditions. Meeting real-time operating needs, FPGA deployment on a Xilinx Zynq UltraScale+ provided 3.1 ms latency and 5 times reduced energy consumption against CPU processing. By emphasizing bus voltage and frequency as major in

View Publication Preview PDF

1 2 ... 7 8 9 10 ... 2255 2256