BEYOND WORDS: HARNESSING SPEECH SOUND FOR SPEAKER AGE AND GENDER DETECTION USING 1D CNN ARCHITECTURE WITH SELF-ATTENTION MECHANISM

Umniah Hameed jaid

doi:10.5455/jjcit.71-1703265368

Details

Publication Date

Mon Jan 01 2024

Journal Name

Jordanian Journal Of Computers And Information Technology

DOI

10.5455/jjcit.71-1703265368

Choose Citation Style

Statistics

View publication

7

Statistics

BEYOND WORDS: HARNESSING SPEECH SOUND FOR SPEAKER AGE AND GENDER DETECTION USING 1D CNN ARCHITECTURE WITH SELF-ATTENTION MECHANISM

Umniah Hameed jaid

...Show More Authors

Beyond the immediate content of speech, the voice can provide rich information about a speaker's demographics, including age and gender. Estimating a speaker's age and gender offers a wide range of applications, spanning from voice forensic analysis to personalized advertising, healthcare monitoring, and human-computer interaction. However, pinpointing precise age remains intricate due to age ambiguity. Specifically, utterances from individuals at adjacent ages are frequently indistinguishable. Addressing this, we propose a novel, end-to-end approach that deploys Mozilla's Common Voice dataset to transform raw audio into high-quality feature representations using Wav2Vec2.0 embeddings. These are then channeled into our self-attention-based convolutional neural network (CNN) model. To address age ambiguity, we evaluate the effects of different loss functions such as focal loss and Kullback-Leibler (KL) divergence loss. Additionally, we evaluate the accuracy of the estimation at different durations of speech. Experimental results from the Common Voice dataset underscore the efficacy of our approach, showcasing an accuracy of 87% for male speakers, 91% for female speakers and 89% overall accuracy, and an accuracy of 99.1% for gender prediction.

View Publication

Publication Date

Tue Aug 01 2023

Journal Name

International Journal Of Online And Biomedical Engineering (ijoe)

End-to-End Speaker Profiling Using 1D CNN Architectures and Filter Bank Initialization

Umniah

...Show More Authors

The automatic estimation of speaker characteristics, such as height, age, and gender, has various applications in forensics, surveillance, customer service, and many human-robot interaction applications. These applications are often required to produce a response promptly. This work proposes a novel approach to speaker profiling by combining filter bank initializations, such as continuous wavelets and gammatone filter banks, with one-dimensional (1D) convolutional neural networks (CNN) and residual blocks. The proposed end-to-end model goes from the raw waveform to an estimated height, age, and gender of the speaker by learning speaker representation directly from the audio signal without relying on handcrafted and pre-computed acou

View Publication

(2)

(3)

Publication Date

Mon Nov 24 2025

Journal Name

Baghdad Science Journal

Transformer Network on Global Self-Attention Mechanism for Brain Tumor Segmentation

Ammar Awni Abbas

Mohammed

Ammar

...Show More Authors

Transformers are a specific category of neural network design. Transformers often depend on extensive pre-training on a large scale and exhibit a notable degree of computational complexity. The disadvantage of using this method is a significant increase in computational complexity, which necessitates a significant commitment of time and computing resources in order to successfully work with these models. Transformer networks possess the desirable benefit of extracting distant characteristics effectively via their self-attention mechanism. In this paper, the Global Self-Attention Transformer module is applied to tackle these issues. The model is based on a segmentation problem called Brain-GS that works as a mechanism and encompasses

View Publication

Publication Date

Mon Feb 27 2023

Journal Name

Tem Journal

Predicting Age and Gender Using AlexNet

Qaswaa Khaled

Farah khiled

...Show More Authors

Due to the availability of technology stemming from in-depth research in this sector and the drawbacks of other identifying methods, biometrics has drawn maximum attention and established itself as the most reliable alternative for recognition in recent years. Efforts are still being made to develop a user-friendly system that is up to par with security-system requirements and yields more reliable outcomes while safeguarding assets and ensuring privacy. Human age estimation and Gender identification are both challenging endeavours. Biomarkers and methods for determining biological age and gender have been extensively researched, and each has advantages and disadvantages. Facial-image-based positioning is crucial for many application

View Publication

(13)

(3)

Publication Date

Mon Jun 01 2026

Journal Name

Statistics, Optimization & Information Computing

Predicting Public Budget Surplus and Deficit Using a Hybrid 1D-CNN–LSTM Model

Sulaiman Hussien

Munaf Yousif

Zahraa Yousif

...Show More Authors

The fiscal position of governments in rentier economies depends heavily on oil revenues. The relationship between oil prices and the budget surplus or deficit is often nonlinear and characterized by complex temporal dependencies, which may limit the predictive capability of conventional econometric models. Accordingly, this study aims to forecast the Iraqi budget surplus and deficit and compare the predictive performance of the ARDL, NARDL, LSTM, 1D-CNN, and hybrid 1D-CNN-LSTM models using oil prices as the primary predictive variable. The hybrid model integrates the feature-extraction capability of One-Dimensional Convolutional Neural Networks (1D-CNN) with the ability of Long Short-Term Memory (LSTM) networks to capture long-term

Publication Date

Thu Feb 07 2019

Journal Name

Journal Of The College Of Education For Women

SPEECH RECOGNITION OF ARABIC WORDS USING ARTIFICIAL NEURAL NETWORKS

Dr. Sadiq jassim

...Show More Authors

The speech recognition system has been widely used by many researchers using different
methods to fulfill a fast and accurate system. Speech signal recognition is a typical
classification problem, which generally includes two main parts: feature extraction and
classification. In this paper, a new approach to achieve speech recognition task is proposed by
using transformation techniques for feature extraction methods; namely, slantlet transform
(SLT), discrete wavelet transforms (DWT) type Daubechies Db1 and Db4. Furthermore, a
modified artificial neural network (ANN) with dynamic time warping (DTW) algorithm is
developed to train a speech recognition system to be used for classification and recognition
purposes. T

View Publication Preview PDF

Publication Date

Sun Jan 01 2023

Journal Name

Ieee Access

Fuzzy-Based Ensemble Feature Selection for Automated Estimation of Speaker Height and Age Using Vocal Characteristics

Umniah

...Show More Authors

View Publication

(2)

(3)

Publication Date

Sat Jan 01 2022

Journal Name

Proceedings Of International Conference On Computing And Communication Networks

Speech Gender Recognition Using a Multilayer Feature Extraction Method

Husam

...Show More Authors

View Publication

(2)

(1)

Publication Date

Thu Oct 01 2020

Journal Name

Ieee Transactions On Artificial Intelligence

Recursive Multi-Signal Temporal Fusions With Attention Mechanism Improves EMG Feature Extraction

Rami N.

Angkoon

Ali H.

Erik

...Show More Authors

View Publication

(43)

(39)

Publication Date

Sun Feb 10 2019

Journal Name

Iraqi National Journal Of Nursing Specialties

Self-Esteem and its Relationship with the Age, Gender and academic Achievement among the students of the south Iraq Colleges of Nursing

Academic

Academic Achievement

Self-Esteem

Hayder

Kareem

...Show More Authors

Abstract
Objectives: this study aims to: (1). Assess self-esteem level and academic achievement for students of nursing colleges in southern Iraq. (2). Determine the relationship between levels of self-esteem and academic achievement of the student in the first semester. (3). Identify differences of self-esteem with gender and different age groups.
Methodology: a sample of (426 students) was purposively selected then collected by using a questionnaire which consisted of: I- Sociodemographic characteristics for assessing some important aspects of students, II- Rosenberg's Self-Esteem Scale (RSES) III- Iraq Grading Scale for assessing student achievement. Finally statistical analysis (SPSS) for data processing.
Results: study resu

View Publication Preview PDF

Publication Date

Sat Jan 01 2022

Journal Name

Proceedings Of International Conference On Computing And Communication Networks

Speech Age Estimation Using a Ranking Convolutional Neural Network

Husam

...Show More Authors

View Publication

(3)

1 2 3 4 ... 2253 2254 2255 2256