Beyond the immediate content of speech, the voice can provide rich information about a speaker's demographics, including age and gender. Estimating a speaker's age and gender offers a wide range of applications, spanning from voice forensic analysis to personalized advertising, healthcare monitoring, and human-computer interaction. However, pinpointing precise age remains intricate due to age ambiguity. Specifically, utterances from individuals at adjacent ages are frequently indistinguishable. Addressing this, we propose a novel, end-to-end approach that deploys Mozilla's Common Voice dataset to transform raw audio into high-quality feature representations using Wav2Vec2.0 embeddings. These are then channeled into our self-attention-based convolutional neural network (CNN) model. To address age ambiguity, we evaluate the effects of different loss functions such as focal loss and Kullback-Leibler (KL) divergence loss. Additionally, we evaluate the accuracy of the estimation at different durations of speech. Experimental results from the Common Voice dataset underscore the efficacy of our approach, showcasing an accuracy of 87% for male speakers, 91% for female speakers and 89% overall accuracy, and an accuracy of 99.1% for gender prediction.
Microservice architecture offers many advantages, especially for business applications, due to its flexibility, expandability, and loosely coupled structure for ease of maintenance. However, there are several disadvantages that stem from the features of microservices, such as the fact that microservices are independent in nature can hinder meaningful communication and make data synchronization more challenging. This paper addresses the issues by proposing a containerized microservices in an asynchronous event-driven architecture. This architecture encloses microservices in containers and implements an event manager to keep track of all the events in an event log to reduce errors in the application. Experiment results show a decline in re
... Show MoreData-driven models perform poorly on part-of-speech tagging problems with the square Hmong language, a low-resource corpus. This paper designs a weight evaluation function to reduce the influence of unknown words. It proposes an improved harmony search algorithm utilizing the roulette and local evaluation strategies for handling the square Hmong part-of-speech tagging problem. The experiment shows that the average accuracy of the proposed model is 6%, 8% more than HMM and BiLSTM-CRF models, respectively. Meanwhile, the average F1 of the proposed model is also 6%, 3% more than HMM and BiLSTM-CRF models, respectively.
Research aims to know the impact beyond the defined in the collection. The research community is the second school students at Baghdad University and a research sample (63) students, the number of experimental group (27) students and a control group (30) students. The researcher was rewarded in variable lifetime for students and educational attainment and educational level of the parents and the educational level of mothers. The researcher has developed a test took the number of paragraphs (20). A test was true after it has been submitted to the Group of arbitrators. The test was consistent with test method used and the reliability coefficient (0, 88). Either the statistical methods used by the researcher are: Pearson correla
... Show MoreThe laboratory experiment was conducted in the laboratories of the Musayyib Bridge Company for Molecular Analyzes in the year 2021-2022 to study the molecular analysis of the inbreed lines and their hybrids F1 to estimate the genetic variation at the level of DNA shown by the selected pure inbreed lines and the resulting hybrids F1 of the flowering gene. Five pure inbreed lines of maize were selected (ZA17WR) Late, ZM74, Late, ZM19, Early ZM49WZ (Zi17WZ, Late, ZM49W3E) and their resulting hybrids, according to the study objective, from fifteen different inbreed lines with flowering time. The five inbreed lines were planted for four seasons (spring and fall 2019) and (spring and fall 2
In this study, structures damage identification method based on changes in the dynamic characteristics
(frequencies) of the structure are examined, stiffness as well as mass matrices of the curved
(in and out-of-plane vibration) beam elements is formulated using Hamilton's principle. Each node
of both of them possesses seven degrees of freedom including the warping degree of freedom. The
curved beam element had been derived based on the Kang and Yoo’s thin-walled curved beam theory
in 1994. A computer program was developing to carry out free vibration analyses of the curved
beam as well as straight beam. Comparing with the frequencies for other researchers using the general
purpose program MATLAB. Fuzzy logic syste
Investigating the strength and the relationship between the Self-organized learning strategies and self-competence among talented students was the aim of this study. To do this, the researcher employed the correlation descriptive approach, whereby a sample of (120) male and female student were selected from various Iraqi cities for the academic year 2015-2016. the researcher setup two scales based on the previous studies: one to measure the Self-organized learning strategies which consist of (47) item and the other to measure the self-competence that composed of (50) item. Both of these scales were applied on the targeted sample to collect the required data
Information processing has an important application which is speech recognition. In this paper, a two hybrid techniques have been presented. The first one is a 3-level hybrid of Stationary Wavelet Transform (S) and Discrete Wavelet Transform (W) and the second one is a 3-level hybrid of Discrete Wavelet Transform (W) and Multi-wavelet Transforms (M). To choose the best 3-level hybrid in each technique, a comparison according to five factors has been implemented and the best results are WWS, WWW, and MWM. Speech recognition is performed on WWS, WWW, and MWM using Euclidean distance (Ecl) and Dynamic Time Warping (DTW). The match performance is (98%) using DTW in MWM, while in the WWS and WWW are (74%) and (78%) respectively, but when using (
... Show MoreThe sound effects in TV dramas achievements have become very important not only in terms of function and implementation, but at a greater and wider level in terms of artistic and aesthetic values, which are produced and employed in the most important world artistic achievements of drama, using the latest and most prominent technologies and equipment and according to the expressive and dramatic values expressed by these modern digital sound effects. Therefore, the researcher chose the aesthetic effect of digital sound effects in television drama to identify the aesthetic aspects provided by digital sound effects by employing them and their accompaniment for the image.
The researcher, therefor, divided this study into the methodolo
... Show More