Beyond the immediate content of speech, the voice can provide rich information about a speaker's demographics, including age and gender. Estimating a speaker's age and gender offers a wide range of applications, spanning from voice forensic analysis to personalized advertising, healthcare monitoring, and human-computer interaction. However, pinpointing precise age remains intricate due to age ambiguity. Specifically, utterances from individuals at adjacent ages are frequently indistinguishable. Addressing this, we propose a novel, end-to-end approach that deploys Mozilla's Common Voice dataset to transform raw audio into high-quality feature representations using Wav2Vec2.0 embeddings. These are then channeled into our self-attention-based convolutional neural network (CNN) model. To address age ambiguity, we evaluate the effects of different loss functions such as focal loss and Kullback-Leibler (KL) divergence loss. Additionally, we evaluate the accuracy of the estimation at different durations of speech. Experimental results from the Common Voice dataset underscore the efficacy of our approach, showcasing an accuracy of 87% for male speakers, 91% for female speakers and 89% overall accuracy, and an accuracy of 99.1% for gender prediction.
This research describes a new model inspired by Mobilenetv2 that was trained on a very diverse dataset. The goal is to enable fire detection in open areas to replace physical sensor-based fire detectors and reduce false alarms of fires, to achieve the lowest losses in open areas via deep learning. A diverse fire dataset was created that combines images and videos from several sources. In addition, another self-made data set was taken from the farms of the holy shrine of Al-Hussainiya in the city of Karbala. After that, the model was trained with the collected dataset. The test accuracy of the fire dataset that was trained with the new model reached 98.87%.
Abstract— The growing use of digital technologies across various sectors and daily activities has made handwriting recognition a popular research topic. Despite the continued relevance of handwriting, people still require the conversion of handwritten copies into digital versions that can be stored and shared digitally. Handwriting recognition involves the computer's strength to identify and understand legible handwriting input data from various sources, including document, photo-graphs and others. Handwriting recognition pose a complexity challenge due to the diversity in handwriting styles among different individuals especially in real time applications. In this paper, an automatic system was designed to handwriting recognition
... Show MoreIn modern years, internet and computers were used by many nations all overhead the world in different domains. So the number of Intruders is growing day-by-day posing a critical problem in recognizing among normal and abnormal manner of users in the network. Researchers have discussed the security concerns from different perspectives. Network Intrusion detection system which essentially analyzes, predicts the network traffic and the actions of users, then these behaviors will be examined either anomaly or normal manner. This paper suggested Deep analyzing system of NIDS to construct network intrusion detection system and detecting the type of intrusions in traditional network. The performance of the proposed system was evaluated by using
... Show MoreThe issue of image captioning, which comprises automatic text generation to understand an image’s visual information, has become feasible with the developments in object recognition and image classification. Deep learning has received much interest from the scientific community and can be very useful in real-world applications. The proposed image captioning approach involves the use of Convolution Neural Network (CNN) pre-trained models combined with Long Short Term Memory (LSTM) to generate image captions. The process includes two stages. The first stage entails training the CNN-LSTM models using baseline hyper-parameters and the second stage encompasses training CNN-LSTM models by optimizing and adjusting the hyper-parameters of
... Show MoreThis study aimed to determine the effects of age, gender, and allergen type on serum immunoglobulin E (IgE) levels in asthma (AS) and allergic rhinitis (AR) patients. Sixty AS patients, 52 AR patients, and 61 controls were enrolled in the study. Sera of participants were assessed for total IgE level and specific IgE antibody against four allergen types (animal dander, grasses, mites, and molds). The results revealed that median level of total IgE was significantly increased in AS (218.9 IU/mL; p-value < 0.001) and AR (244.3 IU/mL; p-value < 0.001) patients compared to controls (167.1 IU/mL), while, there was no significant difference between AS and AR patients (p-value = 0.270). Logistic regression
... Show MoreThe interest in Multi social skills and self-concept is extremely important for many of the scholars of education and psychology has taken a great deal in their writings and their interests as they see that social skills training is to make sure of the same, and that whenever enable the individual from acquiring social skills whenever asserted itself.The research aims know social skills and self-concept and their relationship to the children Riyadh age (4-6 years), and the research sample consisted of(200) boys and girls from kindergarten in the city of Baghdad Bjanbey Rusafa second and Karkh second.And to the objectives of the research realized the researcher has built two measures of social skills a
... Show MoreBuilding a system to identify individuals through their speech recording can find its application in diverse areas, such as telephone shopping, voice mail and security control. However, building such systems is a tricky task because of the vast range of differences in the human voice. Thus, selecting strong features becomes very crucial for the recognition system. Therefore, a speaker recognition system based on new spin-image descriptors (SISR) is proposed in this paper. In the proposed system, circular windows (spins) are extracted from the frequency domain of the spectrogram image of the sound, and then a run length matrix is built for each spin, to work as a base for feature extraction tasks. Five different descriptors are generated fro
... Show MoreDBNRSK Sayed, Journal of Strategic Research in Social Science (JoSReSS), 2020