Recognizing speech emotions is an important subject in pattern recognition. This work is about studying the effect of extracting the minimum possible number of features on the speech emotion recognition (SER) system. In this paper, three experiments performed to reach the best way that gives good accuracy. The first one extracting only three features: zero crossing rate (ZCR), mean, and standard deviation (SD) from emotional speech samples, the second one extracting only the first 12 Mel frequency cepstral coefficient (MFCC) features, and the last experiment applying feature fusion between the mentioned features. In all experiments, the features are classified using five types of classification techniques, which are the Random Forest (RF), k-Nearest Neighbor (k-NN), Sequential Minimal Optimization (SMO), Naïve Bayes (NB), and Decision Tree (DT). The performance of the system validated over Surrey Audio-Visual Expressed Emotion (SAVEE) dataset for seven emotions. The results of the experiments showed given good accuracy compared with the previous studies using a fusion of a few numbers of features with the RF classifier.
A new algorithm is proposed to compress speech signals using wavelet transform and linear predictive coding. Signal compression based on the concept of selecting a small number of approximation coefficients after they are compressed by the wavelet decomposition (Haar and db4) at a suitable chosen level and ignored details coefficients, and then approximation coefficients are windowed by a rectangular window and fed to the linear predictor. Levinson Durbin algorithm is used to compute LP coefficients, reflection coefficients and predictor error. The compress files contain LP coefficients and previous sample. These files are very small in size compared to the size of the original signals. Compression ratio is calculated from the size of th
... Show MoreIn this work , a hybrid scheme tor Arabic speech for the recognition
of the speaker verification is presented . The scheme is hybrid as utilizes the traditional digi tal signal processi ng and neural network . Kohonen neural network has been used as a recognizer tor speaker verification after extract spectral features from an acoustic signal by Fast Fourier Transformation Algorithm(FFT) .
The system was im plemented using a PENTIUM processor , I000
MHZ compatible and MS-dos 6.2 .
Speech is the first invented way of communication that human used age before the invention of writing. In this paper, proposed method for speech analyses to extract features by using multiwavelet Transform (Repeated Row Preprocessing).The proposed system depends on the Euclidian differences of the coefficients of the multiwavelet Transform to determine the beast features of speech recognition. Each sample value in the reference file is computed by taking the average value of four samples for the same data (four speakers for the same phoneme). The result of the input data to every frame value in the reference file using the Euclidian distance to determine the frame with the minimum distance is said to be the "Best Match". Simulatio
... Show MoreRoot-finding is an oldest classical problem, which is still an important research topic, due to its impact on computational algebra and geometry. In communications systems, when the impulse response of the channel is minimum phase the state of equalization algorithm is reduced and the spectral efficiency will improved. To make the channel impulse response minimum phase the prefilter which is called minimum phase filter is used, the adaptation of the minimum phase filter need root finding algorithm. In this paper, the VHDL implementation of the root finding algorithm introduced by Clark and Hau is introduced.
VHDL program is used in the work, to find the roots of two channels and make them minimum phase, the obtained output results are
Nowadays, people's expression on the Internet is no longer limited to text, especially with the rise of the short video boom, leading to the emergence of a large number of modal data such as text, pictures, audio, and video. Compared to single mode data ,the multi-modal data always contains massive information. The mining process of multi-modal information can help computers to better understand human emotional characteristics. However, because the multi-modal data show obvious dynamic time series features, it is necessary to solve the dynamic correlation problem within a single mode and between different modes in the same application scene during the fusion process. To solve this problem, in this paper, a feature extraction framework of
... Show MoreAbstract— The growing use of digital technologies across various sectors and daily activities has made handwriting recognition a popular research topic. Despite the continued relevance of handwriting, people still require the conversion of handwritten copies into digital versions that can be stored and shared digitally. Handwriting recognition involves the computer's strength to identify and understand legible handwriting input data from various sources, including document, photo-graphs and others. Handwriting recognition pose a complexity challenge due to the diversity in handwriting styles among different individuals especially in real time applications. In this paper, an automatic system was designed to handwriting recognition
... Show More