Human posture estimation is a crucial topic in the computer vision field and has become a hotspot for research in many human behaviors related work. Human pose estimation can be understood as the human key point recognition and connection problem. The paper presents an optimized symmetric spatial transformation network designed to connect with single-person pose estimation network to propose high-quality human target frames from inaccurate human bounding boxes, and introduces parametric pose non-maximal suppression to eliminate redundant pose estimation, and applies an elimination rule to eliminate similar pose to obtain unique human pose estimation results. The exploratory outcomes demonstrate the way that the proposed technique can pre
... Show MoreNowadays, people's expression on the Internet is no longer limited to text, especially with the rise of the short video boom, leading to the emergence of a large number of modal data such as text, pictures, audio, and video. Compared to single mode data ,the multi-modal data always contains massive information. The mining process of multi-modal information can help computers to better understand human emotional characteristics. However, because the multi-modal data show obvious dynamic time series features, it is necessary to solve the dynamic correlation problem within a single mode and between different modes in the same application scene during the fusion process. To solve this problem, in this paper, a feature extraction framework of
... Show MoreThis paper offers a systemic review of the deep learning methods to detect violence on campus, which is a critical issue in intelligent surveillance to improve the student safety and prompt cut off of violent accidents. The review reviews studies published 2018-2025, concentrating on model structure to detect fights, bullying, vandalism, and aggressive behavior on problematic campuses due to occlusion and light variations and complicated human interactions. The research design includes a comparative study of different deep learning networks, such as CNNs, RNNs, 3D CNNs, attention-based networks, transformers, graph neural networks, neuro-fuzzy, and multimodal systems and federated learning methods. The paper also assesses benchmark
... Show MoreMethods of speech recognition have been the subject of several studies over the past decade. Speech recognition has been one of the most exciting areas of the signal processing. Mixed transform is a useful tool for speech signal processing; it is developed for its abilities of improvement in feature extraction. Speech recognition includes three important stages, preprocessing, feature extraction, and classification. Recognition accuracy is so affected by the features extraction stage; therefore different models of mixed transform for feature extraction were proposed. The properties of the recorded isolated word will be 1-D, which achieve the conversion of each 1-D word into a 2-D form. The second step of the word recognizer requires, the
... Show MoreInformation processing has an important application which is speech recognition. In this paper, a two hybrid techniques have been presented. The first one is a 3-level hybrid of Stationary Wavelet Transform (S) and Discrete Wavelet Transform (W) and the second one is a 3-level hybrid of Discrete Wavelet Transform (W) and Multi-wavelet Transforms (M). To choose the best 3-level hybrid in each technique, a comparison according to five factors has been implemented and the best results are WWS, WWW, and MWM. Speech recognition is performed on WWS, WWW, and MWM using Euclidean distance (Ecl) and Dynamic Time Warping (DTW). The match performance is (98%) using DTW in MWM, while in the WWS and WWW are (74%) and (78%) respectively, but when using (
... Show More