Nowadays, people's expression on the Internet is no longer limited to text, especially with the rise of the short video boom, leading to the emergence of a large number of modal data such as text, pictures, audio, and video. Compared to single mode data ,the multi-modal data always contains massive information. The mining process of multi-modal information can help computers to better understand human emotional characteristics. However, because the multi-modal data show obvious dynamic time series features, it is necessary to solve the dynamic correlation problem within a single mode and between different modes in the same application scene during the fusion process. To solve this problem, in this paper, a feature extraction framework of the three-dimensional dynamic expansion is established based on the common multi-modal data, for example video , sound ,text.Based on the framework, a multi-modal fusion-matched framework based on spatial and temporal feature enhancement, respectively to solve the dynamic correlation within and between modes, and then model the short and long term dynamic correlation information between different modes based on the proposed framework. Multiple group experiments performed on MOSI datasets show that the emotion recognition model constructed based on the framework proposed here in this paper can better utilize the more complex complementary information between different modal data. Compared with other multi-modal data fusion models, the spatial-temporal attention-based multimodal data fusion framework proposed in this paper significantly improves the emotion recognition rate and accuracy when applied to multi-modal emotion analysis, so it is more feasible and effective.
Abstract
This study aims to identify the extent to which the criteria of the American Council for Teaching Foreign Languages (ACTFL) are included in the English language books for the fifth and sixth graders. To achieve the objective of the study, a content analysis card was prepared, where the classification of language proficiencies was divided into five main levels (beginner, intermediate, advanced, superior, and distinguished) of the four language skills (listening, speaking, reading, and writing), The content analysis card consisted of (89) indicators distributed at the four levels of language skills as follows: Listening (17), speaking (33), reading (15), and writing (26). The study sample consisted of Engl
... Show MoreThis study seeks to identify the possibility of achieving the property of faithful representation of accounting information and measure it by using the standard approach based on mathematical and statistical equations by comparing two financial periods before and after the application of (IFRS-15) Revenue from contracts with customers, during the period. (2014-2018), for the financial statements of the mixed joint stock companies listed on the Iraq Stock Exchange, which is one of the main pillars of the economic structure of the country, as a joint investment between the state and the private sector, and has importance in many aspects, including support for projects of public companies, S Absorption and employment of labor, as well as ra
... Show MoreThis field experiment, was conducted to investigate a comparison of two methods for harvesting potatoes: mechanical and handy when using moldboard and chisel plow for primary tillage and three different distances for planting tubers in the rows 15, 25, and 35 cm in silt clay loam soil south of Baghdad. The factorial experiment followed a randomized complete block design with three replications using L.S.D. 5 % and 1 %. Mechanical harvest recorded the best valid potato tubers at 88.78 %, marketable yield of 31.74 ton. ha-1, efficiency lifted 95.68 %, tubers damage index 28.41, speeding up the harvesting process and reducing time and effort. Handy harvest gave the least damage to potato tubers, 6.02 %, and unlifted potato tubers, 4.32 %. Howe
... Show More