Today with increase using social media, a lot of researchers have interested in topic extraction from Twitter. Twitter is an unstructured short text and messy that it is critical to find topics from tweets. While topic modeling algorithms such as Latent Semantic Analysis (LSA) and Latent Dirichlet Allocation (LDA) are originally designed to derive topics from large documents such as articles, and books. They are often less efficient when applied to short text content like Twitter. Luckily, Twitter has many features that represent the interaction between users. Tweets have rich user-generated hashtags as keywords. In this paper, we exploit the hashtags feature to improve topics learned from Twitter content without modifying the basic topic model of LSA and LDA. Users who share the same hashtag at most discuss the same topic. We compare the performance of the two methods (LSA and LDA) using the topic coherence (with and without hashtags). The experiment result on the Twitter dataset showed that LSA has better coherence score with hashtags than that do not incorporate hashtags. In contrast, our experiments show that the LDA has a better coherence score without incorporating hashtags. Finally, LDA has a better coherence score than LSA and the best coherence result obtained from the LDA method was (0.6047) and the LSA method was (0.4744) but the number of topics in LDA was higher than LSA. Thus, LDA may cause the same tweets to discuss the same subject set into different clustering.
Abstract
This study aims to identify the extent to which the criteria of the American Council for Teaching Foreign Languages (ACTFL) are included in the English language books for the fifth and sixth graders. To achieve the objective of the study, a content analysis card was prepared, where the classification of language proficiencies was divided into five main levels (beginner, intermediate, advanced, superior, and distinguished) of the four language skills (listening, speaking, reading, and writing), The content analysis card consisted of (89) indicators distributed at the four levels of language skills as follows: Listening (17), speaking (33), reading (15), and writing (26). The study sample consisted of Engl
... Show MoreA frequently used approach for denoising is the shrinkage of coefficients of the noisy signal representation in a transform domain. This paper proposes an algorithm based on hybrid transform (stationary wavelet transform proceeding by slantlet transform); The slantlet transform is applied to the approximation subband of the stationary wavelet transform. BlockShrink thresholding technique is applied to the hybrid transform coefficients. This technique can decide the optimal block size and thresholding for every wavelet subband by risk estimate (SURE). The proposed algorithm was executed by using MATLAB R2010aminimizing Stein’s unbiased with natural images contaminated by white Gaussian noise. Numerical results show that our algorithm co
... Show More