Twitter data analysis is an emerging field of research that utilizes data collected from Twitter to address many issues such as disaster response, sentiment analysis, and demographic studies. The success of data analysis relies on collecting accurate and representative data of the studied group or phenomena to get the best results. Various twitter analysis applications rely on collecting the locations of the users sending the tweets, but this information is not always available. There are several attempts at estimating location based aspects of a tweet. However, there is a lack of attempts on investigating the data collection methods that are focused on location. In this paper, we investigate the two methods for obtaining location-based data provided by Twitter API, Twitter places and Geocode parameters. We studied these methods to determine their accuracy and their suitability for research. The study concludes that the places method is the more accurate, but it excludes a lot of the data, while the geocode method provides us with more data, but special attention needs to be paid to outliers. Copyright © Research Institute for Intelligent Computer Systems, 2018. All rights reserved.
Big data usually running in large-scale and centralized key management systems. However, the centralized key management systems are increasing the problems such as single point of failure, exchanging a secret key over insecure channels, third-party query, and key escrow problem. To avoid these problems, we propose an improved certificate-based encryption scheme that ensures data confidentiality by combining symmetric and asymmetric cryptography schemes. The combination can be implemented by using the Advanced Encryption Standard (AES) and Elliptic Curve Diffie-Hellman (ECDH). The proposed scheme is an enhanced version of the Certificate-Based Encryption (CBE) scheme and preserves all its advantages. However
... Show MoreThe advancements in Information and Communication Technology (ICT), within the previous decades, has significantly changed people’s transmit or store their information over the Internet or networks. So, one of the main challenges is to keep these information safe against attacks. Many researchers and institutions realized the importance and benefits of cryptography in achieving the efficiency and effectiveness of various aspects of secure communication.This work adopts a novel technique for secure data cryptosystem based on chaos theory. The proposed algorithm generate 2-Dimensional key matrix having the same dimensions of the original image that includes random numbers obtained from the 1-Dimensional logistic chaotic map for given con
... Show MoreToday, problems of spatial data integration have been further complicated by the rapid development in communication technologies and the increasing amount of available data sources on the World Wide Web. Thus, web-based geospatial data sources can be managed by different communities and the data themselves can vary in respect to quality, coverage, and purpose. Integrating such multiple geospatial datasets remains a challenge for geospatial data consumers. This paper concentrates on the integration of geometric and classification schemes for official data, such as Ordnance Survey (OS) national mapping data, with volunteered geographic information (VGI) data, such as the data derived from the OpenStreetMap (OSM) project. Useful descriptions o
... Show MoreThe influx of data in bioinformatics is primarily in the form of DNA, RNA, and protein sequences. This condition places a significant burden on scientists and computers. Some genomics studies depend on clustering techniques to group similarly expressed genes into one cluster. Clustering is a type of unsupervised learning that can be used to divide unknown cluster data into clusters. The k-means and fuzzy c-means (FCM) algorithms are examples of algorithms that can be used for clustering. Consequently, clustering is a common approach that divides an input space into several homogeneous zones; it can be achieved using a variety of algorithms. This study used three models to cluster a brain tumor dataset. The first model uses FCM, whic
... Show MoreAttacking a transferred data over a network is frequently happened millions time a day. To address this problem, a secure scheme is proposed which is securing a transferred data over a network. The proposed scheme uses two techniques to guarantee a secure transferring for a message. The message is encrypted as a first step, and then it is hided in a video cover. The proposed encrypting technique is RC4 stream cipher algorithm in order to increase the message's confidentiality, as well as improving the least significant bit embedding algorithm (LSB) by adding an additional layer of security. The improvement of the LSB method comes by replacing the adopted sequential selection by a random selection manner of the frames and the pixels wit
... Show MoreDatabase is characterized as an arrangement of data that is sorted out and disseminated in a way that allows the client to get to the data being put away in a simple and more helpful way. However, in the era of big-data the traditional methods of data analytics may not be able to manage and process the large amount of data. In order to develop an efficient way of handling big-data, this work studies the use of Map-Reduce technique to handle big-data distributed on the cloud. This approach was evaluated using Hadoop server and applied on EEG Big-data as a case study. The proposed approach showed clear enhancement for managing and processing the EEG Big-data with average of 50% reduction on response time. The obtained results provide EEG r
... Show MoreResearchers employ behavior based malware detection models that depend on API tracking and analyzing features to identify suspected PE applications. Those malware behavior models become more efficient than the signature based malware detection systems for detecting unknown malwares. This is because a simple polymorphic or metamorphic malware can defeat signature based detection systems easily. The growing number of computer malwares and the detection of malware have been the concern for security researchers for a large period of time. The use of logic formulae to model the malware behaviors is one of the most encouraging recent developments in malware research, which provides alternatives to classic virus detection methods. To address the l
... Show MoreLongitudinal data is becoming increasingly common, especially in the medical and economic fields, and various methods have been analyzed and developed to analyze this type of data.
In this research, the focus was on compiling and analyzing this data, as cluster analysis plays an important role in identifying and grouping co-expressed subfiles over time and employing them on the nonparametric smoothing cubic B-spline model, which is characterized by providing continuous first and second derivatives, resulting in a smoother curve with fewer abrupt changes in slope. It is also more flexible and can pick up on more complex patterns and fluctuations in the data.
The longitudinal balanced data profile was compiled into subgroup
... Show More