Twitter data analysis is an emerging field of research that utilizes data collected from Twitter to address many issues such as disaster response, sentiment analysis, and demographic studies. The success of data analysis relies on collecting accurate and representative data of the studied group or phenomena to get the best results. Various twitter analysis applications rely on collecting the locations of the users sending the tweets, but this information is not always available. There are several attempts at estimating location based aspects of a tweet. However, there is a lack of attempts on investigating the data collection methods that are focused on location. In this paper, we investigate the two methods for obtaining location-based data provided by Twitter API, Twitter places and Geocode parameters. We studied these methods to determine their accuracy and their suitability for research. The study concludes that the places method is the more accurate, but it excludes a lot of the data, while the geocode method provides us with more data, but special attention needs to be paid to outliers. Copyright © Research Institute for Intelligent Computer Systems, 2018. All rights reserved.
The influx of data in bioinformatics is primarily in the form of DNA, RNA, and protein sequences. This condition places a significant burden on scientists and computers. Some genomics studies depend on clustering techniques to group similarly expressed genes into one cluster. Clustering is a type of unsupervised learning that can be used to divide unknown cluster data into clusters. The k-means and fuzzy c-means (FCM) algorithms are examples of algorithms that can be used for clustering. Consequently, clustering is a common approach that divides an input space into several homogeneous zones; it can be achieved using a variety of algorithms. This study used three models to cluster a brain tumor dataset. The first model uses FCM, whic
... Show MoreGround-based active optical sensors (GBAOS) have been successfully used in agriculture to predict crop yield potential (YP) early in the season and to improvise N rates for optimal crop yield. However, the models were found weak or inconsistent due to environmental variation especially rainfall. The objectives of the study were to evaluate if GBAOS could predict YP across multiple locations, soil types, cultivation systems, and rainfall differences. This study was carried from 2011 to 2013 on corn (Zea mays L.) in North Dakota, and in 2017 in potatoes in Maine. Six N rates were used on 50 sites in North Dakota and 12 N rates on two sites, one dryland and one irrigated, in Maine. Two active GBAOS used for this study were GreenSeeker and Holl
... Show MoreBig data analysis is essential for modern applications in areas such as healthcare, assistive technology, intelligent transportation, environment and climate monitoring. Traditional algorithms in data mining and machine learning do not scale well with data size. Mining and learning from big data need time and memory efficient techniques, albeit the cost of possible loss in accuracy. We have developed a data aggregation structure to summarize data with large number of instances and data generated from multiple data sources. Data are aggregated at multiple resolutions and resolution provides a trade-off between efficiency and accuracy. The structure is built once, updated incrementally, and serves as a common data input for multiple mining an
... Show More
Abstract:
We can notice cluster data in social, health and behavioral sciences, so this type of data have a link between its observations and we can express these clusters through the relationship between measurements on units within the same group.
In this research, I estimate the reliability function of cluster function by using the seemingly unrelate
... Show MoreThis paper proposes a new encryption method. It combines two cipher algorithms, i.e., DES and AES, to generate hybrid keys. This combination strengthens the proposed W-method by generating high randomized keys. Two points can represent the reliability of any encryption technique. Firstly, is the key generation; therefore, our approach merges 64 bits of DES with 64 bits of AES to produce 128 bits as a root key for all remaining keys that are 15. This complexity increases the level of the ciphering process. Moreover, it shifts the operation one bit only to the right. Secondly is the nature of the encryption process. It includes two keys and mixes one round of DES with one round of AES to reduce the performance time. The W-method deals with
... Show MoreFinding orthogonal matrices in different sizes is very complex and important because it can be used in different applications like image processing and communications (eg CDMA and OFDM). In this paper we introduce a new method to find orthogonal matrices by using tensor products between two or more orthogonal matrices of real and imaginary numbers with applying it in images and communication signals processing. The output matrices will be orthogonal matrices too and the processing by our new method is very easy compared to other classical methods those use basic proofs. The results are normal and acceptable in communication signals and images but it needs more research works.
Most of the medical datasets suffer from missing data, due to the expense of some tests or human faults while recording these tests. This issue affects the performance of the machine learning models because the values of some features will be missing. Therefore, there is a need for a specific type of methods for imputing these missing data. In this research, the salp swarm algorithm (SSA) is used for generating and imputing the missing values in the pain in my ass (also known Pima) Indian diabetes disease (PIDD) dataset, the proposed algorithm is called (ISSA). The obtained results showed that the classification performance of three different classifiers which are support vector machine (SVM), K-nearest neighbour (KNN), and Naïve B
... Show MoreIn this study, we review the ARIMA (p, d, q), the EWMA and the DLM (dynamic linear moodelling) procedures in brief in order to accomdate the ac(autocorrelation) structure of data .We consider the recursive estimation and prediction algorithms based on Bayes and KF (Kalman filtering) techniques for correlated observations.We investigate the effect on the MSE of these procedures and compare them using generated data.
Mauddud formation is one of the most prominent formations in Northeastern Iraq due to its significant hydrocarbon reserves, making accurate geomechanical characterization essential for safe drilling operations and informed development planning. This study constructs a calibrated post-drill one dimensional mechanical earth model (1D-MEM) for selected wells, levering Techlog software to integrate rock mechanical data, image logs, multi-arm caliper measurements, conventional well logs, drilling reports, and core analyses. The methodology provides a detailed workflow for estimating geomechanical properties from log and image analysis to model calibration. Validation of the 1-D MEM performed through cross-comparison with direct me
... Show More