Big data analysis is essential for modern applications in areas such as healthcare, assistive technology, intelligent transportation, environment and climate monitoring. Traditional algorithms in data mining and machine learning do not scale well with data size. Mining and learning from big data need time and memory efficient techniques, albeit the cost of possible loss in accuracy. We have developed a data aggregation structure to summarize data with large number of instances and data generated from multiple data sources. Data are aggregated at multiple resolutions and resolution provides a trade-off between efficiency and accuracy. The structure is built once, updated incrementally, and serves as a common data input for multiple mining and learning algorithms. Data mining algorithms are modified to accept the aggregated data as input. Hierarchical data aggregation serves as a paradigm under which novel …
The development of information systems in recent years has contributed to various methods of gathering information to evaluate IS performance. The most common approach used to collect information is called the survey system. This method, however, suffers one major drawback. The decision makers consume considerable time to transform data from survey sheets to analytical programs. As such, this paper proposes a method called ‘survey algorithm based on R programming language’ or SABR, for data transformation from the survey sheets inside R environments by treating the arrangement of data as a relational format. R and Relational data format provide excellent opportunity to manage and analyse the accumulated data. Moreover, a survey syste
... Show MoreIn this research, a simple experiment in the field of agriculture was studied, in terms of the effect of out-of-control noise as a result of several reasons, including the effect of environmental conditions on the observations of agricultural experiments, through the use of Discrete Wavelet transformation, specifically (The Coiflets transform of wavelength 1 to 2 and the Daubechies transform of wavelength 2 To 3) based on two levels of transform (J-4) and (J-5), and applying the hard threshold rules, soft and non-negative, and comparing the wavelet transformation methods using real data for an experiment with a size of 26 observations. The application was carried out through a program in the language of MATLAB. The researcher concluded that
... Show MoreMixed-effects conditional logistic regression is evidently more effective in the study of qualitative differences in longitudinal pollution data as well as their implications on heterogeneous subgroups. This study seeks that conditional logistic regression is a robust evaluation method for environmental studies, thru the analysis of environment pollution as a function of oil production and environmental factors. Consequently, it has been established theoretically that the primary objective of model selection in this research is to identify the candidate model that is optimal for the conditional design. The candidate model should achieve generalizability, goodness-of-fit, parsimony and establish equilibrium between bias and variab
... Show MoreThe smart city concept has attracted high research attention in recent years within diverse application domains, such as crime suspect identification, border security, transportation, aerospace, and so on. Specific focus has been on increased automation using data driven approaches, while leveraging remote sensing and real-time streaming of heterogenous data from various resources, including unmanned aerial vehicles, surveillance cameras, and low-earth-orbit satellites. One of the core challenges in exploitation of such high temporal data streams, specifically videos, is the trade-off between the quality of video streaming and limited transmission bandwidth. An optimal compromise is needed between video quality and subsequently, rec
... Show MoreThe distribution of the intensity of the comet Ison C/2013 is studied by taking its histogram. This distribution reveals four distinct regions that related to the background, tail, coma and nucleus. One dimensional temperature distribution fitting is achieved by using two mathematical equations that related to the coordinate of the center of the comet. The quiver plot of the gradient of the comet shows very clearly that arrows headed towards the maximum intensity of the comet.
أثبتت الشبكات المحددة بالبرمجيات (SDN) تفوقها في معالجة مشاكل الشبكة العادية مثل قابلية التوسع وخفة الحركة والأمن. تأتي هذه الميزة من SDN بسبب فصل مستوى التحكم عن مستوى البيانات. على الرغم من وجود العديد من الأوراق والدراسات التي تركز على إدارة SDN، والرصد، والتحكم، وتحسين QoS، إلا أن القليل منها يركز على تقديم ما يستخدمونه لتوليد حركة المرور وقياس أداء الشبكة. كما أن المؤلفات تفتقر إلى مقارنات بين الأدوات والأ
... Show MoreThis paper presents a hybrid approach for solving null values problem; it hybridizes rough set theory with intelligent swarm algorithm. The proposed approach is a supervised learning model. A large set of complete data called learning data is used to find the decision rule sets that then have been used in solving the incomplete data problem. The intelligent swarm algorithm is used for feature selection which represents bees algorithm as heuristic search algorithm combined with rough set theory as evaluation function. Also another feature selection algorithm called ID3 is presented, it works as statistical algorithm instead of intelligent algorithm. A comparison between those two approaches is made in their performance for null values estima
... Show MoreToday, there are large amounts of geospatial data available on the web such as Google Map (GM), OpenStreetMap (OSM), Flickr service, Wikimapia and others. All of these services called open source geospatial data. Geospatial data from different sources often has variable accuracy due to different data collection methods; therefore data accuracy may not meet the user requirement in varying organization. This paper aims to develop a tool to assess the quality of GM data by comparing it with formal data such as spatial data from Mayoralty of Baghdad (MB). This tool developed by Visual Basic language, and validated on two different study areas in Baghdad / Iraq (Al-Karada and Al- Kadhumiyah). The positional accuracy was asses
... Show MoreIn order to obtain a mixed model with high significance and accurate alertness, it is necessary to search for the method that performs the task of selecting the most important variables to be included in the model, especially when the data under study suffers from the problem of multicollinearity as well as the problem of high dimensions. The research aims to compare some methods of choosing the explanatory variables and the estimation of the parameters of the regression model, which are Bayesian Ridge Regression (unbiased) and the adaptive Lasso regression model, using simulation. MSE was used to compare the methods.