Big data analysis is essential for modern applications in areas such as healthcare, assistive technology, intelligent transportation, environment and climate monitoring. Traditional algorithms in data mining and machine learning do not scale well with data size. Mining and learning from big data need time and memory efficient techniques, albeit the cost of possible loss in accuracy. We have developed a data aggregation structure to summarize data with large number of instances and data generated from multiple data sources. Data are aggregated at multiple resolutions and resolution provides a trade-off between efficiency and accuracy. The structure is built once, updated incrementally, and serves as a common data input for multiple mining and learning algorithms. Data mining algorithms are modified to accept the aggregated data as input. Hierarchical data aggregation serves as a paradigm under which novel …
Among the metaheuristic algorithms, population-based algorithms are an explorative search algorithm superior to the local search algorithm in terms of exploring the search space to find globally optimal solutions. However, the primary downside of such algorithms is their low exploitative capability, which prevents the expansion of the search space neighborhood for more optimal solutions. The firefly algorithm (FA) is a population-based algorithm that has been widely used in clustering problems. However, FA is limited in terms of its premature convergence when no neighborhood search strategies are employed to improve the quality of clustering solutions in the neighborhood region and exploring the global regions in the search space. On the
... Show MoreInformation systems and data exchange between government institutions are growing rapidly around the world, and with it, the threats to information within government departments are growing. In recent years, research into the development and construction of secure information systems in government institutions seems to be very effective. Based on information system principles, this study proposes a model for providing and evaluating security for all of the departments of government institutions. The requirements of any information system begin with the organization's surroundings and objectives. Most prior techniques did not take into account the organizational component on which the information system runs, despite the relevance of
... Show MoreCloud computing provides huge amount of area for storage of the data, but with an increase of number of users and size of their data, cloud storage environment faces earnest problem such as saving storage space, managing this large data, security and privacy of data. To save space in cloud storage one of the important methods is data deduplication, it is one of the compression technique that allows only one copy of the data to be saved and eliminate the extra copies. To offer security and privacy of the sensitive data while supporting the deduplication, In this work attacks that exploit the hybrid cloud deduplication have been identified, allowing an attacker to gain access to the files of other users based on very small hash signatures of
... Show MoreData compression offers an attractive approach to reducing communication costs using available bandwidth effectively. It makes sense to pursue research on developing algorithms that can most effectively use available network. It is also important to consider the security aspect of the data being transmitted is vulnerable to attacks. The basic aim of this work is to develop a module for combining the operation of compression and encryption on the same set of data to perform these two operations simultaneously. This is achieved through embedding encryption into compression algorithms since both cryptographic ciphers and entropy coders bear certain resemblance in the sense of secrecy. First in the secure compression module, the given text is p
... Show MoreIn this study, we review the ARIMA (p, d, q), the EWMA and the DLM (dynamic linear moodelling) procedures in brief in order to accomdate the ac(autocorrelation) structure of data .We consider the recursive estimation and prediction algorithms based on Bayes and KF (Kalman filtering) techniques for correlated observations.We investigate the effect on the MSE of these procedures and compare them using generated data.
Reliable data transfer and energy efficiency are the essential considerations for network performance in resource-constrained underwater environments. One of the efficient approaches for data routing in underwater wireless sensor networks (UWSNs) is clustering, in which the data packets are transferred from sensor nodes to the cluster head (CH). Data packets are then forwarded to a sink node in a single or multiple hops manners, which can possibly increase energy depletion of the CH as compared to other nodes. While several mechanisms have been proposed for cluster formation and CH selection to ensure efficient delivery of data packets, less attention has been given to massive data co
In this study, we made a comparison between LASSO & SCAD methods, which are two special methods for dealing with models in partial quantile regression. (Nadaraya & Watson Kernel) was used to estimate the non-parametric part ;in addition, the rule of thumb method was used to estimate the smoothing bandwidth (h). Penalty methods proved to be efficient in estimating the regression coefficients, but the SCAD method according to the mean squared error criterion (MSE) was the best after estimating the missing data using the mean imputation method