New and Existing Approaches Reviewing of Big Data Analysis with Hadoop Tools

Watheq Ghanim Mutasher; Abbas Fadhil Aljuboori

doi:10.21123/bsj.2022.19.4.0887

Details

Publication Date

Mon Aug 01 2022

Journal Name

Baghdad Science Journal

Volume

19

Issue Number

4

DOI

10.21123/bsj.2022.19.4.0887

Choose Citation Style

Statistics

View publication

19

Statistics

(11)

(6)

New and Existing Approaches Reviewing of Big Data Analysis with Hadoop Tools

Apache-Spark

Big Data

Hadoop

IOT

Social Media

Watheq Ghanim Mutasher

Abbas Fadhil Aljuboori

...Show More Authors

Everybody is connected with social media like (Facebook, Twitter, LinkedIn, Instagram…etc.) that generate a large quantity of data and which traditional applications are inadequate to process. Social media are regarded as an important platform for sharing information, opinion, and knowledge of many subscribers. These basic media attribute Big data also to many issues, such as data collection, storage, moving, updating, reviewing, posting, scanning, visualization, Data protection, etc. To deal with all these problems, this is a need for an adequate system that not just prepares the details, but also provides meaningful analysis to take advantage of the difficult situations, relevant to business, proper decision, Health, social media, science, telecommunications, the environment, etc. Authors notice through reading of previous studies that there are different analyzes through HADOOP and its various tools such as the sentiment in real-time and others. However, dealing with this Big data is a challenging task. Therefore, such type of analysis is more efficiently possible only through the Hadoop Ecosystem. The purpose of this paper is to analyze literature related analysis of big data of social media using the Hadoop framework for knowing almost analysis tools existing in the world under the Hadoop umbrella and its orientations in addition to difficulties and modern methods of them to overcome challenges of big data in offline and real –time processing. Real-time Analytics accelerates decision-making along with providing access to business metrics and reporting. Comparison between Hadoop and spark has been also illustrated.

View Publication Preview PDF

Quick Preview PDF

Publication Date

Fri Jan 01 2021

Journal Name

International Journal Of Nonlinear Analysis And Applications

Big data analysis by using one covariate at a time multiple testing (Ocmt) method: Early school dropout in iraq

A.M.

...Show More Authors

(7)

Publication Date

Wed Oct 17 2018

Journal Name

Journal Of Economics And Administrative Sciences

The Use Of Some Parametric And Non parametric Methods For Analysis Of Factorial Experiments With Application

Factorial Experiment

Analysis Of Variance (ANOVA)

Transformations

F Test

Nonparametric Transformation .

كمال علوان

هديل عماد

...Show More Authors

summary

In this search, we examined the factorial experiments and the study of the significance of the main effects, the interaction of the factors and their simple effects by the F test (ANOVA) for analyze the data of the factorial experience. It is also known that the analysis of variance requires several assumptions to achieve them, Therefore, in case of violation of one of these conditions we conduct a transform to the data in order to match or achieve the conditions of analysis of variance, but it was noted that these transfers do not produce accurate results, so we resort to tests or non-parametric methods that work as a solution or alternative to the parametric tests , these method

View Publication Preview PDF

Publication Date

Sun Jun 01 2025

Journal Name

Al-khwarizmi Engineering Journal

Recent Tools of Software-Defined Networking Traffic Generation and Data Collection

Tabarak

Omar

...Show More Authors

أثبتت الشبكات المحددة بالبرمجيات (SDN) تفوقها في معالجة مشاكل الشبكة العادية مثل قابلية التوسع وخفة الحركة والأمن. تأتي هذه الميزة من SDN بسبب فصل مستوى التحكم عن مستوى البيانات. على الرغم من وجود العديد من الأوراق والدراسات التي تركز على إدارة SDN، والرصد، والتحكم، وتحسين QoS، إلا أن القليل منها يركز على تقديم ما يستخدمونه لتوليد حركة المرور وقياس أداء الشبكة. كما أن المؤلفات تفتقر إلى مقارنات بين الأدوات والأ

View Publication

(2)

Publication Date

Fri Dec 15 2023

Journal Name

Al-academy

Aesthetics Contents of Data Visualization as an Input to its humanization

Aesthetic Contents

data visualization

data humanization

amal

Maha

...Show More Authors

The aesthetic contents of data visualization is one of the contemporary areas through which data scientists and designers have been able to link data to humans, and even after reaching successful attempts to model data visualization, it wasn't clear how that reveals how it contributed to choosing the aesthetic content as an input to humanize these models, so the goal of the current research is to use The analytical descriptive approach aims to identify the aesthetic contents in data visualization, which the researchers interpreted through pragmatic philosophy and Kantian philosophy, and analyze a sample of data visualization models to reveal the aesthetic entrances in them to explain how to humanize them. The two researchers reached seve

View Publication Preview PDF

Publication Date

Wed Dec 18 2019

Journal Name

Baghdad Science Journal

The Calculation and Analysis of the Total Electron Content Over Different Latitudes and Seasons Using the Numerical Trapezoidal and Simpson Methods

Electron density

GNSS

Global positioning System

Ionosphere

IRI2012 model

NeQuick2 model.

Ali Hussein

...Show More Authors

It has been shown in ionospheric research that calculation of the total electron content (TEC) is an important factor in global navigation system. In this study, TEC calculation was performed over Baghdad city, Iraq, using a combination of two numerical methods called composite Simpson and composite Trapezoidal methods. TEC was calculated using the line integral of the electron density derived from the International reference ionosphere IRI2012 and NeQuick2 models from 70 to 2000 km above the earth surface. The hour of the day and the day number of the year, R12, were chosen as inputs for the calculation techniques to take into account latitudinal, diurnal and seasonal variation of TEC. The results of latitudinal variation of TE

View Publication Preview PDF

Publication Date

Fri Apr 14 2023

Journal Name

Journal Of Big Data

A survey on deep learning tools dealing with data scarcity: definitions, challenges, solutions, tips, and applications

Ali H.

...Show More Authors

Abstract<p>Data scarcity is a major challenge when training deep learning (DL) models. DL demands a large amount of data to achieve exceptional performance. Unfortunately, many applications have small or inadequate data to train DL frameworks. Usually, manual labeling is needed to provide labeled data, which typically involves human annotators with a vast background of knowledge. This annotation process is costly, time-consuming, and error-prone. Usually, every DL framework is fed by a significant amount of labeled data to automatically learn representations. Ultimately, a larger amount of data would generate a better DL model and its performance is also application dependent. This issue is the main barrier for</p> ... Show More

View Publication Preview PDF

(628)

Publication Date

Thu Dec 31 2020

Journal Name

Journal Of Accounting And Financial Studies ( Jafs )

Application of data content analysis (DEA) technology to evaluate performance efficiency: applied research in the General Tax Authority

data envelopment analysis

performance efficiency assessment

input and output

عمر عبد الواحد

أ.د. بيداء ستار

...Show More Authors

The aim of the research is to use the data content analysis technique (DEA) in evaluating the efficiency of the performance of the eight branches of the General Tax Authority, located in Baghdad, represented by Karrada, Karkh parties, Karkh Center, Dora, Bayaa, Kadhimiya, New Baghdad, Rusafa according to the determination of the inputs represented by the number of non-accountable taxpayers and according to the categories professions and commercial business, deduction, transfer of property ownership, real estate and tenders, In addition to determining the outputs according to the checklist that contains nine dimensions to assess the efficiency of the performance of the investigated branches by investing their available resources T

View Publication Preview PDF

Publication Date

Mon Aug 01 2016

Journal Name

Journal Of Economics And Administrative Sciences

User (K-Means) for clustering in Data Mining with application

العناصر

تنقيب البيانات

العنقدة

التعليم الالي

الخوارزمية.

object

data mining

clustering

machine learning

algorithm object

data mining

clustering

machine learning

algorithm

قتيبة نبيل

محي الدين خلف

...Show More Authors

The great scientific progress has led to widespread Information as information accumulates in large databases is important in trying to revise and compile this vast amount of data and, where its purpose to extract hidden information or classified data under their relations with each other in order to take advantage of them for technical purposes.

And work with data mining (DM) is appropriate in this area because of the importance of research in the (K-Means) algorithm for clustering data in fact applied with effect can be observed in variables by changing the sample size (n) and the number of clusters (K)

View Publication Preview PDF

Publication Date

Mon Dec 05 2022

Journal Name

Baghdad Science Journal

Cloud Data Security through BB84 Protocol and Genetic Algorithm

Attribute based Encryption

BB84 Protocol Cloud

Data Security

Geneticncryption/Decryption

Quantum Key Distribution

Jaydip

Vipin

...Show More Authors

In the current digitalized world, cloud computing becomes a feasible solution for the virtualization of cloud computing resources. Though cloud computing has many advantages to outsourcing an organization’s information, but the strong security is the main aspect of cloud computing. Identity authentication theft becomes a vital part of the protection of cloud computing data. In this process, the intruders violate the security protocols and perform attacks on the organizations or user’s data. The situation of cloud data disclosure leads to the cloud user feeling insecure while using the cloud platform. The different traditional cryptographic techniques are not able to stop such kinds of attacks. BB84 protocol is the first quantum cry

View Publication Preview PDF

(7)

(6)

Publication Date

Mon Sep 01 2008

Journal Name

Journal Of Economics And Administrative Sciences

تحديد القيم الشاذة باستخدام الطرق الاستكشافية ومقارنتها مع الطرق المعلمية

*تحديد القيم الشاذة باستخدام الطرق الاستكشافية ومقارنتها مع الطرق المعلمية

محمود مهدي

دلير صليوا

...Show More Authors

The availability of statistical data plays an important role in planning process. The importance of this research which deals with safety of statistical data from errors and outliers values. The Objective of this study is to determine the outlier values in statistical data by using modern exploratory data methods and comparing them with parametric methods. The research has been divided into four chapters ,the main important conclusions reached are:1-The exploratory methods and the parametric methods showed variation between them in determining the outlier values in the data.

2-The study showed that the box plot method was the best method used in determining

View Publication Preview PDF

1 2 3 4 ... 2837 2838 2839 2840