Analysing Apache Pig, Apache Hive, and MySQL Query Performance on Large Dataset

Fuad, Ammar and Erwin, Alva and Ipung, Heru Purnomo (2014) Analysing Apache Pig, Apache Hive, and MySQL Query Performance on Large Dataset. Bachelor thesis, Swiss German University.

[img]
Preview
Text
Ammar Fuad 12110003-TOC.pdf

Download (1MB) | Preview
[img] Text
Ammar Fuad 12110003-1.pdf
Restricted to Registered users only

Download (594kB)
[img] Text
Ammar Fuad 12110003-2.pdf
Restricted to Registered users only

Download (2MB)
[img] Text
Ammar Fuad 12110003-3.pdf
Restricted to Registered users only

Download (795kB)
[img] Text
Ammar Fuad 12110003-4.pdf
Restricted to Registered users only

Download (1MB)
[img] Text
Ammar Fuad 12110003-5.pdf
Restricted to Registered users only

Download (343kB)
[img]
Preview
Text
Ammar Fuad 12110003-Ref.pdf

Download (381kB) | Preview

Abstract

Hadoop is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. With the capabilities of Hadoop to process large data sets, it will help to save resource on computer. MySQL Cluster is a famous database that is used by the company, government, and many other institutes. The problem of MySQL Cluster is that as the data grow larger, processing data time is increasing and additional resource may be needed. With Hadoop and tools provided for Hadoop, processing time can be decreased and with data consistency. The purpose of this research is to find when is Hadoop and the tools can overcome processing time of MySQL Cluster with data consistency. Another aspect that makes Hadoop is suitable for big data is by adding another extra Hadoop node, it can squeeze more the processing time. The research is done by creating dummy datasets with large rows and queries statements on each dataset. The result of queries time will determine when Hadoop overcomes MySQL Cluster.

Item Type: Thesis (Bachelor)
Uncontrolled Keywords: Big Data, Hadoop, Hive, Pig, MySQL, MySQL Cluster, Processing Big Data
Subjects: Q Science > QA Mathematics > QA76 Computer software
Q Science > QA Mathematics > QA76 Computer software > > QA76.94 Electronic data processing--Auditing
T Technology > T Technology (General) > T58.5 Information technology
Divisions: Faculty of Engineering and Information Technology > Department of Information Technology
Depositing User: Faisal Ifzaldi
Date Deposited: 04 May 2021 15:29
Last Modified: 04 May 2021 15:29
URI: http://repository.sgu.ac.id/id/eprint/1998

Actions (login required)

View Item View Item