Hadoop Configuration Tuning For Performance Optimization

Christian, Christian and Eng, Kho I and Ipung, Heru Purnomo (2017) Hadoop Configuration Tuning For Performance Optimization. Bachelor thesis, Swiss German University.

[img]
Preview
Text
Christian 11302024 TOC.pdf

Download (452kB) | Preview
[img] Text
Christian 11302024 1.pdf
Restricted to Registered users only

Download (118kB)
[img] Text
Christian 11302024 2.pdf
Restricted to Registered users only

Download (247kB)
[img] Text
Christian 11302024 3.pdf
Restricted to Registered users only

Download (679kB)
[img] Text
Christian 11302024 4.pdf
Restricted to Registered users only

Download (588kB)
[img] Text
Christian 11302024 5.pdf
Restricted to Registered users only

Download (568kB)
[img]
Preview
Text
Christian 11302024 Ref.pdf

Download (96kB) | Preview

Abstract

Configuration parameter tuning is an essential part of the implementation of Hadoop clusters. Each parameter in a configuration plays a role that impacts the overall performance of the cluster. However, we need to learn the characteristics of said parameter and understand the impact in hardware utilization in order to achieve optimal configuration. Several configuration changes includes mapper count, reduces count, HDFS block size, and MapReduce compression codec selection. The experiment also includes the rebuilding Hadoop source to produce the 64bit version over the 32bit version release from Apache. To prove any performance gain, we performed benchmark test every experiment we conducted. The benchmark includes TeraGen, TeraSort, and TeraValidate. We used 1GB, 10GB, 50GB of data size that we generated initially using TeraGen which will be used throughout all benchmarks. TeraSort is the program that runs the benchmark, we measure the time needed to complete the sort of the set of data and the CPU utilization during the benchmark. TeraValidate only validates the output of TeraSort to ensure that the output is correct. From the experiments that we conducted, we have observed significant performance improvements. However, the results may vary between different cluster configuration.

Item Type: Thesis (Bachelor)
Uncontrolled Keywords: Apache Hadoop; Computer Cluster; Configuration Tuning; Terasort Benchmark
Subjects: Q Science > QA Mathematics > QA76 Computer software
Q Science > QA Mathematics > QA76 Computer software >
Divisions: Faculty of Engineering and Information Technology > Department of Information Technology
Depositing User: Astuti Kusumaningrum
Date Deposited: 11 May 2020 04:17
Last Modified: 11 May 2020 04:17
URI: http://repository.sgu.ac.id/id/eprint/253

Actions (login required)

View Item View Item