Logo
  • Ubuntu
  • CentOS
  • Debian
  • Fedora
  • RedHat

Best Practices for Deploying Hadoop Server on CentOS/RHEL 7 – Part 1 - DesignLinux

Nov 02 2020
designlinux 0 Comments

In this series of articles, we are going to cover the entire Cloudera Hadoop Cluster Building building with Vendor and Industrial recommended best practices.

Part 1: Best Practices for Deploying Hadoop Server on CentOS/RHEL 7
Part 2: Setting Up Hadoop Pre-requisites and Security Hardening
Part 3: How to Install and Configure the Cloudera Manager on CentOS/RHEL 7
Part 4: How to Install CDH and Configure Service Placements on CentOS/RHEL 7
Part 5: How to Set Up High Availability for Namenode
Part 6: How to Set Up High Availability for Resource Manager
Part 7: How to Install and Configure Hive with High Availability
Part 8: How to Install and Configure Sentry (Authorization Tool)
Part 9: How to Install Kerberos (Kerberising the Cluster) for Hadoop Authentication
Part 10: How to Tune Cluster (Yarn Tuning) on CentOS/RHEL 7

OS installation and doing OS level Pre-requisites are the first steps to build a Hadoop Cluster. Hadoop can run on the various flavor of Linux platform: CentOS, RedHat, Ubuntu, Debian, SUSE etc., In real-time production, most of the Hadoop Clusters are built on top of RHEL/CentOS, we will use CentOS 7 for demonstration in this series of tutorials.

In an Organization, OS installation can be done using kickstart. If it is a 3 to 4 node cluster, manual installation is possible but if we build a big cluster with more than 10 nodes, it’s tedious to install OS one by one. In this scenario, the Kickstart method comes into the picture, we can proceed with the mass installation using kickstart.

Achieving good performance from a Hadoop Environment is depends on provisioning the correct Hardware & Software. So, building a production Hadoop cluster involves a lot of consideration regarding Hardware and Software.

In this article, we will go through various Benchmarks about OS installation and some best practices for deploying Cloudera Hadoop Cluster Server on CentOS/RHEL 7.

Important Consideration and Best Practices for Deploying Hadoop Server

The following are the best practices for setting up deploying Cloudera Hadoop Cluster Server on CentOS/RHEL 7.

  • Hadoop servers do not require enterprise standard servers to build a cluster, it requires commodity hardware.
  • In the production cluster, having 8 to 12 data disks are recommended. According to the nature of the workload, we need to decide on this. If the cluster is for compute-intensive applications, having 4 to 6 drives is best practice to avoid I/O issues.
  • Data drives should be partitioned individually, for example – starting from /data01 to /data10.
  • RAID configuration is not recommended for worker nodes, because Hadoop itself providing fault-tolerance on data by replicating the blocks into 3 by default. So JBOD is best for worker nodes.
  • For Master Servers, RAID 1 is the best practice.
  • The default filesystem on CentOS/RHEL 7.x is XFS. Hadoop supports XFS, ext3, and ext4. The recommended file-system is ext3 as it is tested for good performance.
  • All the servers should be having the same OS version, at-least same minor release.
  • It is best practice to have homogeneous hardware (all worker nodes should have the same hardware characteristics (RAM, disk space & Core etc).
  • According to the cluster workload (Balanced Workload, Compute Intensive, I/O Intensive) and size, resource (RAM, CPU) planning per server will get differ.

Find the below Example for Disk Partitioning of the servers of 24TB storage.

Disk Partitioning
Disk Partitioning

Installing CentOS 7 for Hadoop Server Deployment

Things you need to know before installing CentOS 7 server for Hadoop Server.

  • Minimal installation is enough for Hadoop Servers (worker nodes), in some cases, GUI can be installed only for Master servers or Management servers where we can use browsers for Web UIs of Management tools.
  • Configuring networks, hostname, and other OS-related settings can be done after OS installation.
  • In real-time, server vendors will be having their own console to interact and manage the servers, for example – Dell servers are having iDRAC which is a device, embedded with servers. Using that iDRAC interface we can install OS with having an OS image in our local system.

In this article, we have installed OS (CentOS 7) in VMware virtual machine. Here, we will not be having multiple disks to perform partitions. CentOS is similar to RHEL (same functionality), so we will see the steps to install CentOS.

1. Begin by downloading the CentOS 7.x ISO image in your local windows system and select it while booting the virtual machine. Select ‘Install CentOS 7‘ as shown.

Install CentOS 7 Boot Menu
Install CentOS 7 Boot Menu

2. Select the Language, default will be English, and click continue.

Select CentOS 7 Language
Select CentOS 7 Language

3. Software Selection – Select the ‘Minimal Installation‘ and click ‘Done‘.

CentOS Software Selection
CentOS Software Selection
CentOS 7 Minimal Installation
CentOS 7 Minimal Installation

4. Set the root password as it will prompt us to set.

Set Root Password
Set Root Password

5. Installation Destination – This is the important step to be cautious. We need to select the disk where the OS has to be installed, dedicated disk should be selected for OS. Click the ‘Installation Destination‘ and select the Disk, in real-time multiple disks will be there, we need to select, preferable ‘sda‘.

Select Installation Destination
Select Installation Destination
Select Disk for CentOS Installation
Select Disk for CentOS Installation

6. Other Storage Options – Choose the second option (I will configure partitioning) to configure OS related partitioning like /var, /var/log, /home, /tmp, /opt, /swap.

Manual CentOS Partitioning
Manual CentOS Partitioning

7. Once done, begin the installation.

Begin CentOS Installation
Begin CentOS Installation
CentOS 7 Installation
CentOS 7 Installation

8. Once the Installation completed, reboot the server.

CentOS 7 Installation Complete
CentOS 7 Installation Complete

9. Login into the server and set the hostname.

# hostnamectl status
# hostnamectl set-hostname tecmint
# hostnamectl status
Set Hostname on CentOS
Set Hostname on CentOS
Summary

In this article, we have gone through OS installation steps and best practices for filesystem partitioning. These are all general guideline, according to the nature of the workload, we may need to concentrate on more nuances to achieve the best performance of the cluster. Cluster planning is art for the Hadoop administrator. We will have deep dive into OS level pre-requisites and security Hardening in the next article.

Related

Tags: CentOS Tips, Hadoop Tips, RHEL Tips

Bash while Loop

Prev Post

Install Odoo 14 on CentOS 8

Next Post
Archives
  • January 2023
  • December 2022
  • November 2022
  • October 2022
  • September 2022
  • July 2022
  • June 2022
  • April 2022
  • March 2022
  • February 2022
  • January 2022
  • December 2021
  • November 2021
  • October 2021
  • September 2021
  • August 2021
  • July 2021
  • June 2021
  • May 2021
  • April 2021
  • March 2021
  • February 2021
  • January 2021
  • December 2020
  • November 2020
  • October 2020
  • September 2020
  • August 2020
  • July 2020
  • June 2020
  • May 2020
Categories
  • AlmaLinux
  • Android
  • Ansible
  • Apache
  • Arch Linux
  • AWS
  • Backups
  • Bash Shell
  • Bodhi Linux
  • CentOS
  • CentOS Stream
  • Chef
  • Cloud Software
  • CMS
  • Commandline Tools
  • Control Panels
  • CouchDB
  • Data Recovery Tools
  • Databases
  • Debian
  • Deepin Linux
  • Desktops
  • Development Tools
  • Docker
  • Download Managers
  • Drupal
  • Editors
  • Elementary OS
  • Encryption Tools
  • Fedora
  • Firewalls
  • FreeBSD
  • FTP
  • GIMP
  • Git
  • Hadoop
  • HAProxy
  • Java
  • Jenkins
  • Joomla
  • Kali Linux
  • KDE
  • Kubernetes
  • KVM
  • Laravel
  • Let's Encrypt
  • LFCA
  • Linux Certifications
  • Linux Commands
  • Linux Desktop
  • Linux Distros
  • Linux IDE
  • Linux Mint
  • Linux Talks
  • Lubuntu
  • LXC
  • Mail Server
  • Manjaro
  • MariaDB
  • MongoDB
  • Monitoring Tools
  • MySQL
  • Network
  • Networking Commands
  • NFS
  • Nginx
  • Nodejs
  • NTP
  • Open Source
  • OpenSUSE
  • Oracle Linux
  • Package Managers
  • Pentoo
  • PHP
  • Podman
  • Postfix Mail Server
  • PostgreSQL
  • Python
  • Questions
  • RedHat
  • Redis Server
  • Rocky Linux
  • Security
  • Shell Scripting
  • SQLite
  • SSH
  • Storage
  • Suse
  • Terminals
  • Text Editors
  • Top Tools
  • Torrent Clients
  • Tutorial
  • Ubuntu
  • Udemy Courses
  • Uncategorized
  • VirtualBox
  • Virtualization
  • VMware
  • VPN
  • VSCode Editor
  • Web Browsers
  • Web Design
  • Web Hosting
  • Web Servers
  • Webmin
  • Windows
  • Windows Subsystem
  • WordPress
  • Zabbix
  • Zentyal
  • Zorin OS
Visits
  • 5
  • 501
  • 612,680

DesignLinux.com © All rights reserved

Go to mobile version