Site icon DesignLinux

Best Practices for Deploying Hadoop Server on CentOS/RHEL 7 – Part 1

In this series of articles, we are going to cover the entire Cloudera Hadoop Cluster Building building with Vendor and Industrial recommended best practices.

Part 1: Best Practices for Deploying Hadoop Server on CentOS/RHEL 7
Part 2: Setting Up Hadoop Pre-requisites and Security Hardening
Part 3: How to Install and Configure the Cloudera Manager on CentOS/RHEL 7
Part 4: How to Install CDH and Configure Service Placements on CentOS/RHEL 7
Part 5: How to Set Up High Availability for Namenode
Part 6: How to Set Up High Availability for Resource Manager
Part 7: How to Install and Configure Hive with High Availability
Part 8: How to Install and Configure Sentry (Authorization Tool)
Part 9: How to Install Kerberos (Kerberising the Cluster) for Hadoop Authentication
Part 10: How to Tune Cluster (Yarn Tuning) on CentOS/RHEL 7

OS installation and doing OS level Pre-requisites are the first steps to build a Hadoop Cluster. Hadoop can run on the various flavor of Linux platform: CentOS, RedHat, Ubuntu, Debian, SUSE etc., In real-time production, most of the Hadoop Clusters are built on top of RHEL/CentOS, we will use CentOS 7 for demonstration in this series of tutorials.

In an Organization, OS installation can be done using kickstart. If it is a 3 to 4 node cluster, manual installation is possible but if we build a big cluster with more than 10 nodes, it’s tedious to install OS one by one. In this scenario, the Kickstart method comes into the picture, we can proceed with the mass installation using kickstart.

Achieving good performance from a Hadoop Environment is depends on provisioning the correct Hardware & Software. So, building a production Hadoop cluster involves a lot of consideration regarding Hardware and Software.

In this article, we will go through various Benchmarks about OS installation and some best practices for deploying Cloudera Hadoop Cluster Server on CentOS/RHEL 7.

Important Consideration and Best Practices for Deploying Hadoop Server

The following are the best practices for setting up deploying Cloudera Hadoop Cluster Server on CentOS/RHEL 7.

Find the below Example for Disk Partitioning of the servers of 24TB storage.

Disk Partitioning

Installing CentOS 7 for Hadoop Server Deployment

Things you need to know before installing CentOS 7 server for Hadoop Server.

In this article, we have installed OS (CentOS 7) in VMware virtual machine. Here, we will not be having multiple disks to perform partitions. CentOS is similar to RHEL (same functionality), so we will see the steps to install CentOS.

1. Begin by downloading the CentOS 7.x ISO image in your local windows system and select it while booting the virtual machine. Select ‘Install CentOS 7‘ as shown.

Install CentOS 7 Boot Menu

2. Select the Language, default will be English, and click continue.

Select CentOS 7 Language

3. Software Selection – Select the ‘Minimal Installation‘ and click ‘Done‘.

CentOS Software Selection
CentOS 7 Minimal Installation

4. Set the root password as it will prompt us to set.

Set Root Password

5. Installation Destination – This is the important step to be cautious. We need to select the disk where the OS has to be installed, dedicated disk should be selected for OS. Click the ‘Installation Destination‘ and select the Disk, in real-time multiple disks will be there, we need to select, preferable ‘sda‘.

Select Installation Destination
Select Disk for CentOS Installation

6. Other Storage Options – Choose the second option (I will configure partitioning) to configure OS related partitioning like /var, /var/log, /home, /tmp, /opt, /swap.

Manual CentOS Partitioning

7. Once done, begin the installation.

Begin CentOS Installation
CentOS 7 Installation

8. Once the Installation completed, reboot the server.

CentOS 7 Installation Complete

9. Login into the server and set the hostname.

# hostnamectl status
# hostnamectl set-hostname tecmint
# hostnamectl status
Set Hostname on CentOS
Summary

In this article, we have gone through OS installation steps and best practices for filesystem partitioning. These are all general guideline, according to the nature of the workload, we may need to concentrate on more nuances to achieve the best performance of the cluster. Cluster planning is art for the Hadoop administrator. We will have deep dive into OS level pre-requisites and security Hardening in the next article.

Exit mobile version