Logo
  • Ubuntu
  • CentOS
  • Debian
  • Fedora
  • RedHat

How to Set Up High Availability for Namenode – Part 5 - DesignLinux

Dec 15 2020
designlinux 0 Comments

Hadoop has two core components which are HDFS and YARN. HDFS is for storing the Data, YARN is for processing the Data. HDFS is Hadoop Distributed File System, it has Namenode as Master Service and Datanode as Slave Service.

Namenode is the critical component of Hadoop which is storing the metadata of data stored in HDFS. If the Namenode goes down, the entire cluster will not be accessible, it is the single point of failure (SPOF). So, the production environment will be having Namenode High Availability to avoid the production outage if one Namenode goes down because of various reasons like machine crash, planned maintenance activity, etc.

Hadoop 2.x provides the feasibility where we can have two Namenodes, one will be Active Namenode and another will be Standby Namenode.

  • Active Namenode – It manages all client operations.
  • Standby Namenode – It is redundant of Active Namenode. If Active NN goes down, then Standby NN will take all the responsibility of Active NN.

Enabling Namenode High Availability requires Zookeeper which is mandatory for automatic failover. ZKFC (Zookeeper Failover Controller) is a Zookeeper client that is used to maintain the state of Namenode.

Requirements

  • Best Practices for Deploying Hadoop Server on CentOS/RHEL 7 – Part 1
  • Setting Up Hadoop Pre-requisites and Security Hardening – Part 2
  • How to Install and Configure the Cloudera Manager on CentOS/RHEL 7 – Part 3
  • How to Install CDH and Configure Service Placements on CentOS/RHEL 7 – Part 4

In this article, we are going to enable Namenode High Availability in Cloudera Manager.

Step 1: Installation of Zookeeper

1. Log in to Cloudera Manager.

http://Your-IP:7180/cmf/home
Cloudera Manager Dashboard
Cloudera Manager Dashboard

2. In the Cluster (tecmint) action prompt, select “Add Service”.

Add Service in Cloudera Manager
Add Service in Cloudera Manager

3. Select the service “Zookeeper”.

Zookeeper Service
Zookeeper Service

4. Select the servers where we are going to have Zookeeper installed.

Add Zookeeper Service
Add Zookeeper Service

5. We are going to have 3 Zookeepers to form Zookeeper Quorum. Select the servers as mentioned below.

Create Zookeeper Quorum
Create Zookeeper Quorum

6. Configure the Zookeeper properties, here we are having the default ones. In real-time, you have to have separate directory/mount points for storing Zookeeper data. In Part-1, we have explained about storage configuration for each service. Click ‘continue’ to proceed.

Configure Zookeeper Properties
Configure Zookeeper Properties

7. Installation will begin, once installed Zookeeper will be started. You can view the background operations here.

Installing Zookeeper Service
Installing Zookeeper Service

8. After successful completion of the above step, Status will be ‘Finished’.

Zookeeper Installed
Zookeeper Installed

9. Now, Zookeeper is successfully Installed and Configured. Click the ‘Finish’.

Zookeeper Configured
Zookeeper Configured

10. You can view the Zookeeper service on the Cloudera Manager Dashboard.

View Zookeeper Service
View Zookeeper Service

Step 2: Enabling Namenode High Availability

11. Go to Cloudera Manager –> HDFS –> Actions –> Enable High Availability.

Enabling High Availability
Enabling High Availability

12. Enter the Nameservice Name as “nameservice1” – This is a common Namespace for both Active and standby Namenode.

Add Nameservice Name
Add Nameservice Name

13. Select the Second Namenode where we are going to have standby Namenode.

Choose Second Namenode
Choose Second Namenode

14. Here we are selecting master2.tecmint.com for standby Namenode.

Select Host for Namenode
Select Host for Namenode

15. Select the Journal nodes, these are mandatory services for synchronizing Active and Standby Namenode.

Select Jouranal Nodes
Select Jouranal Nodes

16. We are making Quorum Journal by placing the Journal node in 3 servers as mentioned below. Select 3 servers and click ‘OK’.

Create Quorum Journal
Create a Quorum Journal

17. Click ‘Continue’ to proceed.

Assign Roles to Quorum Journal
Assign Roles to Quorum Journal

18. Enter the Journal Node directory path. Just we need to mention the path while installing this directory will be automatically created by the service itself. We are mentioning as ‘/jn’. Click ‘Continue’ to proceed.

Add Journal Node Path
Add Journal Node Path

19. It will start enabling the High Availability.

Enabling High Availability
Enabling High Availability

20. Once completed all the background processes, we will get ‘Finished’ Status.

Finished High Availability
Finished High Availability

21. Finally, we will get a notification ‘Successfully enabled High Availability’. Click ‘Finish’.

High Availability Enabled
High Availability Enabled

22. Verify the Active and Standby Namenode by going to Cloudera Manager –> HDFS –> Instances.

Verify High Availability
Verify High Availability

23. Here, you can wee two Namenodes, one will be in the ‘Active’ state and another will be in ‘Standby’ state.

Verify Namenodes
Verify Namenodes
Conclusion

In this article, we have gone through the step by step process to enable Namenode High Availability. It is highly recommended to have Namenode High Availability in all the clusters in a real-time environment. Please post your doubts if you face any error while doing this process. We will see Resource Manager High Availability in the next article.

Related

Tags: CentOS Tips, Cloudera Manager, Hadoop Tips, RHEL Tips

How to Install MariaDB on CentOS 8

Prev Post

Linux Head Command

Next Post
Archives
  • January 2023
  • December 2022
  • November 2022
  • October 2022
  • September 2022
  • July 2022
  • June 2022
  • April 2022
  • March 2022
  • February 2022
  • January 2022
  • December 2021
  • November 2021
  • October 2021
  • September 2021
  • August 2021
  • July 2021
  • June 2021
  • May 2021
  • April 2021
  • March 2021
  • February 2021
  • January 2021
  • December 2020
  • November 2020
  • October 2020
  • September 2020
  • August 2020
  • July 2020
  • June 2020
  • May 2020
Categories
  • AlmaLinux
  • Android
  • Ansible
  • Apache
  • Arch Linux
  • AWS
  • Backups
  • Bash Shell
  • Bodhi Linux
  • CentOS
  • CentOS Stream
  • Chef
  • Cloud Software
  • CMS
  • Commandline Tools
  • Control Panels
  • CouchDB
  • Data Recovery Tools
  • Databases
  • Debian
  • Deepin Linux
  • Desktops
  • Development Tools
  • Docker
  • Download Managers
  • Drupal
  • Editors
  • Elementary OS
  • Encryption Tools
  • Fedora
  • Firewalls
  • FreeBSD
  • FTP
  • GIMP
  • Git
  • Hadoop
  • HAProxy
  • Java
  • Jenkins
  • Joomla
  • Kali Linux
  • KDE
  • Kubernetes
  • KVM
  • Laravel
  • Let's Encrypt
  • LFCA
  • Linux Certifications
  • Linux Commands
  • Linux Desktop
  • Linux Distros
  • Linux IDE
  • Linux Mint
  • Linux Talks
  • Lubuntu
  • LXC
  • Mail Server
  • Manjaro
  • MariaDB
  • MongoDB
  • Monitoring Tools
  • MySQL
  • Network
  • Networking Commands
  • NFS
  • Nginx
  • Nodejs
  • NTP
  • Open Source
  • OpenSUSE
  • Oracle Linux
  • Package Managers
  • Pentoo
  • PHP
  • Podman
  • Postfix Mail Server
  • PostgreSQL
  • Python
  • Questions
  • RedHat
  • Redis Server
  • Rocky Linux
  • Security
  • Shell Scripting
  • SQLite
  • SSH
  • Storage
  • Suse
  • Terminals
  • Text Editors
  • Top Tools
  • Torrent Clients
  • Tutorial
  • Ubuntu
  • Udemy Courses
  • Uncategorized
  • VirtualBox
  • Virtualization
  • VMware
  • VPN
  • VSCode Editor
  • Web Browsers
  • Web Design
  • Web Hosting
  • Web Servers
  • Webmin
  • Windows
  • Windows Subsystem
  • WordPress
  • Zabbix
  • Zentyal
  • Zorin OS
Visits
  • 0
  • 515
  • 1,055,287

DesignLinux.com © All rights reserved

Go to mobile version