Hadoop has two core components which are HDFS and YARN. HDFS is for storing the Data, YARN is for processing the Data. HDFS is Hadoop Distributed File System, it has Namenode as Master Service and Datanode as Slave Service.
Namenode is the critical component of Hadoop which is storing the metadata of data stored in HDFS. If the Namenode goes down, the entire cluster will not be accessible, it is the single point of failure (SPOF). So, the production environment will be having Namenode High Availability to avoid the production outage if one Namenode goes down because of various reasons like machine crash, planned maintenance activity, etc.
Hadoop 2.x provides the feasibility where we can have two Namenodes, one will be Active Namenode and another will be Standby Namenode.
- Active Namenode – It manages all client operations.
- Standby Namenode – It is redundant of Active Namenode. If Active NN goes down, then Standby NN will take all the responsibility of Active NN.
Enabling Namenode High Availability requires Zookeeper which is mandatory for automatic failover. ZKFC (Zookeeper Failover Controller) is a Zookeeper client that is used to maintain the state of Namenode.
Requirements
- Best Practices for Deploying Hadoop Server on CentOS/RHEL 7 – Part 1
- Setting Up Hadoop Pre-requisites and Security Hardening – Part 2
- How to Install and Configure the Cloudera Manager on CentOS/RHEL 7 – Part 3
- How to Install CDH and Configure Service Placements on CentOS/RHEL 7 – Part 4
In this article, we are going to enable Namenode High Availability in Cloudera Manager.
Step 1: Installation of Zookeeper
1. Log in to Cloudera Manager.
http://Your-IP:7180/cmf/home
![Cloudera Manager Dashboard](https://www.tecmint.com/wp-content/uploads/2020/12/Cloudera-Manager-Dashboard-1.png)
2. In the Cluster (tecmint) action prompt, select “Add Service”.
![Add Service in Cloudera Manager](https://www.tecmint.com/wp-content/uploads/2020/12/Add-Service-in-Cloudera-Manager.png)
3. Select the service “Zookeeper”.
![Zookeeper Service](https://www.tecmint.com/wp-content/uploads/2020/12/Zookeeper.png)
4. Select the servers where we are going to have Zookeeper installed.
![Add Zookeeper Service](https://www.tecmint.com/wp-content/uploads/2020/12/Add-Zookeeper-Service.png)
5. We are going to have 3 Zookeepers to form Zookeeper Quorum. Select the servers as mentioned below.
![Create Zookeeper Quorum](https://www.tecmint.com/wp-content/uploads/2020/12/Create-Zookeeper-Quorum.png)
6. Configure the Zookeeper properties, here we are having the default ones. In real-time, you have to have separate directory/mount points for storing Zookeeper data. In Part-1, we have explained about storage configuration for each service. Click ‘continue’ to proceed.
![Configure Zookeeper Properties](https://www.tecmint.com/wp-content/uploads/2020/12/Configure-Zookeeper-Properties.png)
7. Installation will begin, once installed Zookeeper will be started. You can view the background operations here.
![Installing Zookeeper Service](https://www.tecmint.com/wp-content/uploads/2020/12/Installing-Zookeeper-Service.png)
8. After successful completion of the above step, Status will be ‘Finished’.
![Zookeeper Installed](https://www.tecmint.com/wp-content/uploads/2020/12/Zookeeper-Installed-Finished.png)
9. Now, Zookeeper is successfully Installed and Configured. Click the ‘Finish’.
![Zookeeper Configured](https://www.tecmint.com/wp-content/uploads/2020/12/Zookeeper-Installation-Finished.png)
10. You can view the Zookeeper service on the Cloudera Manager Dashboard.
![View Zookeeper Service](https://www.tecmint.com/wp-content/uploads/2020/12/View-Zookeeper-Service.png)
Step 2: Enabling Namenode High Availability
11. Go to Cloudera Manager –> HDFS –> Actions –> Enable High Availability.
![Enabling High Availability](https://www.tecmint.com/wp-content/uploads/2020/12/Enabling-High-Availability.png)
12. Enter the Nameservice Name as “nameservice1” – This is a common Namespace for both Active and standby Namenode.
![Add Nameservice Name](https://www.tecmint.com/wp-content/uploads/2020/12/Add-Nameservice-Name.png)
13. Select the Second Namenode where we are going to have standby Namenode.
![Choose Second Namenode](https://www.tecmint.com/wp-content/uploads/2020/12/Choose-Second-Namenode.png)
14. Here we are selecting master2.tecmint.com for standby Namenode.
![Select Host for Namenode](https://www.tecmint.com/wp-content/uploads/2020/12/Select-Host-for-Namenode.png)
15. Select the Journal nodes, these are mandatory services for synchronizing Active and Standby Namenode.
![Select Jouranal Nodes](https://www.tecmint.com/wp-content/uploads/2020/12/Select-Jouranal-Nodes.png)
16. We are making Quorum Journal by placing the Journal node in 3 servers as mentioned below. Select 3 servers and click ‘OK’.
![Create Quorum Journal](https://www.tecmint.com/wp-content/uploads/2020/12/Create-Quorum-Journal.png)
17. Click ‘Continue’ to proceed.
![Assign Roles to Quorum Journal](https://www.tecmint.com/wp-content/uploads/2020/12/Roles-Quorum-Journal.png)
18. Enter the Journal Node directory path. Just we need to mention the path while installing this directory will be automatically created by the service itself. We are mentioning as ‘/jn’
. Click ‘Continue’ to proceed.
![Add Journal Node Path](https://www.tecmint.com/wp-content/uploads/2020/12/Add-Journal-Node-Path.png)
19. It will start enabling the High Availability.
![Enabling High Availability](https://www.tecmint.com/wp-content/uploads/2020/12/Enable-High-Availability.png)
20. Once completed all the background processes, we will get ‘Finished’ Status.
![Finished High Availability](https://www.tecmint.com/wp-content/uploads/2020/12/Finished-High-Availability.png)
21. Finally, we will get a notification ‘Successfully enabled High Availability’. Click ‘Finish’.
![High Availability Enabled](https://www.tecmint.com/wp-content/uploads/2020/12/High-Availability-Enabled.png)
22. Verify the Active and Standby Namenode by going to Cloudera Manager –> HDFS –> Instances.
![Verify High Availability](https://www.tecmint.com/wp-content/uploads/2020/12/Verify-High-Availability.png)
23. Here, you can wee two Namenodes, one will be in the ‘Active’ state and another will be in ‘Standby’ state.
![Verify Namenodes](https://www.tecmint.com/wp-content/uploads/2020/12/Verify-namenodes.png)
Conclusion
In this article, we have gone through the step by step process to enable Namenode High Availability. It is highly recommended to have Namenode High Availability in all the clusters in a real-time environment. Please post your doubts if you face any error while doing this process. We will see Resource Manager High Availability in the next article.