Site icon DesignLinux

How to Set Up High Availability for Namenode – Part 5

Hadoop has two core components which are HDFS and YARN. HDFS is for storing the Data, YARN is for processing the Data. HDFS is Hadoop Distributed File System, it has Namenode as Master Service and Datanode as Slave Service.

Namenode is the critical component of Hadoop which is storing the metadata of data stored in HDFS. If the Namenode goes down, the entire cluster will not be accessible, it is the single point of failure (SPOF). So, the production environment will be having Namenode High Availability to avoid the production outage if one Namenode goes down because of various reasons like machine crash, planned maintenance activity, etc.

Hadoop 2.x provides the feasibility where we can have two Namenodes, one will be Active Namenode and another will be Standby Namenode.

Enabling Namenode High Availability requires Zookeeper which is mandatory for automatic failover. ZKFC (Zookeeper Failover Controller) is a Zookeeper client that is used to maintain the state of Namenode.

Requirements

In this article, we are going to enable Namenode High Availability in Cloudera Manager.

Step 1: Installation of Zookeeper

1. Log in to Cloudera Manager.

http://Your-IP:7180/cmf/home
Cloudera Manager Dashboard

2. In the Cluster (tecmint) action prompt, select “Add Service”.

Add Service in Cloudera Manager

3. Select the service “Zookeeper”.

Zookeeper Service

4. Select the servers where we are going to have Zookeeper installed.

Add Zookeeper Service

5. We are going to have 3 Zookeepers to form Zookeeper Quorum. Select the servers as mentioned below.

Create Zookeeper Quorum

6. Configure the Zookeeper properties, here we are having the default ones. In real-time, you have to have separate directory/mount points for storing Zookeeper data. In Part-1, we have explained about storage configuration for each service. Click ‘continue’ to proceed.

Configure Zookeeper Properties

7. Installation will begin, once installed Zookeeper will be started. You can view the background operations here.

Installing Zookeeper Service

8. After successful completion of the above step, Status will be ‘Finished’.

Zookeeper Installed

9. Now, Zookeeper is successfully Installed and Configured. Click the ‘Finish’.

Zookeeper Configured

10. You can view the Zookeeper service on the Cloudera Manager Dashboard.

View Zookeeper Service

Step 2: Enabling Namenode High Availability

11. Go to Cloudera Manager –> HDFS –> Actions –> Enable High Availability.

Enabling High Availability

12. Enter the Nameservice Name as “nameservice1” – This is a common Namespace for both Active and standby Namenode.

Add Nameservice Name

13. Select the Second Namenode where we are going to have standby Namenode.

Choose Second Namenode

14. Here we are selecting master2.tecmint.com for standby Namenode.

Select Host for Namenode

15. Select the Journal nodes, these are mandatory services for synchronizing Active and Standby Namenode.

Select Jouranal Nodes

16. We are making Quorum Journal by placing the Journal node in 3 servers as mentioned below. Select 3 servers and click ‘OK’.

Create a Quorum Journal

17. Click ‘Continue’ to proceed.

Assign Roles to Quorum Journal

18. Enter the Journal Node directory path. Just we need to mention the path while installing this directory will be automatically created by the service itself. We are mentioning as ‘/jn’. Click ‘Continue’ to proceed.

Add Journal Node Path

19. It will start enabling the High Availability.

Enabling High Availability

20. Once completed all the background processes, we will get ‘Finished’ Status.

Finished High Availability

21. Finally, we will get a notification ‘Successfully enabled High Availability’. Click ‘Finish’.

High Availability Enabled

22. Verify the Active and Standby Namenode by going to Cloudera Manager –> HDFS –> Instances.

Verify High Availability

23. Here, you can wee two Namenodes, one will be in the ‘Active’ state and another will be in ‘Standby’ state.

Verify Namenodes
Conclusion

In this article, we have gone through the step by step process to enable Namenode High Availability. It is highly recommended to have Namenode High Availability in all the clusters in a real-time environment. Please post your doubts if you face any error while doing this process. We will see Resource Manager High Availability in the next article.

Exit mobile version