Logo
  • Ubuntu
  • CentOS
  • Debian
  • Fedora
  • RedHat

How to Install Hadoop Single Node Cluster (Pseudonode) on CentOS 7 - DesignLinux

Oct 29 2020
designlinux 0 Comments

Hadoop is an open-source framework that is widely used to deal with Bigdata. Most of the Bigdata/Data Analytics projects are being built up on top of the Hadoop Eco-System. It consists of two-layer, one is for Storing Data and another one is for Processing Data.

Storage will be taken care of by its own filesystem called HDFS (Hadoop Distributed Filesystem) and Processing will be taken care of by YARN (Yet Another Resource Negotiator). Mapreduce is the default processing engine of the Hadoop Eco-System.

This article describes the process to install the Pseudonode installation of Hadoop, where all the daemons (JVMs) will be running Single Node Cluster on CentOS 7.

This is mainly for beginners to learn Hadoop. In real-time, Hadoop will be installed as a multinode cluster where the data will be distributed among the servers as blocks and the job will be executed in a parallel manner.

Prerequisites

  • A minimal installation of CentOS 7 server.
  • Java v1.8 release.
  • Hadoop 2.x stable release.

On this page

  • How to Install Java on CentOS 7
  • Set Up Passwordless Login on CentOS 7
  • How to Install Hadoop Single Node in CentOS 7
  • How to Configuring Hadoop in CentOS 7
  • Formatting the HDFS File System via the NameNode

Installing Java on CentOS 7

1. Hadoop is an Eco-System which is made up of Java. We need Java installed in our system mandatorily to install Hadoop.

# yum install java-1.8.0-openjdk

2. Next, verify the installed version of Java on the system.

# java -version
Verify Java Version
Verify Java Version

Configure Passwordless Login on CentOS 7

We need to have ssh configured in our machine, Hadoop will manage nodes with the use of SSH. Master node uses SSH connection to connect its slave nodes and perform operation like start and stop.

We need to set up password-less ssh so that the master can communicate with slaves using ssh without a password. Otherwise for each connection establishment, need to enter the password.

3. Set up a password-less SSH login using the following commands on the server.

# ssh-keygen
# ssh-copy-id -i localhost
Create SSH Keygen in CentOS 7
Create SSH Keygen in CentOS 7
Copy SSH Key to CentOS 7
Copy SSH Key to CentOS 7

4. After you configured passwordless SSH login, try to login again, you will be connected without a password.

# ssh localhost
SSH Passwordless Login to CentOS 7
SSH Passwordless Login to CentOS 7

Installing Hadoop in CentOS 7

5. Go to the Apache Hadoop website and download the stable release of Hadoop using the following wget command.

# wget https://archive.apache.org/dist/hadoop/core/hadoop-2.10.1/hadoop-2.10.1.tar.gz
# tar xvpzf hadoop-2.10.1.tar.gz

6. Next, add the Hadoop environment variables in ~/.bashrc file as shown.

HADOOP_PREFIX=/root/hadoop-2.10.1
PATH=$PATH:$HADOOP_PREFIX/bin
export PATH JAVA_HOME HADOOP_PREFIX

7. After adding environment variables to ~/.bashrc the file, source the file and verify the Hadoop by running the following commands.

# source ~/.bashrc
# cd $HADOOP_PREFIX
# bin/hadoop version
Check Hadoop Version in CentOS 7
Check Hadoop Version in CentOS 7

Configuring Hadoop in CentOS 7

We need to configure below Hadoop configuration files in order to fit into your machine. In Hadoop, each service has its own port number and its own directory to store the data.

  • Hadoop Configuration Files – core-site.xml, hdfs-site.xml, mapred-site.xml & yarn-site.xml

8. First, we need to update JAVA_HOME and Hadoop path in the hadoop-env.sh file as shown.

# cd $HADOOP_PREFIX/etc/hadoop
# vi hadoop-env.sh

Enter the following line at beginning of the file.

export JAVA_HOME=/usr/lib/jvm/java-1.8.0/jre
export HADOOP_PREFIX=/root/hadoop-2.10.1

9. Next, modify the core-site.xml file.

# cd $HADOOP_PREFIX/etc/hadoop
# vi core-site.xml

Paste following between <configuration> tags as shown.

<configuration>
            <property>
                   <name>fs.defaultFS</name>
                   <value>hdfs://localhost:9000</value>
           </property>
</configuration>

10. Create the below directories under tecmint user home directory, which will be used for NN and DN storage.

# mkdir -p /home/tecmint/hdata/
# mkdir -p /home/tecmint/hdata/data
# mkdir -p /home/tecmint/hdata/name

10. Next, modify the hdfs-site.xml file.

# cd $HADOOP_PREFIX/etc/hadoop
# vi hdfs-site.xml

Paste following between <configuration> tags as shown.

<configuration>
<property>
        <name>dfs.replication</name>
        <value>1</value>
 </property>
  <property>
        <name>dfs.namenode.name.dir</name>
        <value>/home/tecmint/hdata/name</value>
  </property>
  <property>
          <name>dfs .datanode.data.dir</name>
          <value>home/tecmint/hdata/data</value>
  </property>
</configuration>

11. Again, modify the mapred-site.xml file.

# cd $HADOOP_PREFIX/etc/hadoop
# cp mapred-site.xml.template mapred-site.xml
# vi mapred-site.xml

Paste following between <configuration> tags as shown.

<configuration>
                <property>
                        <name>mapreduce.framework.name</name>
                        <value>yarn</value>
                </property>
</configuration>

12. Lastly, modify the yarn-site.xml file.

# cd $HADOOP_PREFIX/etc/hadoop
# vi yarn-site.xml

Paste following between <configuration> tags as shown.

<configuration>
                <property>
                       <name>yarn.nodemanager.aux-services</name>
                       <value>mapreduce_shuffle</value>
                </property>
</configuration>

Formatting the HDFS File System via the NameNode

13. Before starting the Cluster, we need to format the Hadoop NN in our local system where it has been installed. Usually, it will be done in the initial stage before starting the cluster the first time.

Formatting the NN will cause loss of data in NN metastore, so we have to be more cautious, we should not format NN while the cluster is running unless it is required intentionally.

# cd $HADOOP_PREFIX
# bin/hadoop namenode -format
Format HDFS Filesystem
Format HDFS Filesystem

14. Start NameNode daemon and DataNode daemon: (port 50070).

# cd $HADOOP_PREFIX
# sbin/start-dfs.sh
Start NameNode and DataNode Daemon
Start NameNode and DataNode Daemon

15. Start ResourceManager daemon and NodeManager daemon: (port 8088).

# sbin/start-yarn.sh
Start ResourceManager and NodeManager Daemon
Start ResourceManager and NodeManager Daemon

16. To stop all the services.

# sbin/stop-dfs.sh
# sbin/stop-dfs.sh
Summary

Summary
In this article, we have gone through the step by step process to set up Hadoop Pseudonode (Single Node) Cluster. If you have basic knowledge of Linux and follow these steps, the cluster will be UP in 40 minutes.

This can be very useful for the beginner to start learning and practice Hadoop or this vanilla version of Hadoop can be used for Development purposes. If we want to have a real-time cluster, either we need at least 3 physical servers in hand or have to provision Cloud for having multiple servers.

Related

Tags: CentOS Tips, Hadoop Tips

How to Monitor Ubuntu Performance Using Netdata

Prev Post

Rmmod Command in Linux

Next Post
Archives
  • January 2023
  • December 2022
  • November 2022
  • October 2022
  • September 2022
  • July 2022
  • June 2022
  • April 2022
  • March 2022
  • February 2022
  • January 2022
  • December 2021
  • November 2021
  • October 2021
  • September 2021
  • August 2021
  • July 2021
  • June 2021
  • May 2021
  • April 2021
  • March 2021
  • February 2021
  • January 2021
  • December 2020
  • November 2020
  • October 2020
  • September 2020
  • August 2020
  • July 2020
  • June 2020
  • May 2020
Categories
  • AlmaLinux
  • Android
  • Ansible
  • Apache
  • Arch Linux
  • AWS
  • Backups
  • Bash Shell
  • Bodhi Linux
  • CentOS
  • CentOS Stream
  • Chef
  • Cloud Software
  • CMS
  • Commandline Tools
  • Control Panels
  • CouchDB
  • Data Recovery Tools
  • Databases
  • Debian
  • Deepin Linux
  • Desktops
  • Development Tools
  • Docker
  • Download Managers
  • Drupal
  • Editors
  • Elementary OS
  • Encryption Tools
  • Fedora
  • Firewalls
  • FreeBSD
  • FTP
  • GIMP
  • Git
  • Hadoop
  • HAProxy
  • Java
  • Jenkins
  • Joomla
  • Kali Linux
  • KDE
  • Kubernetes
  • KVM
  • Laravel
  • Let's Encrypt
  • LFCA
  • Linux Certifications
  • Linux Commands
  • Linux Desktop
  • Linux Distros
  • Linux IDE
  • Linux Mint
  • Linux Talks
  • Lubuntu
  • LXC
  • Mail Server
  • Manjaro
  • MariaDB
  • MongoDB
  • Monitoring Tools
  • MySQL
  • Network
  • Networking Commands
  • NFS
  • Nginx
  • Nodejs
  • NTP
  • Open Source
  • OpenSUSE
  • Oracle Linux
  • Package Managers
  • Pentoo
  • PHP
  • Podman
  • Postfix Mail Server
  • PostgreSQL
  • Python
  • Questions
  • RedHat
  • Redis Server
  • Rocky Linux
  • Security
  • Shell Scripting
  • SQLite
  • SSH
  • Storage
  • Suse
  • Terminals
  • Text Editors
  • Top Tools
  • Torrent Clients
  • Tutorial
  • Ubuntu
  • Udemy Courses
  • Uncategorized
  • VirtualBox
  • Virtualization
  • VMware
  • VPN
  • VSCode Editor
  • Web Browsers
  • Web Design
  • Web Hosting
  • Web Servers
  • Webmin
  • Windows
  • Windows Subsystem
  • WordPress
  • Zabbix
  • Zentyal
  • Zorin OS
Visits
  • 0
  • 259
  • 614,631

DesignLinux.com © All rights reserved

Go to mobile version