Logo
  • Ubuntu
  • CentOS
  • Debian
  • Fedora
  • RedHat

How to Install and Setup Apache Spark on Ubuntu/Debian - DesignLinux

May 04 2021
designlinux 0 Comments

Apache Spark is an open-source distributed computational framework that is created to provide faster computational results. It is an in-memory computational engine, meaning the data will be processed in memory.

Spark supports various APIs for streaming, graph processing, SQL, MLLib. It also supports Java, Python, Scala, and R as the preferred languages. Spark is mostly installed in Hadoop clusters but you can also install and configure spark in standalone mode.

In this article, we will be seeing how to install Apache Spark in Debian and Ubuntu-based distributions.

Install Java and Scala in Ubuntu

To install Apache Spark in Ubuntu, you need to have Java and Scala installed on your machine. Most of the modern distributions come with Java installed by default and you can verify it using the following command.

$ java -version
Check Java Version in Ubuntu
Check Java Version in Ubuntu

If no output, you can install Java using our article on how to install Java on Ubuntu or simply run the following commands to install Java on Ubuntu and Debian-based distributions.

$ sudo apt update
$ sudo apt install default-jre
$ java -version
Install Java in Ubuntu
Install Java in Ubuntu

Next, you can install Scala from the apt repository by running the following commands to search for scala and install it.

$ sudo apt search scala  ⇒ Search for the package
$ sudo apt install scala ⇒ Install the package
Install Scala in Ubuntu
Install Scala in Ubuntu

To verify the installation of Scala, run the following command.

$ scala -version 

Scala code runner version 2.11.12 -- Copyright 2002-2017, LAMP/EPFL

Install Apache Spark in Ubuntu

Now go to the official Apache Spark download page and grab the latest version (i.e. 3.1.1) at the time of writing this article. Alternatively, you can use the wget command to download the file directly in the terminal.

$ wget https://apachemirror.wuchna.com/spark/spark-3.1.1/spark-3.1.1-bin-hadoop2.7.tgz

Now open your terminal and switch to where your downloaded file is placed and run the following command to extract the Apache Spark tar file.

$ tar -xvzf spark-3.1.1-bin-hadoop2.7.tgz

Finally, move the extracted Spark directory to /opt directory.

$ sudo mv spark-3.1.1-bin-hadoop2.7 /opt/spark

Configure Environmental Variables for Spark

Now you have to set a few environmental variables in your .profile file before starting up the spark.

$ echo "export SPARK_HOME=/opt/spark" >> ~/.profile
$ echo "export PATH=$PATH:/opt/spark/bin:/opt/spark/sbin" >> ~/.profile
$ echo "export PYSPARK_PYTHON=/usr/bin/python3" >> ~/.profile

To make sure that these new environment variables are reachable within the shell and available to Apache Spark, it is also mandatory to run the following command to take recent changes into effect.

$ source ~/.profile

All the spark-related binaries to start and stop the services are under the sbin folder.

$ ls -l /opt/spark
Spark Binaries
Spark Binaries

Start Apache Spark in Ubuntu

Run the following command to start the Spark master service and slave service.

$ start-master.sh
$ start-workers.sh spark://localhost:7077
Start Spark Service
Start Spark Service

Once the service is started go to the browser and type the following URL access spark page. From the page, you can see my master and slave service is started.

http://localhost:8080/
OR
http://127.0.0.1:8080
Spark Web Page
Spark Web Page

You can also check if spark-shell works fine by launching the spark-shell command.

$ spark-shell
Spark Shell
Spark Shell

That’s it for this article. We will catch you with another interesting article very soon.

Related

Tags: Apache Spark, Debian Tips, Ubuntu Tips

How to Install Apache Nifi in Ubuntu Linux

Prev Post

How to Monitor Linux Server and Process Metrics from Browser

Next Post
Archives
  • January 2023
  • December 2022
  • November 2022
  • October 2022
  • September 2022
  • July 2022
  • June 2022
  • April 2022
  • March 2022
  • February 2022
  • January 2022
  • December 2021
  • November 2021
  • October 2021
  • September 2021
  • August 2021
  • July 2021
  • June 2021
  • May 2021
  • April 2021
  • March 2021
  • February 2021
  • January 2021
  • December 2020
  • November 2020
  • October 2020
  • September 2020
  • August 2020
  • July 2020
  • June 2020
  • May 2020
Categories
  • AlmaLinux
  • Android
  • Ansible
  • Apache
  • Arch Linux
  • AWS
  • Backups
  • Bash Shell
  • Bodhi Linux
  • CentOS
  • CentOS Stream
  • Chef
  • Cloud Software
  • CMS
  • Commandline Tools
  • Control Panels
  • CouchDB
  • Data Recovery Tools
  • Databases
  • Debian
  • Deepin Linux
  • Desktops
  • Development Tools
  • Docker
  • Download Managers
  • Drupal
  • Editors
  • Elementary OS
  • Encryption Tools
  • Fedora
  • Firewalls
  • FreeBSD
  • FTP
  • GIMP
  • Git
  • Hadoop
  • HAProxy
  • Java
  • Jenkins
  • Joomla
  • Kali Linux
  • KDE
  • Kubernetes
  • KVM
  • Laravel
  • Let's Encrypt
  • LFCA
  • Linux Certifications
  • Linux Commands
  • Linux Desktop
  • Linux Distros
  • Linux IDE
  • Linux Mint
  • Linux Talks
  • Lubuntu
  • LXC
  • Mail Server
  • Manjaro
  • MariaDB
  • MongoDB
  • Monitoring Tools
  • MySQL
  • Network
  • Networking Commands
  • NFS
  • Nginx
  • Nodejs
  • NTP
  • Open Source
  • OpenSUSE
  • Oracle Linux
  • Package Managers
  • Pentoo
  • PHP
  • Podman
  • Postfix Mail Server
  • PostgreSQL
  • Python
  • Questions
  • RedHat
  • Redis Server
  • Rocky Linux
  • Security
  • Shell Scripting
  • SQLite
  • SSH
  • Storage
  • Suse
  • Terminals
  • Text Editors
  • Top Tools
  • Torrent Clients
  • Tutorial
  • Ubuntu
  • Udemy Courses
  • Uncategorized
  • VirtualBox
  • Virtualization
  • VMware
  • VPN
  • VSCode Editor
  • Web Browsers
  • Web Design
  • Web Hosting
  • Web Servers
  • Webmin
  • Windows
  • Windows Subsystem
  • WordPress
  • Zabbix
  • Zentyal
  • Zorin OS
Visits
  • 0
  • 606
  • 1,055,378

DesignLinux.com © All rights reserved

Go to mobile version