Building a Hadoop cluster (Cloudera Manager on Ubuntu 18.04)

TL;DR

environment

A cluster is constructed with a total of eight servers, one server that also serves as the management node and master node, and seven other servers. The OS and version are as follows.

Cloudera Manager and the management node to which many roles are assigned consume a lot of memory, so it is recommended to add as much as possible. Or it should be divided by a dedicated node.

In this paper, the master node is shown as amdahl01.example.jp, and the other slave nodes are shown as amdahl [02-08] .example.jp.

Installation

The work is basically based on the Cloudera Installation Guide.

The user creates an account called cloudera at the time of installation. Any account with sudo privileges will do. Make sure that sudo privileges can be executed without a password.

sudo update-alternatives --config editor #Change default editor
sudo visudo #Allow cloudera users to run sudo without a password

/etc/sudoers


cloudera ALL=(ALL) NOPASSWD: ALL

Install Ubuntu 18.04 Server on all nodes. Please adjust the network settings etc. according to the environment. This paper uses netplan. The mirror server is set to http://jp.archive.ubuntu.com/ubuntu/. After installation, update the package to the latest one.

sudo apt update -y && sudo apt upgrade -y

network

Set the host name and FQDN appropriately for each node. Specify the host name at the time of installation or set it with a command.

sudo hostnamectl set-hostname amdahl01.example.jp

Allows you to resolve the IP address and FQDN of each node. There is no need to publish to the outside, so edit / etc / hosts and associate it. You may want to remove the association between the host name and localhost.

/etc/hosts


127.0.0.1 localhost

192.168.141.141 amdahl01.example.jp amdahl01
192.168.141.142 amdahl02.example.jp amdahl02
192.168.141.143 amdahl03.example.jp amdahl03
192.168.141.144 amdahl04.example.jp amdahl04
192.168.141.145 amdahl05.example.jp amdahl05
192.168.141.146 amdahl06.example.jp amdahl06
192.168.141.147 amdahl07.example.jp amdahl07
192.168.141.148 amdahl08.example.jp amdahl08

Preparation for installation

Disable the firewall. This time it is systemd, so you can stop it with the following command.

sudo systemctl stop ufw
sudo systemctl disable ufw

If SELinux is enabled, please disable it. This time it is not installed, so I will omit it.

Time management should already be done under systemd, so switch to ntp.

sudo systemctl stop systemd-timesyncd
sudo systemctl disable systemd-timesyncd

Install ntp. Edit /etc/ntp.conf to set up the ntp server. I have set ntp.nict.jp safely.

sudo apt install -y ntp ntpdate
sudo systemctl start ntp
sudo systemctl enable ntp

vi /etc/ntp.conf
ntpdate -u ntp.nict.jp
hwclock --systohc

Install the required software.

sudo apt install -y vim tmux git curl wget build-essential
sudo apt install -y make xz-utils file liblzma-dev patch zlib1g-dev g++ gcc 

MariaDB / MySQL / PostgreSQL installation (optional)

Install the database software. Since the built-in PostgreSQL server is built when Cloudera Manager is installed, set it up according to the usage environment. For details, refer to Official Guide.

The following is an example of setting up MySQL. The database and users shown in the table are created.

Service Database User
Cloudera Manager Server scm scm
Acrivity Monitor amon amon
Hue hue hue
Hive hive hive
Oozie oozie oozie
sudo apt install -y mysql-server libmysql-java
sudo /usr/bin/mysql_secure_installation
mysql -u root -p
create database <database> default character set utf8 default collate utf8_general_ci;
GRANT ALL ON <database>.* TO '<user>'@'%' IDENTIFIED BY '<password>';

Set the database used for Cloudera Manager with the following script. Other ecosystem databases are set using the Web GUI when creating a cluster.

/opt/cloudera/cm/schema/scm_prepare_database.sh mysql scm scm scm

If you do not use the built-in database, the following services are not required. If it is running, you can stop it.

sudo systemctl stop cloudera-scm-server-db.service
sudo systemctl disable cloudera-scm-server-db.service

Install Cloudera Manager

Install Cloudera Manager (administrative node only). Download the installer and give it execute permission. When you execute it, the TUI installer will start up, but there is nothing to worry about, so proceed as it is. If the installation is completed normally, you can access it from your browser with amdahl01.example.jp: 7180.

wget https://archive.cloudera.com/cm6/6.3.0/cloudera-manager-installer.bin
chmod u+x cloudera-manager-installer.bin
sudo ./cloudera-manager-installer.bin

Cluster installation

Log in to the Cloudera Manager web console. The username / password is admin / admin by default. Follow the instructions on the screen to proceed with the setup. If you don't change the settings, you may be at a loss.

The managed host (the target to install the ecosystem) is specified by FQDN. This time, I searched and selected amdahl [01-08] .example.jp to install it on all nodes. In order to install the software on each node, it is necessary to be able to SSH from the management node to other nodes and to be able to execute the sudo command without a password. It's a good idea to use the cloudera account you created first. The SSH credential sets the password and public key.

In addition, please specify the ecosystem, database, etc. according to the desired environment. A fast network speed is recommended as it takes a considerable amount of time to download and install the ecosystem.

When the setup is completed, the screen will change to the dashboard screen. Probably many warnings are displayed, so please review the settings one by one.

2020-09-12_064436_221584607.png

This completes the cluster setup. Thank you for your hard work.

Recommended Posts

Building a Hadoop cluster (Cloudera Manager on Ubuntu 18.04)
[Ruby] Building a Ruby development environment on Ubuntu
Building WebGIS on Ubuntu20.04 LTS
Build a XAMPP environment on Ubuntu
Building a Deep Learning environment (Ubuntu 20.04 LTS)
Building a DLNA server on Ubuntu (just move for the time being)
Try building a GPU container on GCP.
Write a dockerfile to start jupyter-lab on ubuntu
Building a Ruby environment for classes on Mac
Build a DHCP and NAT router on Ubuntu 16.04
How to build a Pytorch environment on Ubuntu
tmux on Ubuntu