TL;DR
A cluster is constructed with a total of eight servers, one server that also serves as the management node and master node, and seven other servers. The OS and version are as follows.
Cloudera Manager and the management node to which many roles are assigned consume a lot of memory, so it is recommended to add as much as possible. Or it should be divided by a dedicated node.
In this paper, the master node is shown as amdahl01.example.jp
, and the other slave nodes are shown as amdahl [02-08] .example.jp
.
The work is basically based on the Cloudera Installation Guide.
The user creates an account called cloudera
at the time of installation. Any account with sudo privileges will do.
Make sure that sudo privileges can be executed without a password.
sudo update-alternatives --config editor #Change default editor
sudo visudo #Allow cloudera users to run sudo without a password
/etc/sudoers
cloudera ALL=(ALL) NOPASSWD: ALL
Install Ubuntu 18.04 Server on all nodes. Please adjust the network settings etc. according to the environment. This paper uses netplan
. The mirror server is set to http://jp.archive.ubuntu.com/ubuntu/
. After installation, update the package to the latest one.
sudo apt update -y && sudo apt upgrade -y
Set the host name and FQDN appropriately for each node. Specify the host name at the time of installation or set it with a command.
sudo hostnamectl set-hostname amdahl01.example.jp
Allows you to resolve the IP address and FQDN of each node. There is no need to publish to the outside, so edit / etc / hosts
and associate it. You may want to remove the association between the host name and localhost.
/etc/hosts
127.0.0.1 localhost
192.168.141.141 amdahl01.example.jp amdahl01
192.168.141.142 amdahl02.example.jp amdahl02
192.168.141.143 amdahl03.example.jp amdahl03
192.168.141.144 amdahl04.example.jp amdahl04
192.168.141.145 amdahl05.example.jp amdahl05
192.168.141.146 amdahl06.example.jp amdahl06
192.168.141.147 amdahl07.example.jp amdahl07
192.168.141.148 amdahl08.example.jp amdahl08
Disable the firewall. This time it is systemd, so you can stop it with the following command.
sudo systemctl stop ufw
sudo systemctl disable ufw
If SELinux is enabled, please disable it. This time it is not installed, so I will omit it.
Time management should already be done under systemd, so switch to ntp.
sudo systemctl stop systemd-timesyncd
sudo systemctl disable systemd-timesyncd
Install ntp.
Edit /etc/ntp.conf
to set up the ntp server. I have set ntp.nict.jp
safely.
sudo apt install -y ntp ntpdate
sudo systemctl start ntp
sudo systemctl enable ntp
vi /etc/ntp.conf
ntpdate -u ntp.nict.jp
hwclock --systohc
Install the required software.
sudo apt install -y vim tmux git curl wget build-essential
sudo apt install -y make xz-utils file liblzma-dev patch zlib1g-dev g++ gcc
Install the database software. Since the built-in PostgreSQL server is built when Cloudera Manager is installed, set it up according to the usage environment. For details, refer to Official Guide.
The following is an example of setting up MySQL. The database and users shown in the table are created.
Service | Database | User |
---|---|---|
Cloudera Manager Server | scm | scm |
Acrivity Monitor | amon | amon |
Hue | hue | hue |
Hive | hive | hive |
Oozie | oozie | oozie |
sudo apt install -y mysql-server libmysql-java
sudo /usr/bin/mysql_secure_installation
mysql -u root -p
create database <database> default character set utf8 default collate utf8_general_ci;
GRANT ALL ON <database>.* TO '<user>'@'%' IDENTIFIED BY '<password>';
Set the database used for Cloudera Manager with the following script. Other ecosystem databases are set using the Web GUI when creating a cluster.
/opt/cloudera/cm/schema/scm_prepare_database.sh mysql scm scm scm
If you do not use the built-in database, the following services are not required. If it is running, you can stop it.
sudo systemctl stop cloudera-scm-server-db.service
sudo systemctl disable cloudera-scm-server-db.service
Install Cloudera Manager (administrative node only).
Download the installer and give it execute permission. When you execute it, the TUI installer will start up, but there is nothing to worry about, so proceed as it is. If the installation is completed normally, you can access it from your browser with amdahl01.example.jp: 7180
.
wget https://archive.cloudera.com/cm6/6.3.0/cloudera-manager-installer.bin
chmod u+x cloudera-manager-installer.bin
sudo ./cloudera-manager-installer.bin
Log in to the Cloudera Manager web console. The username / password is admin / admin
by default.
Follow the instructions on the screen to proceed with the setup. If you don't change the settings, you may be at a loss.
The managed host (the target to install the ecosystem) is specified by FQDN. This time, I searched and selected amdahl [01-08] .example.jp
to install it on all nodes.
In order to install the software on each node, it is necessary to be able to SSH from the management node to other nodes and to be able to execute the sudo command without a password. It's a good idea to use the cloudera
account you created first. The SSH credential sets the password and public key.
In addition, please specify the ecosystem, database, etc. according to the desired environment. A fast network speed is recommended as it takes a considerable amount of time to download and install the ecosystem.
When the setup is completed, the screen will change to the dashboard screen. Probably many warnings are displayed, so please review the settings one by one.
This completes the cluster setup. Thank you for your hard work.
Recommended Posts