Verification environment

Item Version
OS Ubuntu 18.04
OpenSSH 7.6p1

Apply for GPU server (via Terrass)


About GPU server

Item Spec
OS Ubuntu 18.04(64bit)
GPU NVIDIA Tesla V100 (32GB) ×1
CPU Xeon 4Core 3.7GHz 1CPU
Disk MLC SSD 480GB ×2
Memory 64GB

Application flow

  1. After registering as a member of Terras (free of charge), apply for a development environment.
  1. The period can be selected from 1 month, 3 months or more (consultation required)
  1. After a while after applying, the operation will contact you with your login ID.

Environment construction (GPU)

Basically, follow the procedure of CUDA Toolkit / GPU card driver installation procedure

Server information

Tellus account dashboard → See development environment

Item Corresponding item
Server IP Environment host name / IP
Login ID Emailed from the operation
Initial password Token information / SSHPW information


Connect to server


Host tellus
     HostName [Environment host name / IP]
     User [Login ID]
     IdentityFile ~/.ssh/id_rsa

Package update and installation

Preparation before installing GPU driver

sudo apt update
sudo apt upgrade
apt install build-essential
apt install dkms

CUDA Toolkit

sudo sh
chmod +x
sudo ./ --toolkit --samples --samplespath=/usr/local/cuda-samples --no-opengl-libs


export CUDA_HOME="/usr/local/cuda" 
export PATH="$CUDA_HOME/bin:$PATH" 
export LD_LIBRARY_PATH="/usr/local/lib:$CUDA_HOME/lib64:$LD_LIBRARY_PATH" 
export CPATH="/usr/local/include:$CUDA_HOME/include:$CPATH" 
export INCLUDE_PATH="$CUDA_HOME/include" 


CUDA Driver

chmod +x
sudo ./ --no-opengl-files --no-libglx-indirect --dkms



scp -r cudnn-10.2-linux-x64-v8.0.3.33.tgz tellus:~/


tar xvzf cudnn-10.2-linux-x64-v8.0.3.33.tgz
sudo mv cuda/include/cudnn.h /usr/local/cuda/include/
sudo mv cuda/lib64/* /usr/local/cuda/lib64/

Installation confirmation


Environment construction (Python)


sudo bash
conda update -n base conda


export PYTHONPATH="/home/[Login ID]/anaconda3/envs/py38/lib/python3.8:/home/[Login ID]/anaconda3/envs/py38/lib/python3.8/site-packages:$PYTHONPATH"


conda install pytorch torchvision cudatoolkit=10.2 -c pytorch


conda install -c conda-forge mlflow


Host tellus
     HostName [Environment host name / IP]
     User [Login ID]
     IdentityFile ~/.ssh/id_rsa
     LocalForward [Client side port number] localhost:5000


conda install -c conda-forge qgis=3.10.8

Operation check

GPU learning

import os

import torch
import torch.nn as nn
import torch.optim as optim
import torchvision.models as models
import torchvision.transforms as transforms
from import DataLoader
from torchvision.datasets import CIFAR10
from tqdm import tqdm

batch = 1024
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

def dataloader(is_train: bool, transform: transforms.Compose) -> DataLoader:
    dataset = CIFAR10(root='./data', train=is_train, download=True, transform=transform)
    return DataLoader(dataset, batch_size=batch, shuffle=is_train, num_workers=os.cpu_count())

def model() -> nn.Module:
    model = models.resnet18(pretrained=True)
    model.fc = nn.Linear(512, 10)

def training(net: nn.Module, trainloader: DataLoader, epochs: int) -> None:
    # loss function & optimizer
    criterion = nn.CrossEntropyLoss()
    optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)

    for epoch in range(epochs):  # loop over the dataset multiple times
        running_loss = 0.0
        bar = tqdm(trainloader, desc="training model [epoch:{:02d}]".format(epoch), total=len(trainloader))
        for data in bar:
            # get the inputs; data is a list of [inputs, labels]
            inputs, labels = data[0].to(device), data[1].to(device)

            # zero the parameter gradients

            # forward + backward + optimize
            outputs = net(inputs)
            loss = criterion(outputs, labels)

            running_loss += loss.item()
            bar.set_postfix(device=device, batch=batch, loss=(running_loss / len(trainloader)))

    print('Finished Training')

transform = transforms.Compose(
    [transforms.ToTensor(), transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])

trainloader = dataloader(True, transform)
net = model()
training(net, trainloader, 3)

CPU results cpu_batch_1024.png cpu_batch_1024_smi.png

GPU results gpu_batch_1024.png gpu_batch_1024_smi.png


from math import pi, sin

import mlflow

amplitude = 2.0

with mlflow.start_run() as _:
    mlflow.log_param('amplitude', amplitude)
    for i in range(360):
        sin_val = amplitude * sin(i * pi / 180.)
        mlflow.log_metric('sin wave', sin_val, step=i)


mlflow ui

Result image mlflow_localforward.png mlflow_test.png mlflow_sinwave.png


ssh -X tellus

Using VS Code

conda install -c conda-forge ipykernel

in conclusion

Reference page

Tellus FAQ Bamboo shoot blog-Building a PyTorch environment from the Terraus GPU server

