I searched based on the official document. The syntax is quite similar to chainer, but be aware that some functions are slightly different.
A deep learning framework for images led by facebook and New York University. It seems that it was forked by chainer. The image of torch7 is also led by facebook and New York University.
Since torch7 is lua and not so abstracted, the function is exposed. pytorch is fairly abstract and reduces the amount of coding.
Developers as of March 2017
Adam Paszke http://apaszke.github.io/posts.html Soumith Chintala http://soumith.ch/
As of March 2017. I don't know if you can measure the excitement with the git graph, but I was curious, so I compared it. chainer pytorch keras tensorflow torch7 caffe caffe2 theano deeplearning4j
cntk
As expected, caffe and torch7 have not been updated very much. Surprisingly cntk ...
2018/6 It was written in the forum that cntk, tensorflow, theano, mxnet were mostly wrapped in keras, but pytorch is a high-level framework and there is a high theory that it will not be wrapped. I was surprised when the author of keras wrote "Will cntk also wrap with keras?" In the issue of cntk, and the cntk person found a "like" comment.
The number of hits in git search of various APIs. For your information.
conda is recommended. I'm trying to keep pip and numpy up to date. official http://pytorch.org/
conda install pytorch torchvision -c soumith
win
conda install -c peterjc123 pytorch
If desired, you can reuse Python packages such as numpy, scipy, and Cython to extend PyTorch.
package | Description |
---|---|
torch | Tensor library with strong GPU support like NumPy |
torch.autograd | Tape-based automatic differentiation library that supports all differentiable tensor operations in torch |
torch.nn | Neural network library integrated with automatic differentiation function designed for maximum flexibility |
torch.optim | Torch using standard optimization techniques such as SGD, RMSProp, LBFGS, Adam.Optimized package for use with nn |
torch.multiprocessing | Allows magical memory sharing of torch tensors throughout the process, rather than Python multiprocessing. Useful for data loading and hogwald training. |
torch.utils | DataLoader, Trainer and other utility functions |
torch.legacy(.nn/.optim) | Legacy code ported from the torch for backwards compatibility reasons |
PyTorch often uses the features around here. I will explain it later.
requires_grad: You can specify whether to calculate the gradient. backward: Gradient calculation is possible. nn.Module: Inherits this to define a network class. DataSet and DataLoader: Used to load data in batches. datasets.ImageFolder: Images can be easily read by arranging them separately for each folder. After this, you can put it in DataLoader and process it separately for each batch. transforms: Image data can be preprocessed. make_grid: When the image is displayed, it is displayed side by side in the grid.
If you use numpy, use Tensors.
PyTorch provides Tensors that reside on either the CPU or the GPU, accelerating huge amounts of computation. We offer a variety of tensor routines to accelerate and adapt your scientific computing needs, including slicing, indexing, mathematical operations, linear algebra, and reduction.
PyTorch has a unique way to build a neural network that plays back using a tape recorder.
Most frameworks such as TensorFlow, Theano, Caffe, CNTK, etc. I'm looking at the world statically. You need to build a neural network and reuse the same structure over and over again. Changing the behavior of your network means that you have to start from the beginning.
PyTorch allows you to change the way your network behaves arbitrarily with zero lag and overhead using a technique called automatic differentiation in reverse mode. Our inspiration comes from several research papers on this topic, as well as current and past studies such as autograd, autograd, and chainer.
Since it is made by forking from chainer, it claims that it is a dynamic network as claimed by chainer. Does it mean that you can change the network on the way?
Check the version.
python
import torch
print(torch.__version__)
Data acquisition
train_loader = torch.utils.data.DataLoader(
datasets.MNIST('../data', train=True, download=True,
transform=transforms.Compose([
transforms.ToTensor(),
transforms.Normalize((0.1307,), (0.3081,))
])),
batch_size=args.batch_size, shuffle=True, **kwargs)
test_loader = torch.utils.data.DataLoader(
datasets.MNIST('../data', train=False, transform=transforms.Compose([
transforms.ToTensor(),
transforms.Normalize((0.1307,), (0.3081,))
])),
batch_size=args.batch_size, shuffle=True, **kwargs)
Model definition
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.conv1 = nn.Conv2d(1, 10, kernel_size=5)
self.conv2 = nn.Conv2d(10, 20, kernel_size=5)
self.conv2_drop = nn.Dropout2d()
self.fc1 = nn.Linear(320, 50)
self.fc2 = nn.Linear(50, 10)
def forward(self, x):
x = F.relu(F.max_pool2d(self.conv1(x), 2))
x = F.relu(F.max_pool2d(self.conv2_drop(self.conv2(x)), 2))
x = x.view(-1, 320)
x = F.relu(self.fc1(x))
x = F.dropout(x, training=self.training)
x = F.relu(self.fc2(x))
return F.log_softmax(x)
Model generation, optimization function setting
model = Net()
if args.cuda:
model.cuda()
optimizer = optim.SGD(model.parameters(), lr=args.lr, momentum=args.momentum)
Learning
def train(epoch):
model.train()
for batch_idx, (data, target) in enumerate(train_loader):
if args.cuda:
data, target = data.cuda(), target.cuda()
data, target = Variable(data), Variable(target)
optimizer.zero_grad()
output = model(data)
loss = F.nll_loss(output, target)
loss.backward()
optimizer.step()
if batch_idx % args.log_interval == 0:
print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
epoch, batch_idx * len(data), len(train_loader.dataset),
100. * batch_idx / len(train_loader), loss.data[0]))
Execution of learning
for epoch in range(1, args.epochs + 1):
train(epoch)
test(epoch)
It's just a chainer, and there is no study cost.
https://github.com/pytorch/tutorials/blob/master/Introduction%20to%20PyTorch%20for%20former%20Torchies.ipynb
import torch
a = torch.FloatTensor(10, 20)
# creates tensor of size (10 x 20) with uninitialized memory
a = torch.randn(10, 20)
# initializes a tensor randomized with a normal distribution with mean=0, var=1
a.size()
Since torch.Size is actually a tuple, it supports the same operation.
Adding _ at the end changes the contents of the original variable.
a.fill_(3.5)
# a has now been filled with the value 3.5
b = a.add(4.0)
# a is still filled with 3.5
# new tensor b is returned with values 3.5 + 4.0 = 7.5
b = a[0,3] #1st row and 4th column
b = a[:,3:5] #4th and 5th columns
Not all functions are camelCase. For example, indexAdd is index_add_.
x = torch.ones(5, 5)
print(x)
z = torch.Tensor(5, 2)
z[:,0] = 10
z[:,1] = 100
print(z)
x.index_add_(1, torch.LongTensor([4,0]), z)
print(x)
Conversion from torch tensor to numpy array
a = torch.ones(5)
b = a.numpy()
a.add_(1)
print(a)
print(b)
Convert numpy array to torch Tensor
import numpy as np
a = np.ones(5)
b = torch.from_numpy(a)
np.add(a, 1, out=a)
print(a)
print(b)
# let us run this cell only if CUDA is available
if torch.cuda.is_available():
# creates a LongTensor and transfers it
# to GPU as torch.cuda.LongTensor
a = torch.LongTensor(10).fill_(3).cuda()
print(type(a))
b = a.cpu()
# transfers it to CPU, back to
# being a torch.LongTensor
Autograd Autograd introduces the Variable class. This is a Tensor wrapper.
from torch.autograd import Variable
x = Variable(torch.ones(2, 2), requires_grad = True)
x.data
x.grad
y = x + 2
z = y * y * 3
out = z.mean()
out.backward()
import torch.nn as nn
The state is not kept in the module, but in the network graph
Create class
import torch.nn.functional as F
class MNISTConvNet(nn.Module):
def __init__(self):
super(MNISTConvNet, self).__init__()
self.conv1 = nn.Conv2d(1, 10, 5)
self.pool1 = nn.MaxPool2d(2,2)
self.conv2 = nn.Conv2d(10, 20, 5)
self.pool2 = nn.MaxPool2d(2, 2)
self.fc1 = nn.Linear(320, 50)
self.fc2 = nn.Linear(50, 10)
def forward(self, input):
x = self.pool1(F.relu(self.conv1(input)))
x = self.pool2(F.relu(self.conv2(x)))
x = x.view(x.size(0), -1)
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
return x
Create an instance of the class
net = MNISTConvNet()
print(net)
input = Variable(torch.randn(1, 1, 28, 28))
out = net(input)
print(out.size())
# define a dummy target label
target = Variable(torch.LongTensor([3]))
# create a loss function
loss_fn = nn.CrossEntropyLoss() # LogSoftmax + ClassNLL Loss
err = loss_fn(out, target)
print(err)
err.backward()
The output of ConvNet is a variable. Using it to calculate the loss results in err, which is also a variable. Conversely, calling err propagates the gradient across ConvNet to the weights.
Access the weights and gradients of individual layers.
print(net.conv1.weight.grad.size())
print(net.conv1.weight.data.norm()) # norm of the weight
print(net.conv1.weight.grad.data.norm()) # norm of the gradients
We examined the weights and gradients. But what about layer output and grad_output inspection / modification? Introducing hooks for this purpose.
You can register a function in a module or variable. The hook can be a front hook or a rear hook. The forward hook is executed when the forward call is made. The rear hook runs in the rear phase. Let's look at an example.
# We register a forward hook on conv2 and print some information
def printnorm(self, input, output):
# input is a tuple of packed inputs
# output is a Variable. output.data is the Tensor we are interested
print('Inside ' + self.__class__.__name__ + ' forward')
print('')
print('input: ', type(input))
print('input[0]: ', type(input[0]))
print('output: ', type(output))
print('')
print('input size:', input[0].size())
print('output size:', output.data.size())
print('output norm:', output.data.norm())
net.conv2.register_forward_hook(printnorm)
out = net(input)
# We register a backward hook on conv2 and print some information
def printgradnorm(self, grad_input, grad_output):
print('Inside ' + self.__class__.__name__ + ' backward')
print('Inside class:' + self.__class__.__name__)
print('')
print('grad_input: ', type(grad_input))
print('grad_input[0]: ', type(grad_input[0]))
print('grad_output: ', type(grad_output))
print('grad_output[0]: ', type(grad_output[0]))
print('')
print('grad_input size:', grad_input[0].size())
print('grad_output size:', grad_output[0].size())
print('grad_input norm:', grad_input[0].data.norm())
net.conv2.register_backward_hook(printgradnorm)
out = net(input)
err = loss_fn(out, target)
err.backward()
Next, let's see how to build a recursive net with PyTorch. The state of the network is kept in layers, not graphs, so you can simply create nn.Linear and reuse it over and over again for recursion.
class RNN(nn.Module):
# you can also accept arguments in your model constructor
def __init__(self, data_size, hidden_size, output_size):
super(RNN, self).__init__()
self.hidden_size = hidden_size
input_size = data_size + hidden_size
self.i2h = nn.Linear(input_size, hidden_size)
self.h2o = nn.Linear(hidden_size, output_size)
def forward(self, data, last_hidden):
input = torch.cat((data, last_hidden), 1)
hidden = self.i2h(input)
output = self.h2o(hidden)
return hidden, output
rnn = RNN(50, 20, 10)
loss_fn = nn.MSELoss()
batch_size = 10
TIMESTEPS = 5
# Create some fake data
batch = Variable(torch.randn(batch_size, 50))
hidden = Variable(torch.zeros(batch_size, 20))
target = Variable(torch.zeros(batch_size, 10))
loss = 0
for t in range(TIMESTEPS):
# yes! you can reuse the same network several times,
# sum up the losses, and call backward!
hidden, output = rnn(batch, hidden)
loss += loss_fn(output, target)
loss.backward()
By default PyTorch has a seamless CuDNN integration for ConvNets and Recurrent Nets.
Data parallelism is when you divide a sample mini-batch into multiple smaller mini-batch and perform the calculations for each mini-batch in parallel. Data parallelism is implemented using torch.nn.DataParallel. You can wrap the Module in a DataParallel and parallelize it on multiple GPUs in a batch dimension.
Data parallel
class DataParallelModel(nn.Module):
def __init__(self):
super().__init__()
self.block1=nn.Linear(10, 20)
# wrap block2 in DataParallel
self.block2=nn.Linear(20, 20)
self.block2 = nn.DataParallel(self.block2)
self.block3=nn.Linear(20, 20)
def forward(self, x):
x = self.block1(x)
x = self.block2(x)
x = self.block3(x)
return x
No need to change code in CPU mode.
In general, pytorch's nn.parallel primitive can be used independently. Implemented a simple MPI-like primitive. replicate: Replicate module to multiple devices scatter: distributes the input to the first dimension gather: Collect and concatenate 1D inputs parallel_apply: Applies an already distributed set of inputs to an already distributed set of models.
For better clarity, here is the function data_parallel constructed using these sets.
def data_parallel(module, input, device_ids, output_device=None):
if not device_ids:
return module(input)
if output_device is None:
output_device = device_ids[0]
replicas = nn.parallel.replicate(module, device_ids)
inputs = nn.parallel.scatter(input, device_ids)
replicas = replicas[:len(inputs)]
outputs = nn.parallel.parallel_apply(replicas, inputs)
return nn.parallel.gather(outputs, output_device)
Let's look at a small example of implementing a network, some of which is on the CPU and on the GPU.
class DistributedModel(nn.Module):
def __init__(self):
super().__init__(
embedding=nn.Embedding(1000, 10),
rnn=nn.Linear(10, 10).cuda(0),
)
def forward(self, x):
# Compute embedding on CPU
x = self.embedding(x)
# Transfer to GPU
x = x.cuda(0)
# Compute RNN on GPU
x = self.rnn(x)
return x
There is sample code such as an image generation system. I mean, there were more at all. There were so many that I didn't need to write them here, so if you want to use them, please search with git. The world was wide.
pix2pix https://github.com/mrzhu-cool/pix2pix-pytorch densenet https://github.com/bamos/densenet.pytorch animeGAN https://github.com/jayleicn/animeGAN yolo2 https://github.com/longcw/yolo2-pytorch gan https://github.com/devnag/pytorch-generative-adversarial-networks List of generated models https://github.com/wiseodd/generative-models functional model https://github.com/szagoruyko/functional-zoo Simple sample list https://github.com/pytorch/examples/
https://github.com/pytorch/tutorials/blob/master/Deep%20Learning%20with%20PyTorch.ipynb
Make the torch and torchvision packages installed.
conda install torchvision -c soumith
or
pip install torchvision
Tensors are similar to numpy's ndarray, but Tensors can also be used on the GPU.
from __future__ import print_function
import torch
x = torch.Tensor(5, 3) # construct a 5x3 matrix, uninitialized
x = torch.rand(5, 3) # construct a randomly initialized matrix
x.size()
y = torch.rand(5, 3)
# addition: syntax 1
x + y
Conversion from torch tensor to numpy array
a = torch.ones(5)
b = a.numpy()
a.add_(1)
print(a)
print(b) # see how the numpy array changed in value
Convert numpy array to torch Tensor
import numpy as np
a = np.ones(5)
b = torch.from_numpy(a)
np.add(a, 1, out=a)
print(a)
print(b) # see how changing the np array changed the torch Tensor automatically
~ Writing ~
Data type | dtype | CPU tensor | GPU tensor |
---|---|---|---|
64-bit floating point | torch.float64 or torch.double | torch.DoubleTensor | torch.cuda.DoubleTensor |
32-bit floating point | torch.float32 or torch.float | torch.FloatTensor | torch.cuda.FloatTensor |
16-bit floating point | torch.float16 or torch.half | torch.HalfTensor | torch.cuda.HalfTensor |
8-bit integer (unsigned) | torch.uint8 | torch.ByteTensor | torch.cuda.ByteTensor |
8-bit integer (signed) | torch.int8 | torch.CharTensor | torch.cuda.CharTensor |
16-bit integer (signed) | torch.int16 or torch.short | torch.ShortTensor | torch.cuda.ShortTensor |
32-bit integer (signed) | torch.int32 or torch.int | torch.IntTensor | torch.cuda.IntTensor |
64-bit integer (signed) | torch.int64 or torch.long | torch.LongTensor | torch.cuda.LongTensor |
torch7 Learning example of mnist https://github.com/torch/demos/blob/master/train-a-digit-classifier/train-on-mnist.lua
Model definition
-- define model to train
model = nn.Sequential()
model:add(nn.Reshape(1024))
model:add(nn.Linear(1024,#classes))
model:add(nn.LogSoftMax())
Data acquisition. Generate a model.
criterion = nn.ClassNLLCriterion()
trainData = mnist.loadTrainSet(nbTrainingPatches, geometry)
trainData:normalizeGlobal(mean, std)
Define learning
-- training function
function train(dataset)
-- epoch tracker
epoch = epoch or 1
~ Abbreviation ~
gradParameters:zero()
-- evaluate function for complete mini batch
local outputs = model:forward(inputs)
local f = criterion:forward(outputs, targets)
-- estimate df/dW
local df_do = criterion:backward(outputs, targets)
model:backward(inputs, df_do)
~ Abbreviation ~
end
Perform learning
while true do
-- train/test
train(trainData)
~ Abbreviation ~
end
torchnet https://github.com/torchnet/torchnet/blob/master/example/mnist.lua
mnist.lua
-- load torchnet:
local tnt = require 'torchnet'
-- use GPU or not:
~ Abbreviation ~
-- function that sets of dataset iterator:
local function getIterator(mode)
~ Abbreviation ~
end
-- set up logistic regressor:
local net = nn.Sequential():add(nn.Linear(784,10))
local criterion = nn.CrossEntropyCriterion()
-- set up training engine:
local engine = tnt.SGDEngine()
~ Abbreviation ~
end
-- set up GPU training:
~ Abbreviation ~
-- train the model:
engine:train{
network = net,
iterator = getIterator('train'),
criterion = criterion,
lr = 0.2,
maxepoch = 5,
}
-- measure test loss and error:
~ Abbreviation ~
print(string.format('test loss: %2.4f; test error: %2.4f',
meter:value(), clerr:value{k = 1}))
I wondered what to do when I wanted to transfer learning with torch or pytorch. I want to try it someday.
http://toxweblog.toxbe.com/2016/12/22/chainer-alexnet-fine-tuning/
conversion
#Path to save caffe model to load and pkl file
loadpath = "bvlc_alexnet.caffemodel"
savepath = "./chainermodels/alexnet.pkl"
from chainer.links.caffe import CaffeFunction
alexnet = CaffeFunction(loadpath)
import _pickle as pickle
pickle.dump(alexnet, open(savepath, 'wb'))
Read
if ext == ".caffemodel":
print('Loading Caffe model file %s...' % args.model, file=sys.stderr)
func = caffe.CaffeFunction(args.model)
print('Loaded', file=sys.stderr)
elif ext == ".pkl":
print('Loading Caffe model file %s...' % args.model, file=sys.stderr)
func = pickle.load(open(args.model, 'rb'))
print('Loaded', file=sys.stderr)
def predict(x):
y, = func(inputs={'data': x}, outputs=['fc8'], train=False)
return F.softmax(y)
Save keras model
hogehoge_model.save_weights('model.h5', overwrite=True)
Loading keras model
hogehoge_model.load_weights('model.h5')
Save as pkl
import _pickle as pickle
pickle.dump(hogehoge_model, open('model.pkl', 'wb'))
Read pkl
hogehoge_model = pickle.load(open('model.pkl', 'rb'))
https://github.com/ethereon/caffe-tensorflow
def convert(def_path, caffemodel_path, data_output_path, code_output_path, phase):
try:
transformer = TensorFlowTransformer(def_path, caffemodel_path, phase=phase)
print_stderr('Converting data...')
if caffemodel_path is not None:
data = transformer.transform_data()
print_stderr('Saving data...')
with open(data_output_path, 'wb') as data_out:
np.save(data_out, data)
if code_output_path:
print_stderr('Saving source...')
with open(code_output_path, 'wb') as src_out:
src_out.write(transformer.transform_source())
print_stderr('Done.')
except KaffeError as err:
fatal_error('Error encountered: {}'.format(err))
https://github.com/Cadene/tensorflow-model-zoo.torch
python3 inceptionv4/tensorflow_dump.py
th inceptionv4/torch_load.lua
or
python3 inceptionv4/pytorch_load.py
torch-hdf5 https://github.com/deepmind/torch-hdf5 This package allows you to read and write Torch data to and from HDF5 files. The format is fast and flexible and is supported by a wide range of other software including MATLAB, Python, and R.
How to move https://github.com/deepmind/torch-hdf5/blob/master/doc/usage.md
For ubuntu14 and above
python
sudo apt-get install libhdf5-serial-dev hdf5-tools
git clone https://github.com/deepmind/torch-hdf5
cd torch-hdf5
luarocks make hdf5-0-0.rockspec LIBHDF5_LIBDIR="/usr/lib/x86_64-linux-gnu/"
I changed the benchmarking code a little and moved it.
python
require 'hdf5'
print("Size\t\t", "torch.save\t\t", "hdf5\t")
n = 1
local size = math.pow(2, n)
local data = torch.rand(size)
local t = torch.tic()
torch.save("out.t7", data)
local normalTime = torch.toc(t)
t = torch.tic()
local hdf5file = hdf5.open("out.h5", 'w')
hdf5file["foo"] = data
hdf5file:close()
local hdf5time = torch.toc(t)
print(n, "\t", normalTime,"\t", hdf5time)
jenkins for pytorch https://github.com/pytorch/builder
QA
Try cuda (2) for cuda (), or specify it with torch.nn.DataParallel I did various things, but in the end I settled on this. By the way, since this is GPU shared, it behaves the same in other libraries such as tensorflow. An image that masks the GPU because it occupies other GPU memory without permission such as tensorflow.
CUDA_VISIBLE_DEVICES=2 python main.py
http://www.acceleware.com/blog/cudavisibledevices-masking-gpus http://qiita.com/kikusumk3/items/907565559739376076b9 http://qiita.com/ballforest/items/3f21bcf34cba8f048f1e If it is 8gpu or more, it seems that it is useless unless it is clustered. http://qiita.com/YusukeSuzuki@github/items/aa5fcc4b4d06c116c3e8
Recommended Posts