I searched based on the official document. The syntax is quite similar to chainer, but be aware that some functions are slightly different.

A deep learning framework for images led by facebook and New York University. It seems that it was forked by chainer. The image of torch7 is also led by facebook and New York University.

Since torch7 is lua and not so abstracted, the function is exposed. pytorch is fairly abstract and reduces the amount of coding.

Developers as of March 2017 スクリーンショット 2017-03-10 0.32.07.png

Committer's blog

Adam Paszke http://apaszke.github.io/posts.html Soumith Chintala http://soumith.ch/

Excitement

As of March 2017. I don't know if you can measure the excitement with the git graph, but I was curious, so I compared it. chainer スクリーンショット 2017-03-10 1.50.05.png pytorch スクリーンショット 2017-03-10 1.50.28.png keras スクリーンショット 2017-03-10 1.50.50.png tensorflow スクリーンショット 2017-03-10 1.51.14.png torch7 スクリーンショット 2017-03-10 1.51.39.png caffe スクリーンショット 2017-03-10 1.53.12.png caffe2 スクリーンショット 2017-03-10 1.53.39.png theano スクリーンショット 2017-03-10 1.56.02.png deeplearning4j スクリーンショット 2017-03-10 1.57.41.png

cntk スクリーンショット 2017-03-10 1.52.14.png

As expected, caffe and torch7 have not been updated very much. Surprisingly cntk ...

2018/6 It was written in the forum that cntk, tensorflow, theano, mxnet were mostly wrapped in keras, but pytorch is a high-level framework and there is a high theory that it will not be wrapped. I was surprised when the author of keras wrote "Will cntk also wrap with keras?" In the issue of cntk, and the cntk person found a "like" comment.

Function usage frequency

The number of hits in git search of various APIs. For your information.

Installation

conda is recommended. I'm trying to keep pip and numpy up to date. スクリーンショット 2017-03-10 1.25.01.png official http://pytorch.org/

conda install pytorch torchvision -c soumith

`win`


conda install -c peterjc123 pytorch

About pytorch

If desired, you can reuse Python packages such as numpy, scipy, and Cython to extend PyTorch.

package	Description
torch	Tensor library with strong GPU support like NumPy
torch.autograd	Tape-based automatic differentiation library that supports all differentiable tensor operations in torch
torch.nn	Neural network library integrated with automatic differentiation function designed for maximum flexibility
torch.optim	Torch using standard optimization techniques such as SGD, RMSProp, LBFGS, Adam.Optimized package for use with nn
torch.multiprocessing	Allows magical memory sharing of torch tensors throughout the process, rather than Python multiprocessing. Useful for data loading and hogwald training.
torch.utils	DataLoader, Trainer and other utility functions
torch.legacy(.nn/.optim)	Legacy code ported from the torch for backwards compatibility reasons

Frequently used functions

PyTorch often uses the features around here. I will explain it later.

requires_grad: You can specify whether to calculate the gradient. backward: Gradient calculation is possible. nn.Module: Inherits this to define a network class. DataSet and DataLoader: Used to load data in batches. datasets.ImageFolder: Images can be easily read by arranging them separately for each folder. After this, you can put it in DataLoader and process it separately for each batch. transforms: Image data can be preprocessed. make_grid: When the image is displayed, it is displayed side by side in the grid.

GPU-enabled Tensor library

If you use numpy, use Tensors.

PyTorch provides Tensors that reside on either the CPU or the GPU, accelerating huge amounts of computation. We offer a variety of tensor routines to accelerate and adapt your scientific computing needs, including slicing, indexing, mathematical operations, linear algebra, and reduction.

Dynamic Neural Network: Tape-based automatic differentiation

PyTorch has a unique way to build a neural network that plays back using a tape recorder.

Most frameworks such as TensorFlow, Theano, Caffe, CNTK, etc. I'm looking at the world statically. You need to build a neural network and reuse the same structure over and over again. Changing the behavior of your network means that you have to start from the beginning.

PyTorch allows you to change the way your network behaves arbitrarily with zero lag and overhead using a technique called automatic differentiation in reverse mode. Our inspiration comes from several research papers on this topic, as well as current and past studies such as autograd, autograd, and chainer.

Since it is made by forking from chainer, it claims that it is a dynamic network as claimed by chainer. Does it mean that you can change the network on the way?

mnist sample

Check the version.

`python`


import torch
print(torch.__version__)

Data acquisition

train_loader = torch.utils.data.DataLoader(
    datasets.MNIST('../data', train=True, download=True,
                   transform=transforms.Compose([
                       transforms.ToTensor(),
                       transforms.Normalize((0.1307,), (0.3081,))
                   ])),
    batch_size=args.batch_size, shuffle=True, **kwargs)
test_loader = torch.utils.data.DataLoader(
    datasets.MNIST('../data', train=False, transform=transforms.Compose([
                       transforms.ToTensor(),
                       transforms.Normalize((0.1307,), (0.3081,))
                   ])),
    batch_size=args.batch_size, shuffle=True, **kwargs)

Model definition

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(1, 10, kernel_size=5)
        self.conv2 = nn.Conv2d(10, 20, kernel_size=5)
        self.conv2_drop = nn.Dropout2d()
        self.fc1 = nn.Linear(320, 50)
        self.fc2 = nn.Linear(50, 10)

    def forward(self, x):
        x = F.relu(F.max_pool2d(self.conv1(x), 2))
        x = F.relu(F.max_pool2d(self.conv2_drop(self.conv2(x)), 2))
        x = x.view(-1, 320)
        x = F.relu(self.fc1(x))
        x = F.dropout(x, training=self.training)
        x = F.relu(self.fc2(x))
        return F.log_softmax(x)

Model generation, optimization function setting

model = Net()
if args.cuda:
    model.cuda()

optimizer = optim.SGD(model.parameters(), lr=args.lr, momentum=args.momentum)

Learning

def train(epoch):
    model.train()
    for batch_idx, (data, target) in enumerate(train_loader):
        if args.cuda:
            data, target = data.cuda(), target.cuda()
        data, target = Variable(data), Variable(target)
        optimizer.zero_grad()
        output = model(data)
        loss = F.nll_loss(output, target)
        loss.backward()
        optimizer.step()
        if batch_idx % args.log_interval == 0:
            print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
                epoch, batch_idx * len(data), len(train_loader.dataset),
                100. * batch_idx / len(train_loader), loss.data[0]))

Execution of learning

for epoch in range(1, args.epochs + 1):
    train(epoch)
    test(epoch)

It's just a chainer, and there is no study cost.

Getting Started with PyTorch

https://github.com/pytorch/tutorials/blob/master/Introduction%20to%20PyTorch%20for%20former%20Torchies.ipynb

tensor

import torch
a = torch.FloatTensor(10, 20)
# creates tensor of size (10 x 20) with uninitialized memory

a = torch.randn(10, 20)
# initializes a tensor randomized with a normal distribution with mean=0, var=1

a.size()

Since torch.Size is actually a tuple, it supports the same operation.

In-place / out-of-place

Adding _ at the end changes the contents of the original variable.

a.fill_(3.5)
# a has now been filled with the value 3.5

b = a.add(4.0)
# a is still filled with 3.5
# new tensor b is returned with values 3.5 + 4.0 = 7.5

Zero index

b = a[0,3] #1st row and 4th column
b = a[:,3:5] #4th and 5th columns

Other than camel case.

Not all functions are camelCase. For example, indexAdd is index_add_.

x = torch.ones(5, 5)
print(x)
z = torch.Tensor(5, 2)
z[:,0] = 10
z[:,1] = 100
print(z)
x.index_add_(1, torch.LongTensor([4,0]), z)
print(x)

numpy bridge

Conversion from torch tensor to numpy array

a = torch.ones(5)
b = a.numpy()
a.add_(1)
print(a)
print(b)

Convert numpy array to torch Tensor

import numpy as np
a = np.ones(5)
b = torch.from_numpy(a)
np.add(a, 1, out=a)
print(a)
print(b)

CUDA tensor

# let us run this cell only if CUDA is available
if torch.cuda.is_available():
    # creates a LongTensor and transfers it 
    # to GPU as torch.cuda.LongTensor
    a = torch.LongTensor(10).fill_(3).cuda()
    print(type(a))
    b = a.cpu()
    # transfers it to CPU, back to 
    # being a torch.LongTensor

Autograd Autograd introduces the Variable class. This is a Tensor wrapper.

from torch.autograd import Variable
x = Variable(torch.ones(2, 2), requires_grad = True)
x.data
x.grad
y = x + 2
z = y * y * 3
out = z.mean()
out.backward()

nn package

import torch.nn as nn

The state is not kept in the module, but in the network graph

Example 1: ConvNet

Create class

import torch.nn.functional as F

class MNISTConvNet(nn.Module):
    def __init__(self):
        super(MNISTConvNet, self).__init__()
        self.conv1 = nn.Conv2d(1, 10, 5)
        self.pool1 = nn.MaxPool2d(2,2)
        self.conv2 = nn.Conv2d(10, 20, 5)
        self.pool2 = nn.MaxPool2d(2, 2)
        self.fc1   = nn.Linear(320, 50)
        self.fc2   = nn.Linear(50, 10)
    def forward(self, input):
        x = self.pool1(F.relu(self.conv1(input)))
        x = self.pool2(F.relu(self.conv2(x)))
        x = x.view(x.size(0), -1)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        return x

Create an instance of the class

net = MNISTConvNet()
print(net)

The entire torch.nn package is a mini-batch of samples and only supports inputs that are not a single sample. For example, nn.Conv2d takes a 4D tensor of nSamples x nChannels x Height x Width. If you have one sample, use input.unsqueeze (0) to add a fake batch dimension.

input = Variable(torch.randn(1, 1, 28, 28))
out = net(input)
print(out.size())

# define a dummy target label
target = Variable(torch.LongTensor([3]))

# create a loss function
loss_fn = nn.CrossEntropyLoss() # LogSoftmax + ClassNLL Loss
err = loss_fn(out, target)
print(err)
err.backward()

The output of ConvNet is a variable. Using it to calculate the loss results in err, which is also a variable. Conversely, calling err propagates the gradient across ConvNet to the weights.

Access the weights and gradients of individual layers.

print(net.conv1.weight.grad.size())
print(net.conv1.weight.data.norm()) # norm of the weight
print(net.conv1.weight.grad.data.norm()) # norm of the gradients

forward and backward hooks

We examined the weights and gradients. But what about layer output and grad_output inspection / modification? Introducing hooks for this purpose.

You can register a function in a module or variable. The hook can be a front hook or a rear hook. The forward hook is executed when the forward call is made. The rear hook runs in the rear phase. Let's look at an example.

# We register a forward hook on conv2 and print some information
def printnorm(self, input, output):
    # input is a tuple of packed inputs
    # output is a Variable. output.data is the Tensor we are interested
    print('Inside ' + self.__class__.__name__ + ' forward')
    print('')
    print('input: ', type(input))
    print('input[0]: ', type(input[0]))
    print('output: ', type(output))
    print('')
    print('input size:', input[0].size())
    print('output size:', output.data.size())
    print('output norm:', output.data.norm())

net.conv2.register_forward_hook(printnorm)

out = net(input)

# We register a backward hook on conv2 and print some information
def printgradnorm(self, grad_input, grad_output):
    print('Inside ' + self.__class__.__name__ + ' backward')    
    print('Inside class:' + self.__class__.__name__)
    print('')    
    print('grad_input: ', type(grad_input))
    print('grad_input[0]: ', type(grad_input[0]))
    print('grad_output: ', type(grad_output))
    print('grad_output[0]: ', type(grad_output[0]))
    print('')    
    print('grad_input size:', grad_input[0].size())
    print('grad_output size:', grad_output[0].size())
    print('grad_input norm:', grad_input[0].data.norm())

net.conv2.register_backward_hook(printgradnorm)

out = net(input)
err = loss_fn(out, target)
err.backward()

Example 2: Recurrent net

Next, let's see how to build a recursive net with PyTorch. The state of the network is kept in layers, not graphs, so you can simply create nn.Linear and reuse it over and over again for recursion.

class RNN(nn.Module):

    # you can also accept arguments in your model constructor
    def __init__(self, data_size, hidden_size, output_size):
        super(RNN, self).__init__()
        
        self.hidden_size = hidden_size
        input_size = data_size + hidden_size
        
        self.i2h = nn.Linear(input_size, hidden_size)
        self.h2o = nn.Linear(hidden_size, output_size)
    
    def forward(self, data, last_hidden):
        input = torch.cat((data, last_hidden), 1)
        hidden = self.i2h(input)
        output = self.h2o(hidden)
        return hidden, output

rnn = RNN(50, 20, 10)

loss_fn = nn.MSELoss()

batch_size = 10
TIMESTEPS = 5

# Create some fake data
batch = Variable(torch.randn(batch_size, 50))
hidden = Variable(torch.zeros(batch_size, 20))
target = Variable(torch.zeros(batch_size, 10))

loss = 0
for t in range(TIMESTEPS):                  
    # yes! you can reuse the same network several times,
    # sum up the losses, and call backward!
    hidden, output = rnn(batch, hidden)
    loss += loss_fn(output, target)
loss.backward()

By default PyTorch has a seamless CuDNN integration for ConvNets and Recurrent Nets.

Multi-GPU example

Data parallelism is when you divide a sample mini-batch into multiple smaller mini-batch and perform the calculations for each mini-batch in parallel. Data parallelism is implemented using torch.nn.DataParallel. You can wrap the Module in a DataParallel and parallelize it on multiple GPUs in a batch dimension.

Data parallel

class DataParallelModel(nn.Module):
    def __init__(self):
        super().__init__()
        self.block1=nn.Linear(10, 20)
        
        # wrap block2 in DataParallel
        self.block2=nn.Linear(20, 20)
        self.block2 = nn.DataParallel(self.block2)
        
        self.block3=nn.Linear(20, 20)
        
    def forward(self, x):
        x = self.block1(x)
        x = self.block2(x)
        x = self.block3(x)
        return x

No need to change code in CPU mode.

Primitives that implement data parallelism

In general, pytorch's nn.parallel primitive can be used independently. Implemented a simple MPI-like primitive. replicate: Replicate module to multiple devices scatter: distributes the input to the first dimension gather: Collect and concatenate 1D inputs parallel_apply: Applies an already distributed set of inputs to an already distributed set of models.

For better clarity, here is the function data_parallel constructed using these sets.

def data_parallel(module, input, device_ids, output_device=None):
    if not device_ids:
        return module(input)

    if output_device is None:
        output_device = device_ids[0]

    replicas = nn.parallel.replicate(module, device_ids)
    inputs = nn.parallel.scatter(input, device_ids)
    replicas = replicas[:len(inputs)]
    outputs = nn.parallel.parallel_apply(replicas, inputs)
    return nn.parallel.gather(outputs, output_device)

Part of the model on the CPU and part on the GPU

Let's look at a small example of implementing a network, some of which is on the CPU and on the GPU.

class DistributedModel(nn.Module):
    def __init__(self):
        super().__init__(
            embedding=nn.Embedding(1000, 10),
            rnn=nn.Linear(10, 10).cuda(0),
        )
        
    def forward(self, x):
        # Compute embedding on CPU
        x = self.embedding(x)
        
        # Transfer to GPU
        x = x.cuda(0)
        
        # Compute RNN on GPU
        x = self.rnn(x)
        return x

pytorch example

There is sample code such as an image generation system. I mean, there were more at all. There were so many that I didn't need to write them here, so if you want to use them, please search with git. The world was wide.

pix2pix https://github.com/mrzhu-cool/pix2pix-pytorch densenet https://github.com/bamos/densenet.pytorch animeGAN https://github.com/jayleicn/animeGAN yolo2 https://github.com/longcw/yolo2-pytorch gan https://github.com/devnag/pytorch-generative-adversarial-networks List of generated models https://github.com/wiseodd/generative-models functional model https://github.com/szagoruyko/functional-zoo Simple sample list https://github.com/pytorch/examples/

Deep learning tutorials on PyTorch

https://github.com/pytorch/tutorials/blob/master/Deep%20Learning%20with%20PyTorch.ipynb

Make the torch and torchvision packages installed.

conda install torchvision -c soumith
or
pip install torchvision

getting started

tensor

Tensors are similar to numpy's ndarray, but Tensors can also be used on the GPU.

from __future__ import print_function
import torch
x = torch.Tensor(5, 3)  # construct a 5x3 matrix, uninitialized
x = torch.rand(5, 3)  # construct a randomly initialized matrix
x.size()
y = torch.rand(5, 3)
# addition: syntax 1
x + y

Numpy bridge

Conversion from torch tensor to numpy array

a = torch.ones(5)
b = a.numpy()
a.add_(1)
print(a)
print(b) # see how the numpy array changed in value

Convert numpy array to torch Tensor

import numpy as np
a = np.ones(5)
b = torch.from_numpy(a)
np.add(a, 1, out=a)
print(a)
print(b) # see how changing the np array changed the torch Tensor automatically

~ Writing ~

torch.Tensor class

Data type	dtype	CPU tensor	GPU tensor
64-bit floating point	torch.float64 or torch.double	torch.DoubleTensor	torch.cuda.DoubleTensor
32-bit floating point	torch.float32 or torch.float	torch.FloatTensor	torch.cuda.FloatTensor
16-bit floating point	torch.float16 or torch.half	torch.HalfTensor	torch.cuda.HalfTensor
8-bit integer (unsigned)	torch.uint8	torch.ByteTensor	torch.cuda.ByteTensor
8-bit integer (signed)	torch.int8	torch.CharTensor	torch.cuda.CharTensor
16-bit integer (signed)	torch.int16 or torch.short	torch.ShortTensor	torch.cuda.ShortTensor
32-bit integer (signed)	torch.int32 or torch.int	torch.IntTensor	torch.cuda.IntTensor
64-bit integer (signed)	torch.int64 or torch.long	torch.LongTensor	torch.cuda.LongTensor

The code for torch7 is also included for comparison.

torch7 Learning example of mnist https://github.com/torch/demos/blob/master/train-a-digit-classifier/train-on-mnist.lua

Model definition

-- define model to train
model = nn.Sequential()
model:add(nn.Reshape(1024))
model:add(nn.Linear(1024,#classes))
model:add(nn.LogSoftMax())

Data acquisition. Generate a model.

criterion = nn.ClassNLLCriterion()
trainData = mnist.loadTrainSet(nbTrainingPatches, geometry)
trainData:normalizeGlobal(mean, std)

Define learning

-- training function
function train(dataset)
   -- epoch tracker
   epoch = epoch or 1

~ Abbreviation ~

         gradParameters:zero()

         -- evaluate function for complete mini batch
         local outputs = model:forward(inputs)
         local f = criterion:forward(outputs, targets)

         -- estimate df/dW
         local df_do = criterion:backward(outputs, targets)
         model:backward(inputs, df_do)

~ Abbreviation ~
   end

Perform learning

while true do
   -- train/test
   train(trainData)
~ Abbreviation ~

end

torchnet https://github.com/torchnet/torchnet/blob/master/example/mnist.lua

`mnist.lua`



-- load torchnet:
local tnt = require 'torchnet'

-- use GPU or not:
~ Abbreviation ~

-- function that sets of dataset iterator:
local function getIterator(mode)
~ Abbreviation ~
end

-- set up logistic regressor:
local net = nn.Sequential():add(nn.Linear(784,10))
local criterion = nn.CrossEntropyCriterion()

-- set up training engine:
local engine = tnt.SGDEngine()
~ Abbreviation ~
end

-- set up GPU training:
~ Abbreviation ~

-- train the model:
engine:train{
   network   = net,
   iterator  = getIterator('train'),
   criterion = criterion,
   lr        = 0.2,
   maxepoch  = 5,
}

-- measure test loss and error:
~ Abbreviation ~
print(string.format('test loss: %2.4f; test error: %2.4f',
   meter:value(), clerr:value{k = 1}))

Transfer learning (fine tuning)

I wondered what to do when I wanted to transfer learning with torch or pytorch. I want to try it someday.

From caffe model to chainer pkl format

http://toxweblog.toxbe.com/2016/12/22/chainer-alexnet-fine-tuning/

conversion

#Path to save caffe model to load and pkl file
loadpath = "bvlc_alexnet.caffemodel"
savepath = "./chainermodels/alexnet.pkl"
 
from chainer.links.caffe import CaffeFunction
alexnet = CaffeFunction(loadpath)
 
import _pickle as pickle
pickle.dump(alexnet, open(savepath, 'wb'))

Read

if ext == ".caffemodel":
    print('Loading Caffe model file %s...' % args.model, file=sys.stderr)
    func = caffe.CaffeFunction(args.model)
    print('Loaded', file=sys.stderr)
elif ext == ".pkl":
    print('Loading Caffe model file %s...' % args.model, file=sys.stderr)
    func = pickle.load(open(args.model, 'rb'))
    print('Loaded', file=sys.stderr)
def predict(x):
        y, = func(inputs={'data': x}, outputs=['fc8'], train=False)
        return F.softmax(y)

keras to pkl format

Save keras model

hogehoge_model.save_weights('model.h5', overwrite=True)

Loading keras model

hogehoge_model.load_weights('model.h5')

Save as pkl

import _pickle as pickle
pickle.dump(hogehoge_model, open('model.pkl', 'wb'))

Read pkl

hogehoge_model = pickle.load(open('model.pkl', 'rb'))

From caffe model to tensorflow format

https://github.com/ethereon/caffe-tensorflow

def convert(def_path, caffemodel_path, data_output_path, code_output_path, phase):
    try:
        transformer = TensorFlowTransformer(def_path, caffemodel_path, phase=phase)
        print_stderr('Converting data...')
        if caffemodel_path is not None:
            data = transformer.transform_data()
            print_stderr('Saving data...')
            with open(data_output_path, 'wb') as data_out:
                np.save(data_out, data)
        if code_output_path:
            print_stderr('Saving source...')
            with open(code_output_path, 'wb') as src_out:
                src_out.write(transformer.transform_source())
        print_stderr('Done.')
    except KaffeError as err:
        fatal_error('Error encountered: {}'.format(err))

From tensorflow to torch7 or pytorch format

https://github.com/Cadene/tensorflow-model-zoo.torch

python3 inceptionv4/tensorflow_dump.py

th inceptionv4/torch_load.lua
or
python3 inceptionv4/pytorch_load.py

torch-hdf5 https://github.com/deepmind/torch-hdf5 This package allows you to read and write Torch data to and from HDF5 files. The format is fast and flexible and is supported by a wide range of other software including MATLAB, Python, and R.

How to move https://github.com/deepmind/torch-hdf5/blob/master/doc/usage.md

For ubuntu14 and above

`python`


sudo apt-get install libhdf5-serial-dev hdf5-tools
git clone https://github.com/deepmind/torch-hdf5
cd torch-hdf5
luarocks make hdf5-0-0.rockspec LIBHDF5_LIBDIR="/usr/lib/x86_64-linux-gnu/"

I changed the benchmarking code a little and moved it.

`python`


require 'hdf5'

print("Size\t\t", "torch.save\t\t", "hdf5\t")
n = 1
local size = math.pow(2, n)
local data = torch.rand(size)
local t = torch.tic()
torch.save("out.t7", data)
local normalTime = torch.toc(t)
t = torch.tic()
local hdf5file = hdf5.open("out.h5", 'w')
hdf5file["foo"] = data
hdf5file:close()
local hdf5time = torch.toc(t)
print(n, "\t", normalTime,"\t", hdf5time)

Test automation

jenkins for pytorch https://github.com/pytorch/builder

I want to specify the gpu to use when there are multiple gpu

Try cuda (2) for cuda (), or specify it with torch.nn.DataParallel I did various things, but in the end I settled on this. By the way, since this is GPU shared, it behaves the same in other libraries such as tensorflow. An image that masks the GPU because it occupies other GPU memory without permission such as tensorflow.

CUDA_VISIBLE_DEVICES=2 python main.py

http://www.acceleware.com/blog/cudavisibledevices-masking-gpus http://qiita.com/kikusumk3/items/907565559739376076b9 http://qiita.com/ballforest/items/3f21bcf34cba8f048f1e If it is 8gpu or more, it seems that it is useless unless it is clustered. http://qiita.com/YusukeSuzuki@github/items/aa5fcc4b4d06c116c3e8

pytorch super introduction

Committer's blog

Excitement

Function usage frequency

Installation

win

About pytorch

Frequently used functions

GPU-enabled Tensor library

Dynamic Neural Network: Tape-based automatic differentiation

mnist sample

python

Getting Started with PyTorch

tensor

In-place / out-of-place

Zero index

Other than camel case.

numpy bridge

CUDA tensor

nn package

Example 1: ConvNet

forward and backward hooks

Example 2: Recurrent net

Multi-GPU example

Primitives that implement data parallelism

Part of the model on the CPU and part on the GPU

pytorch example

Deep learning tutorials on PyTorch

getting started

tensor

Numpy bridge

torch.Tensor class

mnist.lua

Transfer learning (fine tuning)

From caffe model to chainer pkl format

keras to pkl format

From caffe model to tensorflow format

From tensorflow to torch7 or pytorch format

python

python

Test automation

I want to specify the gpu to use when there are multiple gpu

`win`

`python`

`mnist.lua`

`python`

`python`