A memo that goes through MXNet Tutorial in order (do your best and proceed to the end ...)
--This time, the second Symbol --Neural network graphs and auto-differentiation.
Symbol
Since scientific calculations are possible with just the NDArray
in the previous section, isn't this all the calculations done? May be raised.
MXNet provides a Symbol API that allows you to write symbolically. In symbolic writing, instead of writing calculations step by step, you first define a calculation graph. The graph contains an input / output Placeholder, which is compiled and then executed by giving a function that outputs an NDArray ... The Symbol API is used for Caffe's network settings and Theano's symbolic writing style. Similar.
Symbolic = declarative, almost synonymous with
Another advantage of the symbolic approach is optimization. If you write it imperatively, you do not know what you will need in each calculation. In symbolic writing, the output is predefined so you can reallocate memory in the middle and calculate instantly. Also, the memory requirement is smaller even on the same network.
Which way to write is discussed here
For now, this chapter describes the Symbol API See here for a graphical description (http: // localhost: 8888 /? token = ebb31e9b782c15683f581f6b99d7cfecc115e4774c59cf7c)
How to express ʻa + b. First, create a placeholder with
mx.sym.Variable(give a name when creating). Next, connect them with + to define
c`, which is automatically named.
import mxnet as mx
a = mx.sym.Variable('a') #Error when a is pulled out
b = mx.sym.Variable('b')
c = a + b
(a, b, c) # _Plus0 and c are automatically named
OUT
(<Symbol a>, <Symbol b>, <Symbol _plus0>)
Most of the NDArray operations are also applicable to Symble.
#Multiplication for each element
d = a * b
#Matrix product
e = mx.sym.dot(a, b)
#Deformation
f = mx.sym.Reshape(d+e, shape=(1,4))
#broadcast
g = mx.sym.broadcast_to(f, shape=(2,4))
mx.viz.plot_network(symbol=g) #Network visualization
Give input with bind
and evaluate (details later)
Also provided for neural network layers. Description example of a two-layer fully connected network.
#The output graph may be
net = mx.sym.Variable('data')
net = mx.sym.FullyConnected(data=net, name='fc1', num_hidden=128)
net = mx.sym.Activation(data=net, name='relu1', act_type="relu")
net = mx.sym.FullyConnected(data=net, name='fc2', num_hidden=10)
net = mx.sym.SoftmaxOutput(data=net, name='out')
mx.viz.plot_network(net, shape={'data':(100,200)})
Each symbol has a unique name. Both NDArray and Symbol represent a single tensor and the operators represent calculations between tensors. The operator takes Symbol (or NDArray) as input, and in some cases receives and outputs hyperparameters such as the number of hidden layers (hidden_num
) and the type of activation function (ʻact_type`).
You can also see symbol as a function with several arguments, and you can see the list of arguments with this function call.
net.list_arguments()
OUT
['data', 'fc1_weight', 'fc1_bias', 'fc2_weight', 'fc2_bias', 'out_label']
mx.sym.Variable('data')
OUT
<Symbol data>
The following parameters and inputs are required for each Symbol
-- data
: Data to be entered for the variable data
--fc1_weight
, fc1_bias
: Weight and bias of first fully connected layer fc1
--fc2_weight
, fc2_bias
: Weight and bias of the first fully connected layer fc2
--ʻOut_label`: Label required for Loss
Can also be declared explicitly
net = mx.symbol.Variable('data')
w = mx.symbol.Variable('myweight')
net = mx.symbol.FullyConnected(data=net, weight=w, name='fc1', num_hidden=128)
net.list_arguments()
OUT
['data', 'myweight', 'fc1_bias']
In the above example, there are three inputs for data
, weight
, and bias
to Fully Connected
.
Is written, but there seems to be no bias in the code, and moreover, the abbreviation sym is not used ...
MXNet offers Symbols optimized for layers commonly used in deep learning. New operators can also be defined in Python
In the following example, Symbol is added for each element and then passed to the fully connected layer.
lhs = mx.symbol.Variable('data1')
rhs = mx.symbol.Variable('data2')
net = mx.symbol.FullyConnected(data=lhs + rhs, name='fc1', num_hidden=128)
"""Isn't this an ordinary operation?"""
net.list_arguments()
OUT
['data1', 'data2', 'fc1_weight', 'fc1_bias']
Not only unidirectional construction but also more flexible construction is possible
data = mx.symbol.Variable('data')
net1 = mx.symbol.FullyConnected(data=data, name='fc1', num_hidden=10)
print(net1.list_arguments())
net2 = mx.symbol.Variable('data2')
net2 = mx.symbol.FullyConnected(data=net2, name='fc2', num_hidden=10)
composed = net2(data2=net1, name='composed') #use net as a function
print(composed.list_arguments())
OUT
['data', 'fc1_weight', 'fc1_bias']
['data', 'fc1_weight', 'fc1_bias', 'fc2_weight', 'fc2_bias']
In this example, net2 is indexed as a function that takes an existing net1, resulting in projected having both net1 and net2 arguments.
symbol You can use Prefix
if you want to add a common Prefix
data = mx.sym.Variable("data")
net = data
n_layer = 2
for i in range(n_layer):
with mx.name.Prefix("layer%d_" % (i + 1)): #Prefix grant
net = mx.sym.FullyConnected(data=net, name="fc", num_hidden=100)
net.list_arguments()
OUT
['data',
'layer1_fc_weight',
'layer1_fc_bias',
'layer2_fc_weight',
'layer2_fc_bias']
It's hard to write deep networks like Google Inception one after another. Therefore, it is modularized and reused.
The following example first defines a fuctory function (convolution, generate a batch of Batch Normalize, ReLU)
# Output may vary
def ConvFactory(data, num_filter, kernel, stride=(1,1), pad=(0, 0), name=None, suffix=''):
conv = mx.symbol.Convolution(data=data, num_filter=num_filter, kernel=kernel, stride=stride, pad=pad, name='conv_%s%s' %(name, suffix))
bn = mx.symbol.BatchNorm(data=conv, name='bn_%s%s' %(name, suffix))
act = mx.symbol.Activation(data=bn, act_type='relu', name='relu_%s%s' %(name, suffix))
return act
#Define one unit: convolution → batch norm (normalization for each batch) → activation with ReLU
prev = mx.symbol.Variable(name="Previos Output")
conv_comp = ConvFactory(data=prev, num_filter=64, kernel=(7,7), stride=(2, 2)) #Slide 7x7 filter with stride 2, no padding=>11 times 11
shape = {"Previos Output" : (128, 3, 28, 28)}
mx.viz.plot_network(symbol=conv_comp, shape=shape)
Use this to build Inception
# @@@ AUTOTEST_OUTPUT_IGNORED_CELL
def InceptionFactoryA(data, num_1x1, num_3x3red, num_3x3, num_d3x3red, num_d3x3, pool, proj, name):
# 1x1
c1x1 = ConvFactory(data=data, num_filter=num_1x1, kernel=(1, 1), name=('%s_1x1' % name))
# 3x3 reduce + 3x3
c3x3r = ConvFactory(data=data, num_filter=num_3x3red, kernel=(1, 1), name=('%s_3x3' % name), suffix='_reduce')
c3x3 = ConvFactory(data=c3x3r, num_filter=num_3x3, kernel=(3, 3), pad=(1, 1), name=('%s_3x3' % name))
# double 3x3 reduce + double 3x3
cd3x3r = ConvFactory(data=data, num_filter=num_d3x3red, kernel=(1, 1), name=('%s_double_3x3' % name), suffix='_reduce')
cd3x3 = ConvFactory(data=cd3x3r, num_filter=num_d3x3, kernel=(3, 3), pad=(1, 1), name=('%s_double_3x3_0' % name))
cd3x3 = ConvFactory(data=cd3x3, num_filter=num_d3x3, kernel=(3, 3), pad=(1, 1), name=('%s_double_3x3_1' % name))
# pool + proj
pooling = mx.symbol.Pooling(data=data, kernel=(3, 3), stride=(1, 1), pad=(1, 1), pool_type=pool, name=('%s_pool_%s_pool' % (pool, name)))
cproj = ConvFactory(data=pooling, num_filter=proj, kernel=(1, 1), name=('%s_proj' % name))
# concat
concat = mx.symbol.Concat(*[c1x1, c3x3, cd3x3, cproj], name='ch_concat_%s_chconcat' % name)
return concat
prev = mx.symbol.Variable(name="Previos Output")
in3a = InceptionFactoryA(prev, 64, 64, 64, 64, 96, "avg", 32, name="in3a")
mx.viz.plot_network(symbol=in3a, shape=shape)
An example of what is complete is here
When building a neural network with multiple Loss layers, grouping is possible with mxnet.sym.Group
net = mx.sym.Variable('data')
fc1 = mx.sym.FullyConnected(data=net, name='fc1', num_hidden=128)
net = mx.sym.Activation(data=fc1, name='relu1', act_type="relu")
out1 = mx.sym.SoftmaxOutput(data=net, name='softmax')
out2 = mx.sym.LinearRegressionOutput(data=net, name='regression')
group = mx.sym.Group([out1, out2])
group.list_outputs()
OUT
['softmax_output', 'regression_output']
NDArray
provides an imperative interface, calculations are evaluated statement by statement.
Symbol
is close to declarative programming, first declaring the computational structure and then evaluating the data. Close to regular expressions and SQL.
Advantages of NDArray
--Simple --Easy to utilize the features of programming languages such as for, if-else and libraries such as NumPy --Easy to debug step by step
Benefits of Symbol
--Almost all the functions of NDArray are provided (+, *, sin, reshape, etc.) --Easy to save, load and visualize --Easy to optimize calculation and memory usage
The difference between Symbol and NDArray is as mentioned above, but Symbol can also be manipulated directly.
However, keep in mind that it is mostly wrapped in the .module
package.
Shape Inference Arguments, additional information, and output can be obtained for each Symbol. The output shape and symbol type can be estimated from the input shape and argument type, which makes memory allocation easier. ..
#It's easy to forget, but c= a + b
arg_name = c.list_arguments() #Input name
out_name = c.list_outputs() #Name of output
#Estimate the shape of the output from the input
arg_shape, out_shape, _ = c.infer_shape(a=(2,3), b=(2,3))
#Estimate output type from input
arg_type, out_type, _ = c.infer_type(a='float32', b='float32')
print({'input' : dict(zip(arg_name, arg_shape)),
'output' : dict(zip(out_name, out_shape))})
print({'input' : dict(zip(arg_name, arg_type)),
'output' : dict(zip(out_name, out_type))})
OUT
{'output': {'_plus0_output': (2, 3)}, 'input': {'b': (2, 3), 'a': (2, 3)}}
{'output': {'_plus0_output': <class 'numpy.float32'>}, 'input': {'b': <class 'numpy.float32'>, 'a': <class 'numpy.float32'>}}
You need to give data as an argument to evaluate symbolc
.
To do this, use the bind
method. This is a method that returns an extruder when you pass a dictionary that maps the context and free valuable names to NDArray.
From exeutor, evaluation can be executed by forward
method, and the result can be fetched from ʻoutput` attribute.
ex = c.bind(ctx=mx.cpu(), args={'a' : mx.nd.ones([2,3]),
'b' : mx.nd.ones([2,3])})
ex.forward()
print('number of outputs = %d\nthe first output = \n%s' % (
len(ex.outputs), ex.outputs[0].asnumpy()))
OUT
number of outputs = 1
the first output =
[[ 2. 2. 2.]
[ 2. 2. 2.]]
The same Symbol can be evaluated with different contexts (GPU) and different data
ex_gpu = c.bind(ctx=mx.gpu(), args={'a' : mx.nd.ones([3,4], mx.gpu())*2,
'b' : mx.nd.ones([3,4], mx.gpu())*3})
ex_gpu.forward()
ex_gpu.outputs[0].asnumpy()
OUT
array([[ 5., 5., 5., 5.],
[ 5., 5., 5., 5.],
[ 5., 5., 5., 5.]], dtype=float32)
Evaluation by ʻevalis also possible, this is a bundle of
bind and
forward`
ex = c.eval(ctx = mx.cpu(), a = mx.nd.ones([2,3]), b = mx.nd.ones([2,3]))
print('number of outputs = %d\nthe first output = \n%s' % (
len(ex), ex[0].asnumpy()))
OUT
number of outputs = 1
the first output =
[[ 2. 2. 2.]
[ 2. 2. 2.]]
Like NDArray, it can be pickle and save and load.
However, Symbol is a graph, and the graph consists of continuous calculations. Since these are implicitly represented by the output Symbol, serialize the output Symbol graph.
Serializing with JSON improves readability, but use tojson
for this.
print(c.tojson())
c.save('symbol-c.json')
c2 = mx.symbol.load('symbol-c.json')
c.tojson() == c2.tojson()
OUT
{
"nodes": [
{
"op": "null",
"name": "a",
"inputs": []
},
{
"op": "null",
"name": "b",
"inputs": []
},
{
"op": "elemwise_add",
"name": "_plus0",
"inputs": [[0, 0, 0], [1, 0, 0]]
}
],
"arg_nodes": [0, 1],
"node_row_ptr": [0, 1, 2, 3],
"heads": [[2, 0, 0]],
"attrs": {"mxnet_version": ["int", 1000]}
}
True
Operations like mx.sym.Convolution
, mx.sym.Reshape
are implemented in C ++ for performance.
MXNet also allows you to create new arithmetic modules using languages like Python, see here for more information.
It feels like inheriting Softmax and implementing it in the foreground
Normally it is a 32-bit decimal point, but it is also possible to use a less accurate type for speeding up.
Type conversion with mx.sym.Cast
a = mx.sym.Variable('data')
b = mx.sym.Cast(data=a, dtype='float16')
arg, out, _ = b.infer_type(data='float32')
print({'input':arg, 'output':out})
c = mx.sym.Cast(data=a, dtype='uint8')
arg, out, _ = c.infer_type(data='int32')
print({'input':arg, 'output':out})
OUT
{'output': [<class 'numpy.float16'>], 'input': [<class 'numpy.float32'>]}
{'output': [<class 'numpy.uint8'>], 'input': [<class 'numpy.int32'>]}
Sharing between Symbols is possible by binding Symbols to the same array
a = mx.sym.Variable('a')
b = mx.sym.Variable('b')
c = mx.sym.Variable('c')
d = a + b * c
data = mx.nd.ones((2,3))*2
ex = d.bind(ctx=mx.cpu(), args={'a':data, 'b':data, 'c':data}) #Share data as input value
ex.forward()
ex.outputs[0].asnumpy()
OUT
array([[ 6., 6., 6.],
[ 6., 6., 6.]], dtype=float32)
--Learning in Python 3 + Ubuntu, GPU environment ――I haven't translated the whole sentence into Japanese, just a memo. ――Since I just modified the output from Jupyter, the layout collapses ...
Have you decided how to call the module on the way? I got the impression that it has not been decided as a tutorial yet (called sym or symbol).
Next is the module plan.