`conv` – Ops for convolutional neural nets¶

Note

Two similar implementation exists for conv2d:

signal.conv2d and nnet.conv2d.

The former implements a traditional 2D convolution, while the latter implements the convolutional layers present in convolutional neural networks (where filters are 3D and pool over several input channels).

The recommended user interface are:

theano.tensor.nnet.conv2d() for 2d convolution
theano.tensor.nnet.conv3d() for 3d convolution

With those new interface, Theano will automatically use the fastest implementation in many cases. On the CPU, the implementation is a GEMM based one. On the GPU, there is a GEMM based and cuDNN version.

By default on the GPU, if cuDNN is available, it will be used, otherwise we will fall back to using gemm based version (slower than cuDNN in most cases and uses more memory). To get an error if cuDNN can not be used, you can supply the Theano flag dnn.enable=True.

Either cuDNN and the gemm version can be disabled using the Theano flags optimizer_excluding=conv_dnn and optimizer_excluding=conv_gemm, respectively. If both are disabled, it will raise an error.

For the cuDNN version, there are different algorithms with different memory/speed trade-offs. Manual selection of the right one is very difficult as it depends on the shapes and hardware. So it can change for each layer. An auto-tuning mode exists and can be activated by those flags: dnn.conv.algo_fwd=time_once, dnn.conv.algo_bwd_data=time_once and dnn.conv.algo_bwd_filter=time_once. Note, they are good mostly when the shape do not change.

This auto-tuning has the inconvenience that the first call is much slower as it tries and times each implementation it has. So if you benchmark, it is important that you remove the first call from your timing.

Also, a meta-optimizer has been implemented for the gpu convolution implementations to automatically choose the fastest implementation for each specific convolution in your graph. For each instance, it will compile and benchmark each applicable implementation and choose the fastest one. It can be enabled using optimizer_including=conv_meta. The meta-optimizer can also selectively disable cudnn and gemm version using the Theano flag metaopt.optimizer_excluding=conv_dnn and metaopt.optimizer_excluding=conv_gemm respectively.

Note

Theano had older user interface like theano.tensor.nnet.conv.conv2d. Do not use them anymore. They will give you slower code and won’t allow easy switch between CPU and GPU computation. They also support less type of convolution.

Implementation Details¶

This section gives more implementation detail. Most of the time you do not need to read it. Theano will select it for you.

Implemented operators for neural network 2D / image convolution:
- nnet.conv.conv2d. old 2d convolution. DO NOT USE ANYMORE.
- GpuCorrMM This is a GPU-only 2d correlation implementation taken from caffe’s CUDA implementation. It does not flip the kernel.
  
  For each element in a batch, it first creates a Toeplitz matrix in a CUDA kernel. Then, it performs a gemm call to multiply this Toeplitz matrix and the filters (hence the name: MM is for matrix multiplication). It needs extra memory for the Toeplitz matrix, which is a 2D matrix of shape (no of channels * filter width * filter height, output width * output height).
- CorrMM This is a CPU-only 2d correlation implementation taken from caffe’s cpp implementation. It does not flip the kernel.
- dnn_conv GPU-only convolution using NVIDIA’s cuDNN library.
Implemented operators for neural network 3D / video convolution:
- GpuCorr3dMM This is a GPU-only 3d correlation relying on a Toeplitz matrix and gemm implementation (see GpuCorrMM) It needs extra memory for the Toeplitz matrix, which is a 2D matrix of shape (no of channels * filter width * filter height * filter depth, output width * output height * output depth).
- Corr3dMM This is a CPU-only 3d correlation implementation based on the 2d version (CorrMM). It does not flip the kernel. As it provides a gradient, you can use it as a replacement for nnet.conv3d. For convolutions done on CPU, nnet.conv3d will be replaced by Corr3dMM.
- dnn_conv3d GPU-only 3D convolution using NVIDIA’s cuDNN library (as dnn_conv but for 3d).
  
  If cuDNN is available, by default, Theano will replace all nnet.conv3d operations with dnn_conv.
- conv3d2d Another conv3d implementation that uses the conv2d with data reshaping. It is faster in some corner cases than conv3d. It flips the kernel.

theano.tensor.nnet.conv2d(input, filters, input_shape=None, filter_shape=None, border_mode='valid', subsample=(1, 1), filter_flip=True, image_shape=None, filter_dilation=(1, 1), num_groups=1, unshared=False, **kwargs)[source]¶

This function will build the symbolic graph for convolving a mini-batch of a stack of 2D inputs with a set of 2D filters. The implementation is modelled after Convolutional Neural Networks (CNN).

Parameters

input (symbolic 4D tensor) – Mini-batch of feature map stacks, of shape (batch size, input channels, input rows, input columns). See the optional parameter input_shape.
filters (symbolic 4D or 6D tensor) – Set of filters used in CNN layer of shape (output channels, input channels, filter rows, filter columns) for normal convolution and (output channels, output rows, output columns, input channels, filter rows, filter columns) for unshared convolution. See the optional parameter filter_shape.
input_shape (None, tuple/list of len 4 or 6 of int or Constant variable) – The shape of the input parameter. Optional, possibly used to choose an optimal implementation. You can give None for any element of the list to specify that this element is not known at compile time.
filter_shape (None, tuple/list of len 4 or 6 of int or Constant variable) – The shape of the filters parameter. Optional, possibly used to choose an optimal implementation. You can give None for any element of the list to specify that this element is not known at compile time.
border_mode (str, int or a tuple of two ints or pairs of ints) –
Either of the following:

'valid': apply filter wherever it completely overlaps with the
input. Generates output of shape: input shape - filter shape + 1

'full': apply filter wherever it partly overlaps with the input.
Generates output of shape: input shape + filter shape - 1

'half': pad input with a symmetric border of filter rows // 2
rows and filter columns // 2 columns, then perform a valid convolution. For filters with an odd number of rows and columns, this leads to the output shape being equal to the input shape.

int: pad input with a symmetric border of zeros of the given
width, then perform a valid convolution.

(int1, int2): (for 2D) pad input with a symmetric border of int1,
int2, then perform a valid convolution.

(int1, (int2, int3)) or ((int1, int2), int3): (for 2D)
pad input with one symmetric border of int1` or int3, and one asymmetric border of (int2, int3) or (int1, int2).
subsample (tuple of len 2) – Factor by which to subsample the output. Also called strides elsewhere.
filter_flip (bool) – If True, will flip the filter rows and columns before sliding them over the input. This operation is normally referred to as a convolution, and this is the default. If False, the filters are not flipped and the operation is referred to as a cross-correlation.
image_shape (None, tuple/list of len 4 of int or Constant variable) – Deprecated alias for input_shape.
filter_dilation (tuple of len 2) – Factor by which to subsample (stride) the input. Also called dilation elsewhere.
num_groups (int) – Divides the image, kernel and output tensors into num_groups separate groups. Each which carry out convolutions separately
unshared (bool) – If true, then unshared or ‘locally connected’ convolution will be performed. A different filter will be used for each region of the input.
kwargs (Any other keyword arguments are accepted for backwards) – compatibility, but will be ignored.

Returns

Set of feature maps generated by convolutional layer. Tensor is of shape (batch size, output channels, output rows, output columns)

Return type

Symbolic 4D tensor

Notes

If cuDNN is available, it will be used on the GPU. Otherwise, it is the CorrMM convolution that will be used “caffe style convolution”.

This is only supported in Theano 0.8 or the development version until it is released.

The parameter filter_dilation is an implementation of dilated convolution.

theano.tensor.nnet.conv2d_transpose(input, filters, output_shape, filter_shape=None, border_mode='valid', input_dilation=(1, 1), filter_flip=True, filter_dilation=(1, 1), num_groups=1, unshared=False)[source]¶

This function will build the symbolic graph for applying a transposed convolution over a mini-batch of a stack of 2D inputs with a set of 2D filters.

Parameters

input (symbolic 4D tensor) – Mini-batch of feature map stacks, of shape (batch size, input channels, input rows, input columns). See the optional parameter input_shape.
filters (symbolic 4D tensor) – Set of filters used in CNN layer of shape (input channels, output channels, filter rows, filter columns). See the optional parameter filter_shape. Note: the order for ``output_channels`` and ``input_channels`` is reversed with respect to ``conv2d``.
output_shape (tuple/list of len 4 of int or Constant variable) – The shape of the output of conv2d_transpose. The last two elements are allowed to be tensor.scalar variables.
filter_shape (None, tuple/list of len 4 of int or Constant variable) – The shape of the filters parameter. Optional, possibly used to choose an optimal implementation. You can give None for any element of the list to specify that this element is not known at compile time.
border_mode (str, int or tuple of two int) – Refers to the border_mode argument of the corresponding forward (non-transposed) convolution. See the argument description in conv2d. What was padding for the forward convolution means cropping the output of the transposed one. valid corresponds to no cropping, full to maximal cropping.
input_dilation (tuple of len 2) – Corresponds to subsample (also called strides elsewhere) in the non-transposed convolution.
filter_flip (bool) – If True, will flip the filter rows and columns before sliding them over the input. This operation is normally referred to as a convolution, and this is the default. If False, the filters are not flipped and the operation is referred to as a cross-correlation.
filter_dilation (tuple of len 2) – Factor by which to subsample (stride) the input. Also called dilation elsewhere.
num_groups (int) – Divides the image, kernel and output tensors into num_groups separate groups. Each which carry out convolutions separately
unshared (bool) – If true, then unshared or ‘locally connected’ convolution will be performed. A different filter will be used for each region of the input. Grouped unshared convolution is supported.

Returns

Set of feature maps generated by the transposed convolution. Tensor is of shape (batch size, output channels, output rows, output columns)

Return type

Symbolic 4D tensor

Notes

If cuDNN is available, it will be used on the GPU. Otherwise, it is the CorrMM convolution that will be used “caffe style convolution”.

This operation is also sometimes called “deconvolution”.

The parameter filter_dilation is an implementation of dilated convolution.

theano.tensor.nnet.conv3d(input, filters, input_shape=None, filter_shape=None, border_mode='valid', subsample=(1, 1, 1), filter_flip=True, filter_dilation=(1, 1, 1), num_groups=1)[source]¶

This function will build the symbolic graph for convolving a mini-batch of a stack of 3D inputs with a set of 3D filters. The implementation is modelled after Convolutional Neural Networks (CNN).

Parameters

input (symbolic 5D tensor) – Mini-batch of feature map stacks, of shape (batch size, input channels, input depth, input rows, input columns). See the optional parameter input_shape.
filters (symbolic 5D tensor) – Set of filters used in CNN layer of shape (output channels, input channels, filter depth, filter rows, filter columns). See the optional parameter filter_shape.
input_shape (None, tuple/list of len 5 of int or Constant variable) – The shape of the input parameter. Optional, possibly used to choose an optimal implementation. You can give None for any element of the list to specify that this element is not known at compile time.
filter_shape (None, tuple/list of len 5 of int or Constant variable) – The shape of the filters parameter. Optional, possibly used to choose an optimal implementation. You can give None for any element of the list to specify that this element is not known at compile time.
border_mode (str, int or tuple of three int) –
Either of the following:

'valid': apply filter wherever it completely overlaps with the
input. Generates output of shape: input shape - filter shape + 1

'full': apply filter wherever it partly overlaps with the input.
Generates output of shape: input shape + filter shape - 1

'half': pad input with a symmetric border of filter // 2,
then perform a valid convolution. For filters with an odd number of slices, rows and columns, this leads to the output shape being equal to the input shape.

int: pad input with a symmetric border of zeros of the given
width, then perform a valid convolution.

(int1, int2, int3)
pad input with a symmetric border of int1, int2 and int3 columns, then perform a valid convolution.
subsample (tuple of len 3) – Factor by which to subsample the output. Also called strides elsewhere.
filter_flip (bool) – If True, will flip the filter x, y and z dimensions before sliding them over the input. This operation is normally referred to as a convolution, and this is the default. If False, the filters are not flipped and the operation is referred to as a cross-correlation.
filter_dilation (tuple of len 3) – Factor by which to subsample (stride) the input. Also called dilation elsewhere.
num_groups (int) – Divides the image, kernel and output tensors into num_groups separate groups. Each which carry out convolutions separately

Returns

Set of feature maps generated by convolutional layer. Tensor is is of shape (batch size, output channels, output depth, output rows, output columns)

Return type

Symbolic 5D tensor

Notes

If cuDNN is available, it will be used on the GPU. Otherwise, it is the Corr3dMM convolution that will be used “caffe style convolution”.

This is only supported in Theano 0.8 or the development version until it is released.

theano.tensor.nnet.conv3d2d.conv3d(signals, filters, signals_shape=None, filters_shape=None, border_mode='valid')[source]¶

Convolve spatio-temporal filters with a movie.

It flips the filters.

Parameters

signals – Timeseries of images whose pixels have color channels. Shape: [Ns, Ts, C, Hs, Ws].
filters – Spatio-temporal filters. Shape: [Nf, Tf, C, Hf, Wf].
signals_shape – None or a tuple/list with the shape of signals.
filters_shape – None or a tuple/list with the shape of filters.
border_mode – One of ‘valid’, ‘full’ or ‘half’.

Notes

Another way to define signals: (batch, time, in channel, row, column) Another way to define filters: (out channel,time,in channel, row, column)

For the GPU, use nnet.conv3d.

conv – Ops for convolutional neural nets¶

Implementation Details¶

`conv` – Ops for convolutional neural nets¶