TF Notes (4), Deconvolution


這篇是個小練習, 就兩點:

  1. 了解什麼是 deconvolution, 並在 tensorflow 中怎麼用
  2. 實作一個 CNN AutoEncoder, Encoder 用 conv2d, Decoder 用 conv2d_transpose

What is deconvolution?

破題: Deconvolution 的操作就是 kernel tranpose 後的 convolution. 使用李宏毅老師的上課內容, 如下圖:

其實圖已經十分明確了, 因此不多解釋.

另外在 tensorflow 中, 假設我們的 kernel $W$ 為 W.shape = (img_h, img_w, dim1, dim2). 則 tf.nn.conv2d(in_tensor,W,stride,padding) 會將 (dim1,dim2) 看成 (in_dim, out_dim). 而 tf.nn.conv2d_transpose(in_tensor,W,output_shape,stride) 會將 (dim1,dim2) 看成 (out_dim, in_dim), 注意是反過來的. 有兩點多做說明:

  1. tf.nn.conv2d_transpose 會自動對 $W$ 做 transpose 之後再 convolution, 因此我們不需要自己做 transpose.
  2. tf.nn.conv2d_transpose 需要額外指定 output_shape.

更多 conv/transpose_conv/dilated_conv with stride/padding 有個 非常棒的可視化 結果參考此 github


CNN AutoEncoder

結構如下圖

直接將 embedding 壓到 2 維, 每個類別的分布情形如下:

embedding 是 128 維, 並使用 tSNE 投影到 2 維畫圖如下:

Encoder 如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
def Encoder(x):
print('Input x got shape=',x.shape) # (None,28,28,1)
# Layer 1 encode: Input = (batch_num, img_height, img_width, cNum). Output = (batch_num, img_height/2, img_width/2, layer_dim['conv1'])
layer1_en = tf.nn.relu(tf.nn.conv2d(x, weights['conv1'], strides=[1, 1, 1, 1], padding='SAME'))
# Avg Pooling
layer1_en = tf.nn.avg_pool(layer1_en, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='VALID')
print('After Layer 1, got shape=',layer1_en.shape) # (None,14,14,32)
# Layer 2 encode: Input = (batch_num, img_height/2, img_width/2, layer_dim['conv1']). Output = (batch_num, img_height/4, img_width/4, layer_dim['conv2'])
layer2_en = tf.nn.relu(tf.nn.conv2d(layer1_en, weights['conv2'], strides=[1, 1, 1, 1], padding='SAME'))
# Avg Pooling
layer2_en = tf.nn.avg_pool(layer2_en, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='VALID')
print('After Layer 2, got shape=',layer2_en.shape) # (None,7,7,64)
# Layer embedded: Input = (batch_num, img_height/4 * img_width/4 * layer_dim['conv2']). Output = (batch_num, layer_dim['embedded'])
flatten_in = flatten(layer2_en)
embedded = tf.matmul(flatten_in,weights['embedded'])
print('embedded has shape=',embedded.shape)
return embedded

Decoder 如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
def Decoder(embedded):
# API: tf.nn.conv2d_transpose = (value, filter, output_shape, strides, padding='SAME', ...)
bsize = tf.shape(embedded)[0]
# Layer embedded decode: Input = (batch_num, layer_dim['embedded']). Output = (batch_num, in_dim_for_embedded)
embedded_t = tf.matmul(embedded,weights['embedded'],transpose_b=True)
embedded_t = tf.reshape(embedded_t,[-1, 7, 7, layer_dim['conv2']])
print('embedded_t has shape=',embedded_t.shape)
# Layer 2 decode: Input = (batch_num, 7, 7, layer_dim['conv2']). Output = (batch_num, 14, 14, layer_dim['conv1'])
layer2_t = tf.nn.relu(tf.nn.conv2d_transpose(embedded_t,weights['conv2t'],[bsize, 14, 14, layer_dim['conv1']], [1, 2, 2, 1]))
print('layer2_t has shape=',layer2_t.shape)
# Layer 1 decode: Input = (batch_num, 14, 14, layer_dim['conv1']). Output = (batch_num, 28, 28, cNum)
layer1_t = tf.nn.relu(tf.nn.conv2d_transpose(layer2_t,weights['conv1t'],[bsize, 28, 28, cNum], [1, 2, 2, 1]))
print('layer1_t has shape=',layer1_t.shape)
# Layer reconstruct: Input = batch_num x layer_dim['layer1']. Output = batch_num x img_dim.
reconstruct = tf.nn.relu(tf.nn.conv2d(layer1_t, weights['reconstruct'], strides=[1, 1, 1, 1], padding='SAME')) - 0.5
print('reconstruct has shape=',reconstruct.shape)
return reconstruct

AutoEncoder 串起來很容易:

1
2
3
4
5
def AutoEncoder(x):
embedded = Encoder(x)
reconstruct = Decoder(embedded)
return [embedded, reconstruct]

完整 source codes 參考下面 reference


Reference

  1. 李宏毅 deconvolution 解釋
  2. tf.nn.conv2d_transpose 說明
  3. conv/transpose_conv/dilated_conv with stride/padding 可視化: github
  4. 本篇完整 source codes