A 自编码神经网络 (Auto-Encoder Network)

A0 自编码神经网络综述

See 240104 AutoEncoder.ipynb

自编码神经网络是一种无监督学习算法。
- 致力于使输出（重构样本）$\boldsymbol x'$ 与输入（原始样本）$\boldsymbol x$ 越接近越好，因而一般定义损失函数为重建误差（重构样本与原始样本间的某种距离）一般为 $\| \boldsymbol x' - \boldsymbol x \|^2$。

A1 基于 FC 层的 Fashion MNIST 自编解码器结构

Untitled

R = C = 28
ENCODED_DIMS = 16

def __init__(self):
    super(AutoEncoder, self).__init__()
    self.encoder = nn.Sequential(
        nn.Linear(R * C, 128),
        nn.ReLU(inplace=True),
        nn.Linear(128, 64),
        nn.ReLU(inplace=True),
        nn.Linear(64, 32),
        nn.ReLU(inplace=True),
        nn.Linear(32, ENCODED_DIMS),
    )
    self.decoder = nn.Sequential(
        nn.Linear(ENCODED_DIMS, 32),
        nn.ReLU(inplace=True),
        nn.Linear(32, 64),
        nn.ReLU(inplace=True),
        nn.Linear(64, 128),
        nn.ReLU(inplace=True),
        nn.Linear(128, R*C),
        nn.Tanh(),
    )

A2 基于 CNN 的 Fashion MNIST 自编码器结构

ConvTranspose2d 层用于逆卷积，其实就是用来扩大和生成图像的，参数与 Conv 反过来。

Untitled

class AutoEncoder_c2(nn.Module):
    def __init__(self):
        super(AutoEncoder_c2, self).__init__()
        self.encoder = nn.Sequential(
            nn.Conv2d(1, 8, 3, stride=2, padding=1), # (28, 28) -> (30, 30) -> (14, 14)
            nn.ReLU(inplace=True),
            nn.Conv2d(8, 16, 3, stride=2, padding=1), # (14, 14) -> (16, 16) -> (7, 7)
            nn.ReLU(inplace=True),
            nn.Conv2d(16, 32, 3, stride=2), # (7, 7) -> (3, 3)
            nn.ReLU(inplace=True),
            nn.Conv2d(32, 64, 3), # (3, 3) -> (1, 1)
            nn.Flatten(),
            nn.ReLU(inplace=True),
            nn.Linear(64, 32),
            nn.ReLU(inplace=True),
            nn.Linear(32, ENCODED_DIMS_c2),
        )
        self.decoder = nn.Sequential(
            nn.Linear(ENCODED_DIMS_c2, 32),
            nn.ReLU(inplace=True),
            nn.Linear(32, 64),
            nn.ReLU(inplace=True),
            nn.Unflatten(1, (64, 1, 1)),
            nn.ConvTranspose2d(64, 32, 3), # (1, 1) -> (3, 3)
            nn.ReLU(inplace=True),
            nn.ConvTranspose2d(32, 16, 3, stride=2), # (3, 3) -> (7, 7)
            nn.ReLU(inplace=True),
            nn.ConvTranspose2d(16, 8, 3, stride=2, padding=1, output_padding=1),
            nn.ReLU(inplace=True),
            nn.ConvTranspose2d(8, 1, 3, stride=2, padding=1, output_padding=1),
            nn.Tanh(),
        )

效果如图。

A3 应用和意义

常常被用于特征抽取、可视化分析、信息检索等。
- 比如将图片/音频压成某个短编码后，在这个编码范畴上进行比对检索，自然比原图检索更快，通常也更有意义。

B 变分自编码器 (Variational Auto-Encoder)

万字长文带你入门带你入门变分自编码器

https://www.bilibili.com/video/BV1Jb4y1g7q6/

A 自编码神经网络 (Auto-Encoder Network)

A0 自编码神经网络综述

A1 基于 FC 层的 Fashion MNIST 自编解码器结构

A2 基于 CNN 的 Fashion MNIST 自编码器结构

A3 应用和意义

B 变分自编码器 (Variational Auto-Encoder)

B1 Kullback-Leibler Divergence（K-L 散度 / 相对熵）