深谈VGG卷积神经网络

大家好，我是讯享网，很高兴认识大家。

一、VGG的创新点

（一）使用了更小的卷积核

在vgg出现之前，大多数网络比如alexnet用的都是大卷积核提取特征，但是vgg采用堆叠小卷积核来达到大卷积核的方式，最显著的优点有两个：

1、堆叠3*3卷积核在和7*7卷积核达到相同效果时，产生的参数更小，计算如下

一个7*7卷积核的参数量：7*7*C*C=49*C*C 三个3*3卷积核的参数量：3*3*3*C*C=27*C*C

讯享网

2、由于使用了小卷积核，可以在每一层后加上非线性激活函数，增强了模型的学习能力，增加特征抽象能力。

（二）小池化核

alexnet的池化层采用3*3的池化核，vgg池化层采用2*2的池化核，能够更有效地提取特征。

（三）层数更深，特征图

VGG网络里有两个很常用，vgg16和vgg19。vgg16有16层，结构简明适合更改；vgg19有19层，训练的精准度更高，但是计算参数量更大。在这种网络中，卷积核专注于扩大通道数、池化专注于缩小高和宽，在可接受的计算量的范围内使模型架构更深更宽。

（四）全连接层使用卷积操作

网络测试阶段将训练阶段的三个全连接层替换为三个卷积层，测试重用训练时的参数，使得测试得到的全卷积网络因为没有全连接的限制，可以接受任意宽或高为输入。这产生了怎样的变化呢？

讯享网

讯享网#假设输入特征为7x7x5， stride=1， #输出尺度为[1,1,1,1000] 可以降维为 [1,1000] #假设输入特征为14x14x5 #输出尺度为[1,2,2,1000] #得到1x2x2x1000的scoremap #对每个2x2的特征图求均值 #求平均后并降维得到[1,1000]

这是参考了OverFeat可以接受任意分辨率的图片做出的变化

二、vgg16网络的复现

vgg16总共有16层，13个卷积层和3个全连接层，第一次经过64个卷积核的两次卷积后，采用一次pooling，第二次经过两次128个卷积核卷积后，再采用pooling，再重复两次三个512个卷积核卷积后，再pooling，最后经过三次全连接。

import torch.nn as nn class vgg(nn.Module): def __init__(self): super(vgg, self).__init__() self.Conv=nn.Sequential( nn.Conv2d(3,64,3,1,1), nn.ReLU(inplace=True), nn.Conv2d(64, 64, 3, 1, 1), nn.ReLU(inplace=True), nn.MaxPool2d(2,2), nn.Conv2d(64,128,3,1,1), nn.ReLU(inplace=True), nn.Conv2d(128,128,3,1,1), nn.ReLU(inplace=True), nn.MaxPool2d(2,2), nn.Conv2d(128, 256, 3, 1, 1), nn.ReLU(inplace=True), nn.Conv2d(256, 256, 3, 1, 1), nn.ReLU(inplace=True), nn.Conv2d(256, 256, 3, 1, 1), nn.ReLU(inplace=True), nn.MaxPool2d(2, 2), nn.Conv2d(256, 512, 3, 1, 1), nn.ReLU(inplace=True), nn.Conv2d(512, 512, 3, 1, 1), nn.ReLU(inplace=True), nn.Conv2d(512, 512, 3, 1, 1), nn.ReLU(inplace=True), nn.MaxPool2d(2, 2), nn.Conv2d(512, 512, 3, 1, 1), nn.ReLU(inplace=True), nn.Conv2d(512, 512, 3, 1, 1), nn.ReLU(inplace=True), nn.Conv2d(512, 512, 3, 1, 1), nn.ReLU(inplace=True), nn.MaxPool2d(2, 2), ) model=vgg()

三、vgg16的参数计算

讯享网 INPUT: [224x224x3] memory: 224*224*3=150K weights: 0 CONV3-64: [224x224x64] memory: 224*224*64=3.2M weights: (3*3*3)*64 = 1,728 CONV3-64: [224x224x64] memory: 224*224*64=3.2M weights: (3*3*64)*64 = 36,864 POOL2: [112x112x64] memory: 112*112*64=800K weights: 0 CONV3-128: [112x112x128] memory: 112*112*128=1.6M weights: (3*3*64)*128 = 73,728 CONV3-128: [112x112x128] memory: 112*112*128=1.6M weights: (3*3*128)*128 = 147,456 POOL2: [56x56x128] memory: 56*56*128=400K weights: 0 CONV3-256: [56x56x256] memory: 56*56*256=800K weights: (3*3*128)*256 = 294,912 CONV3-256: [56x56x256] memory: 56*56*256=800K weights: (3*3*256)*256 = 589,824 CONV3-256: [56x56x256] memory: 56*56*256=800K weights: (3*3*256)*256 = 589,824 POOL2: [28x28x256] memory: 28*28*256=200K weights: 0 CONV3-512: [28x28x512] memory: 28*28*512=400K weights: (3*3*256)*512 = 1,179,648 CONV3-512: [28x28x512] memory: 28*28*512=400K weights: (3*3*512)*512 = 2,359,296 CONV3-512: [28x28x512] memory: 28*28*512=400K weights: (3*3*512)*512 = 2,359,296 POOL2: [14x14x512] memory: 14*14*512=100K weights: 0 CONV3-512: [14x14x512] memory: 14*14*512=100K weights: (3*3*512)*512 = 2,359,296 CONV3-512: [14x14x512] memory: 14*14*512=100K weights: (3*3*512)*512 = 2,359,296 CONV3-512: [14x14x512] memory: 14*14*512=100K weights: (3*3*512)*512 = 2,359,296 POOL2: [7x7x512] memory: 7*7*512=25K weights: 0 FC: [1x1x4096] memory: 4096 weights: 7*7*512*4096 = 102,760,448 FC: [1x1x4096] memory: 4096 weights: 4096*4096 = 16,777,216 FC: [1x1x1000] memory: 1000 weights: 4096*1000 = 4,096,000 TOTAL memory: 24M * 4 bytes ~= 93MB / image (only forward! ~*2 for bwd) TOTAL params: 138M parameters