2025年resnet模型应用场景（resnet模块）

大家好，我是讯享网，很高兴认识大家。

ResNet（Residual Neural Network）由微软研究院的Kaiming He等四名华人提出，通过使用ResNet Unit成功训练出了152层的神经网络，并在ILSVRC2015比赛中取得冠军，同时参数量比VGGNet低，效果非常突出。ResNet的结构可以极快的加速神经网络的训练，模型的准确率也有比较大的提升。

resnet模型参数可视化 resnet模型框架_Tensorflow
讯享网

resnet模型参数可视化 resnet模型框架_Tensorflow_02

resnet模型参数可视化 resnet模型框架_resnet模型参数可视化_03

resnet模型参数可视化 resnet模型框架_ResNet_04

resnet模型参数可视化 resnet模型框架_Keras_05

Zero-padding pads the input with a pad of (3,3)
Stage 1:

The 2D Convolution has 64 filters of shape (7,7) and uses a stride of (2,2). Its name is “conv1”.
BatchNorm is applied to the channels axis of the input.
MaxPooling uses a (3,3) window and a (2,2) stride.

Stage 2:

The convolutional block uses three set of filters of size [64,64,256], “f” is 3, “s” is 1 and the block is “a”.
The 2 identity blocks use three set of filters of size [64,64,256], “f” is 3 and the blocks are “b” and “c”.

Stage 3:

The convolutional block uses three set of filters of size [128,128,512], “f” is 3, “s” is 2 and the block is “a”.
The 3 identity blocks use three set of filters of size [128,128,512], “f” is 3 and the blocks are “b”, “c” and “d”.

Stage 4:

The convolutional block uses three set of filters of size [256, 256, 1024], “f” is 3, “s” is 2 and the block is “a”.
The 5 identity blocks use three set of filters of size [256, 256, 1024], “f” is 3 and the blocks are “b”, “c”, “d”, “e” and “f”.

Stage 5:

The convolutional block uses three set of filters of size [512, 512, 2048], “f” is 3, “s” is 2 and the block is “a”.
The 2 identity blocks use three set of filters of size [256, 256, 2048], “f” is 3 and the blocks are “b” and “c”.
The 2D Average Pooling uses a window of shape (2,2) and its name is “avg_pool”.

通过“pip install keras”等安装必备环境内容略过，详见参考内容。初始化导入代码如下：

其中，from datasource是自定义训练数据源，在此略过。而from nets.resnet_utils import * 给我带来麻烦”ModuleNotFoundError: No module named ‘resnets_utils’“，涉及到egg解决方案，关键内容如下：

D:06Studymodels-master esearchslim>python setup.py build
running build
running build_py
creating build
error: could not create ‘build’: 当文件已存在时，无法创建该文件。

最后提示：拷贝egg文件到Python系统目录中。

resnet模型参数可视化 resnet模型框架_Keras_06

因此，将采用发生器generator解决方案，后面将展开。

3.3.1. 图片格式设置问题

keras代码中”K.set_image_data_format(‘channels_last’)“，用来设置图片宽和高及通道。

resnet模型参数可视化 resnet模型框架_resnet模型参数可视化_07

3.3.2 当验证集的loss不再下降时，如何中断训练？

可以定义EarlyStopping来提前终止训练（注：网上摘抄，还未实践）

from keras.callbacks import EarlyStopping
early_stopping = EarlyStopping(monitor=‘val_loss’, patience=2)
model.fit(X, y, validation_split=0.2, callbacks=[early_stopping])

3.3.3. 如何在每个epoch后记录训练/测试的loss和正确率？

model.fit在运行结束后返回一个History对象，其中含有的history属性包含了训练过程中损失函数的值以及其他度量指标。

3.3.4. 数据量大，内存不足问题解决方案

利用Python的生成器，逐个生成数据的batch并进行训练。

3.3.4.1. 自己定义生成器

generator：生成器函数，生成器的输出应该为：

一个形如（inputs，targets）的数据组，所有的返回值都应该包含相同数目的样本。生成器将无限在数据集上循环。每个epoch以经过模型的样本数达到samples_per_epoch时，记一个epoch结束。

3.3.4.2. 模型使用生成器输入训练数据

steps_per_epoch：整数，当生成器返回steps_per_epoch次数据时计一个epoch结束，执行下一个epoch，它表示是将一个epoch分成多少个batch_size，如果训练样本数N=35000，steps_per_epoch = 700，那么相当于一个batch_size=50
epochs：整数，数据迭代的轮数
verbose：日志显示，0为不在标准输出流输出日志信息，1为输出进度条记录，2为每个epoch输出一行记录

3.3.4.3. 关于yield（）

生成器函数——yield（），在python 里就是一个生成器。当你使用一个yield的时候，对应的函数就是一个生成器了。生成器的功能就是在yield的区域进行迭代处理。

yield 是一个类似 return 的关键字，迭代一次遇到yield时就返回yield后面(右边)的值。重点是：下一次迭代时，从上一次迭代遇到的yield后面的代码(下一行)开始执行。return 的作用：如果没有 return，则默认执行至函数完毕，返回的值一般是 yield的变量。

3.3.5. epochs与训练时长

随着epochs的增大，后期训练时间逐步增长，例如60000个图片样本，其中训练集为70%，训练过程中，从最开始5多分钟一轮，很快增加到6、7分钟一轮。这是端午节前一天开始训练大批数据，由于准备不充分，晚上20点多时，饥饿、头晕难耐，只好放弃本轮训练。

Adam

默认参数遵循原论文中提供的值。

参数

lr: float >= 0. 学习率。
beta_1: float, 0 < beta < 1. 通常接近于 1。
beta_2: float, 0 < beta < 1. 通常接近于 1。
epsilon: float >= 0. 模糊因子. 若为 None, 默认为 K.epsilon()。
decay: float >= 0. 每次参数更新后学习率衰减值。
amsgrad: boolean. 是否应用此算法的 AMSGrad 变种，来自论文 “On the Convergence of Adam and Beyond”。

如果Batch 设置过大，可能发生OOM问题，例如设置Batch=100时，运行一段时间后，报GPU内存不足（OOM）错误。然后，把batch设置为batch_size=50，就可以了。

GPU内存为4G。

resnet模型参数可视化 resnet模型框架_resnet模型参数可视化_08

机器学习领域有个很重要的假设：IID独立同分布假设，就是假设训练数据和测试数据是满足相同分布的，这是通过训练数据获得的模型能够在测试集获得好的效果的一个基本保障。那BatchNorm的作用是什么呢？BatchNorm就是在深度神经网络训练过程中使得每一层神经网络的输入保持相同分布的。

mean_squared_error或mse
mean_absolute_error或mae
mean_absolute_percentage_error或mape
mean_squared_logarithmic_error或msle
squared_hinge
hinge
binary_crossentropy（亦称作对数损失，logloss）
categorical_crossentropy：亦称作多类的对数损失，注意使用该目标函数时，需要将标签转化为形如(nb_samples, nb_classes)的二值序列
sparse_categorical_crossentrop：如上，但接受稀疏标签。注意，使用该函数时仍然需要你的标签与输出值的维度相同，你可能需要在标签数据上增加一个维度：np.expand_dims(y,-1)
kullback_leibler_divergence:从预测值概率分布Q到真值概率分布P的信息增益,用以度量两个分布的差异.
cosine_proximity：即预测值与真实标签的余弦距离平均值的相反数

VGG、GoogLeNet、ResNet是目前广泛使用的网络，ResNet的效果略优。如FaceNet、Im2Tx等应用。

网络层本身的设计和跳过若干层，将是今后一段时间的研究热点。

改进特征和梯度的流转路径；
重点关注网络深度、宽度、残差连接三方面。