开发者

pytorch中model.named_parameters()与model.parameters()解读

开发者 https://www.devze.com 2022-11-29 09:20 出处:网络 作者: 不想秃顶还想当程序猿
目录解读model.named_parameters()与model.parameters()model.named_parameters()model.parameters()state_dict(...
目录
  • 解读model.named_parameters()与model.parameters()
    • model.named_parameters()
    • model.parameters()
  • state_dict()、named_parameters()和parameters()的区别
    • 测试代码准备工作
    • 两个概念
    • named_parameters()
    • parameters()
    • state_dict()

解读model.named_parameters()与model.parameters()

model.named_parameters()

迭代打印model.named_parameters()将会打印每一次迭代元素的名字和param。

model = DarkNet([1, 2, 8, 8, 4])
for name, param in model.named_parameters():
  print(name,param.requires_grad)
  param.requires_grad = False

输出结果为

conv1.weight True

bn1.weight True

bn1.bias True

layer1.ds_conv.weight True

layer1.ds_bn.weight True

layer1.ds_bn.bias True

layer1.residual_0.conv1.weight True

layer1.residual_0.bn1.weight True

layer1.residual_0.bn1.bias True

layer1.residual_0.conv2.weight True

layer1.residual_0.bn2.weight True

layer1.residual_0.bn2.bias True

layer2.ds_conv.weight True

layer2.ds_bn.weight True

layer2.ds_bn.bias True

layer2.residual_0.conv1.weight True

layer2.residual_0.bn1.weight True

layer2.residual_0.bn1.bias True

....

并且可以更改参数的可训练属性,第一次打印是True,这是第二次,就是False了

model.parameters()

迭代打印model.parameters()将会打印每一次迭代元素的param而不会打印名字,这是它和named_parameters的区别,两者都可以用来改变requires_grad的属性。

for index, param in enumerate(model.parameters()):
  print(param.shape)

输出结果为

torch.Size([32, 3, 3, 3])

torch.Size([32])

torch.Size([32])

torch.Size([64, 32, 3, 3])

torch.Size([64])

torch.Size([64])

torch.Size([32, 64, 1, 1])

torch.Size([32])

torch.Size([32])

torch.Size([64, 32, 3, 3])

torch.Size([64])

torch.Size([64])

torch.Size([128, 64, 3, 3])

torch.Size([128])

torch.Size([128])

torch.Size([64, 128, 1, 1])

torch.Size([64])

torch.Size([64])

torch.Size([128, 64, 3, 3])

torch.Size([128])

torch.Size([128])

torch.Size([64, 128, 1, 1])

torch.Size([64])

torch.Size([64])

torch.Size([128, 64, 3, 3])

torch.Size([128])

torch.Size([128])

torch.Size([256, 128, 3, 3])

torch.Size([256])

torch.Size([256])

torch.Size([128, 256, 1, 1])

....

将两者结合进行迭代,同时具有索引,网络层名字及param

 for index, (name, param) in zip(enumerate(model.parameters()), model.named_parameters()):
  print(index[0])
  print(name, param.shape)

输出结果为

0

conv1.weight torch.Size([32, 3, 3, 3])

1

bn1.weight torch.Size([32])

2

bn1.bias torch.Size([32])

3

layer1.ds_conv.weight torch.Size([64, 32, 3, 3])

4

layer1.ds_bn.weight torch.Size([64])

5

layer1.ds_bn.bias torch.Size([64])

6

layer1.residual_0.conv1.weight torch.Size([32, 64, 1, 1])

7

layer1.residual_0.bn1.weight torch.Size([32])

8

layer1.residual_0.bn1.bias torch.Size([32])

9

layer1.residual_0.conv2.weight torch.Size([64, 32, 3, 3])

state_dict()、named_parameters()和parameters()的区别

Pytorch中有3个功能极其类似的方法,分别是model.parameters()、model.named_parameters()和model.state_dict(),下面就来探究一下这三种方法的区别。

它们的差异主要体现在3方面:

  • 返回值类型不同
  • 存储的模型参数的种类不同
  • 返回的值的require_grad属性不同

测试代码准备工作

import torch
import torch.nn as nn
import torch.optim as optim
import random
import os
import numpy as np

def seed_torch(seed=1029):
	random.seed(seed)
	os.environ['pythonHASHSEED'] = str(seed) # 为了禁止hash随机化,使得实验可复现
	np.random.seed(seed)
	torch.manual_seed(seed)
	torch.cuda.manual_seed(seed)
	torch.cuda.manual_seed_all(seed) # if you are using multi-GPU.
	torch.backends.cudnn.benchmark = False
	torch.backends.cudnn.deterministic = True

seed_torch() # 固定随机数

# 定义一个网络
class net(nn.Module):
    def __init__(self, num_class=10):
        super(net, self).__init__()
        self.pool1 = nn.AvgPool1d(2)
        self.bn1 = nn.BATchNorm1d(3)
        self.fc1 = nn.Linear(12, 4)
        

    
    def forward(self, x):
        x = self.pool1(x)
        x = self.bn1(x)
        x = x.reshape(x.phpsize(0), -1)
        x = self.fc1(x)

        return x


# 定义网络
model = net()

# 定义loss
loss_fn = nn.CrossEntropyLoss()

# 定义优化器
optimizer = optim.SGD(model.parameters(), lr=1e-2)

# 定义训练数据
x = torch.randn((3, 3, 8))

两个概念

可学习参数

可学习参数也可叫做模型参数,其就是要参与学习和更新的,特别注意这里的参数更新是指在优化器的optim.step步骤里更新参数,即需要反向传播更新的参数

使用nn.parameter.Parameter()创建的变量是可学习参数(模型参数)

模型中的可学习参数的数据类型都是nn.parameter.Parameter

optim.step只能更新nn.parameter.Parameter类型的参数

nn.parameter.Parameter类型的参数的特点是默认requires_grad=True,也就是说训练过程中需要反向传播的,就需要使用这个

示例:

在上述定义的网络中,self.fc1层中的参数(weight和bias)是可学习参数,要在训练过程中进行学习与更新

print(type(model.fc1.weight))
(bbn) jyzhang@admin2-X10DAi:~/test$ python net.py
<class 'torch.nn.parameter.Parameter'>

不可学习参数

不可学习参数不参与学习和在优化器中的更新,即不需要参与反向传播

不可学习参数将会通过Module.register_parameter()注册在self._buffers中,self._buffers是一个OrderedDict

举例:上述定义的模型中,self.bn1层中的参数running_mean、running_var和num_batches_tracked均是不可学习参数

self.register_parameter('running_mean', None)

存储在self._buffers中的不可学习参数不能通过optim.step()更新参数,但例如上述的self.bn1层中的不可学习参数也会更新,其更新是发生在forward的过程中

示例:

在上述定义的网络中,self.bn1层中的参数(running_mean)是不可学习参数

print(type(model.bn1.running_mean))
(bbn) jyzhang@admin2-X10DAi:~/test$ python net.py
<class 'torch.Tensor'>

named_parameters()

总述

model.named_parameters()返回的是一个生成器(generator),该生成器中只保存了可学习、可被优化器更新的参数的参数名和具体的参数,可通过循环迭代打印参数名和参数(参见代码示例一)

该方法可以用来改变可学习、可被优化器更新参数的requires_grad属性,因此可用于锁住某些层的参数,让其在训练的时候不更新参数(参见代码示例二)

代码示例一

# model.named_parameters()的用法
print(type(model.named_parameters()))

for name, param in model.named_parameters():
  print(name)
  print(param)

结果

(bbn) jyzhang@admin2-X10DAi:~/test$ python net.py

<class 'generator'>

bn1.weight

Parameter containing:

tensor([1., 1., 1.], requires_grad=True)

bn1.bias

Parameter containing:

tensor([0., 0., 0.], requires_grad=True)

fc1.weight

Parameter containing:

tensor([[ 0.0036,  0.1960,  0.2315, -0.2408,  0.1217,  0.2579, -0.0676, -0.1880,

         -0.2855, -0.1587,  0.0409,  0.0312],

      开发者_Python学习  [ 0.1057,  0.1348, -0.0590, -0.1538,  0.2505,  0.0651, -0.2461, -0.1856,

          0.2498, -0.1969,  0.0013,  0.1979],

        [-0.1812,  0.1153,  0.2723, -0.2190,  0.0371, -0.0341,  0.2282,  0.1461,

          0.1890,  0.1762,  0.2657, -0.0827],

        [-0.0188,  0.0081, -0.2674, -0.1858,  0.1296,  0.1728, -0.0770,  0.1444,

         -0.2360, -0.1793,  0.1921, -0.2791]], requires_grad=True)

fc1.bias

Parameter containing:

tensor([-0.0020,  0.0985,  0.1859, -0.0175], requires_grad=True)

代码示例二

print(model.fc1.weight.requires_grad) # 可学习参数fc1.weight的requires_grad属性

for name, param in model.named_parameters():
  if ("fc1" in name):
    param.requires_graphpd = Falsephp

print(model.fc1.weight.requires_grad) # 修改后可学习参数fc1.weight的requires_grad属性

结果

(bbn) jyzhang@admin2-X10DAi:~/test$ python net.py

True

False

parameters()

总述

model.parameters()返回的是一个生成器,该生成器中只保存了可学习、可被优化器更新的参数的具体的参数,可通过循环迭代打印参数。(参见代码示例一)

与编程客栈model.named_parameters()相比,model.parameters()不会保存参数的名字。

该方法可以用来改变可学习、可被优化器更新参数的requires_grad属性,但由于其只有参数,没有对应的参数名,所以当要修改指定的某些层的requires_grad属性时,没有model.named_parameters()方便。(参见

代码示例二)

代码示例一

# model.parameters()的用法
print(type(model.parameters()))

for param in model.parameters():
  print(param)

结果

(bbn) jyzhang@admin2-X10DAi:~/test$ python net.py

<class 'generator'>

Parameter containing:

tensor([1., 1., 1.], requires_grad=True)

Parameter containing:

tensor([0., 0., 0.], requires_grad=True)

Parameter containing:

tensor([[ 0.0036,  0.1960,  0.2315, -0.2408,  0.1217,  0.2579, -0.0676, -0.1880,

         -0.2855, -0.1587,  0.0409,  0.0312],

        [ 0.1057,  0.1348, -0.0590, -0.1538,  0.2505,  0.0651, -0.2461, -0.1856,

          0.2498, -0.1969,  0.0013,  0.1979],

        [-0.1812,  0.1153,  0.2723, -0.2190,  0.0371, -0.0341,  0.2282,  0.1461,

          0.1890,  0.1762,  0.2657, -0.0827],

        [-0.0188,  0.0081, -0.2674, -0.1858,  0.1296,  0.1728, -0.0770,  0.1444,

         -0.2360, -0.1793,  0.1921, -0.2791]], requires_grad=True)

Parameter containing:

tensor([-0.0020,  0.0985,  0.1859, -0.0175], requires_grad=True)

代码示例二

print(model.fc1.weight.requires_grad)

for param in model.parameters():
  param.requires_grad = False

print(model.fc1.weight.requires_grad)

结果

(bbn) jyzhang@admin2-X10DAi:~/test$ python net.py

True

False

state_dict()

总述

model.state_dict()返回的是一个有序字典OrderedDict,该有序字典中保存了模型所有参数的参数名和具体的参数值,所有参数包括可学习参数和不可学习参数,可通过循环迭代打印参数,因此,该方法可用于保存模型,当保存模型时,会将不可学习参数也存下,当加载模型时,也会将不可学习参数进行赋值。(参见代码示例一)

一般在使用model.state_dict()时会使用该函数的默认参数,model.state_dict()源码如下:

# torch.nn.modules.module.py
class Module(object):
  def state_dict(self, destination=None, prefix='', keep_vars=False):
    if destination is None:
      destination = OrderedDict()
      destination._metadata = OrderedDict()
    destination._metadata[prefix[:-1]] = local_metadata = dict(version=self._version)
    for name, param in self._parameters.items():
      if param is not None:
        destination[prefix + name] = param if keep_vars else param.data
    for name, buf in self._buffers.items():
      if buf is not None:
        destination[prefix + name] = buf if keep_vars else buf.data
    for name, module in self._modules.items():
      if module is not NZPcMPUyone:
        module.state_dict(destination, prefix + name + '.', keep_vars=keep_vars)
    for hook in self._state_dict_hooks.values():
      hook_result = hook(self, destination, prefix, local_metadata)
      if hook_result is not None:
        destination = hook_result
    return destination

在默认参数下,model.state_dict()保存参数时只会保存参数(Tensor对象)的data属性,不会保存参数的requires_grad属性,因此,其保存的参数的requires_grad的属性变为False,没有办法改变requires_grad的属性,所以改变requires_grad的属性只能通过上面的两种方式。(参见代码示例二)

model.state_dict()本质上是浅拷贝,即返回的OrderedDict对象本身是新创建的对象,但其中的param参数的引用仍是模型参数的data属性的地址,又因为Tensor是可变对象,因此,若对param参数进行修改(在原地址变更数据内容),会导致对应的模型参数的改变。(参见代码示例三)

代码示例一

# model.state_dict()的用法
print(model.state_dict())

for name, param in model.state_dict().items():
  print(name)
  print(param)
  print(param.requires_grad)

结果

(bbn) jyzhang@admin2-X10DAi:~/test$ python net.py

OrderedDict([('bn1.weight', tensor([1., 1., 1.])), ('bn1.bias', tensor([0., 0., 0.])), ('bn1.running_mean', tensor([0., 0., 0.])), ('bn1.running_var', tensor([1., 1., 1.])), ('bn1.num_batches_tracked', tensor(0)), ('fc1.weight', tensor([[ 0.0036,  0.1960,  0.2315, -0.2408,  0.1217,  0.2579, -0.0676, -0.1880,

         -0.2855, -0.1587,  0.0409,  0.0312],

        [ 0.1057,  0.1348, -0.0590, -0.1538,  0.2505,  0.0651, -0.2461, -0.1856,

          0.2498, -0.1969,  0.0013,  0.1979],

        [-0.1812,  0.1153,  0.2723, -0.2190,  0.0371, -0.0341,  0.2282,  0.1461,

          0.1890,  0.1762,  0.2657, -0.0827],

        [-0.0188,  0.0081, -0.2674, -0.1858,  0.1296,  0.1728, -0.0770,  0.1444,

         -0.2360, -0.1793,  0.1921, -0.2791]])), ('fc1.bias', tensor([-0.0020,  0.0985,  0.1859, -0.0175]))])

bn1.weight

tensor([1., 1., 1.])

False

bn1.bias

tensor([0., 0., 0.])

False

bn1.running_mean

tensor([0., 0., 0.])

False

bn1.running_var

tensor([1., 1., 1.])

False

bn1.num_batches_tracked

tensor(0)

False

fc1.weight

tensor([[ 0.0036,  0.1960,  0.2315, -0.2408,  0.1217,  0.2579, -0.0676, -0.1880,

         -0.2855, -0.1587,  0.0409,  0.0312],

        [ 0.1057,  0.1348, -0.0590, -0.1538,  0.2505,  0.0651, -0.2461, -0.1856,

          0.2498, -0.1969,  0.0013,  0.1979],

        [-0.1812,  0.1153,  0.2723, -0.2190,  0.0371, -0.0341,  0.2282,  0.1461,

          0.1890,  0.1762,  0.2657, -0.0827],

        [-0.0188,  0.0081, -0.2674, -0.1858,  0.1296,  0.1728, -0.0770,  0.1444,

         -0.2360, -0.1793,  0.1921, -0.2791]])

False

fc1.bias

tensor([-0.0020,  0.0985,  0.1859, -0.0175])

False

代码示例二

# model.state_dict()的用法
print(model.bn1.weight.requires_grad)
model.bn1.weight.requires_grad = False
print(model.bn1.weight.requires_grad)

for name, param in model.state_dict().items():
  if (name == "bn1.weight"):
    param.requires_grad = True

print(model.bn1.weight.requires_grad)

结果

(bbn) jyzhang@admin2-X10DAi:~/test$ python net.py

True

False

False

代码示例三

# model.state_dict()的用法
print(model.bn1.weight)

for name, param in model.state_dict().items():
  if (name == "bn1.weight"):
    param[0] = 1000

print(model.bn1.weight)

结果

(bbn) jyzhang@admin2-X10DAi:~/test$ python net.py

Parameter containing:

tensor([1., 1., 1.], requires_grad=True)

Parameter containing:

tensor([1000.,    1.,    1.], requires_grad=True)

以上为个人经验,希望能给大家一个参考,也希望大家多多支持我们。

0

精彩评论

暂无评论...
验证码 换一张
取 消

关注公众号