Pytorch在训练时冻结某些层使其不参与训练问题(更新梯度)_开发_开发者

定义网络

# 定义一个简单的网络
class net(nn.Module):
  def __init__(self, num_class=10):
    super(net, self).__init__()
    self.fc1 = nn.Linear(8, 4)
    self.fc2 = nn.Linear(4, num_class)
 
 
  def forward(self, x):
    return self.fc2(self.fc1(x))

情况一：当不冻结层时

代码

model = net()

# 情况一：不冻结参数时
loss_fn = nn.CrossEntropyLoss()
optimizer = optim.SGD(model.parameters(), lr=1e-2) # 传入的是所有的参数

# 训练前的模型参数
print("model.fc1.weight", model.fc1.weight)
print("model.fc2.weight", model.fc2.weight)

for epoch in range(10):
  x = torch.randn((3, 8))
  label = torch.randint(0,10,[3]).long()
  output = model(x)
 
  loss = loss_fn(output, label)
  optimizer.zero_grad()
  loss.backward()
  optimizer.step()

# 训练后的模型参数
print("model.fc1.weight", model.fc1.weight)
print("model.fc2.weight", model.fc2.weight)

结果

(bbn) jyzhang@admin2-X10DAi:~/test$ python -u "/home/jyzhang/test/net.py"
model.fc1.weight Parameter containing:
tensor([[ 0.3362, -0.2676, -0.3497, -0.3009, -0.1013, -0.2316, -0.0189, 0.1430],
[-0.2486, 0.2900, -0.1818, -0.0942, 0.1445, 0.2410, -0.1407, -0.3176],
[-0.3198, 0.2039, -0.2android249, 0.2819, -0.3136, -0.2794, -0.3011, -0.2270],
[ 0.3376, -0.0842, 0.2747, -0.0232, 0.0768, 0.3160, -0.1185, 0.2911]],
requires_grad=True)
model.fc2.weight Parameter containing:
tensor([[ 0.4277, 0.0945, 0.1768, 0.3773],
[-0.4595, -0.2447, 0.4701, 0.2873],
[ 0.3281, -0.1861, -0.2202, 0.4413],
[-0.1053, -0.1238, 0.0275, -0.0072],
[-0.4448, -0.2787, -0.0280, 0.4629],
[ 0.4063, -0.2091, 0.0706, 0.3216],
[-0.2287, -0.1352, -0.0502, 0.3434],
[-0.2946, -0.4074, 0.4926, -0.0832],
[-0.2608, 0.0165, 0.0501, -0.1673],
[ 0.2507, 0.3006, 0.0481, 0.2257]], requires_grad=True)
model.fc1.weight Parameter containing:
tensor([[ 0.3316, -0.2628, -0.3391, -0.2989, -0.0981, -0.2178, -0.0056, 0.1410],
[-0.2529, 0.2991, -0.1772, -0.0992, 0.1447, 0.2480, -0.1370, -0.3186],
[-0.3246, 0.2055, -0.2229, 0.2745, -0.3158, -0.2750, -0.2994, -0.2295],
[ 0.3366, -0.0877, 0.2693, -0.0182, 0.0807, 0.3117, -0.1184, 0.2946]],
requires_grad=True)
model.fc2.weight Parameter containing:
tensor([[ 0.4189, 0.0985, 0.1723, 0.3804],
[-0.4593, -0.2356, 0.4772, 0.2784],
[ 0.3269, -0.1874, -0.2173, 0.4407],
[-0.1www.devze.com061, -0.1248, 0.0309, -0.0062],
[-0.4322, -0.2868, -0.0319, 0.4647],
[ 0.4048, -0.2150, 0.0692, 0.3228],
[-0.2252, -0.1353, -0.0433, 0.3396],
[-0.2936, -0.4118, 0.4875, -0.0782],
[-0.2625, 0.0192, 0.0509, -0.1670],
[ 0.2474, 0.3056, 0.0418, 0.2265]], requires_grad=True)

结论

当不冻结层时，随着训练的进行，模型中的可学习参数层的参数会发生改变

情况二：采用方式一冻结fc1层时

方式一

优化器传入所有的参数

optimizer = optim.SGD(model.parameters(), lr=1e-2) # 传入的是所有的参数

将要冻结层的参数的requires_grad置为False

for name, param in model.named_parameters():
  if "fc1" in name:
    param.requires_grad = False

代码

# 情况二：采用方式一冻结fc1层时
loss_fn = nn.CrossEntropyLoss()
optimizer = optim.SGphpD(model.parameters(), lr=1e-2) # 优化器传入的是所有的参数

# 训练前的模型参数
print("model.fc1.weight", model.fc1.weight)
print("model.fc2.weight", model.fc2.weight)

# 冻结fc1层的参数
for name, param in model.named_parameters():
  if "fc1" in name:
    param.requires_grad = False

for epoch in range(10):
  x = torch.randn((3, 8))
  label = torch.randint(0,10,[3]).long()
  output = model(x)

  loss = loss_fn(output, label)
  optimizer.zero_grad()
  loss.backward()
  optimizer.step()

print("model.fc1.weight", model.fc1.weight)
print("model.fc2.weight", model.fc2.weight)

结果

(bbn) jyzhang@admin2-X10DAi:~/test$ python -u "/home/jyzhang/test/net.py"
model.fc1.weight Parameter containing:
tensor([[ 0.3163, -0.1592, -0.2360, 0.1436, 0.1158, 0.0406, -0.0627, 0.0566],
[-0.1688, 0.3519, 0.2464, -0.2693, 0.1284, 0.0544, -0.0188, 0.2404],
[ 0.0738, 0.2013, 0.0868, 0.1396, -0.2885, 0.3431, -0.1109, 0.2549],
[ 0.1222, -0.1877, 0.3511, 0.1951, 0.2147, -0.0427, -0.3374, -0.0653]],
requires_grad=True)
model.fc2.weight Parameter containing:
tensor([[-0.1830, -0.3147, -0.1698, 0.3235],
[-0.1347, 0.3096, 0.4895, 0.1221],
[ 0.2735, -0.2238, 0.4713, -0.0683],
[-0.3150, -0.1905, 0.3645, 0.3766],
[-0.0340, 0.3212, 0.0650, 0.1380],
[-0.2500, 0.1128, -0.3338, -0.4151],
[ 0.0446, -0.4776, -0.3655, 0.0822],
[-0.1871, -0.0602, -0.4855, -0.3604],
[-0.3296, 0.0523, -0.3424, 0.2151],
[-0.2478, 0.1424, 0.4547, -0.1969]], requires_grad=True)
model.fc1.weight Parameter containing:
tensor([[ 0.3163, -0.1592, -0.2360, 0.1436, 0.1158, 0.0406, -0.0627, 0.0566],
[-0.1688, 0.3519, 0.2464, -0.2693, 0.1284, 0.0544, -0.0188, 0.2404],
[ 0.0738, 0.2013, 0.0868, 0.1396, -0.2885, 0.3431, -0.1109, 0.2549],
[ 0.1222, -0.1877, 0.3511, 0.1951, 0.2147, -0.0427, -0.3374, -0.0653]])
model.fc2.weight Parameter containing:
teandroidnsor([[-0.1821, -0.3155, -0.1637, 0.3213],
[-0.1353, 0.3130, 0.4807, 0.1245],
[ 0.2731, -0.2206, 0.4687, -0.0718],
[-0.3138, -0.1925, 0.3561, 0.3809],
[-0.0344, 0.3152, 0.0606, 0.1332],
[-0.2501, 0.1154, -0.3267, -0.4137],
[ 0.0400, -0.4723, -0.3586, 0.0808],
[-0.1823, -0.0667, -0.4854, -0.3543],
[-0.3285, 0.0547, -0.3388, 0.2166],
[-0.2497, 0.1410, 0.4551, -0.2008]], requires_grad=True)

结论

由实验的结果可以看出：只要设置requires_grad=False虽然传入模型所有的参数，仍然只更新requires_grad=True的层的参数

情况三：采用方式二冻结fc1层时

方式二

优化器传入不冻结的fc2层的参数

optimizer = optim.SGD(model.fc2.parameters(), lr=1e-2) # 优化器只传入fc2的参数

注：不需要将要冻结层的参数的requires_grad置为False

代码

# 情况三：采用方式二冻结fc1层时
loss_fn = nn.CrossEntropyLoss()
optimizer = optim.SGD(model.fc2.parameters(), lr=1e-2) # 优化器只传入fc2的参数
print("model.fc1.weight", model.fc1.weight)
print("model.fc2.weight", model.fc2.weight)

for epoch in range(10):
  x = torch.randn((3, 8))
  label = torch.randint(0,3,[3]).long()
  output = model(x)

  loss = loss_fn(output, label)
  optimizer.zero_grad()
  loss.backward()
  optimizer.step()

print("model.fc1.weight", model.fc1.weight)
print("model.fc2.weight", model.fc2.weight)

结果

model.fc1.weight Parameter containing:
tensor([[ 0.2519, -0.1772, -0.2229, 0.0711, -0.1681, 0.1233, -0.3217, -0.0412],
[ 0.2032, -0.2045, 0.2723, 0.3272, 0.1034, 0.1519, -0.0587, -0.3436],
[ 0.0470, 0.2379, 0.0590, 0.2400, 0.2280, 0.2045, -0.0229, -0.3484],
[-0.3023, -0.1195, 0.1792, -0.2173, -0.0492, 0.2640, -0.3511, -0.2845]],
requires_grad=True)
model.fc2.weight Parameter containing:
tensor([[-0.3263, -0.2938, -0.3516, -0.4578],
[-0.4549, -0.0060, 0.4696, -0.0174],
[-0.4841, 0.2861, 0.2658, 0.4483],
[-0.3093, 0.0977, -0.2735, 0.1033],
[-0.2421, 0.4489, -0.4649, 0.0110],
[-0.3671, 0.0182, -0.1027, -0.4441],
[ 0.0205, -0.0659, 0.4183, -0.2068],
[-0.1846, 0.1741, -0.2302, -0.1745],
[-0.3423, -0.2642, 0.2796, 0.4976],
[-0.0770, -0.3766, -0.0512, -0.2105]], requires_grad=True)
model.fc1.weight Parameter containing:
tensor([[ 0.2519, -0.1772, -0.2229, 0.0711, -0.1681, 0.1233, -0.3217, -0.0412],
[ 0.2032, -0.2045, 0.2723, 0.3272, 0.1034, 0.1519, -0.0587, -0.3436],
[ 0.0470, 0.2379, 0.0590, 0.2400, 0.2280, 0.2045, -0.0229, -0.3484],
[-0.3023, -0.1195, 0.1792, -0.2173, -0.0492, 0.2640, -0.3511, -0.2845]],
requires_grad=True)
model.fc2.weight Parameter containing:
tensor([[-0.3253, -0.2973, -0.3707, -0.4560],
[-0.4566, 0.0015, 0.4655, -0.0166],
[-0.4796, 0.2931, 0.2592, 0.4661],
[-0.3097, 0.0966, -0.2695, 0.1002],
[-0.2433, 0.4455, -0.4587, 0.0063],
[-0.3669, 0.0171, -0.0988, -0.4452],
[ 0.0198, -0.0679, 0.4203, -0.2088],
[-0.1854, 0.1717, -0.2241, -0.1781],
[-0.3429, -0.2653, 0.2822, 0.4938],
[-0.0773, -0.3765, -0.0464, -0.2127]], requires_grad=True)

结论

当优化器只传入要更新的层的参数时，只会更新优化器传入的参数，对于没有传入的参数可以求导，但是仍然不会更新参数

方式一与方式二对比总结

在训练过程中可能需要固定一部分模型的参数，只更新另一部分参数。

有两种思路实现这个目标，一个是设置不要更新参数的网络层为false，另一个就是在定义优化器时只传入要更新的参数。

最优做法是，优化器只传入requires_grad=True的参数，这样占用的内存会更小一点，效率也会更高。

最优写法

最优写法

将不更新的参数的requires_grad设置为False，同时不将该参数传入optimizer

将不更新的参数的requires_grad设置为False

# 冻结fc1层的参数
for name, param in model.named_parameters():
  if "fc1" in name:
    param.requires_grad = False

不将不更新的模型参数传入optimizer

# 定义一个fliter，只传入requires_grad=True的模型参数
optimizer = optim.SGD(filter(lambda p : p.requires_grad, model.parameters()), lr=1e-2)

代码

# 最优写法
loss_fn = nn.CrossEntropyLoss()

# # 训练前的模型参数
print("model.fc1.weight", model.fc1.weight)
print("model.fc2.weight", model.fc2.weight)
print("model.fc1.weight.requires_grad:", model.fc1.weight.requires_grad)
print("model.fc2.weight.requires_grad:", model.fc2.weight.requires_grad)

# 冻结fc1层的参数
for name, param in model.named_parameters():
  if "fc1" in name:
    param.requires_grad = False

optimizer = optim.SGD(filter(lambda p : p.requires_grad, model.parameters()), lr=1e-2) # 定义一个fliter，只传入requires_grad=True的模型参数

for epoch in range(10):
  x = torch.randn((3, 8))
  label = torch.randint(0,3,[3]).long()
  output = model(x)

  loss = loss_fn(output, label)
  optimizer.zero_grad()
  loss.backward()
  optimizer.step()

print("model.fc1.weight", model.fc1.weight)
print("model.fc2.weight", model.fc2.weight)
print("model.fc1.weight.requires_grad:", model.fc1.weight.requires_grad)
print("model.fc2.weight.requires_grad:", model.fc2.weight.requires_grad)

结果

(bbn) jyzhang@admin2-X10DAi:~/test$ python -u "/home/jyzhang/test/net.py"
model.fc1.weight Parameter containing:
tensor([[-0.1193, 0.2354, 0.2520, 0.1187, 0.2699, -0.2301, 0.1622, -0.0478],
[-0.2862, -0.1716, 0.2865, 0.2615, -0.2205, -0.2046, -0.0983, -0.1564],
[-0.3143, -0.2248, 0.2198, 0.2338, 0.1184, -0.2033, -0.3418, 0.1434],
&nb编程sp; [ 0.3107, -0.0411, -0.3016, 0.1924, -0.1756, -0.2881, 0.0528, -0.0444]],
requires_grad=True)
model.fc2.weight Parameter containing:
tensor([[-0.2548, 0.2107, -0.1293, -0.2562],
[-0.1989, -0.2624, 0.2226, 开发者_Go学习0.4861],
[-0.1501, 0.2516, 0.4311, -0.1650],
[ 0.0334, -0.0963, -0.1731, 0.1706],
[ 0.2451, -0.2102, 0.0499, 0.0497],
[-0.1464, -0.2973, 0.3692, 0.0523],
[ 0.1192, 0.3575, -0.1911, 0.1457],
[-0.0990, 0.2059, 0.2072, -0.2013],
[-0.4397, 0.4036, -0.3402, -0.0417],
[ 0.0379, 0.0128, -0.3212, -0.0867]], requires_grad=True)
model.fc1.weight.requires_grad: True
model.fc2.weight.requires_grad: True
model.fc1.weight Parameter containing:
tensor([[-0.1193, 0.2354, 0.2520, 0.1187, 0.2699, -0.2301, 0.1622, -0.0478],
[-0.2862, -0.1716, 0.2865, 0.2615, -0.2205, -0.2046, -0.0983, -0.1564],
[-0.3143, -0.2248, 0.2198, 0.2338, 0.1184, -0.2033, -0.3418, 0.1434],
[ 0.3107, -0.0411, -0.3016, 0.1924, -0.1756, -0.2881, 0.0528, -0.0444]])
model.fc2.weight Parameter containing:
tensor([[-0.2637, 0.2073, -0.1293, -0.2422],
[-0.2027, -0.2641, 0.2152, 0.4897],
[-0.1543, 0.2504, 0.4188, -0.1576],
[ 0.0356, -0.0947, -0.1698, 0.1669],
[ 0.2474, -0.2081, 0.0536, 0.0456],
[-0.1445, -0.2962, 0.3708, 0.0500],
[ 0.1219, 0.3574, -0.1876, 0.1404],
[-0.0961, 0.2058, 0.2091, -0.2046],
[-0.4368, 0.4039, -0.3376, -0.0450],
[ 0.0398, 0.0143, -0.3181, -0.0897]], requires_grad=True)
model.fc1.weight.requires_grad: False
model.fc2.weight.requires_grad: True

结论

最优写法能够节省显存和提升速度：

节省显存：不将不更新的参数传入optimizer

提升速度：将不更新的参数的requires_grad设置为False，节省了计算这部分参数梯度的时间

总结

以上为个人经验，希望能给大家一个参考，也希望大家多多支持我们。

Pytorch在训练时冻结某些层使其不参与训练问题(更新梯度)

目录

定义网络

情况一：当不冻结层时

结论

情况二：采用方式一冻结fc1层时

方式一

结论

情况三：采用方式二冻结fc1层时

方式二

结论

方式一与方式二对比总结

最优写法

结论

总结

精彩评论

关注公众号

热门标签

图文推荐

Pytorch在训练时冻结某些层使其不参与训练问题(更新梯度)

目录

定义网络

情况一：当不冻结层时

结论

情况二：采用方式一冻结fc1层时

方式一

结论

情况三：采用方式二冻结fc1层时

方式二

结论

方式一与方式二对比总结

最优写法

结论

总结

更多 开发 相关资讯：

精彩评论

关注公众号

热门标签

图文推荐

更多开发相关资讯：