开发者

Pytorch在训练时冻结某些层使其不参与训练问题(更新梯度)

开发者 https://www.devze.com 2022-11-29 09:21 出处:网络 作者: Jiyang@UESTC
目录定义网络情况一:当不冻结层时结论情况二:采用方式一冻结fc1层时方式一结论情况三:采用方式二冻结fc1层时方式二结论方式一与方式二对比总结最优写法结论总结首先,我们知道,深度学习网络中的参数是通过...
目录
  • 定义网络
  • 情况一:当不冻结层时
    • 结论
  • 情况二:采用方式一冻结fc1层时
    • 方式一
    • 结论
  • 情况三:采用方式二冻结fc1层时
    • 方式二
    • 结论
  • 方式一与方式二对比总结
    • 最优写法
      • 结论
    • 总结

      首先,我们知道,深度学习网络中的参数是通过计算梯度,在反向传播进行更新的,从而能得到一个优秀的参数,但是有的时候,我们想固定其中的某些层的参数不参与反向传播。

      比如说,进行微调时,我们想固定已经加载预训练模型的参数部分,只想更新最后一层的分类器,这时应该怎么做呢。

      定义网络

      # 定义一个简单的网络
      class net(nn.Module):
        def __init__(self, num_class=10):
          super(net, self).__init__()
          self.fc1 = nn.Linear(8, 4)
          self.fc2 = nn.Linear(4, num_class)
       
       
        def forward(self, x):
          return self.fc2(self.fc1(x))

      情况一:当不冻结层时

      代码

      model = net()
      
      # 情况一:不冻结参数时
      loss_fn = nn.CrossEntropyLoss()
      optimizer = optim.SGD(model.parameters(), lr=1e-2) # 传入的是所有的参数
      
      # 训练前的模型参数
      print("model.fc1.weight", model.fc1.weight)
      print("model.fc2.weight", model.fc2.weight)
      
      for epoch in range(10):
        x = torch.randn((3, 8))
        label = torch.randint(0,10,[3]).long()
        output = model(x)
       
        loss = loss_fn(output, label)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
      
      # 训练后的模型参数
      print("model.fc1.weight", model.fc1.weight)
      print("model.fc2.weight", model.fc2.weight)

      结果

      (bbn) jyzhang@admin2-X10DAi:~/test$ python -u "/home/jyzhang/test/net.py"

      model.fc1.weight Parameter containing:

      tensor([[ 0.3362, -0.2676, -0.3497, -0.3009, -0.1013, -0.2316, -0.0189,  0.1430],

              [-0.2486,  0.2900, -0.1818, -0.0942,  0.1445,  0.2410, -0.1407, -0.3176],

              [-0.3198,  0.2039, -0.2android249,  0.2819, -0.3136, -0.2794, -0.3011, -0.2270],

              [ 0.3376, -0.0842,  0.2747, -0.0232,  0.0768,  0.3160, -0.1185,  0.2911]],

             requires_grad=True)

      model.fc2.weight Parameter containing:

      tensor([[ 0.4277,  0.0945,  0.1768,  0.3773],

              [-0.4595, -0.2447,  0.4701,  0.2873],

              [ 0.3281, -0.1861, -0.2202,  0.4413],

              [-0.1053, -0.1238,  0.0275, -0.0072],

              [-0.4448, -0.2787, -0.0280,  0.4629],

              [ 0.4063, -0.2091,  0.0706,  0.3216],

              [-0.2287, -0.1352, -0.0502,  0.3434],

              [-0.2946, -0.4074,  0.4926, -0.0832],

              [-0.2608,  0.0165,  0.0501, -0.1673],

              [ 0.2507,  0.3006,  0.0481,  0.2257]], requires_grad=True)

      model.fc1.weight Parameter containing:

      tensor([[ 0.3316, -0.2628, -0.3391, -0.2989, -0.0981, -0.2178, -0.0056,  0.1410],

              [-0.2529,  0.2991, -0.1772, -0.0992,  0.1447,  0.2480, -0.1370, -0.3186],

              [-0.3246,  0.2055, -0.2229,  0.2745, -0.3158, -0.2750, -0.2994, -0.2295],

              [ 0.3366, -0.0877,  0.2693, -0.0182,  0.0807,  0.3117, -0.1184,  0.2946]],

             requires_grad=True)

      model.fc2.weight Parameter containing:

      tensor([[ 0.4189,  0.0985,  0.1723,  0.3804],

              [-0.4593, -0.2356,  0.4772,  0.2784],

              [ 0.3269, -0.1874, -0.2173,  0.4407],

              [-0.1www.devze.com061, -0.1248,  0.0309, -0.0062],

              [-0.4322, -0.2868, -0.0319,  0.4647],

              [ 0.4048, -0.2150,  0.0692,  0.3228],

              [-0.2252, -0.1353, -0.0433,  0.3396],

              [-0.2936, -0.4118,  0.4875, -0.0782],

              [-0.2625,  0.0192,  0.0509, -0.1670],

              [ 0.2474,  0.3056,  0.0418,  0.2265]], requires_grad=True)

      结论

      当不冻结层时,随着训练的进行,模型中的可学习参数层的参数会发生改变

      情况二:采用方式一冻结fc1层时

      方式一

      优化器传入所有的参数

      optimizer = optim.SGD(model.parameters(), lr=1e-2) # 传入的是所有的参数

      将要冻结层的参数的requires_grad置为False

      for name, param in model.named_parameters():
        if "fc1" in name:
          param.requires_grad = False

      代码

      # 情况二:采用方式一冻结fc1层时
      loss_fn = nn.CrossEntropyLoss()
      optimizer = optim.SGphpD(model.parameters(), lr=1e-2) # 优化器传入的是所有的参数
      
      # 训练前的模型参数
      print("model.fc1.weight", model.fc1.weight)
      print("model.fc2.weight", model.fc2.weight)
      
      # 冻结fc1层的参数
      for name, param in model.named_parameters():
        if "fc1" in name:
          param.requires_grad = False
      
      for epoch in range(10):
        x = torch.randn((3, 8))
        label = torch.randint(0,10,[3]).long()
        output = model(x)
      
        loss = loss_fn(output, label)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
      
      print("model.fc1.weight", model.fc1.weight)
      print("model.fc2.weight", model.fc2.weight)

      结果

      (bbn) jyzhang@admin2-X10DAi:~/test$ python -u "/home/jyzhang/test/net.py"

      model.fc1.weight Parameter containing:

      tensor([[ 0.3163, -0.1592, -0.2360,  0.1436,  0.1158,  0.0406, -0.0627,  0.0566],

              [-0.1688,  0.3519,  0.2464, -0.2693,  0.1284,  0.0544, -0.0188,  0.2404],

              [ 0.0738,  0.2013,  0.0868,  0.1396, -0.2885,  0.3431, -0.1109,  0.2549],

              [ 0.1222, -0.1877,  0.3511,  0.1951,  0.2147, -0.0427, -0.3374, -0.0653]],

             requires_grad=True)

      model.fc2.weight Parameter containing:

      tensor([[-0.1830, -0.3147, -0.1698,  0.3235],

              [-0.1347,  0.3096,  0.4895,  0.1221],

              [ 0.2735, -0.2238,  0.4713, -0.0683],

              [-0.3150, -0.1905,  0.3645,  0.3766],

              [-0.0340,  0.3212,  0.0650,  0.1380],

              [-0.2500,  0.1128, -0.3338, -0.4151],

              [ 0.0446, -0.4776, -0.3655,  0.0822],

              [-0.1871, -0.0602, -0.4855, -0.3604],

              [-0.3296,  0.0523, -0.3424,  0.2151],

              [-0.2478,  0.1424,  0.4547, -0.1969]], requires_grad=True)

      model.fc1.weight Parameter containing:

      tensor([[ 0.3163, -0.1592, -0.2360,  0.1436,  0.1158,  0.0406, -0.0627,  0.0566],

              [-0.1688,  0.3519,  0.2464, -0.2693,  0.1284,  0.0544, -0.0188,  0.2404],

              [ 0.0738,  0.2013,  0.0868,  0.1396, -0.2885,  0.3431, -0.1109,  0.2549],

              [ 0.1222, -0.1877,  0.3511,  0.1951,  0.2147, -0.0427, -0.3374, -0.0653]])

      model.fc2.weight Parameter containing:

      teandroidnsor([[-0.1821, -0.3155, -0.1637,  0.3213],

              [-0.1353,  0.3130,  0.4807,  0.1245],

              [ 0.2731, -0.2206,  0.4687, -0.0718],

              [-0.3138, -0.1925,  0.3561,  0.3809],

              [-0.0344,  0.3152,  0.0606,  0.1332],

              [-0.2501,  0.1154, -0.3267, -0.4137],

              [ 0.0400, -0.4723, -0.3586,  0.0808],

              [-0.1823, -0.0667, -0.4854, -0.3543],

              [-0.3285,  0.0547, -0.3388,  0.2166],

              [-0.2497,  0.1410,  0.4551, -0.2008]], requires_grad=True)

      结论

      由实验的结果可以看出:只要设置requires_grad=False虽然传入模型所有的参数,仍然只更新requires_grad=True的层的参数

      情况三:采用方式二冻结fc1层时

      方式二

      优化器传入不冻结的fc2层的参数

      optimizer = optim.SGD(model.fc2.parameters(), lr=1e-2) # 优化器只传入fc2的参数

      注:不需要将要冻结层的参数的requires_grad置为False

      代码

      # 情况三:采用方式二冻结fc1层时
      loss_fn = nn.CrossEntropyLoss()
      optimizer = optim.SGD(model.fc2.parameters(), lr=1e-2) # 优化器只传入fc2的参数
      print("model.fc1.weight", model.fc1.weight)
      print("model.fc2.weight", model.fc2.weight)
      
      for epoch in range(10):
        x = torch.randn((3, 8))
        label = torch.randint(0,3,[3]).long()
        output = model(x)
      
        loss = loss_fn(output, label)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
      
      print("model.fc1.weight", model.fc1.weight)
      print("model.fc2.weight", model.fc2.weight)

      结果

      model.fc1.weight Parameter containing:

      tensor([[ 0.2519, -0.1772, -0.2229,  0.0711, -0.1681,  0.1233, -0.3217, -0.0412],

              [ 0.2032, -0.2045,  0.2723,  0.3272,  0.1034,  0.1519, -0.0587, -0.3436],

              [ 0.0470,  0.2379,  0.0590,  0.2400,  0.2280,  0.2045, -0.0229, -0.3484],

              [-0.3023, -0.1195,  0.1792, -0.2173, -0.0492,  0.2640, -0.3511, -0.2845]],

             requires_grad=True)

      model.fc2.weight Parameter containing:

      tensor([[-0.3263, -0.2938, -0.3516, -0.4578],

              [-0.4549, -0.0060,  0.4696, -0.0174],

              [-0.4841,  0.2861,  0.2658,  0.4483],

              [-0.3093,  0.0977, -0.2735,  0.1033],

              [-0.2421,  0.4489, -0.4649,  0.0110],

              [-0.3671,  0.0182, -0.1027, -0.4441],

              [ 0.0205, -0.0659,  0.4183, -0.2068],

              [-0.1846,  0.1741, -0.2302, -0.1745],

              [-0.3423, -0.2642,  0.2796,  0.4976],

              [-0.0770, -0.3766, -0.0512, -0.2105]], requires_grad=True)

      model.fc1.weight Parameter containing:

      tensor([[ 0.2519, -0.1772, -0.2229,  0.0711, -0.1681,  0.1233, -0.3217, -0.0412],

              [ 0.2032, -0.2045,  0.2723,  0.3272,  0.1034,  0.1519, -0.0587, -0.3436],

              [ 0.0470,  0.2379,  0.0590,  0.2400,  0.2280,  0.2045, -0.0229, -0.3484],

              [-0.3023, -0.1195,  0.1792, -0.2173, -0.0492,  0.2640, -0.3511, -0.2845]],

             requires_grad=True)

      model.fc2.weight Parameter containing:

      tensor([[-0.3253, -0.2973, -0.3707, -0.4560],

              [-0.4566,  0.0015,  0.4655, -0.0166],

              [-0.4796,  0.2931,  0.2592,  0.4661],

              [-0.3097,  0.0966, -0.2695,  0.1002],

              [-0.2433,  0.4455, -0.4587,  0.0063],

              [-0.3669,  0.0171, -0.0988, -0.4452],

              [ 0.0198, -0.0679,  0.4203, -0.2088],

              [-0.1854,  0.1717, -0.2241, -0.1781],

              [-0.3429, -0.2653,  0.2822,  0.4938],

              [-0.0773, -0.3765, -0.0464, -0.2127]], requires_grad=True)

      结论

      当优化器只传入要更新的层的参数时,只会更新优化器传入的参数,对于没有传入的参数可以求导,但是仍然不会更新参数

      方式一与方式二对比总结

      在训练过程中可能需要固定一部分模型的参数,只更新另一部分参数。

      有两种思路实现这个目标,一个是设置不要更新参数的网络层为false,另一个就是在定义优化器时只传入要更新的参数。

      最优做法是,优化器只传入requires_grad=True的参数,这样占用的内存会更小一点,效率也会更高。

      最优写法

      最优写法

      将不更新的参数的requires_grad设置为False,同时不将该参数传入optimizer

      将不更新的参数的requires_grad设置为False

      # 冻结fc1层的参数
      for name, param in model.named_parameters():
        if "fc1" in name:
          param.requires_grad = False

      不将不更新的模型参数传入optimizer

      # 定义一个fliter,只传入requires_grad=True的模型参数
      optimizer = optim.SGD(filter(lambda p : p.requires_grad, model.parameters()), lr=1e-2)

      代码

      # 最优写法
      loss_fn = nn.CrossEntropyLoss()
      
      # # 训练前的模型参数
      print("model.fc1.weight", model.fc1.weight)
      print("model.fc2.weight", model.fc2.weight)
      print("model.fc1.weight.requires_grad:", model.fc1.weight.requires_grad)
      print("model.fc2.weight.requires_grad:", model.fc2.weight.requires_grad)
      
      # 冻结fc1层的参数
      for name, param in model.named_parameters():
        if "fc1" in name:
          param.requires_grad = False
      
      optimizer = optim.SGD(filter(lambda p : p.requires_grad, model.parameters()), lr=1e-2) # 定义一个fliter,只传入requires_grad=True的模型参数
      
      for epoch in range(10):
        x = torch.randn((3, 8))
        label = torch.randint(0,3,[3]).long()
        output = model(x)
      
        loss = loss_fn(output, label)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
      
      print("model.fc1.weight", model.fc1.weight)
      print("model.fc2.weight", model.fc2.weight)
      print("model.fc1.weight.requires_grad:", model.fc1.weight.requires_grad)
      print("model.fc2.weight.requires_grad:", model.fc2.weight.requires_grad)

      结果

      (bbn) jyzhang@admin2-X10DAi:~/test$ python -u "/home/jyzhang/test/net.py"

      model.fc1.weight Parameter containing:

      tensor([[-0.1193,  0.2354,  0.2520,  0.1187,  0.2699, -0.2301,  0.1622, -0.0478],

              [-0.2862, -0.1716,  0.2865,  0.2615, -0.2205, -0.2046, -0.0983, -0.1564],

              [-0.3143, -0.2248,  0.2198,  0.2338,  0.1184, -0.2033, -0.3418,  0.1434],

        &nb编程sp;     [ 0.3107, -0.0411, -0.3016,  0.1924, -0.1756, -0.2881,  0.0528, -0.0444]],

             requires_grad=True)

      model.fc2.weight Parameter containing:

      tensor([[-0.2548,  0.2107, -0.1293, -0.2562],

              [-0.1989, -0.2624,  0.2226,  开发者_Go学习0.4861],

              [-0.1501,  0.2516,  0.4311, -0.1650],

              [ 0.0334, -0.0963, -0.1731,  0.1706],

              [ 0.2451, -0.2102,  0.0499,  0.0497],

              [-0.1464, -0.2973,  0.3692,  0.0523],

              [ 0.1192,  0.3575, -0.1911,  0.1457],

              [-0.0990,  0.2059,  0.2072, -0.2013],

              [-0.4397,  0.4036, -0.3402, -0.0417],

              [ 0.0379,  0.0128, -0.3212, -0.0867]], requires_grad=True)

      model.fc1.weight.requires_grad: True

      model.fc2.weight.requires_grad: True

      model.fc1.weight Parameter containing:

      tensor([[-0.1193,  0.2354,  0.2520,  0.1187,  0.2699, -0.2301,  0.1622, -0.0478],

              [-0.2862, -0.1716,  0.2865,  0.2615, -0.2205, -0.2046, -0.0983, -0.1564],

              [-0.3143, -0.2248,  0.2198,  0.2338,  0.1184, -0.2033, -0.3418,  0.1434],

              [ 0.3107, -0.0411, -0.3016,  0.1924, -0.1756, -0.2881,  0.0528, -0.0444]])

      model.fc2.weight Parameter containing:

      tensor([[-0.2637,  0.2073, -0.1293, -0.2422],

              [-0.2027, -0.2641,  0.2152,  0.4897],

              [-0.1543,  0.2504,  0.4188, -0.1576],

              [ 0.0356, -0.0947, -0.1698,  0.1669],

              [ 0.2474, -0.2081,  0.0536,  0.0456],

              [-0.1445, -0.2962,  0.3708,  0.0500],

              [ 0.1219,  0.3574, -0.1876,  0.1404],

              [-0.0961,  0.2058,  0.2091, -0.2046],

              [-0.4368,  0.4039, -0.3376, -0.0450],

              [ 0.0398,  0.0143, -0.3181, -0.0897]], requires_grad=True)

      model.fc1.weight.requires_grad: False

      model.fc2.weight.requires_grad: True

      结论

      最优写法能够节省显存和提升速度:

      节省显存:不将不更新的参数传入optimizer

      提升速度:将不更新的参数的requires_grad设置为False,节省了计算这部分参数梯度的时间

      总结

      以上为个人经验,希望能给大家一个参考,也希望大家多多支持我们。

      0

      精彩评论

      暂无评论...
      验证码 换一张
      取 消

      关注公众号