1. Train Model

  • 模型使用随机参数(random parameter)计算进行预测,
    即猜测(随机)basically guessing (randomly);

  • 为解决此问题,可更新内部参数(internal parameter),
    也将参数称为模式pattern,使用nn.Parameter()和torch.randn()
    随机设置的权重weight和偏差值(bias value),使其更好的表示数据;

  • 可对此硬编码(因我们知道默认值,weight=0.7和bias=0.3),
    很多时候不知道模型的理想参数是多少,相反,编写代码找出最佳值很有趣;

  • 在PyTorch中创建损失函数loss function和优化器optimizer,
    为使模型自行更新其参数,需在recipe中添加更多内容;

1.1. Loss Function Optimizer

Function Do What Where Live Common Value

Loss Function

  • 测量measure模型预测(如y_pred)与真实标签(truth label如y_test)
    相比的错误程度(how wrong),错误程度越低越好

PyTorch在torch.nn中有很多(plenty of)内置的损失函数

  • 回归问题的MAE:torch.nn.L1Loss(),

  • 二分类问题的binary cross entropy:torch.nn.BCELoss()

Optimizer

  • 告诉模型如何更新其内部参数以最好的降低损失,tell model how to
    update its internal parameters to best lower loss

在torch.optim中查找各种优化函数的实现

  • SGD随机梯度下降:torch.optim.SGD()

  • Adam优化器optimizer:torch.optim.Adam()

  • SGD:stochastic gradient descent,随机梯度下降;

  • MAE:mean absolute error,平均绝对误差;

  • 创建损失函数和优化器,以帮助改进模型,正处理的问题类型将决定要使用的损失函数和优化器;

  • 但有些常用值,如SGD随机梯度下降或Adam优化器,被公认为效果很好;
    及用于回归问题(预测数字)的MAE(平均绝对误差)损失函数;
    或用于分类问题的binary cross entropy损失函数;

  • 对我们的问题,因正在预测数字,
    可选择Pytorch中使用MAE(torch.nn.L1Loss())作为损失函数;

  • SGD:torch.optim.SGD(param,lr):

  • param:想要优化的目标模型参数,如我们之前随机设置的权重weight和偏差值bias value;

  • lr:希望优化器更新参数的学习率,越高则优化器将尝试更大的更新(更新太大,优化器可能无法工作),
    越低则优化器将尝试较小的更新(更新太小,优化器将花费太长时间才能找到理想值)

  • 学习率learning rate被视为超参数hyperparameter,因它由ML工程师设置,
    学习率的常见起始值为0.01,0.001,0.0001,但这些值也可随时间的推移而调整,
    这称为学习率调度learning rate scheduling;

# create loss function,MAE loss is same as L1Loss
loss_fn = nn.L1Loss()

# create optimizer
# parameter of target model to optimize

# learning rate:how much optimizer should change parameter at each step,
# higher=more (less stable),lower=less (might take long time))
optimizer = torch.optim.SGD(params=model_0.parameters(),lr=0.01)
MaeLoss

1.2. Optimization Loop

​​* 训练循环training loop涉及模型遍历训练数据并学习特征和标签之间的关系;

  • 测试循环testing loop涉及遍历测试数据并评估模型在训练数据上
    学习的模式有多好(模型在训练期间永远不会看到测试数据);

  • 循环loop:希望模型查看(循环遍历loop through)每个数据集中的每个样本;

1.3. Training Loop

Number Step Name Code Do What

1

Forward Pass

model(x_train)

该模型会遍历所有训练数据一次,并执行其forward()函数计算

2

Calculate Loss

loss=loss_fn(y_pred,y_train)

将模型的输出(预测)与ground truth比较,并评估以查看其错误程度

3

Zero Gradient

optimizer.zero_grad()

优化器的梯度gradient设置为零(默认是累积的
accumulate),故可针对特定的训练步骤重新计算它们

4

Perform Backpropagation on Loss

loss.backward()

计算每个要更新的模型参数(每个参数的require_grad=True)的
损失梯度backpropagation,称为反向传播,故为向后backward

5

update optimizer(gradient descent)

optimizer.step()

用require_grad=True来根据损失梯度更新参数,以改进它们

PytorchTrainingLoop
  • 注:以上图片步骤排序或描述方式只是示例,经验法则:

  • 在执行反向传播( loss.backward() )之前计算损失( loss = …​ );

  • 在stepping them( optimizer.step() )
    之前将梯度gradient归零( optimizer.zero_grad() );

  • 在执行损失反向传播( loss.backward() )之后step优化器( optimizer.step() );

  • step optimizer( optimizer.step() ) after performing
    backpropagation on loss( loss.backward() );

1.4. Testing Loop

Number Step Name Code Do What

1

Forward Pass

model(x_test)

该模型遍历所有训练数据一次,执行其forward()函数计算

2

Calculate Loss

loss=loss_fn(y_pred,y_test)

将模型的输出(预测)与ground truth比较并评估,以了解它们的错误程度

3

Calulate Evaluation Metric(optional)

Custom Function

除alongisde损失值,可能还需计算其他评估指标,如测试集的准确性accuracy

PytorchTestingLoop
  • 注:测试循环不包括执行反向传播backpropagation( loss.backward() )
    和stepping optimizer( optimizer.step() );

  • 因在测试过程中没有更改模型中的任何参数,它们已经被计算过,
    对测试,只对通过模型的正向传递(forward pass)的输出感兴趣;

  • 我们把以上所有问题放一起,训练模型100个epoch(通过数据向前,
    forward pass through data),将每10个epoch对其评估;

torch.manual_seed(42)

# set epoch number(how many times model will pass over training data)
# 设置时期的数量(模型将通过训练数据的次数)
epoch = 100

# create empty loss list to track value
train_loss_value = []
test_loss_value = []
epoch_count = []

for e in range(epoch):
    ### Training
    # put model in training mode (model default state)
    model_0.train()

    # 1:Forward Pass on train data via forward() method inside
    y_pred = model_0(X_train)
    # print(y_pred)

    # 2:Calculate Loss (how different model prediction to ground truth)
    loss = loss_fn(y_pred,y_train)

    # 3:Zero Grad of optimizer
    optimizer.zero_grad()

    # 4:Loss Backward
    loss.backward()

    # 5:Progress optimizer
    optimizer.step()

    ### Testing
    # put model in evaluation mode
    model_0.eval()

    with torch.inference_mode():
      # 1:Forward Pass on test data
      test_pred = model_0(X_test)

      # 2:Caculate Loss on test data,prediction come in torch.float datatype,
      # so comparison need to be done with tensor of the same type
      test_loss = loss_fn(test_pred, y_test.type(torch.float))

      # Print Out What Happening
      if e % 10 == 0:
            epoch_count.append(e)
            train_loss_value.append(loss.detach().numpy())
            test_loss_value.append(test_loss.detach().numpy())
            print(f"Epoch: {e} | MAE Train Loss: {loss} | MAE Test Loss: {test_loss} ")

# Plot Loss Curve
plt.plot(epoch_count, train_loss_value, label="Train Loss")
plt.plot(epoch_count, test_loss_value, label="Test Loss")
plt.title("Training and Testing Loss Curve")
plt.ylabel("Loss")
plt.xlabel("Epoch")
plt.legend();

plt.savefig("TrainingTestingLossCurve.svg")
Epoch: 0 | MAE Train Loss: 0.31288138031959534 | MAE Test Loss: 0.48106518387794495
Epoch: 10 | MAE Train Loss: 0.1976713240146637 | MAE Test Loss: 0.3463551998138428
Epoch: 20 | MAE Train Loss: 0.08908725529909134 | MAE Test Loss: 0.21729660034179688
Epoch: 30 | MAE Train Loss: 0.053148526698350906 | MAE Test Loss: 0.14464017748832703
Epoch: 40 | MAE Train Loss: 0.04543796554207802 | MAE Test Loss: 0.11360953003168106
Epoch: 50 | MAE Train Loss: 0.04167863354086876 | MAE Test Loss: 0.09919948130846024
Epoch: 60 | MAE Train Loss: 0.03818932920694351 | MAE Test Loss: 0.08886633068323135
Epoch: 70 | MAE Train Loss: 0.03476089984178543 | MAE Test Loss: 0.0805937647819519
Epoch: 80 | MAE Train Loss: 0.03132382780313492 | MAE Test Loss: 0.07232122868299484
Epoch: 90 | MAE Train Loss: 0.02788739837706089 | MAE Test Loss: 0.06473556160926819
TrainingTestingLossCurve
  • 注:忌重复执行,要重复执行,可从头开始执行,否则损失曲线是横的

  • 损失曲线显示损失随时间的推移而下降,损失是衡量模型错误程度的标准,越低越好;

  • 损失会减少,因损失函数和优化器,模型的内部参数(权重和偏差)
    得到更新,以更好的反映数据中的潜在模式;

  • model internal parameter (weight and bias)
    update to better reflect underlying pattern in data

  • 检视inspect模型的state_dict(),看看模型与我们为权重和偏差设置的原始值有多接近;

  • inspect model state_dict() to see how close model
    get to original value we set for weight and bias;

# find model learned parameter
print("model learned following value for weight and bias:")
print(model_0.state_dict())
print("\noriginal value for weight and bias:")
print(f"weight: {weight}, bias: {bias}")
model learned following value for weight and bias:
OrderedDict({'weight': tensor([0.5784]), 'bias': tensor([0.3513])})

original value for weight and bias:
weight: 0.7, bias: 0.3
  • 模型非常接近于计算权重和偏差的确切原始值,若训练更长时间,可能会变得更接近;

  • 尝试将上面的epoch修改为200,模型的损失曲线,权重和偏差值会发生啥?