:: AsciiDoc

1. Train Model

1. Train Model

模型使用随机参数(random parameter)计算进行预测，
即猜测(随机)basically guessing (randomly)；
为解决此问题，可更新内部参数(internal parameter)，
也将参数称为模式pattern，使用nn.Parameter()和torch.randn()
随机设置的权重weight和偏差值(bias value)，使其更好的表示数据；
可对此硬编码(因我们知道默认值，weight=0.7和bias=0.3)，
很多时候不知道模型的理想参数是多少，相反，编写代码找出最佳值很有趣；
在PyTorch中创建损失函数loss function和优化器optimizer，
为使模型自行更新其参数，需在recipe中添加更多内容；

1.1. Loss Function Optimizer

Function	Do What	Where Live	Common Value
Loss Function	测量measure模型预测(如y_pred)与真实标签(truth label如y_test) 相比的错误程度(how wrong)，错误程度越低越好	PyTorch在torch.nn中有很多(plenty of)内置的损失函数	回归问题的MAE：torch.nn.L1Loss()，二分类问题的binary cross entropy：torch.nn.BCELoss()
Optimizer	告诉模型如何更新其内部参数以最好的降低损失，tell model how to update its internal parameters to best lower loss	在torch.optim中查找各种优化函数的实现	SGD随机梯度下降：torch.optim.SGD() Adam优化器optimizer：torch.optim.Adam()

Function

Do What

Where Live

Common Value

Loss Function

测量measure模型预测(如y_pred)与真实标签(truth label如y_test)
相比的错误程度(how wrong)，错误程度越低越好

PyTorch在torch.nn中有很多(plenty of)内置的损失函数

回归问题的MAE：torch.nn.L1Loss()，
二分类问题的binary cross entropy：torch.nn.BCELoss()

Optimizer

告诉模型如何更新其内部参数以最好的降低损失，tell model how to
update its internal parameters to best lower loss

在torch.optim中查找各种优化函数的实现

SGD随机梯度下降：torch.optim.SGD()
Adam优化器optimizer：torch.optim.Adam()

SGD：stochastic gradient descent，随机梯度下降；
MAE：mean absolute error，平均绝对误差；
创建损失函数和优化器，以帮助改进模型，正处理的问题类型将决定要使用的损失函数和优化器；
但有些常用值，如SGD随机梯度下降或Adam优化器，被公认为效果很好；
及用于回归问题(预测数字)的MAE(平均绝对误差)损失函数；
或用于分类问题的binary cross entropy损失函数；
对我们的问题，因正在预测数字，
可选择Pytorch中使用MAE(torch.nn.L1Loss())作为损失函数；
SGD：torch.optim.SGD(param,lr)：
param：想要优化的目标模型参数，如我们之前随机设置的权重weight和偏差值bias value；
lr：希望优化器更新参数的学习率，越高则优化器将尝试更大的更新(更新太大，优化器可能无法工作)，
越低则优化器将尝试较小的更新(更新太小，优化器将花费太长时间才能找到理想值)
学习率learning rate被视为超参数hyperparameter，因它由ML工程师设置，
学习率的常见起始值为0.01，0.001，0.0001，但这些值也可随时间的推移而调整，
这称为学习率调度learning rate scheduling；

# create loss function,MAE loss is same as L1Loss
loss_fn = nn.L1Loss()

# create optimizer
# parameter of target model to optimize

# learning rate:how much optimizer should change parameter at each step,
# higher=more (less stable),lower=less (might take long time))
optimizer = torch.optim.SGD(params=model_0.parameters(),lr=0.01)

1.2. Optimization Loop

* 训练循环training loop涉及模型遍历训练数据并学习特征和标签之间的关系；

测试循环testing loop涉及遍历测试数据并评估模型在训练数据上
学习的模式有多好(模型在训练期间永远不会看到测试数据)；
循环loop：希望模型查看(循环遍历loop through)每个数据集中的每个样本；

1.3. Training Loop

Number	Step Name	Code	Do What
1	Forward Pass	model(x_train)	该模型会遍历所有训练数据一次，并执行其forward()函数计算
2	Calculate Loss	loss=loss_fn(y_pred,y_train)	将模型的输出(预测)与ground truth比较，并评估以查看其错误程度
3	Zero Gradient	optimizer.zero_grad()	优化器的梯度gradient设置为零(默认是累积的 accumulate)，故可针对特定的训练步骤重新计算它们
4	Perform Backpropagation on Loss	loss.backward()	计算每个要更新的模型参数(每个参数的require_grad=True)的损失梯度backpropagation，称为反向传播，故为向后backward
5	update optimizer(gradient descent)	optimizer.step()	用require_grad=True来根据损失梯度更新参数，以改进它们

Number

Step Name

Code

Do What

Forward Pass

model(x_train)

该模型会遍历所有训练数据一次，并执行其forward()函数计算

Calculate Loss

loss=loss_fn(y_pred,y_train)

将模型的输出(预测)与ground truth比较，并评估以查看其错误程度

Zero Gradient

optimizer.zero_grad()

优化器的梯度gradient设置为零(默认是累积的
accumulate)，故可针对特定的训练步骤重新计算它们

Perform Backpropagation on Loss

loss.backward()

计算每个要更新的模型参数(每个参数的require_grad=True)的
损失梯度backpropagation，称为反向传播，故为向后backward

update optimizer(gradient descent)

optimizer.step()

用require_grad=True来根据损失梯度更新参数，以改进它们

注：以上图片步骤排序或描述方式只是示例，经验法则：
在执行反向传播( loss.backward() )之前计算损失( loss = … )；
在stepping them( optimizer.step() )
之前将梯度gradient归零( optimizer.zero_grad() )；
在执行损失反向传播( loss.backward() )之后step优化器( optimizer.step() )；
step optimizer( optimizer.step() ) after performing
backpropagation on loss( loss.backward() )；

1.4. Testing Loop

Number	Step Name	Code	Do What
1	Forward Pass	model(x_test)	该模型遍历所有训练数据一次，执行其forward()函数计算
2	Calculate Loss	loss=loss_fn(y_pred,y_test)	将模型的输出(预测)与ground truth比较并评估，以了解它们的错误程度
3	Calulate Evaluation Metric(optional)	Custom Function	除alongisde损失值，可能还需计算其他评估指标，如测试集的准确性accuracy

Number

Step Name

Code

Do What

Forward Pass

model(x_test)

该模型遍历所有训练数据一次，执行其forward()函数计算

Calculate Loss

loss=loss_fn(y_pred,y_test)

将模型的输出(预测)与ground truth比较并评估，以了解它们的错误程度

Calulate Evaluation Metric(optional)

Custom Function

除alongisde损失值，可能还需计算其他评估指标，如测试集的准确性accuracy

注：测试循环不包括执行反向传播backpropagation( loss.backward() )
和stepping optimizer( optimizer.step() )；
因在测试过程中没有更改模型中的任何参数，它们已经被计算过，
对测试，只对通过模型的正向传递(forward pass)的输出感兴趣；

我们把以上所有问题放一起，训练模型100个epoch(通过数据向前，
forward pass through data)，将每10个epoch对其评估；

torch.manual_seed(42)

# set epoch number(how many times model will pass over training data)
# 设置时期的数量(模型将通过训练数据的次数)
epoch = 100

# create empty loss list to track value
train_loss_value = []
test_loss_value = []
epoch_count = []

for e in range(epoch):
    ### Training
    # put model in training mode (model default state)
    model_0.train()

    # 1：Forward Pass on train data via forward() method inside
    y_pred = model_0(X_train)
    # print(y_pred)

    # 2：Calculate Loss (how different model prediction to ground truth)
    loss = loss_fn(y_pred,y_train)

    # 3：Zero Grad of optimizer
    optimizer.zero_grad()

    # 4：Loss Backward
    loss.backward()

    # 5：Progress optimizer
    optimizer.step()

    ### Testing
    # put model in evaluation mode
    model_0.eval()

    with torch.inference_mode():
      # 1：Forward Pass on test data
      test_pred = model_0(X_test)

      # 2：Caculate Loss on test data,prediction come in torch.float datatype,
      # so comparison need to be done with tensor of the same type
      test_loss = loss_fn(test_pred, y_test.type(torch.float))

      # Print Out What Happening
      if e % 10 == 0:
            epoch_count.append(e)
            train_loss_value.append(loss.detach().numpy())
            test_loss_value.append(test_loss.detach().numpy())
            print(f"Epoch: {e} | MAE Train Loss: {loss} | MAE Test Loss: {test_loss} ")

# Plot Loss Curve
plt.plot(epoch_count, train_loss_value, label="Train Loss")
plt.plot(epoch_count, test_loss_value, label="Test Loss")
plt.title("Training and Testing Loss Curve")
plt.ylabel("Loss")
plt.xlabel("Epoch")
plt.legend();

plt.savefig("TrainingTestingLossCurve.svg")

Epoch: 0 | MAE Train Loss: 0.31288138031959534 | MAE Test Loss: 0.48106518387794495
Epoch: 10 | MAE Train Loss: 0.1976713240146637 | MAE Test Loss: 0.3463551998138428
Epoch: 20 | MAE Train Loss: 0.08908725529909134 | MAE Test Loss: 0.21729660034179688
Epoch: 30 | MAE Train Loss: 0.053148526698350906 | MAE Test Loss: 0.14464017748832703
Epoch: 40 | MAE Train Loss: 0.04543796554207802 | MAE Test Loss: 0.11360953003168106
Epoch: 50 | MAE Train Loss: 0.04167863354086876 | MAE Test Loss: 0.09919948130846024
Epoch: 60 | MAE Train Loss: 0.03818932920694351 | MAE Test Loss: 0.08886633068323135
Epoch: 70 | MAE Train Loss: 0.03476089984178543 | MAE Test Loss: 0.0805937647819519
Epoch: 80 | MAE Train Loss: 0.03132382780313492 | MAE Test Loss: 0.07232122868299484
Epoch: 90 | MAE Train Loss: 0.02788739837706089 | MAE Test Loss: 0.06473556160926819

注：忌重复执行，要重复执行，可从头开始执行，否则损失曲线是横的
损失曲线显示损失随时间的推移而下降，损失是衡量模型错误程度的标准，越低越好；
损失会减少，因损失函数和优化器，模型的内部参数(权重和偏差)
得到更新，以更好的反映数据中的潜在模式；
model internal parameter (weight and bias)
update to better reflect underlying pattern in data
检视inspect模型的state_dict()，看看模型与我们为权重和偏差设置的原始值有多接近；
inspect model state_dict() to see how close model
get to original value we set for weight and bias；

# find model learned parameter
print("model learned following value for weight and bias：")
print(model_0.state_dict())
print("\noriginal value for weight and bias：")
print(f"weight: {weight}, bias: {bias}")

model learned following value for weight and bias：
OrderedDict({'weight': tensor([0.5784]), 'bias': tensor([0.3513])})

original value for weight and bias：
weight: 0.7, bias: 0.3

模型非常接近于计算权重和偏差的确切原始值，若训练更长时间，可能会变得更接近；
尝试将上面的epoch修改为200，模型的损失曲线，权重和偏差值会发生啥？