1. Build Model Part

  • 已准备好数据,现应建立模型,分成及部分;

  • A:设置与设备无关的代码;

  • B:子类化 nn.Module 来构建constructing模型;

  • C:定义损失函数和优化器;

  • D:创建训练循环;

import torch
from torch import nn

# make device agnostic code
device = "cuda" if torch.cuda.is_available() else "cpu"
device
'cpu' 或

'cuda'
  • 现设备已设置好,用于创建任何数据或模型,PyTorch将在CPU(默认)或GPU上处理(若可用);

  • 创建模型:想要能将X数据作为输入进行处理,并将y数据作为输出进行处理的模型;

  • 即:给定X(特征feature),希望我们的模型预测(predict)y(标签label);

  • 此种拥有特征和标签的设置被称为监督学习supervised learning,
    因它的数据告诉我们,在特定的(certain)输入下应该给出什么输出;

  • 要创建这样的模型,需处理X和y的输入和输出形状;

2. Build Model Step

  • A:子类化(subclass)nn.Module,几乎所有PyTorch模型都是nn.Module的子类;

  • B:创建2个nn.Linear layer,构造方法中的线性层能处理X和y的输入和输出形状;

  • create 2 nn.Linear layer in constructor capable
    of handling input and output shape of X and y;

  • C:定义包含模型前向传递计算的forward()方法;
    define forward() containing forward pass computation of model;

  • D:实例化模型类并将其发送到目标设备;
    instantiate model class and send to target device;

# 1:构建模型类(nn.Module的子类)
# 1:construct model class that subclass nn.Module
class CircleModelV0(nn.Module):
    def __init__(self):
        super().__init__()
        # 2:构建2层nn.Linear,能处理X和y的输入输出形状的线性层
        # 2:create 2 nn.Linear layer capable of handling X and y input and output shape

        # 获取2个特征(X),生产5个特征:take in 2 feature(X),produce 5 feature
        self.layer_1 = nn.Linear(in_features=2,out_features=5)

        # 获取5个特征,生成1个特征(y):takes in 5 feature,produce 1 feature(y)
        self.layer_2 = nn.Linear(in_features=5,out_features=1)

    # 3:定义包含正向传递计算的正向方法
    # 3:define forward() containing forward pass computation
    def forward(self,x):
        # 返回layer_2的输出,这是一个单一的特征,与y的形状相同
        # return output of layer_2,a single feature,the same shape as y

        # 计算首先经过层1,然后层1的输出经过层2
        # computation go through layer_1 first then output of layer_1 go through layer_2
        return self.layer_2(self.layer_1(x))

# 4:创建模型的实例并将其发送到目标设备
# 4:create model instance and send to target device
model_0 = CircleModelV0().to(device)
model_0
CircleModelV0(
  (layer_1): Linear(in_features=2, out_features=5, bias=True)
  (layer_2): Linear(in_features=5, out_features=1, bias=True)
)
  • neuronhyperparameterhidden unit,linear computation;

  • self.layer_1和self.layer_2间发生啥:self.layer_1获取in_features=2的
    2个输入特征,并产生out_features=5的5个输出特征,这被称为具有5个隐藏的单元或神经元;

  • 该层将输入数据从具有2个特征转变为5个特征,这样做使模型能从5个数字而不仅仅
    是2个数字中学习模式,从而获得更好的输出,这么说可能因有时它不起作用;

  • 可在神经网络层中使用隐藏单元的数量是一个超参(可自己设置),无需使用固定的值,
    一般越多越好,但也有过多的情况,选择的数量取决于使用的模型类型和数据集;

  • 因我们的数据集小且简单,我们可保持小规模,具有隐藏单元的唯一规则:
    下一层self.layer_2必须具有上一层out_feature相同的in_feature;

  • 这是self.layer_2的in_features=5的原因,它从self.layer_1中
    取出out_features=5,对它执行线性计算,将它们变成out_feature=1(与y的形状相同);

3. TF Playground

TFPlaygroundLinearActivation
# Replicate CircleModelV0 with nn.Sequential
model_0 = nn.Sequential(
    nn.Linear(in_features=2, out_features=5),
    nn.Linear(in_features=5, out_features=1)
).to(device)

model_0
Sequential(
  (0): Linear(in_features=2, out_features=5, bias=True)
  (1): Linear(in_features=5, out_features=1, bias=True)
)
  • 两者对比:

    • nn.Sequential对直接计算(straight-forward computation)
      非常棒fantastic,但它是顺序执行(sequential order);

    • 故若任务不仅仅是直接的顺序计算,将需定义自己的自定义nn.Module子类;
      straight-forward sequential computation;

# make prediction with model
untrained_pred = model_0(X_test.to(device))
print(f"Length of prediction: {len(untrained_pred)}, Shape: {untrained_pred.shape}")
print(f"Length of test sample: {len(y_test)}, Shape: {y_test.shape}")
print(f"\nFirst 10 prediction:\n{untrained_pred[:10]}")
print(f"\nFirst 10 test label:\n{y_test[:10]}")
Length of prediction: 200, Shape: torch.Size([200, 1])
Length of test sample: 200, Shape: torch.Size([200])

First 10 prediction:
tensor([[0.2625],
        [0.2737],
        [0.3577],
        [0.2350],
        [0.5554],
        [0.5607],
        [0.4355],
        [0.5032],
        [0.3492],
        [0.2766]], grad_fn=<SliceBackward0>)

First 10 test label:
tensor([1., 0., 1., 0., 1., 1., 0., 0., 1., 0.])

4. Setup LF Optimizer

  • Loss又称Criterion或Cost Function,不同的问题类型需不同的损失函数;

  • 如回归问题(预测数值),可使用MAE损失;

  • 对二进制分类问题binary classification problem(如我们的问题),
    经常使用二进制交叉熵binary cross entropy作为损失函数;

  • 但相同的优化器函数通常可在不同的问题空间space中使用,
    如随机梯度下降优化器:torch.optim.SGD(),
    Adam优化器:torch.optim.Adam();

  • MAE:mean absolute error,平均绝对误差;

  • MSE:mean squared error,均方误差

  • SGD:stochastic gradient descent,随机梯度下降,

Loss Function/Optimizer Problem Type PyTorch Code

Optimizer

SGD Optimizer

Classification,Regression,其它

torch.optim.SGD()

Adam Optimizer

Classification,Regression,其它

torch.optim.Adam()

Loss Function

Binary Cross Entropy Loss

Binary Classification

torch.nn.BCELoss或torch.nn.BCELossWithLogits

Cross Entropy Loss

MutliClass Classification

torch.nn.CrossEntropyLoss

MAE or L1 Loss

Regression

torch.nn.L1Loss

MSE or L2 Loss

Regression

torch.nn.MSELoss

  • recall loss function,衡量模型预测错误程度的指标,损失越高,模型越差;

  • 同样,PyTorch文档经常将损失函数称为"loss criterion"或"criterion";

  • PyTorch有两种二进制交叉熵(binary cross entropy)实现:

    • torch.nn.BCELoss():创建损失函数,
      测量measure目标(标签)和输入(特征)之间的二进制交叉熵;

    • torch.nn.BCEWithLogitsLoss():类torch.nn.BCELoss(),
      但内置sigmoid layer(nn.Sigmoid)

  • 如何权衡,torch.nn.BCEWithLogitsLoss()表示在数值numerically
    上比使用torch.nn.BCELoss()更稳定(在nn.Sigmoid Layer后);

  • 故一般BCEWithLogitsLoss()是更好的选择,但对高级用法,
    需分离nn.Sigmoid和torch.nn.BCELoss()的组合,separate
    combination of nn.Sigmoid and torch.nn.BCELoss();

# create loss function
# loss_fn = nn.BCELoss() # BCELoss = no sigmoid built-in
loss_fn = nn.BCEWithLogitsLoss() # BCEWithLogitsLoss = sigmoid built-in

# create optimizer
optimizer = torch.optim.SGD(params=model_0.parameters(),lr=0.1)

5. Evaluation Metric

  • 若损失函数衡量模型的错误程度,评估指标则衡量模型的正确程度;

  • 准确性Accuracy:正确预测的总数 除以 预测的总数来测量;

  • 如100人的核酸检测中,99个属于真阴和真阳,则模型的准确性为99%;

# calculate accuracy (classification metric)
def accuracy_fn(y_true, y_pred):
    # torch.eq() calculate where two tensor are equal
    correct = torch.eq(y_true,y_pred).sum().item()
    accuracy = (correct / len(y_pred)) * 100
    return accuracy