1. Reproducibility

  • Reproducibility:trying to take random out of random;

  • 计算机从跟不上是确定性deterministic,即每个步骤都是可预测的predictable,
    故产生的随机性randomness可简单认为是模拟随机性,或伪随机性pseudorandomness;

  • 神经网络与深度学习:神经网络从随机数开始描述数据中的模式,
    并试图使用张量运算来改进这些随机数,以更好的描述数据中的模式pattern;

  • 随机性很好很强大,但有时随机性尽可能少,以便进行可重复实验或
    再现性reproducibility,以下是再现性案例:

# create two random tensor
tensor_ra = torch.rand(3,4)
tensor_rb = torch.rand(3,4)

print(f"Tensor A:\n{tensor_ra}\n")
print(f"Tensor B:\n{tensor_rb}\n")
print(f"Tensor A equal Tensor B?")
tensor_ra == tensor_rb
# create two random tensor
tensor_ra = torch.rand(3,4)
tensor_rb = torch.rand(3,4)

print(f"Tensor A:\n{tensor_ra}\n")
print(f"Tensor B:\n{tensor_rb}\n")
print(f"Tensor A equal Tensor B?")
tensor_ra == tensor_rb
  • 如上,我们创建两随机张量,并期望它们是不同的,若想创建两个具有相同值的随机张量,
    例如(as in),张量仍包含随机值,但具有相同的flavour(特点);

  • 即torch.manual_seed(seed)的作用所在,seed是整数,如42,可增加随机性;

import torch
import random

# set random seed,更改seed,并观察数字会发生啥
RANDOM_SEED = 44
torch.manual_seed(seed=RANDOM_SEED)
tensor_rc = torch.rand(3,4)

# 每次调用新的rand()时都必须重置种子(reset seed),
# 否则tensor_rd将与tensor_rc不同,尝试把此行注释看会发生啥
torch.random.manual_seed(seed=RANDOM_SEED)
tensor_rd = torch.rand(3,4)

print(f"Tensor C:\n{tensor_rc}\n")
print(f"Tensor D:\n{tensor_rd}\n")
print(f"Tensor C equal Tensor D?")
tensor_rc == tensor_rd
Tensor C:
tensor([[0.7196, 0.7307, 0.8278, 0.1343],
        [0.6280, 0.7297, 0.2882, 0.2112],
        [0.9836, 0.8722, 0.9650, 0.7837]])

Tensor D:
tensor([[0.7196, 0.7307, 0.8278, 0.1343],
        [0.6280, 0.7297, 0.2882, 0.2112],
        [0.9836, 0.8722, 0.9650, 0.7837]])

Tensor C equal Tensor D?

tensor([[True, True, True, True],
        [True, True, True, True],
        [True, True, True, True]])