1. Tensor on GPU

  • DL算法需大量数值运算,默认,这些操作通常在CPU上完成,但GPU通常比CPU更快执行
    神经网络所需的特定类型的操作(Operation:matrix multiplication);

  • 注:本文指的GPU是启用CUDA的Nvidia GPU,
    CUDA是计算平台和API,有助于通用计算,而不仅仅是图形;

  • https://developer.nvidia.com/cuda-gpus

2. Getting GPU

  • A:Own Machine:https://pytorch.org/get-started/locally

  • B:Google Colab:Edit ☞ Notebook Setting;

!nvidia-smi
/bin/bash: line 1: nvidia-smi: command not found
Sun May 26 03:26:55 2024
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.104.05             Driver Version: 535.104.05   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  Tesla T4                       Off | 00000000:00:04.0 Off |                    0 |
| N/A   78C    P0              35W /  70W |   1215MiB / 15360MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
+---------------------------------------------------------------------------------------+

Colab paid products - Cancel contracts here

3. Getting Pytorch

  • getting PyTorch to run on GPU;

  • PyTorch可使用torch.cuda包来存储数据(张量)和计算数据(对张量执行操作);

import torch

torchVersion = torch.__version__

cudaAvailable = torch.cuda.is_available()

deviceType = "cuda" if torch.cuda.is_available() else "cpu"

deviceNumber = torch.cuda.device_count()

print('torchVersion:{} \n cudaVersion:{} \n deviceType:{} \n deviceNumber:{}'
  .format(torchVersion,cudaAvailable,deviceType,deviceNumber))
torchVersion:2.3.0+cu121

cudaVersion:True

deviceType:cuda

deviceNumber:1

4. Putting Tensor Model

  • putting tensor and model on GPU;

  • 可通过调用特定设备上的to(device),将张量(和模型)put
    到特定设备上,device是我们希望张量或模型转到的目标设备;

  • GPU提供的数值计算速度比CPU快得多,若device agnostic code,那它将在CPU上执行;
    https://pytorch.org/docs/master/notes/cuda.html#device-agnostic-code

  • 注:put tensor on GPU via to(device),如tensor.to(device),
    返回张量的copy,如CPU和GPU上有相同的张量,要覆盖overwrite张量,请重新分配reassign;

  • some_tensor = some_tensor.to(device)

  • 第二个张量的设备为:cuda:0,即存储在可用的第0个GPU上;

  • GPU索引为0,若有两个GPU可用,则分别为cuda:0和cuda:1,最多cuda:n;

import torch

# create tensor,default on CPU
tensor = torch.tensor([1,2,3])

# tensor not on GPU
print(tensor,tensor.device)

# move tensor to GPU if available
device = "cuda" if torch.cuda.is_available() else "cpu"
tensor_on_gpu = tensor.to(device)
tensor_on_gpu
tensor([1, 2, 3]) cpu

tensor([1, 2, 3], device='cuda:0')

5. Moving Back

  • moving tensor back to CPU

  • 若想将张量移回CPU,如想用Numpy与张量交互,Numpy不利用GPU,
    则需执行此操作,试试torch.Tensor.numpy();

  • 相反:为将张量返回CPU与Numpy合用,可用Tensor.cpu(),
    将张量复制到CPU内存,使其可用于CPU,

import torch

# create tensor,default on CPU
tensor = torch.tensor([1,2,3])

# tensor not on GPU
print(tensor,tensor.device)

# move tensor to GPU if available
device = "cuda" if torch.cuda.is_available() else "cpu"
tensor_on_gpu = tensor.to(device)
print(tensor_on_gpu)

# if tensor on GPU, can't transform it to NumPy (exception)
# TypeError: can't convert cuda:0 device type tensor to numpy.
# Use Tensor.cpu() to copy the tensor to host memory first.
# tensor_on_gpu.numpy()

# copy tensor back to cpu instead
tensor_back_on_cpu = tensor_on_gpu.cpu().numpy()
print(tensor_back_on_cpu)
  • 返回CPU内存中GPU张量的副本,原始张量仍在GPU上

tensor([1, 2, 3]) cpu
tensor([1, 2, 3], device='cuda:0')
[1 2 3]