:: AsciiDoc

1. Tensor on GPU
2. Getting GPU
3. Getting Pytorch
4. Putting Tensor Model
5. Moving Back

1. Tensor on GPU

DL算法需大量数值运算，默认，这些操作通常在CPU上完成，但GPU通常比CPU更快执行
神经网络所需的特定类型的操作(Operation：matrix multiplication)；
注：本文指的GPU是启用CUDA的Nvidia GPU，
CUDA是计算平台和API，有助于通用计算，而不仅仅是图形；
https://developer.nvidia.com/cuda-gpus

2. Getting GPU

A：Own Machine：https://pytorch.org/get-started/locally
B：Google Colab：Edit ☞ Notebook Setting；

!nvidia-smi

/bin/bash: line 1: nvidia-smi: command not found

Sun May 26 03:26:55 2024
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.104.05             Driver Version: 535.104.05   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  Tesla T4                       Off | 00000000:00:04.0 Off |                    0 |
| N/A   78C    P0              35W /  70W |   1215MiB / 15360MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
+---------------------------------------------------------------------------------------+

Colab paid products - Cancel contracts here

3. Getting Pytorch

getting PyTorch to run on GPU；
PyTorch可使用torch.cuda包来存储数据(张量)和计算数据(对张量执行操作)；

import torch

torchVersion = torch.__version__

cudaAvailable = torch.cuda.is_available()

deviceType = "cuda" if torch.cuda.is_available() else "cpu"

deviceNumber = torch.cuda.device_count()

print('torchVersion:{} \n cudaVersion:{} \n deviceType:{} \n deviceNumber:{}'
  .format(torchVersion,cudaAvailable,deviceType,deviceNumber))

torchVersion:2.3.0+cu121

cudaVersion:True

deviceType:cuda

deviceNumber:1

4. Putting Tensor Model

putting tensor and model on GPU；
可通过调用特定设备上的to(device)，将张量(和模型)put
到特定设备上，device是我们希望张量或模型转到的目标设备；
GPU提供的数值计算速度比CPU快得多，若device agnostic code，那它将在CPU上执行；
https://pytorch.org/docs/master/notes/cuda.html#device-agnostic-code

注：put tensor on GPU via to(device)，如tensor.to(device)，
返回张量的copy，如CPU和GPU上有相同的张量，要覆盖overwrite张量，请重新分配reassign；
some_tensor = some_tensor.to(device)
第二个张量的设备为：cuda:0，即存储在可用的第0个GPU上；
GPU索引为0，若有两个GPU可用，则分别为cuda:0和cuda:1，最多cuda:n；

import torch

# create tensor,default on CPU
tensor = torch.tensor([1,2,3])

# tensor not on GPU
print(tensor,tensor.device)

# move tensor to GPU if available
device = "cuda" if torch.cuda.is_available() else "cpu"
tensor_on_gpu = tensor.to(device)
tensor_on_gpu

tensor([1, 2, 3]) cpu

tensor([1, 2, 3], device='cuda:0')

5. Moving Back

moving tensor back to CPU
若想将张量移回CPU，如想用Numpy与张量交互，Numpy不利用GPU，
则需执行此操作，试试torch.Tensor.numpy()；
相反：为将张量返回CPU与Numpy合用，可用Tensor.cpu()，
将张量复制到CPU内存，使其可用于CPU，

import torch

# create tensor,default on CPU
tensor = torch.tensor([1,2,3])

# tensor not on GPU
print(tensor,tensor.device)

# move tensor to GPU if available
device = "cuda" if torch.cuda.is_available() else "cpu"
tensor_on_gpu = tensor.to(device)
print(tensor_on_gpu)

# if tensor on GPU, can't transform it to NumPy (exception)
# TypeError: can't convert cuda:0 device type tensor to numpy.
# Use Tensor.cpu() to copy the tensor to host memory first.
# tensor_on_gpu.numpy()

# copy tensor back to cpu instead
tensor_back_on_cpu = tensor_on_gpu.cpu().numpy()
print(tensor_back_on_cpu)

返回CPU内存中GPU张量的副本，原始张量仍在GPU上；

tensor([1, 2, 3]) cpu
tensor([1, 2, 3], device='cuda:0')
[1 2 3]