本文介绍如何为GTX1060显卡开启vGPU功能。消费级显卡不支持nvidia GRID vGPU功能。在2021年初，疫情激发了黑客的创作热情，给出了一个vgpu_unlock的补丁，可以让消费级显卡支持vGPU。但是vgpu_unlock 和 Proxmox 配合起来有好多坑。

首先，Proxmox 不在nvida vGPU的官方支持支持的hypervisor内，不是每个Proxmox 版本都能稳定的支持nvidia vgpu。我的上一篇文章《DoraCloud for Proxmox桌面云上启用NVIDIA Tesla P4的vGPU功能》讲到了在Proxmox 5.4 上为 Tesla P4 启动vGPU功能。为啥是Proxmox 5.4，主要是这个版本的内核和nvidia vgpu配合起来比较稳定。 PVE 6.4，PVE7.x貌似都无法和nvida vgpu稳定的工作。

其次，Proxmox降低到 5.4后，5.4是源于Debian 9的。Debian 9 自带 Python 3.5。但是Python 3.5 和 vgpu_unlock的python部分的代码不兼容。需要升级python3.6 或者 3.7。但是在PVE 5.4上安装Python 3.6/3.7，特别费劲。虽然也成功的让 vgpu_unlock 成功启用过。但是总有说不清的混乱出现。另外为了运行 mdevctl，还需要加载 rust 的工具链和cargo包管理工具。感觉一个小补丁牵出了一头牛。

经过探索，vgpu_unlock已经衍生出C和语言版本，以及 RUST语言版本。于是选择RUST版本的vgpu_unlock，替代vgpu_unlock的Python配置vgpu服务的部分。

整体的安装过程与之前的Proxmox 5.4上支持P4 vGPU的过程相似。

与P4的安装过程相比，unlock的过程增加了如下环节：

1、下载vgpu_unlock项目，执行对nvidia DKMS驱动打补丁的部分。而对于vgpu service修改部分的不要执行，由 vgpu_unlock-rs 部分完成。

国内位置：https://gitee.com/deskpool/vgpu_unlock.git

2、下载安装 RUST工具链和cargo包管理器，并切换 USTC的源。

3、下载vgpu_unlock-rs项目，编译配置使用vgpu_unlock-rs来启用 unlock的服务。

国内位置：https://gitee.com/deskpool/vgpu_unlock-rs

1、下载安装Proxmox 5.4.1

推荐中科大（ USTC）的源下载 ISO，然后使用 rufus 制作启动U盘。

https://mirrors.ustc.edu.cn/proxmox/iso/proxmox-ve_5.4-1.iso

2、修改中科大源，更新

替换中科大的源，更新升级系统。

cp /etc/apt/sources.list /etc/apt/sources.list.backup

sed -i 's|^deb http://ftp.debian.org|deb https://mirrors.ustc.edu.cn|g' /etc/apt/sources.list

sed -i 's|^deb http://security.debian.org|deb https://mirrors.ustc.edu.cn/debian-security|g' /etc/apt/sources.list

mv /etc/apt/sources.list.d/pve-enterprise.list /etc/apt/sources.list.d/pve-enterprise.list.bak

CODENAME=`cat /etc/os-release |grep PRETTY_NAME |cut -f 2 -d "(" |cut -f 1 -d ")"`

echo "deb https://mirrors.ustc.edu.cn/proxmox/debian $CODENAME pve-no-subscription" > /etc/apt/sources.list.d/pve-no-subscription.list

#更新

apt update && apt dist-upgrade -y

安装DKMS 依赖包

#安装 DKMS 依赖包

apt install pve-headers dkms git pve-headers-4.15.18-12-pve -y

3、启用 IOMMU

服务器为Intel 处理器，通过如下脚本启用IOMMU，如果是AMD处理器，配置有差异。

# 复制如下脚本，启用IO-MMU

# /etc/default/grub 的GRUB_CMDLINE_LINUX_DEFAULT，增加 intel_iommu=on iommu=pt

sed -i 's/GRUB_CMDLINE_LINUX_DEFAULT="quiet"/GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on iommu=pt"/g' /etc/default/grub

update-grub

# 加载 vfio vfio_iommu_type1 vfio_pci vfio_virqfd 4个Modules

echo vfio >> /etc/modules

echo vfio_iommu_type1 >> /etc/modules

echo vfio_pci >> /etc/modules

echo vfio_virqfd >> /etc/modules

echo "options vfio_iommu_type1 allow_unsafe_interrupts=1" > /etc/modprobe.d/iommu_unsafe_interrupts.conf

echo "options kvm ignore_msrs=1" > /etc/modprobe.d/kvm.conf

echo "blacklist nouveau" >> /etc/modprobe.d/blacklist.conf

update-initramfs -u

reboot

执行完毕脚本，会自动重启服务器，然后查看日志，确认 IOMMU已经启用。

root@pveserver:~# dmesg | grep -e DMAR -e IOMMU

[ 0.000000] ACPI: DMAR 0x0000000079A48648 0000A8 (v01 INTEL EDK2 00000002 01000013)

[ 0.000000] DMAR: IOMMU enabled

[ 0.004000] DMAR: Host address width 39

[ 0.004000] DMAR: DRHD base: 0x000000fed90000 flags: 0x0

[ 0.004000] DMAR: dmar0: reg_base_addr fed90000 ver 1:0 cap 1c0000c40660462 ecap 19e2ff0505e

[ 0.004000] DMAR: DRHD base: 0x000000fed91000 flags: 0x1

[ 0.004000] DMAR: dmar1: reg_base_addr fed91000 ver 1:0 cap d2008c40660462 ecap f050da

4、安装DKMS 和 NVIDIA 内核驱动程序

#下载nvidia 驱动

wget http://www1.deskpool.com:9000/software/NVIDIA-Linux-x86_64-460.32.03-grid.run

wget http://www1.deskpool.com:9000/software/NVIDIA-Linux-x86_64-460.32.04-vgpu-kvm.run

chmod +x NVIDIA-Linux-x86_64-460.32.04-vgpu-kvm.run

#安装驱动

./NVIDIA-Linux-x86_64-460.32.04-vgpu-kvm.run -dkms

为了让DoraCloud桌面云系统能够使用nvidia 的GPU资源，需要对Proxmox 5.4 打一个补丁。该补丁对Proxmox的 API进行了增强。

wget http://www1.deskpool.com:9000/software/patch.tar.gz

tar -zxvf patch.tar.gz  -C /

NVIDIA 驱动安装成功后，运行如下命令，重启Proxmox 服务器。

systemctl daemon-reload

reboot

如果GPU是支持NVIDA GRID的专业GPU，那么vGPU的驱动已经安装好了。由于GPU是消费级GPU，需要执行vgpu_unlock。

5、安装vgpu_unlock，对nvida 驱动源代码打补丁

git clone https://gitee.com/deskpool/vgpu_unlock.git

chmod -R +x vgpu_unlock

sed -i 's/#include "nv-time.h"/#include "nv-time.h"\n\n#include "\/root\/vgpu_unlock\/vgpu_unlock_hooks.c"/g' /usr/src/nvidia-460.32.04/nvidia/os-interface.c

echo "ldflags-y += -T /root/vgpu_unlock/kern.ld" >>/usr/src/nvidia-460.32.04/nvidia/nvidia.Kbuild

dkms remove -m nvidia -v 460.32.04 --all

dkms install -m nvidia -v 460.32.04

6、安装rust工具链和cargo包管理器

Proxmox 5.4 自带的cargo版本太旧，会有问题。需要从rust官网安装。以下为从中科大（USTC）源安装 rust 和 cargo

执行如下脚本，并选择 Y，通过 rustup 方式安装 rust 和 cargo

#设置rustup的source为 USTC

export RUSTUP_DIST_SERVER=https://mirrors.ustc.edu.cn/rust-static

export RUSTUP_UPDATE_ROOT=https://mirrors.ustc.edu.cn/rust-static/rustup

wget -qO- https://cdn.jsdelivr.net/gh/rust-lang-nursery/rustup.rs/rustup-init.sh |sh

设置cargo的镜像为 USTC

│#加载cargo的环境变量

source ~/.cargo/env

#设置cargo的源镜像为 USTC

cat >>~/.cargo/config <<EOF

[source.crates-io]

replace-with = 'ustc'

[source.ustc]

registry = "git://mirrors.ustc.edu.cn/crates.io-index"

EOF

7、安装rust工具链和cargo包管理器

#下载 vgpu_unlock-rs 项目
git clone https://gitee.com/deskpool/vgpu_unlock-rs

cd vgpu_unlock-rs/

cargo build --release

#nvidia-vgpud 的unlock 服务

mkdir /etc/systemd/system/nvidia-vgpud.service.d

cat >>/etc/systemd/system/nvidia-vgpud.service.d/vgpu_unlock.conf <<EOF

[Service]

Environment=LD_PRELOAD=/root/vgpu_unlock-rs/target/release/libvgpu_unlock_rs.so

EOF

#nvidia-vgpu-mgr 的unlock服务

mkdir /etc/systemd/system/nvidia-vgpu-mgr.service.d

cat >>/etc/systemd/system/nvidia-vgpu-mgr.service.d/vgpu_unlock.conf <<EOF

[Service]

Environment=LD_PRELOAD=/root/vgpu_unlock-rs/target/release/libvgpu_unlock_rs.so

EOF

#描述文件，控制GPU的显示配置，以及cuda

mkdir /etc/vgpu_unlock

cat >>/etc/vgpu_unlock/profile_override.toml <<EOF

[profile.nvidia-55]

num_displays = 1

display_width = 1920

display_height = 1080

max_pixels = 2073600

cuda_enabled = 1

frl_enabled = 0

EOF

重启动完毕后，观察syslog日志，确定vgpu unlock 是否启动

1 root@pveserver:~# cat /var/log/syslog |grep unlock

2 Dec 17 15:28:22 pveserver nvidia-vgpud: PID file unlocked.

3 Dec 17 15:41:36 pveserver nvidia-vgpud: PID file unlocked.

再观察一下识别的vGPU类型，可以看到 GTX-1060已经被模拟成了P40

 1 root@pveserver:~# cat /var/log/syslog |grep VGPU |grep GRID

 2 Dec 17 15:41:36 pveserver nvidia-vgpud: VGPU Type 0x3e: GRID P40-1B Class: NVS

 3 Dec 17 15:41:36 pveserver nvidia-vgpud: VGPU Type 0x2e: GRID P40-1Q Class: Quadro

 4 Dec 17 15:41:36 pveserver nvidia-vgpud: VGPU Type 0x2f: GRID P40-2Q Class: Quadro

 5 Dec 17 15:41:36 pveserver nvidia-vgpud: VGPU Type 0x30: GRID P40-3Q Class: Quadro

 6 Dec 17 15:41:36 pveserver nvidia-vgpud: VGPU Type 0x31: GRID P40-4Q Class: Quadro

 7 Dec 17 15:41:36 pveserver nvidia-vgpud: VGPU Type 0x32: GRID P40-6Q Class: Quadro

 8 Dec 17 15:41:36 pveserver nvidia-vgpud: VGPU Type 0x33: GRID P40-8Q Class: Quadro

 9 Dec 17 15:41:36 pveserver nvidia-vgpud: VGPU Type 0x34: GRID P40-12Q Class: Quadro

10 Dec 17 15:41:36 pveserver nvidia-vgpud: VGPU Type 0x35: GRID P40-24Q Class: Quadro

11 Dec 17 15:41:36 pveserver nvidia-vgpud: VGPU Type 0x36: GRID P40-1A Class: NVS

12 Dec 17 15:41:36 pveserver nvidia-vgpud: VGPU Type 0x37: GRID P40-2A Class: NVS

13 Dec 17 15:41:36 pveserver nvidia-vgpud: VGPU Type 0x38: GRID P40-3A Class: NVS

14 Dec 17 15:41:36 pveserver nvidia-vgpud: VGPU Type 0x39: GRID P40-4A Class: NVS

15 Dec 17 15:41:36 pveserver nvidia-vgpud: VGPU Type 0x3a: GRID P40-6A Class: NVS

16 Dec 17 15:41:36 pveserver nvidia-vgpud: VGPU Type 0x3b: GRID P40-8A Class: NVS

17 Dec 17 15:41:36 pveserver nvidia-vgpud: VGPU Type 0x3c: GRID P40-12A Class: NVS

18 Dec 17 15:41:36 pveserver nvidia-vgpud: VGPU Type 0x3d: GRID P40-24A Class: NVS

19 Dec 17 15:41:36 pveserver nvidia-vgpud: VGPU Type 0x9c: GRID P40-2B Class: NVS

20 Dec 17 15:41:36 pveserver nvidia-vgpud: VGPU Type 0xd7: GRID P40-2B4 Class: NVS

21 Dec 17 15:41:36 pveserver nvidia-vgpud: VGPU Type 0xf1: GRID P40-1B4 Class: NVS

22 Dec 17 15:41:36 pveserver nvidia-vgpud: VGPU Type 0x11f: GRID P40-24C Class: Compute

23 Dec 17 15:41:36 pveserver nvidia-vgpud: VGPU Type 0x11b: GRID P40-4C Class: Compute

24 Dec 17 15:41:36 pveserver nvidia-vgpud: VGPU Type 0x11c: GRID P40-6C Class: Compute

25 Dec 17 15:41:36 pveserver nvidia-vgpud: VGPU Type 0x11d: GRID P40-8C Class: Compute

26 Dec 17 15:41:36 pveserver nvidia-vgpud: VGPU Type 0x11e: GRID P40-12C Class: Compute

通过nvidia-smi看一下GPU卡的原始信息

 1 root@pveserver:~# nvidia-smi

 2 Fri Dec 17 16:28:13 2021

 3 +-----------------------------------------------------------------------------+

 4 | NVIDIA-SMI 460.32.04    Driver Version: 460.32.04    CUDA Version: N/A      |

 5 |-------------------------------+----------------------+----------------------+

 6 | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |

 7 | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |

 8 |                               |                      |               MIG M. |

 9 |===============================+======================+======================|

10 |   0  GeForce GTX 106...  On   | 00000000:01:00.0 Off |                  N/A |

11 | 21%   29C    P8     7W / 130W |     19MiB /  6143MiB |      0%      Default |

12 |                               |                      |                  N/A |

13 +-------------------------------+----------------------+----------------------+

14

15 +-----------------------------------------------------------------------------+

16 | Processes:                                                                  |

17 |  GPU   GI   CI        PID   Type   Process name                  GPU Memory |

18 |        ID   ID                                                   Usage      |

19 |=============================================================================|

20 |  No running processes found                                                 |

21 +-----------------------------------------------------------------------------+

验证一下 Proxmox的API。是否支持vGPU。其中的 name 字段，是上面的补丁（patch.tar.gz）安装后的增强。便于DoraCloud的API使用。

 1 root@pveserver:~# pvesh get /nodes/pveserver/hardware/pci/01:00.0/mdev

 2 ┌───────────┬──────────────────────────────────────────────────────────────────────────────────────────┬──────────────┬────────────┐

 3 │ available │ description                                                                              │ name         │ type       │

 4 ├───────────┼──────────────────────────────────────────────────────────────────────────────────────────┼──────────────┼────────────┤

 5 │         1 │ num_heads=1, frl_config=60, framebuffer=24576M, max_resolution=4096x2160, max_instance=1 │ GRID P40-24C │ nvidia-287 │

 6 ├───────────┼──────────────────────────────────────────────────────────────────────────────────────────┼──────────────┼────────────┤

 7 │         1 │ num_heads=4, frl_config=60, framebuffer=24576M, max_resolution=7680x4320, max_instance=1 │ GRID P40-24Q │ nvidia-53  │

 8 ├───────────┼──────────────────────────────────────────────────────────────────────────────────────────┼──────────────┼────────────┤

 9 │         1 │ num_heads=1, frl_config=60, framebuffer=24576M, max_resolution=1280x1024, max_instance=1 │ GRID P40-24A │ nvidia-61  │

10 ├───────────┼──────────────────────────────────────────────────────────────────────────────────────────┼──────────────┼────────────┤

11 │         2 │ num_heads=1, frl_config=60, framebuffer=12288M, max_resolution=4096x2160, max_instance=2 │ GRID P40-12C │ nvidia-286 │

12 ├───────────┼──────────────────────────────────────────────────────────────────────────────────────────┼──────────────┼────────────┤

13 │         2 │ num_heads=4, frl_config=60, framebuffer=12288M, max_resolution=7680x4320, max_instance=2 │ GRID P40-12Q │ nvidia-52  │

14 ├───────────┼──────────────────────────────────────────────────────────────────────────────────────────┼──────────────┼────────────┤

15 │         2 │ num_heads=1, frl_config=60, framebuffer=12288M, max_resolution=1280x1024, max_instance=2 │ GRID P40-12A │ nvidia-60  │

16 ├───────────┼──────────────────────────────────────────────────────────────────────────────────────────┼──────────────┼────────────┤

17 │         3 │ num_heads=1, frl_config=60, framebuffer=8192M, max_resolution=1280x1024, max_instance=3  │ GRID P40-8A  │ nvidia-59  │

18 ├───────────┼──────────────────────────────────────────────────────────────────────────────────────────┼──────────────┼────────────┤

19 │         3 │ num_heads=1, frl_config=60, framebuffer=8192M, max_resolution=4096x2160, max_instance=3  │ GRID P40-8C  │ nvidia-285 │

20 ├───────────┼──────────────────────────────────────────────────────────────────────────────────────────┼──────────────┼────────────┤

21 │         3 │ num_heads=4, frl_config=60, framebuffer=8192M, max_resolution=7680x4320, max_instance=3  │ GRID P40-8Q  │ nvidia-51  │

22 ├───────────┼──────────────────────────────────────────────────────────────────────────────────────────┼──────────────┼────────────┤

23 │         4 │ num_heads=1, frl_config=60, framebuffer=6144M, max_resolution=1280x1024, max_instance=4  │ GRID P40-6A  │ nvidia-58  │

24 ├───────────┼──────────────────────────────────────────────────────────────────────────────────────────┼──────────────┼────────────┤

25 │         4 │ num_heads=1, frl_config=60, framebuffer=6144M, max_resolution=4096x2160, max_instance=4  │ GRID P40-6C  │ nvidia-284 │

26 ├───────────┼──────────────────────────────────────────────────────────────────────────────────────────┼──────────────┼────────────┤

27 │         4 │ num_heads=4, frl_config=60, framebuffer=6144M, max_resolution=7680x4320, max_instance=4  │ GRID P40-6Q  │ nvidia-50  │

28 ├───────────┼──────────────────────────────────────────────────────────────────────────────────────────┼──────────────┼────────────┤

29 │         6 │ num_heads=4, frl_config=60, framebuffer=4096M, max_resolution=7680x4320, max_instance=6  │ GRID P40-4Q  │ nvidia-49  │

30 ├───────────┼──────────────────────────────────────────────────────────────────────────────────────────┼──────────────┼────────────┤

31 │         6 │ num_heads=1, frl_config=60, framebuffer=4096M, max_resolution=1280x1024, max_instance=6  │ GRID P40-4A  │ nvidia-57  │

32 ├───────────┼──────────────────────────────────────────────────────────────────────────────────────────┼──────────────┼────────────┤

33 │         6 │ num_heads=1, frl_config=60, framebuffer=4096M, max_resolution=4096x2160, max_instance=6  │ GRID P40-4C  │ nvidia-283 │

34 ├───────────┼──────────────────────────────────────────────────────────────────────────────────────────┼──────────────┼────────────┤

35 │         8 │ num_heads=4, frl_config=60, framebuffer=3072M, max_resolution=7680x4320, max_instance=8  │ GRID P40-3Q  │ nvidia-48  │

36 ├───────────┼──────────────────────────────────────────────────────────────────────────────────────────┼──────────────┼────────────┤

37 │         8 │ num_heads=1, frl_config=60, framebuffer=3072M, max_resolution=1280x1024, max_instance=8  │ GRID P40-3A  │ nvidia-56  │

38 ├───────────┼──────────────────────────────────────────────────────────────────────────────────────────┼──────────────┼────────────┤

39 │        12 │ num_heads=4, frl_config=45, framebuffer=2048M, max_resolution=5120x2880, max_instance=12 │ GRID P40-2B  │ nvidia-156 │

40 ├───────────┼──────────────────────────────────────────────────────────────────────────────────────────┼──────────────┼────────────┤

41 │        12 │ num_heads=4, frl_config=60, framebuffer=2048M, max_resolution=7680x4320, max_instance=12 │ GRID P40-2Q  │ nvidia-47  │

42 ├───────────┼──────────────────────────────────────────────────────────────────────────────────────────┼──────────────┼────────────┤

43 │        12 │ num_heads=1, frl_config=60, framebuffer=2048M, max_resolution=1280x1024, max_instance=12 │ GRID P40-2A  │ nvidia-55  │

44 ├───────────┼──────────────────────────────────────────────────────────────────────────────────────────┼──────────────┼────────────┤

45 │        12 │ num_heads=4, frl_config=45, framebuffer=2048M, max_resolution=5120x2880, max_instance=12 │ GRID P40-2B4 │ nvidia-215 │

46 ├───────────┼──────────────────────────────────────────────────────────────────────────────────────────┼──────────────┼────────────┤

47 │        24 │ num_heads=4, frl_config=45, framebuffer=1024M, max_resolution=5120x2880, max_instance=24 │ GRID P40-1B4 │ nvidia-241 │

48 ├───────────┼──────────────────────────────────────────────────────────────────────────────────────────┼──────────────┼────────────┤

49 │        24 │ num_heads=4, frl_config=60, framebuffer=1024M, max_resolution=5120x2880, max_instance=24 │ GRID P40-1Q  │ nvidia-46  │

50 ├───────────┼──────────────────────────────────────────────────────────────────────────────────────────┼──────────────┼────────────┤

51 │        24 │ num_heads=1, frl_config=60, framebuffer=1024M, max_resolution=1280x1024, max_instance=24 │ GRID P40-1A  │ nvidia-54  │

52 ├───────────┼──────────────────────────────────────────────────────────────────────────────────────────┼──────────────┼────────────┤

53 │        24 │ num_heads=4, frl_config=45, framebuffer=1024M, max_resolution=5120x2880, max_instance=24 │ GRID P40-1B  │ nvidia-62  │

54 └───────────┴──────────────────────────────────────────────────────────────────────────────────────────┴──────────────┴────────────┘

8、使用vGPU

到这里，Proxmox虚拟化平台下，vGPU功能已经成功开启。接下来，还需要创建 VM，并安装 nvidia grid 的guest 驱动。另外还涉及到 nvidia grid license 的申请。具体请参考Proxmox 以及nvidia vgpu的文档。

如果您使用DoraCloud管理桌面，可以直接部署DoraCloud，通过DoraCloud下载桌面模板。桌面模板已经集成了nvidia vgpu驱动。

请参考我上一篇文章，第4部分部署 DoraCloud

https://www.cnblogs.com/doracloud/p/proxmox_doracloud_telsa_p4.html

DoraCloud for Proxmox的安装

https://www.doracloud.cn/downloads/2-cn.html

巴特西

Proxmox 5.4使用vgpu_unlock，为GTX1060开启vGPU支持