安装NVIDIA Virtual GPU Manager
准备工作
在物理主机上安装 NVIDIA Virtual GPU Manager
的准备工作:
KVM服务器上安装好以下软件包:
x86_64
GNU Compiler Collection (GCC)Linux kernel headers
sudo apt install gcc linux-headers-$(uname -r)
安装Virtual GPU Manager Package for Linux KVM
备注
我的实践是在 Ubuntu Linux 22.04 上使用 Nvidia Tesla P10 GPU运算卡 ,官方文档提供了 Installing and Configuring the NVIDIA Virtual GPU Manager for Ubuntu 所以我改为按照这部分资料来完成实践
NVIDIA官方文档非常详尽(繁琐),需要仔细核对你的软硬件环境来找到最适配的文档部分进行参考
安装非常简单,实际上就是运行 NVIDIA Host Drivers安装:
chmod +x NVIDIA-Linux-x86_64-510.85.03-vgpu-kvm.run
sudo sh ./NVIDIA-Linux-x86_64-510.85.03-vgpu-kvm.run
安装执行很快,会编译内核模块并完成安装
警告
我发现 Installing and Configuring the NVIDIA Virtual GPU Manager for Ubuntu 文档中 Installing the Virtual GPU Manager Package for Ubuntu 使用的是 .deb
软件包安装,安装以后 lsmod | grep vfio
设备也是具备了 mdev
模块的。
这和我这里 在Host主机上安装vGPU Manager for Linux KVM 结果不同, 令人困惑
这时, Was the vfio_mdev module removed from the 5.15 kernel? 给了我一个指引: Kernel 5.15开始, mdev
模块取代了 vfio_mdev
,依然可以在 kernel 5.15 上通过 mdev
来使用 vfio
Proxmox 7 vGPU – v2 提供了详细的指导
上述安装
vGPU Manager for Linux KVM
在/etc/systemd/system/multi-user.target.wants
添加了链接,实际上激活了以下两个 vgpu 服务:nvidia-vgpud.service -> /lib/systemd/system/nvidia-vgpud.service nvidia-vgpu-mgr.service -> /lib/systemd/system/nvidia-vgpu-mgr.service
但是我的实践实际发现 nvidia-vgpud.service
运行有异常,见下文 " nvidia-vgpud
和 nvidia-vgpu-mgr
服务段落"
重启服务器,重启后检查
vfio
模块:
lsmod
查看 vfio相关模块lsmod | grep vfio
这里只看到2个vfio相关模块,并没有如文档中具备了 vfio_mdev 模块(原因: 内核 5.15 以后 mdev
取代了 vfio_mdev
) :
lsmod
查看 vfio相关模块,但是没有看到mdevnvidia_vgpu_vfio 57344 0
mdev 28672 1 nvidia_vgpu_vfio
注意 Verifying the Installation of the NVIDIA vGPU Software for Red Hat Enterprise Linux KVM or RHV (这里参考官方文档) 显示 vfio_mdev
是 kernel 5.15 之前的内核模块, Ubuntu Linux 22.04 使用 kernel 5.15系列, mdev
模块已经取代了 vfio_mdev
lsmod
查看 vfio相关模块应该能够看到mdevnvidia_vgpu_vfio 27099 0
nvidia 12316924 1 nvidia_vgpu_vfio
vfio_mdev 12841 0
mdev 20414 2 vfio_mdev,nvidia_vgpu_vfio
vfio_iommu_type1 22342 0
vfio 32331 3 vfio_mdev,nvidia_vgpu_vfio,vfio_iommu_type1
检查设备对应加载的驱动可以使用如下命令:
lspci -vvvnnn
检查驱动详情lspci -vvvnnn -s 82:00.0 | grep -i kernel
输出显示已经加载了 nvidia
驱动:
lspci -vvvnnn
检查驱动详情 Kernel driver in use: nvidia
Kernel modules: nvidiafb, nouveau, nvidia_vgpu_vfio, nvidia
此时检查
nvidia-smi
可以看到当前只有一个物理GPU:
nvidia-smi
检查GPUsudo nvidia-smi
输出显示只有一个GPU:
nvidia-smi
检查显示只有1块GPU卡+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.85.03 Driver Version: 510.85.03 CUDA Version: N/A |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA Graphics... Off | 00000000:82:00.0 Off | 0 |
| N/A 40C P0 42W / 150W | 50MiB / 23040MiB | 1% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
为KVM Hypervisor准备工作: 获取GPU的BDF和Domain
获取物理GPU的PCI设备 bus/device/function (BDF):
lspci | grep NVIDIA
此时看到的物理GPU设备如下:
82:00.0 3D controller: NVIDIA Corporation GP102GL [Tesla P10] (rev a1)
这里显示输出的 82:00.0
就是GPU的 PCI 设备 BDF
从GPU的PCI设备BDF获得GPU的完整标识: 注意,这里要将
82:00.0
转换成82_00_0
(也就是所谓的transformed-bdf
)
virsh nodedev-list
获得完整的GPU标识virsh nodedev-list --cap pci | grep 82_00_0
这里输出的结果如下:
virsh nodedev-list
获得完整的GPU标识pci_0000_82_00_0
记录下这里输出的完整PCI设备identifier pci_0000_82_00_0
,我们将用这个标识字符串来获得 virsh
中使用的 GPU 的 domain, bus, slot 以及 function
获取GPU设备完整的virsh配置:
virsh nodedev-dumpxml
获得完整的GPU配置(domain, bus, slot 以及 function)virsh nodedev-dumpxml pci_0000_82_00_0 | egrep 'domain|bus|slot|function'
输出内容:
virsh nodedev-dumpxml
获得完整的GPU配置(domain, bus, slot 以及 function) <domain>0</domain>
<bus>130</bus>
<slot>0</slot>
<function>0</function>
<address domain='0x0000' bus='0x82' slot='0x00' function='0x0'/>
记录下这个输出内容备用(最后一行)
创建KVM Hypervisor的NVIDIA vGPU
为KVM Hypervisor创建NVIDIA vGPU分为两种方式:
传统的NVIDIA vGPU 是分时切分vGPU (本文记录)
基于最新的Ampere微架构的 NVIDIA Multi-Instance GPU (MIG) vGPU (不在本文记录)
传统的NVIDIA vGPU (分时切分vGPU)
警告
最初我在 Ubuntu Linux 22.04 上实践不成功,原因是 Nvidia Tesla P10 GPU运算卡 默认关闭了 NVIDIA Virtual GPU (vGPU) 支持。在完成 vgpu_unlock 解锁了 NVIDIA Virtual GPU (vGPU) 功能之后,才能完成本段配置
首先进入物理GPU对应的
mdev_supported_types
目录,这个目录的完整路径结合了上文我们获得的 domain, bus, slot, and function
mdev_supported_types
目录# virsh nodedev-dumpxml pci_0000_82_00_0 | egrep 'domain|bus|slot|function' 输出获得:
# <address domain='0x0000' bus='0x82' slot='0x00' function='0x0'/>
domain=0000
bus=82
slot=00
function=0
cd /sys/class/mdev_bus/${domain}\:${bus}\:${slot}.${function}/mdev_supported_types/
这里我遇到一个问题, /sys/class/mdev_bus/
目录不存在,也就没有进入所谓物理GPU对应 mdev_supported_types
目录。这是为何呢? => 原因已经找到: Nvidia Tesla P10 GPU运算卡 需要通过 vgpu_unlock 解锁 NVIDIA Virtual GPU (vGPU) 支持
这个问题需要分设备来解决:
对于 NVIDIA Multi-Instance GPU (MIG) GPU设备( Sigle Root I/O Virtualization(SR-IOV) ) 需要执行
sudo /usr/lib/nvidia/sriov-manage -e ALL
(参考 /sys/class/mdev_bus/ Can’t Found )对于传统的GPU设备,NVIDIA提供了一种称为 VFIO Mediated devices 设备,当NVIDIA GPU支持 NVIDIA Virtual GPU (vGPU) 功能时,就会在
sys/class/
目录下创建mdev_bus
设备入口 (/sys/class/mdev_bus/
似乎是一个vdsm-hook-vfio-mdev
的hook创建的,这个包在 oVirt 仓库中提供(参考 vGPU in oVirt )。不过这个软件包是 RedHat Linux 提供,没有在 Ubuntu Linux 上找到 )
检查GPU设备详情:
lspci -v
检查GPU设备lspci -v -s 82:00.0
输出显示:
lspci -v
检查GPU设备82:00.0 3D controller: NVIDIA Corporation GP102GL [Tesla P10] (rev a1)
Subsystem: NVIDIA Corporation GP102GL [Tesla P10]
Physical Slot: 3
Flags: bus master, fast devsel, latency 0, IRQ 183, NUMA node 1, IOMMU group 80
Memory at c8000000 (32-bit, non-prefetchable) [size=16M]
Memory at 3b000000000 (64-bit, prefetchable) [size=32G]
Memory at 3b800000000 (64-bit, prefetchable) [size=32M]
Capabilities: [60] Power Management version 3
Capabilities: [68] MSI: Enable+ Count=1/1 Maskable- 64bit+
Capabilities: [78] Express Endpoint, MSI 00
Capabilities: [100] Virtual Channel
Capabilities: [128] Power Budgeting <?>
Capabilities: [420] Advanced Error Reporting
Capabilities: [600] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?>
Capabilities: [900] Secondary PCI Express
Kernel driver in use: nvidia
Kernel modules: nvidiafb, nouveau, nvidia_vgpu_vfio, nvidia
奇怪,我的Tasla P10 GPU卡确实是插在物理slot 3上,为何前面使用virsh nodedev-dumpxml输出显示slot=0x00 两者是什么关系?
检查 vgpu 状态:
nvidia-smi vgpu
查看vgpu状态nvidia-smi vgpu
输出显示只有一个vGPU:
nvidia-smi vgpu
查看vgpu状态Wed Jun 7 23:32:40 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.85.03 Driver Version: 510.85.03 |
|---------------------------------+------------------------------+------------+
| GPU Name | Bus-Id | GPU-Util |
| vGPU ID Name | VM ID VM Name | vGPU-Util |
|=================================+==============================+============|
| 0 NVIDIA Graphics Device | 00000000:82:00.0 | 0% |
+---------------------------------+------------------------------+------------+
nvidia-vgpud
和 nvidia-vgpu-mgr
服务
检查
nvidia-vgpu-mgr
服务:
nvidia-vgpu-mgr
服务状态systemctl status nvidia-vgpu-mgr.service
这里观察 nvidia-vgpu-mgr
服务运行正常:
nvidia-vgpu-mgr
服务状态正常● nvidia-vgpu-mgr.service - NVIDIA vGPU Manager Daemon
Loaded: loaded (/lib/systemd/system/nvidia-vgpu-mgr.service; enabled; vendor preset: enabled)
Active: active (running) since Thu 2023-06-08 23:29:10 CST; 8s ago
Process: 12170 ExecStart=/usr/bin/nvidia-vgpu-mgr (code=exited, status=0/SUCCESS)
Main PID: 12171 (nvidia-vgpu-mgr)
Tasks: 1 (limit: 464054)
Memory: 260.0K
CPU: 4ms
CGroup: /system.slice/nvidia-vgpu-mgr.service
└─12171 /usr/bin/nvidia-vgpu-mgr
Jun 08 23:29:10 zcloud.staging.huatai.me systemd[1]: Starting NVIDIA vGPU Manager Daemon...
Jun 08 23:29:10 zcloud.staging.huatai.me systemd[1]: Started NVIDIA vGPU Manager Daemon.
Jun 08 23:29:10 zcloud.staging.huatai.me nvidia-vgpu-mgr[12171]: notice: vmiop_env_log: nvidia-vgpu-mgr daemon started
但是检查
nvidia-vgpud
服务:
nvidia-vgpud
服务状态systemctl status nvidia-vgpud.service
发现 nvidia-vgpud
启动失败:
nvidia-vgpud
服务启动失败× nvidia-vgpud.service - NVIDIA vGPU Daemon
Loaded: loaded (/lib/systemd/system/nvidia-vgpud.service; enabled; vendor preset: enabled)
Active: failed (Result: exit-code) since Thu 2023-06-08 23:29:40 CST; 3s ago
Process: 12179 ExecStart=/usr/bin/nvidia-vgpud (code=exited, status=0/SUCCESS)
Process: 12181 ExecStopPost=/bin/rm -rf /var/run/nvidia-vgpud (code=exited, status=0/SUCCESS)
Main PID: 12180 (code=exited, status=6)
CPU: 35ms
Jun 08 23:29:40 zcloud.staging.huatai.me nvidia-vgpud[12180]: vGPU types: 613
Jun 08 23:29:40 zcloud.staging.huatai.me nvidia-vgpud[12180]:
Jun 08 23:29:40 zcloud.staging.huatai.me nvidia-vgpud[12180]: pciId of gpu [0]: 0:82:0:0
Jun 08 23:29:40 zcloud.staging.huatai.me nvidia-vgpud[12180]: GPU not supported by vGPU at PCI Id: 0:82:0:0 DevID: 0x10de / 0x1b39 / 0x10de / 0x1217
Jun 08 23:29:40 zcloud.staging.huatai.me nvidia-vgpud[12180]: error: failed to send vGPU configuration info to RM: 6
Jun 08 23:29:40 zcloud.staging.huatai.me nvidia-vgpud[12180]: PID file unlocked.
Jun 08 23:29:40 zcloud.staging.huatai.me nvidia-vgpud[12180]: PID file closed.
Jun 08 23:29:40 zcloud.staging.huatai.me nvidia-vgpud[12180]: Shutdown (12180)
Jun 08 23:29:40 zcloud.staging.huatai.me systemd[1]: nvidia-vgpud.service: Main process exited, code=exited, status=6/NOTCONFIGURED
Jun 08 23:29:40 zcloud.staging.huatai.me systemd[1]: nvidia-vgpud.service: Failed with result 'exit-code'.
为什么 nvidia-vgpud
启动失败? error: failed to send vGPU configuration info to RM: 6
Hacking NVidia Cards into their Professional Counterparts 有用户提供了 Tesla P4 和 GTX 1080( 和 Tesla P4 是相同的 GP104核型 )启动日志对比,很不幸,我的 Nvidia Tesla P10 GPU运算卡 启动日志居然和不支持 vGPU 的 GTX 1080相同。 <= 确实,经过验证 Nvidia Tesla P10 GPU运算卡 和消费级显卡一样需要 vgpu_unlock 之后才能使用 NVIDIA Virtual GPU (vGPU) 功能
问了以下GPT 3.5,居然也提示: 根据日志显示,nvidia-vgpud服务启动失败,具体原因是GPU不支持vGPU。 ,而且GPT 3.5还告诉我 NVIDIA Tesla P10不支持vGPU功能 ,建议我升级到Tesla P40
难道我的 Nvidia Tesla P10 GPU运算卡 这张隐形卡,真的是老黄刀法精准的阉割Tesla卡? 我不服,扶我起来,我还能打!
备注
在完成 vgpu_unlock 解锁 NVIDIA Virtual GPU (vGPU) 功能之后,才能正常运行 nvidia-vgpud
nvidia-smi
提供了query
:
nvidia-smi -q
查询GPUnvidia-smi -q
nvidia-smi -q
查询GPU显示支持VGPU==============NVSMI LOG==============
Timestamp : Fri Jun 9 00:12:17 2023
Driver Version : 510.85.03
CUDA Version : Not Found
Attached GPUs : 1
GPU 00000000:82:00.0
Product Name : NVIDIA Graphics Device
Product Brand : Tesla
Product Architecture : Pascal
Display Mode : Enabled
Display Active : Disabled
Persistence Mode : Enabled
MIG Mode
Current : N/A
Pending : N/A
Accounting Mode : Enabled
Accounting Mode Buffer Size : 4000
Driver Model
Current : N/A
Pending : N/A
Serial Number : 0000000000000
GPU UUID : GPU-794d1de5-b8c7-9b49-6fe3-f96f8fd98a19
Minor Number : 0
VBIOS Version : 86.02.4B.00.01
MultiGPU Board : No
Board ID : 0x8200
GPU Part Number : 000-00000-0000-000
Module ID : 0
Inforom Version
Image Version : G000.0000.00.00
OEM Object : 1.1
ECC Object : 4.1
Power Management Object : N/A
GPU Operation Mode
Current : N/A
Pending : N/A
GSP Firmware Version : N/A
GPU Virtualization Mode
Virtualization Mode : Host VGPU
Host VGPU Mode : Non SR-IOV
IBMNPU
Relaxed Ordering Mode : N/A
PCI
Bus : 0x82
Device : 0x00
Domain : 0x0000
Device Id : 0x1B3910DE
Bus Id : 00000000:82:00.0
Sub System Id : 0x121710DE
GPU Link Info
PCIe Generation
Max : 3
Current : 1
Link Width
Max : 16x
Current : 16x
Bridge Chip
Type : N/A
Firmware : N/A
Replays Since Reset : 0
Replay Number Rollovers : 0
Tx Throughput : 0 KB/s
Rx Throughput : 0 KB/s
Fan Speed : N/A
Performance State : P8
Clocks Throttle Reasons
Idle : Active
Applications Clocks Setting : Not Active
SW Power Cap : Not Active
HW Slowdown : Not Active
HW Thermal Slowdown : Not Active
HW Power Brake Slowdown : Not Active
Sync Boost : Not Active
SW Thermal Slowdown : Not Active
Display Clock Setting : Not Active
FB Memory Usage
Total : 23040 MiB
Reserved : 0 MiB
Used : 50 MiB
Free : 22989 MiB
BAR1 Memory Usage
Total : 32768 MiB
Used : 2 MiB
Free : 32766 MiB
Compute Mode : Default
Utilization
Gpu : 0 %
Memory : 0 %
Encoder : 0 %
Decoder : 0 %
Encoder Stats
Active Sessions : 0
Average FPS : 0
Average Latency : 0
FBC Stats
Active Sessions : 0
Average FPS : 0
Average Latency : 0
Ecc Mode
Current : Enabled
Pending : Enabled
ECC Errors
Volatile
Single Bit
Device Memory : 0
Register File : N/A
L1 Cache : N/A
L2 Cache : N/A
Texture Memory : N/A
Texture Shared : N/A
CBU : N/A
Total : 0
Double Bit
Device Memory : 0
Register File : N/A
L1 Cache : N/A
L2 Cache : N/A
Texture Memory : N/A
Texture Shared : N/A
CBU : N/A
Total : 0
Aggregate
Single Bit
Device Memory : 0
Register File : N/A
L1 Cache : N/A
L2 Cache : N/A
Texture Memory : N/A
Texture Shared : N/A
CBU : N/A
Total : 0
Double Bit
Device Memory : 0
Register File : N/A
L1 Cache : N/A
L2 Cache : N/A
Texture Memory : N/A
Texture Shared : N/A
CBU : N/A
Total : 0
Retired Pages
Single Bit ECC : 0
Double Bit ECC : 0
Pending Page Blacklist : No
Remapped Rows : N/A
Temperature
GPU Current Temp : 40 C
GPU Shutdown Temp : 95 C
GPU Slowdown Temp : 92 C
GPU Max Operating Temp : N/A
GPU Target Temperature : N/A
Memory Current Temp : N/A
Memory Max Operating Temp : N/A
Power Readings
Power Management : Supported
Power Draw : 18.88 W
Power Limit : 150.00 W
Default Power Limit : 150.00 W
Enforced Power Limit : 150.00 W
Min Power Limit : 75.00 W
Max Power Limit : 150.00 W
Clocks
Graphics : 544 MHz
SM : 544 MHz
Memory : 405 MHz
Video : 544 MHz
Applications Clocks
Graphics : 1025 MHz
Memory : 3008 MHz
Default Applications Clocks
Graphics : 1025 MHz
Memory : 3008 MHz
Max Clocks
Graphics : 1531 MHz
SM : 1531 MHz
Memory : 3008 MHz
Video : 1544 MHz
Max Customer Boost Clocks
Graphics : 1531 MHz
Clock Policy
Auto Boost : N/A
Auto Boost Default : N/A
Voltage
Graphics : N/A
Processes : None
可以看到这块GPU卡是支持 非 Sigle Root I/O Virtualization(SR-IOV) 模式的 Host VGPU
进一步查询
vgpu
:
nvidia-smi vgpu -q
查询vGPUnvidia-smi vgpu -q
输出显示只激活了一个vGPU:
nvidia-smi vgpu -q
查询vGPU显示只有一个vGPUGPU 00000000:82:00.0
Active vGPUs : 0
解决: 采用 vgpu_unlock
果然, Nvidia Tesla P10 GPU运算卡 是一块被NVIDIA关闭vGPU功能的计算卡,类似消费级GPU,需要采用 vgpu_unlock 来解锁 Nvidia Tesla P10 GPU运算卡 vGPU能力。在完成了 vgpu_unlock 之后,再次检查就可以看到 nvidia-vgpud
服务正常运行:
vgpu_unlock
之后 nvidia-vgpud.service
能够正常运行显示状态○ nvidia-vgpud.service - NVIDIA vGPU Daemon
Loaded: loaded (/lib/systemd/system/nvidia-vgpud.service; enabled; vendor preset: enabled)
Active: inactive (dead) since Sat 2023-06-10 00:14:29 CST; 1s ago
Process: 3815 ExecStart=/opt/vgpu_unlock/vgpu_unlock /usr/bin/nvidia-vgpud (code=exited, status=0/SUCCESS)
Process: 3855 ExecStopPost=/bin/rm -rf /var/run/nvidia-vgpud (code=exited, status=0/SUCCESS)
Main PID: 3819 (code=exited, status=0/SUCCESS)
CPU: 449ms
Jun 10 00:14:29 zcloud.staging.huatai.me nvidia-vgpud[3839]: BAR1 Length: 0x4000
Jun 10 00:14:29 zcloud.staging.huatai.me nvidia-vgpud[3839]: Frame Rate Limiter enabled: 0x1
Jun 10 00:14:29 zcloud.staging.huatai.me nvidia-vgpud[3839]: Number of Displays: 1
Jun 10 00:14:29 zcloud.staging.huatai.me nvidia-vgpud[3839]: Max pixels: 8847360
Jun 10 00:14:29 zcloud.staging.huatai.me nvidia-vgpud[3839]: Display: width 4096, height 2160
Jun 10 00:14:29 zcloud.staging.huatai.me nvidia-vgpud[3839]: License: NVIDIA-vComputeServer,9.0;Quadro-Virtual-DWS,5.0
Jun 10 00:14:29 zcloud.staging.huatai.me nvidia-vgpud[3839]: PID file unlocked.
Jun 10 00:14:29 zcloud.staging.huatai.me nvidia-vgpud[3839]: PID file closed.
Jun 10 00:14:29 zcloud.staging.huatai.me nvidia-vgpud[3839]: Shutdown (3839)
Jun 10 00:14:29 zcloud.staging.huatai.me systemd[1]: nvidia-vgpud.service: Deactivated successfully.
继续: 为KVM Hypervisor创建NVIDIA vGPU设备
手工创建 mdev
设备
在进入物理GPU对应的 mdev_supported_types
目录之后:
mdev_supported_types
目录# virsh nodedev-dumpxml pci_0000_82_00_0 | egrep 'domain|bus|slot|function' 输出获得:
# <address domain='0x0000' bus='0x82' slot='0x00' function='0x0'/>
domain=0000
bus=82
slot=00
function=0
cd /sys/class/mdev_bus/${domain}\:${bus}\:${slot}.${function}/mdev_supported_types/
检查该目录下内容可以看到类似如下设备入口:
nvidia-156 nvidia-241 nvidia-284 nvidia-286 nvidia-46 nvidia-48 nvidia-50 nvidia-52 nvidia-54 nvidia-56 nvidia-58 nvidia-60 nvidia-62
nvidia-215 nvidia-283 nvidia-285 nvidia-287 nvidia-47 nvidia-49 nvidia-51 nvidia-53 nvidia-55 nvidia-57 nvidia-59 nvidia-61
那么我们该使用哪个设备呢?
使用
mdevctl
命令可以扫描输出这些设备对应的 NVIDIA Virtual GPU (vGPU) 组合设备规格:
mdevctl types
命令扫描 mdev_supported_types
目录获得 NVIDIA Virtual GPU (vGPU) 设备配置mdevctl types
可以看到不同规格vGPU命名以及对应配置:
mdevctl types
命令扫描 mdev_supported_types
目录获得 NVIDIA Virtual GPU (vGPU) 设备配置0000:82:00.0
nvidia-156
Available instances: 12
Device API: vfio-pci
Name: GRID P40-2B
Description: num_heads=4, frl_config=45, framebuffer=2048M, max_resolution=5120x2880, max_instance=12
nvidia-215
Available instances: 12
Device API: vfio-pci
Name: GRID P40-2B4
Description: num_heads=4, frl_config=45, framebuffer=2048M, max_resolution=5120x2880, max_instance=12
nvidia-241
Available instances: 24
Device API: vfio-pci
Name: GRID P40-1B4
Description: num_heads=4, frl_config=45, framebuffer=1024M, max_resolution=5120x2880, max_instance=24
nvidia-283
Available instances: 6
Device API: vfio-pci
Name: GRID P40-4C
Description: num_heads=1, frl_config=60, framebuffer=4096M, max_resolution=4096x2160, max_instance=6
nvidia-284
Available instances: 4
Device API: vfio-pci
Name: GRID P40-6C
Description: num_heads=1, frl_config=60, framebuffer=6144M, max_resolution=4096x2160, max_instance=4
nvidia-285
Available instances: 3
Device API: vfio-pci
Name: GRID P40-8C
Description: num_heads=1, frl_config=60, framebuffer=8192M, max_resolution=4096x2160, max_instance=3
nvidia-286
Available instances: 2
Device API: vfio-pci
Name: GRID P40-12C
Description: num_heads=1, frl_config=60, framebuffer=12288M, max_resolution=4096x2160, max_instance=2
nvidia-287
Available instances: 1
Device API: vfio-pci
Name: GRID P40-24C
Description: num_heads=1, frl_config=60, framebuffer=24576M, max_resolution=4096x2160, max_instance=1
nvidia-46
Available instances: 24
Device API: vfio-pci
Name: GRID P40-1Q
Description: num_heads=4, frl_config=60, framebuffer=1024M, max_resolution=5120x2880, max_instance=24
nvidia-47
Available instances: 12
Device API: vfio-pci
Name: GRID P40-2Q
Description: num_heads=4, frl_config=60, framebuffer=2048M, max_resolution=7680x4320, max_instance=12
nvidia-48
Available instances: 8
Device API: vfio-pci
Name: GRID P40-3Q
Description: num_heads=4, frl_config=60, framebuffer=3072M, max_resolution=7680x4320, max_instance=8
nvidia-49
Available instances: 6
Device API: vfio-pci
Name: GRID P40-4Q
Description: num_heads=4, frl_config=60, framebuffer=4096M, max_resolution=7680x4320, max_instance=6
nvidia-50
Available instances: 4
Device API: vfio-pci
Name: GRID P40-6Q
Description: num_heads=4, frl_config=60, framebuffer=6144M, max_resolution=7680x4320, max_instance=4
nvidia-51
Available instances: 3
Device API: vfio-pci
Name: GRID P40-8Q
Description: num_heads=4, frl_config=60, framebuffer=8192M, max_resolution=7680x4320, max_instance=3
nvidia-52
Available instances: 2
Device API: vfio-pci
Name: GRID P40-12Q
Description: num_heads=4, frl_config=60, framebuffer=12288M, max_resolution=7680x4320, max_instance=2
nvidia-53
Available instances: 1
Device API: vfio-pci
Name: GRID P40-24Q
Description: num_heads=4, frl_config=60, framebuffer=24576M, max_resolution=7680x4320, max_instance=1
nvidia-54
Available instances: 24
Device API: vfio-pci
Name: GRID P40-1A
Description: num_heads=1, frl_config=60, framebuffer=1024M, max_resolution=1280x1024, max_instance=24
nvidia-55
Available instances: 12
Device API: vfio-pci
Name: GRID P40-2A
Description: num_heads=1, frl_config=60, framebuffer=2048M, max_resolution=1280x1024, max_instance=12
nvidia-56
Available instances: 8
Device API: vfio-pci
Name: GRID P40-3A
Description: num_heads=1, frl_config=60, framebuffer=3072M, max_resolution=1280x1024, max_instance=8
nvidia-57
Available instances: 6
Device API: vfio-pci
Name: GRID P40-4A
Description: num_heads=1, frl_config=60, framebuffer=4096M, max_resolution=1280x1024, max_instance=6
nvidia-58
Available instances: 4
Device API: vfio-pci
Name: GRID P40-6A
Description: num_heads=1, frl_config=60, framebuffer=6144M, max_resolution=1280x1024, max_instance=4
nvidia-59
Available instances: 3
Device API: vfio-pci
Name: GRID P40-8A
Description: num_heads=1, frl_config=60, framebuffer=8192M, max_resolution=1280x1024, max_instance=3
nvidia-60
Available instances: 2
Device API: vfio-pci
Name: GRID P40-12A
Description: num_heads=1, frl_config=60, framebuffer=12288M, max_resolution=1280x1024, max_instance=2
nvidia-61
Available instances: 1
Device API: vfio-pci
Name: GRID P40-24A
Description: num_heads=1, frl_config=60, framebuffer=24576M, max_resolution=1280x1024, max_instance=1
nvidia-62
Available instances: 24
Device API: vfio-pci
Name: GRID P40-1B
Description: num_heads=4, frl_config=45, framebuffer=1024M, max_resolution=5120x2880, max_instance=24
NVIDIA Virtual GPU (vGPU) 设备类型(
mdev_supported_types
每种规格末尾有一个 A/B/C/Q 标识类型 )
类型 |
建议用途 |
---|---|
A |
虚拟应用程序(Virtual Applications / vApps) |
B |
虚拟桌面(Virtual Desktops / vPC) |
C |
AI/机器学习/训练( vCS or vWS ) |
Q |
虚拟工作站 (Virtual Workstations / vWS) |
我有两种规划:
2块12GB规格做 Machine Learning =>
P40-12C
4块6GB规格测试 Windoes 10 下玩 微软飞行模拟 / Linux桌面 下玩blender =>
P40-6Q
6GB 显存规格 的
P40-6Q
:
这里我为 微软飞行模拟 准备 6GB 显存规格 的 P40-6Q
对应查询:
mdev_supported_types
目录下查找实例名# 这里 P40-6Q 是 mdevctl types 输出中找到6GB显存规格的GRID名字
# Name: GRID P40-6Q
domain=0000
bus=82
slot=00
function=0
cd /sys/class/mdev_bus/${domain}\:${bus}\:${slot}.${function}/mdev_supported_types/
grep -l P40-6Q nvidia-*/name
输出显示:
mdev_supported_types
目录下查找实例名nvidia-50/name
检查这个vGPU命名可以对应几个实例:
cat nvidia-50/available_instances
输出结果是:
4
备注
这里 available_instances
会随着vGPU分配而递减。例如对于 Nvidia Tesla P10 GPU运算卡 可以分配4个6G规格,每创建一个 P40-6Q
的 mdev
设备,这个 available_instances
就会减1,直到减为0。
创建vGPU设备的方法是向该规格目录下
create
文件输入一个随机uuid:
UUID=`uuidgen`
echo "$UUID" > nvidia-284/create
此时检查 mdev
设备
# ls -lh /sys/bus/mdev/devices/
lrwxrwxrwx 1 root root 0 Jun 10 14:33 e991023e-0f0e-484a-8763-df6b6874b82e -> ../../../devices/pci0000:80/0000:80:02.0/0000:82:00.0/e991023e-0f0e-484a-8763-df6b6874b82e
可以看到 /sys/bus/mdev/devices/
目录下增加了新的虚拟vGPU设备软连接
再重复3此,一共创建4个vGPU实例
检查vGPU(mdevctl)实例:
mdev
mdevctl list
输出可以卡到已经有4个vGPU:
mdev
e991023e-0f0e-484a-8763-df6b6874b82e 0000:82:00.0 nvidia-284
23501256-ff15-439a-98b1-e4f6d01e459f 0000:82:00.0 nvidia-284
58fe7cf4-e9de-41f4-ae4b-c424a2a81193 0000:82:00.0 nvidia-284
e19fa267-ff3a-4ce8-bcf6-6ae402871085 0000:82:00.0 nvidia-284
使用 mdevctl
管理 mdev
(创建和销毁)
上述手工创建 mdev
设备方法需要在 /sys
文件系统中访问文件方式创建和检查,比较繁琐。 mdevctl
工具则提供了完整增的创建、检查、删除 vGPU设备的维护方法。这里完整重现一遍上述操作,不过采用 mdevctl
会方便很多
首先,依然是使用
mdevctl types
检查系统GPU提供的所有支持类型,方便挑选合适的 profile 类型(这里不再重复): 通过mdev
设备列表,我们可以选择需要的profile,例如我选择P40-12C
和P40-6Q
分别对应nvidia-286
和nvidia-50
前面我按照官方文档,通过
virsh nodedev-dumpxml
来获得GPU设备的 完整的GPU配置(domain, bus, slot 以及 function) 。其实有一个更为简单的方法,nvidia-smi
实际上可以直接获得这个信息,只不过需要稍微转换一下: 显示输出信息中有一个Bus-Id
内容是00000000:82:00.0
,实际上只要忽略开头的4个0就能获得我们实际想要的0000:82:00.0
生成4个随机uuid:
uuid -n 4
这里会输出4个随机的UUID:
334852fe-079b-11ee-9fc7-77463608f467
3348556a-079b-11ee-9fc8-7fb0c612aedd
334855e2-079b-11ee-9fc9-83e0dccb6713
33485650-079b-11ee-9fca-8f6415d2734c
将用于 mdev
设备标识
创建vGPUprofile,采用
mdevctl start
命令:mdevctl start -u 334852fe-079b-11ee-9fc7-77463608f467 -p 0000:82:00.0 -t nvidia-50 mdevctl start -u 3348556a-079b-11ee-9fc8-7fb0c612aedd -p 0000:82:00.0 -t nvidia-50 mdevctl start -u 334855e2-079b-11ee-9fc9-83e0dccb6713 -p 0000:82:00.0 -t nvidia-50 mdevctl start -u 33485650-079b-11ee-9fca-8f6415d2734c -p 0000:82:00.0 -t nvidia-50
此时执行
mdevctl list
可以看到4个vGPU设备如下:334855e2-079b-11ee-9fc9-83e0dccb6713 0000:82:00.0 nvidia-50 33485650-079b-11ee-9fca-8f6415d2734c 0000:82:00.0 nvidia-50 334852fe-079b-11ee-9fc7-77463608f467 0000:82:00.0 nvidia-50 3348556a-079b-11ee-9fc8-7fb0c612aedd 0000:82:00.0 nvidia-50
如果要将profile持久化,只需要使用
mdevctl define -a -u UUID
就可以,类似:mdevctl define -a -u 334855e2-079b-11ee-9fc9-83e0dccb6713 mdevctl define -a -u 33485650-079b-11ee-9fca-8f6415d2734c mdevctl define -a -u 334852fe-079b-11ee-9fc7-77463608f467 mdevctl define -a -u 3348556a-079b-11ee-9fc8-7fb0c612aedd
OK,就这么简单
要删除vGPU设备也很简单,使用
mdevctl stop -u UUID
就可以,例如:mdevctl stop -u 334855e2-079b-11ee-9fc9-83e0dccb6713
添加vGPU设备到虚拟机(失败)
备注
这段我参考SUSE文档,但是启动虚拟机失败,所以改为参考 NVIDIA 官方手册 NVIDIA Docs Hub > NVIDIA AI Enterprise > Red Hat Enterprise Linux with KVM Deployment Guide > Setting Up NVIDIA vGPU Devices ,见下一段
获取GPU设备完整的virsh配置(上文已经执行过):
virsh nodedev-dumpxml
获得完整的GPU配置(domain, bus, slot 以及 function)virsh nodedev-dumpxml pci_0000_82_00_0 | egrep 'domain|bus|slot|function'
已经获得过:
virsh nodedev-dumpxml
获得完整的GPU配置(domain, bus, slot 以及 function) <domain>0</domain>
<bus>130</bus>
<slot>0</slot>
<function>0</function>
<address domain='0x0000' bus='0x82' slot='0x00' function='0x0'/>
所以我们现在组件的4个vGPU设备的配置如下:
<hostdev mode='subsystem' type='mdev' managed='no' model='vfio-pci' display='off'>
<source>
<address uuid='334855e2-079b-11ee-9fc9-83e0dccb6713'/>
</source>
<address type='pci' domain='0x0000' bus='0x82' slot='0x00' function='0x0'/>
</hostdev>
<hostdev mode='subsystem' type='mdev' managed='no' model='vfio-pci' display='off'>
<source>
<address uuid='33485650-079b-11ee-9fca-8f6415d2734c'/>
</source>
<address type='pci' domain='0x0000' bus='0x82' slot='0x00' function='0x1'/>
</hostdev>
<hostdev mode='subsystem' type='mdev' managed='no' model='vfio-pci' display='off'>
<source>
<address uuid='334852fe-079b-11ee-9fc7-77463608f467'/>
</source>
<address type='pci' domain='0x0000' bus='0x82' slot='0x00' function='0x2'/>
</hostdev>
<hostdev mode='subsystem' type='mdev' managed='no' model='vfio-pci' display='off'>
<source>
<address uuid='3348556a-079b-11ee-9fc8-7fb0c612aedd'/>
</source>
<address type='pci' domain='0x0000' bus='0x82' slot='0x00' function='0x3'/>
</hostdev>
备注
使用 Q
系列(虚拟工作站),则配置 display='on'
,如果是 C
系列(机器学习),则配置 display='off'
我这里遇到过2个报错:
error: XML error: Attempted double use of PCI Address 0000:82:00.0
: 原因是我将所有的GPU设备pci信息都写成了:<address type='pci' domain='0x0000' bus='0x82' slot='0x00' function='0x0'/>
经过尝试,每个设备的 function=
设置为不同值
error: unsupported configuration: graphics device is needed for attribute value 'display=on' in <hostdev>
: 我在配置Q系列时设置为'display=on'
但是有这个报错,暂时改成'display=off'
上述2个错误解决后,我启动 y-k8s-n-1
虚拟机(已添加上述4个vGPU),结果启动报错:
error: Failed to start domain 'y-k8s-n-1'
error: internal error: qemu unexpectedly closed the monitor: 2023-06-10T15:22:45.243840Z qemu-system-x86_64: -device vfio-pci,id=hostdev0,sysfsdev=/sys/bus/mdev/devices/334855e2-079b-11ee-9fc9-83e0dccb6713,display=off,bus=pci.130,multifunction=on,addr=0x0: warning: vfio 334855e2-079b-11ee-9fc9-83e0dccb6713: Could not enable error recovery for the device
2023-06-10T15:22:45.272339Z qemu-system-x86_64: -device vfio-pci,id=hostdev1,sysfsdev=/sys/bus/mdev/devices/33485650-079b-11ee-9fca-8f6415d2734c,display=off,bus=pci.130,addr=0x0.0x1: vfio 33485650-079b-11ee-9fca-8f6415d2734c: error getting device from group 126: Input/output error
Verify all devices in group 126 are bound to vfio-<bus> or pci-stub and not already in use
添加vGPU设备到虚拟机(未完全成功)
警告
我遇到一个问题尚未解决,将1个GPU划分成4个vGPU, mdevctl
启动设备显示正常,但是尝试将多个vGPU添加到同一个虚拟机时,添加不报错,但是启动虚拟机报错
然而,在一个虚拟机中只添加一个vGPU则能正常工作
备注
参考 NVIDIA 官方手册 NVIDIA Docs Hub > NVIDIA AI Enterprise > Red Hat Enterprise Linux with KVM Deployment Guide > Setting Up NVIDIA vGPU Devices
使用
virsh nodedev-dumpxml
输出完整的mdev
设备的详细信息(也就是mdevctl list
输出信息的翻版xml):
virsh nodedev-dumpxml
获取 pci
设备的mdev xml配置virsh nodedev-dumpxml pci_0000_82_00_0
这里会完整输出(前文采用了过滤):
virsh nodedev-dumpxml
获取 pci
设备的mdev xml配置<device>
<name>pci_0000_82_00_0</name>
<path>/sys/devices/pci0000:80/0000:80:02.0/0000:82:00.0</path>
<parent>pci_0000_80_02_0</parent>
<driver>
<name>nvidia</name>
</driver>
<capability type='pci'>
<class>0x030200</class>
<domain>0</domain>
<bus>130</bus>
<slot>0</slot>
<function>0</function>
<product id='0x1b39'>GP102GL [Tesla P10]</product>
<vendor id='0x10de'>NVIDIA Corporation</vendor>
<capability type='mdev_types'>
<type id='nvidia-241'>
<name>GRID P40-1B4</name>
<deviceAPI>vfio-pci</deviceAPI>
<availableInstances>0</availableInstances>
</type>
<type id='nvidia-58'>
<name>GRID P40-6A</name>
<deviceAPI>vfio-pci</deviceAPI>
<availableInstances>0</availableInstances>
</type>
<type id='nvidia-48'>
<name>GRID P40-3Q</name>
<deviceAPI>vfio-pci</deviceAPI>
<availableInstances>0</availableInstances>
</type>
<type id='nvidia-286'>
<name>GRID P40-12C</name>
<deviceAPI>vfio-pci</deviceAPI>
<availableInstances>0</availableInstances>
</type>
<type id='nvidia-56'>
<name>GRID P40-3A</name>
<deviceAPI>vfio-pci</deviceAPI>
<availableInstances>0</availableInstances>
</type>
<type id='nvidia-46'>
<name>GRID P40-1Q</name>
<deviceAPI>vfio-pci</deviceAPI>
<availableInstances>0</availableInstances>
</type>
<type id='nvidia-284'>
<name>GRID P40-6C</name>
<deviceAPI>vfio-pci</deviceAPI>
<availableInstances>0</availableInstances>
</type>
<type id='nvidia-54'>
<name>GRID P40-1A</name>
<deviceAPI>vfio-pci</deviceAPI>
<availableInstances>0</availableInstances>
</type>
<type id='nvidia-62'>
<name>GRID P40-1B</name>
<deviceAPI>vfio-pci</deviceAPI>
<availableInstances>0</availableInstances>
</type>
<type id='nvidia-52'>
<name>GRID P40-12Q</name>
<deviceAPI>vfio-pci</deviceAPI>
<availableInstances>0</availableInstances>
</type>
<type id='nvidia-60'>
<name>GRID P40-12A</name>
<deviceAPI>vfio-pci</deviceAPI>
<availableInstances>0</availableInstances>
</type>
<type id='nvidia-50'>
<name>GRID P40-6Q</name>
<deviceAPI>vfio-pci</deviceAPI>
<availableInstances>0</availableInstances>
</type>
<type id='nvidia-156'>
<name>GRID P40-2B</name>
<deviceAPI>vfio-pci</deviceAPI>
<availableInstances>0</availableInstances>
</type>
<type id='nvidia-59'>
<name>GRID P40-8A</name>
<deviceAPI>vfio-pci</deviceAPI>
<availableInstances>0</availableInstances>
</type>
<type id='nvidia-49'>
<name>GRID P40-4Q</name>
<deviceAPI>vfio-pci</deviceAPI>
<availableInstances>0</availableInstances>
</type>
<type id='nvidia-287'>
<name>GRID P40-24C</name>
<deviceAPI>vfio-pci</deviceAPI>
<availableInstances>0</availableInstances>
</type>
<type id='nvidia-57'>
<name>GRID P40-4A</name>
<deviceAPI>vfio-pci</deviceAPI>
<availableInstances>0</availableInstances>
</type>
<type id='nvidia-47'>
<name>GRID P40-2Q</name>
<deviceAPI>vfio-pci</deviceAPI>
<availableInstances>0</availableInstances>
</type>
<type id='nvidia-285'>
<name>GRID P40-8C</name>
<deviceAPI>vfio-pci</deviceAPI>
<availableInstances>0</availableInstances>
</type>
<type id='nvidia-55'>
<name>GRID P40-2A</name>
<deviceAPI>vfio-pci</deviceAPI>
<availableInstances>0</availableInstances>
</type>
<type id='nvidia-283'>
<name>GRID P40-4C</name>
<deviceAPI>vfio-pci</deviceAPI>
<availableInstances>0</availableInstances>
</type>
<type id='nvidia-53'>
<name>GRID P40-24Q</name>
<deviceAPI>vfio-pci</deviceAPI>
<availableInstances>0</availableInstances>
</type>
<type id='nvidia-215'>
<name>GRID P40-2B4</name>
<deviceAPI>vfio-pci</deviceAPI>
<availableInstances>0</availableInstances>
</type>
<type id='nvidia-61'>
<name>GRID P40-24A</name>
<deviceAPI>vfio-pci</deviceAPI>
<availableInstances>0</availableInstances>
</type>
<type id='nvidia-51'>
<name>GRID P40-8Q</name>
<deviceAPI>vfio-pci</deviceAPI>
<availableInstances>0</availableInstances>
</type>
</capability>
<iommuGroup number='80'>
<address domain='0x0000' bus='0x82' slot='0x00' function='0x0'/>
</iommuGroup>
<numa node='1'/>
<pci-express>
<link validity='cap' port='0' speed='8' width='16'/>
<link validity='sta' speed='2.5' width='16'/>
</pci-express>
</capability>
</device>
这里 <iommuGroup>
标识一组设备,通过 IOMMU 功能和PCI总线拓扑,这些设备和其他设备隔离,并且本地Host主机驱动不得使用这些设备(解绑),这样才能分配给Guest虚拟机。
按照
mdevctl list
输出信息:
mdev
e991023e-0f0e-484a-8763-df6b6874b82e 0000:82:00.0 nvidia-284
23501256-ff15-439a-98b1-e4f6d01e459f 0000:82:00.0 nvidia-284
58fe7cf4-e9de-41f4-ae4b-c424a2a81193 0000:82:00.0 nvidia-284
e19fa267-ff3a-4ce8-bcf6-6ae402871085 0000:82:00.0 nvidia-284
配置如下 vgpu_1.yaml
到 vgpu_4.yaml
分别代表4个vGPU:
<device>
<parent>pci_0000_82_00_0</parent>
<capability type="mdev">
<type id="nvidia-50"/>
<uuid>334855e2-079b-11ee-9fc9-83e0dccb6713</uuid>
</capability>
</device>
<device>
<parent>pci_0000_82_00_0</parent>
<capability type="mdev">
<type id="nvidia-50"/>
<uuid>33485650-079b-11ee-9fca-8f6415d2734c</uuid>
</capability>
</device>
<device>
<parent>pci_0000_82_00_0</parent>
<capability type="mdev">
<type id="nvidia-50"/>
<uuid>334852fe-079b-11ee-9fc7-77463608f467</uuid>
</capability>
</device>
<device>
<parent>pci_0000_82_00_0</parent>
<capability type="mdev">
<type id="nvidia-50"/>
<uuid>3348556a-079b-11ee-9fc8-7fb0c612aedd</uuid>
</capability>
</device>
定义第一个vGPU设备:
virsh nodedev-define vgpu_1.yaml
输出提示信息:
Node device 'mdev_334855e2_079b_11ee_9fc9_83e0dccb6713_0000_82_00_0' defined from 'vgpu_1.yaml'
然后继续定义第2, 3, 4的vGPU:
virsh nodedev-define vgpu_2.yaml
virsh nodedev-define vgpu_3.yaml
virsh nodedev-define vgpu_4.yaml
检查已经激活的 mediated devices:
--inactive
)virsh nodedev-list --cap mdev
设置 vGPU 设备自动启动:
virsh nodedev-autostart mdev_334852fe_079b_11ee_9fc7_77463608f467_0000_82_00_0
virsh nodedev-autostart mdev_3348556a_079b_11ee_9fc8_7fb0c612aedd_0000_82_00_0
virsh nodedev-autostart mdev_334855e2_079b_11ee_9fc9_83e0dccb6713_0000_82_00_0
virsh nodedev-autostart mdev_33485650_079b_11ee_9fca_8f6415d2734c_0000_82_00_0
将 vGPU 设备添加到VM:
<hostdev mode='subsystem' type='mdev' managed='no' model='vfio-pci' display='off'>
<source>
<address uuid='334855e2-079b-11ee-9fc9-83e0dccb6713'/>
</source>
</hostdev>
<hostdev mode='subsystem' type='mdev' managed='no' model='vfio-pci' display='off'>
<source>
<address uuid='33485650-079b-11ee-9fca-8f6415d2734c'/>
</source>
</hostdev>
<hostdev mode='subsystem' type='mdev' managed='no' model='vfio-pci' display='off'>
<source>
<address uuid='334852fe-079b-11ee-9fc7-77463608f467'/>
</source>
</hostdev>
<hostdev mode='subsystem' type='mdev' managed='no' model='vfio-pci' display='off'>
<source>
<address uuid='3348556a-079b-11ee-9fc8-7fb0c612aedd'/>
</source>
</hostdev>
注意,这里没有设置PCI设备详细配置
晕倒,报错依旧
error: Failed to start domain 'y-k8s-n-1'
error: internal error: qemu unexpectedly closed the monitor: 2023-06-10T16:13:50.247914Z qemu-system-x86_64: -device vfio-pci,id=hostdev0,sysfsdev=/sys/bus/mdev/devices/334855e2-079b-11ee-9fc9-83e0dccb6713,display=off,bus=pci.7,addr=0x0: warning: vfio 334855e2-079b-11ee-9fc9-83e0dccb6713: Could not enable error recovery for the device
2023-06-10T16:13:50.272484Z qemu-system-x86_64: -device vfio-pci,id=hostdev1,sysfsdev=/sys/bus/mdev/devices/33485650-079b-11ee-9fca-8f6415d2734c,display=off,bus=pci.8,addr=0x0: vfio 33485650-079b-11ee-9fca-8f6415d2734c: error getting device from group 126: Input/output error
Verify all devices in group 126 are bound to vfio-<bus> or pci-stub and not already in use
从 dmesg -T
检查系统日志:
[Sun Jun 11 00:13:50 2023] [nvidia-vgpu-vfio] 334855e2-079b-11ee-9fc9-83e0dccb6713: vGPU migration disabled
[Sun Jun 11 00:13:50 2023] [nvidia-vgpu-vfio] 33485650-079b-11ee-9fca-8f6415d2734c: start failed. status: 0x0
可以看到第二个vgpu启动时已经失败
使用
virsh edit y-k8s-n-1
检查,可以看到原来 Libvirt虚拟机管理器 自动给这些 NVIDIA Virtual GPU (vGPU) 分配了完整的GPU配置(domain, bus, slot 以及 function):
<hostdev mode='subsystem' type='mdev' managed='no' model='vfio-pci' display='off'>
<source>
<address uuid='334855e2-079b-11ee-9fc9-83e0dccb6713'/>
</source>
<address type='pci' domain='0x0000' bus='0x07' slot='0x00' function='0x0'/>
</hostdev>
<hostdev mode='subsystem' type='mdev' managed='no' model='vfio-pci' display='off'>
<source>
<address uuid='33485650-079b-11ee-9fca-8f6415d2734c'/>
</source>
<address type='pci' domain='0x0000' bus='0x08' slot='0x00' function='0x0'/>
</hostdev>
<hostdev mode='subsystem' type='mdev' managed='no' model='vfio-pci' display='off'>
<source>
<address uuid='334852fe-079b-11ee-9fc7-77463608f467'/>
</source>
<address type='pci' domain='0x0000' bus='0x09' slot='0x00' function='0x0'/>
</hostdev>
<hostdev mode='subsystem' type='mdev' managed='no' model='vfio-pci' display='off'>
<source>
<address uuid='3348556a-079b-11ee-9fc8-7fb0c612aedd'/>
</source>
<address type='pci' domain='0x0000' bus='0x0a' slot='0x00' function='0x0'/>
</hostdev>
但是看起来这种方式还是存在pci设备冲突。
备注
参考NVIDIA原文,采用简化的配置 libvirt 也会扩展成上述配置,但是启动时报错没有解决
但是一个虚拟机添加一个vGPU正常
既然 y-k8s-n-1
在添加多个vGPU启动失败(实际是启动第2块vGPU出现 vfio-<bus> 或 pci-stub 已经被使用),那么只在一个虚拟机中插入一个vGPU是否可以呢?
重新修订
y-k8s-n-1
,只添加一段(一块vGPU):
y-k8s-n-1
虚拟机中只添加一块vGPU<hostdev mode='subsystem' type='mdev' managed='no' model='vfio-pci' display='off'>
<source>
<address uuid='334855e2-079b-11ee-9fc9-83e0dccb6713'/>
</source>
</hostdev>
果然,这次 virsh start y-k8s-n-1
启动正常
既然一个虚拟机加一块vGPU工作正常,那么将第二块vGPU添加到另外一个虚拟机中,是否正常呢? 答案是: 也正常
修订
y-k8s-n-2
,添加第二块vGPU:
y-k8s-n-2
虚拟机中添加另一块vGPU(第二块vGPU)<hostdev mode='subsystem' type='mdev' managed='no' model='vfio-pci' display='off'>
<source>
<address uuid='33485650-079b-11ee-9fca-8f6415d2734c'/>
</source>
</hostdev>
验证第二台虚拟机启动也正常
此时验证
nvidia-smi
输出可以看到系统启动了2个vgpu:
nvidia-smi
输出Sun Jun 11 22:41:23 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.85.03 Driver Version: 510.85.03 CUDA Version: N/A |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA Graphics... On | 00000000:82:00.0 Off | 0 |
| N/A 41C P8 18W / 150W | 11474MiB / 23040MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 11656 C+G vgpu 5712MiB |
| 0 N/A N/A 13194 C+G vgpu 5712MiB |
+-----------------------------------------------------------------------------+
这里可以看到物理GPU的 23040MiB
显存已经被使用了 11474MiB
大约12GB,并且有2个GPU进程,名字都是 vgpu
此时验证
nvidia-smi vgpu
显示详细的vgpu信息:
nvidia-smi vgpu
输出Sun Jun 11 22:41:38 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.85.03 Driver Version: 510.85.03 |
|---------------------------------+------------------------------+------------+
| GPU Name | Bus-Id | GPU-Util |
| vGPU ID Name | VM ID VM Name | vGPU-Util |
|=================================+==============================+============|
| 0 NVIDIA Graphics Device | 00000000:82:00.0 | 0% |
| 3251634243 GRID P40-6Q | 10a1... y-k8s-n-1 | 0% |
| 3251634251 GRID P40-6Q | b102... y-k8s-n-2 | 0% |
+---------------------------------+------------------------------+------------+
这里可以看到有2个虚拟机 y-k8s-n-1
和 y-k8s-n-2
分别占用了一个 GRID P40-6Q
的NVIDIA显示设备,也就是2个vGPU
备注
也就是说,截止目前,vGPU的创建和简单分配是成功的,而且能够添加到VM中,只是尚未解决如何在一个VM中使用多个vGPU。
再次尝试在一个VM中添加多个vGPU(成功又遗憾)
在 Please ensure all devices within the iommu_group are bound to their vfio bus driver Error 提到了一个细节,触发我想起了很久以前实践 采用OVMF实现passthrough GPU和NVMe存储 曾经在配置 PCIe Pass Through 中,有一段技术要求提到:
IOMMU Group
是直通给虚拟机的最小物理设备集合必须将一个
IOMMU Group
完整输出给一个VM
找到一个和我的情况完全相同的 Error when allocating multiple vGPUs in a single VM with Ubuntu KVM hypervisor 但是原帖没有解决这个问题
前面我通过 mdevctl
创建了4个vGPU,在系统日志中可以看到:
dmesg
中有 IOMMU
记录显示添加了 vGPU
设备(mdev)dmesg | grep -i -e DMAR -e IOMMU
输出显示的最后添加 Adding to iommu group 123
以及删除 Removing from iommu group 123
以及又添加 Adding to iommu group 123
则是我之前操作创建 mdev
设备以及删除再创建的记录
dmesg
中有 IOMMU
记录显示添加了 vGPU
设备(mdev)[Fri Jun 9 23:43:37 2023] Command line: BOOT_IMAGE=/boot/vmlinuz-5.15.0-73-generic root=UUID=caa4193b-9222-49fe-a4b3-89f1cb417e6a ro intel_iommu=on iommu=pt vfio-pci.ids=144d:a80a intel_pstate=enable processor.max_cstate=1 intel_idle.max_cstate=1 rd.driver.blacklist=nouveau,rivafb,nvidiafb,rivatv
[Fri Jun 9 23:43:37 2023] ACPI: DMAR 0x000000007B7E7000 000294 (v01 HP ProLiant 00000001 HP 00000001)
[Fri Jun 9 23:43:37 2023] ACPI: Reserving DMAR table memory at [mem 0x7b7e7000-0x7b7e7293]
[Fri Jun 9 23:43:38 2023] Kernel command line: BOOT_IMAGE=/boot/vmlinuz-5.15.0-73-generic root=UUID=caa4193b-9222-49fe-a4b3-89f1cb417e6a ro intel_iommu=on iommu=pt vfio-pci.ids=144d:a80a intel_pstate=enable processor.max_cstate=1 intel_idle.max_cstate=1 rd.driver.blacklist=nouveau,rivafb,nvidiafb,rivatv
[Fri Jun 9 23:43:38 2023] DMAR: IOMMU enabled
[Fri Jun 9 23:43:39 2023] DMAR: Host address width 46
[Fri Jun 9 23:43:39 2023] DMAR: DRHD base: 0x000000fbffc000 flags: 0x0
[Fri Jun 9 23:43:39 2023] DMAR: dmar0: reg_base_addr fbffc000 ver 1:0 cap d2078c106f0466 ecap f020df
[Fri Jun 9 23:43:39 2023] DMAR: DRHD base: 0x000000c7ffc000 flags: 0x1
[Fri Jun 9 23:43:39 2023] DMAR: dmar1: reg_base_addr c7ffc000 ver 1:0 cap d2078c106f0466 ecap f020df
[Fri Jun 9 23:43:39 2023] DMAR: RMRR base: 0x00000079174000 end: 0x00000079176fff
[Fri Jun 9 23:43:39 2023] DMAR: RMRR base: 0x000000791f4000 end: 0x000000791f7fff
[Fri Jun 9 23:43:39 2023] DMAR: RMRR base: 0x000000791de000 end: 0x000000791f3fff
[Fri Jun 9 23:43:39 2023] DMAR: RMRR base: 0x000000791cb000 end: 0x000000791dbfff
[Fri Jun 9 23:43:39 2023] DMAR: RMRR base: 0x000000791dc000 end: 0x000000791ddfff
[Fri Jun 9 23:43:39 2023] DMAR-IR: IOAPIC id 10 under DRHD base 0xfbffc000 IOMMU 0
[Fri Jun 9 23:43:39 2023] DMAR-IR: IOAPIC id 8 under DRHD base 0xc7ffc000 IOMMU 1
[Fri Jun 9 23:43:39 2023] DMAR-IR: IOAPIC id 9 under DRHD base 0xc7ffc000 IOMMU 1
[Fri Jun 9 23:43:39 2023] DMAR-IR: HPET id 0 under DRHD base 0xc7ffc000
[Fri Jun 9 23:43:39 2023] DMAR-IR: Queued invalidation will be enabled to support x2apic and Intr-remapping.
[Fri Jun 9 23:43:39 2023] DMAR-IR: Enabled IRQ remapping in x2apic mode
[Fri Jun 9 23:43:40 2023] iommu: Default domain type: Passthrough (set via kernel command line)
[Fri Jun 9 23:43:40 2023] DMAR: No ATSR found
[Fri Jun 9 23:43:40 2023] DMAR: No SATC found
[Fri Jun 9 23:43:40 2023] DMAR: dmar0: Using Queued invalidation
[Fri Jun 9 23:43:40 2023] DMAR: dmar1: Using Queued invalidation
[Fri Jun 9 23:43:40 2023] pci 0000:00:00.0: Adding to iommu group 0
[Fri Jun 9 23:43:40 2023] pci 0000:00:01.0: Adding to iommu group 1
[Fri Jun 9 23:43:40 2023] pci 0000:00:01.1: Adding to iommu group 2
[Fri Jun 9 23:43:40 2023] pci 0000:00:02.0: Adding to iommu group 3
...
[Fri Jun 9 23:43:40 2023] pci 0000:ff:1f.2: Adding to iommu group 94
[Fri Jun 9 23:43:40 2023] DMAR: Intel(R) Virtualization Technology for Directed I/O
[Fri Jun 9 23:43:44 2023] pci 0000:04:10.0: Adding to iommu group 95
[Fri Jun 9 23:43:44 2023] pci 0000:04:10.4: Adding to iommu group 96
...
[Fri Jun 9 23:43:44 2023] pci 0000:04:13.2: Adding to iommu group 120
[Fri Jun 9 23:43:44 2023] pci 0000:04:13.1: Adding to iommu group 121
[Fri Jun 9 23:43:44 2023] pci 0000:04:13.3: Adding to iommu group 122
[Sat Jun 10 14:33:53 2023] vfio_mdev e991023e-0f0e-484a-8763-df6b6874b82e: Adding to iommu group 123
[Sat Jun 10 14:39:16 2023] vfio_mdev 58fe7cf4-e9de-41f4-ae4b-c424a2a81193: Adding to iommu group 124
[Sat Jun 10 14:39:25 2023] vfio_mdev e19fa267-ff3a-4ce8-bcf6-6ae402871085: Adding to iommu group 125
[Sat Jun 10 14:39:27 2023] vfio_mdev 23501256-ff15-439a-98b1-e4f6d01e459f: Adding to iommu group 126
[Sat Jun 10 20:47:23 2023] vfio_mdev e19fa267-ff3a-4ce8-bcf6-6ae402871085: Removing from iommu group 125
[Sat Jun 10 20:47:23 2023] vfio_mdev e19fa267-ff3a-4ce8-bcf6-6ae402871085: MDEV: detaching iommu
[Sat Jun 10 22:28:09 2023] vfio_mdev 58fe7cf4-e9de-41f4-ae4b-c424a2a81193: Removing from iommu group 124
[Sat Jun 10 22:28:09 2023] vfio_mdev 58fe7cf4-e9de-41f4-ae4b-c424a2a81193: MDEV: detaching iommu
[Sat Jun 10 22:28:17 2023] vfio_mdev 23501256-ff15-439a-98b1-e4f6d01e459f: Removing from iommu group 126
[Sat Jun 10 22:28:17 2023] vfio_mdev 23501256-ff15-439a-98b1-e4f6d01e459f: MDEV: detaching iommu
[Sat Jun 10 22:28:23 2023] vfio_mdev e991023e-0f0e-484a-8763-df6b6874b82e: Removing from iommu group 123
[Sat Jun 10 22:28:23 2023] vfio_mdev e991023e-0f0e-484a-8763-df6b6874b82e: MDEV: detaching iommu
[Sat Jun 10 22:33:39 2023] vfio_mdev 334852fe-079b-11ee-9fc7-77463608f467: Adding to iommu group 123
[Sat Jun 10 22:33:46 2023] vfio_mdev 3348556a-079b-11ee-9fc8-7fb0c612aedd: Adding to iommu group 124
[Sat Jun 10 22:33:52 2023] vfio_mdev 334855e2-079b-11ee-9fc9-83e0dccb6713: Adding to iommu group 125
[Sat Jun 10 22:33:59 2023] vfio_mdev 33485650-079b-11ee-9fca-8f6415d2734c: Adding to iommu group 126
这些添加的 group 123
到 group 126
分别是4个 vGPU 设备对应的 iommu group
从内核 sys
文件系统可以找到对应项:
ls
检查 iommu_group
中详细的设备信息for iommu_group in {123..126};do ls -lh /sys/kernel/iommu_groups/${iommu_group}/devices/ | grep -v total;done
可以看到内核中这些 vGPU
设备都位于 /sys/devices/pci0000:80/0000:80:02.0/0000:82:00.0/
目录下:
vGPU
设备详情lrwxrwxrwx 1 root root 0 Jun 15 08:53 334852fe-079b-11ee-9fc7-77463608f467 -> ../../../../devices/pci0000:80/0000:80:02.0/0000:82:00.0/334852fe-079b-11ee-9fc7-77463608f467
lrwxrwxrwx 1 root root 0 Jun 15 08:55 3348556a-079b-11ee-9fc8-7fb0c612aedd -> ../../../../devices/pci0000:80/0000:80:02.0/0000:82:00.0/3348556a-079b-11ee-9fc8-7fb0c612aedd
lrwxrwxrwx 1 root root 0 Jun 15 08:55 334855e2-079b-11ee-9fc9-83e0dccb6713 -> ../../../../devices/pci0000:80/0000:80:02.0/0000:82:00.0/334855e2-079b-11ee-9fc9-83e0dccb6713
lrwxrwxrwx 1 root root 0 Jun 14 02:16 33485650-079b-11ee-9fca-8f6415d2734c -> ../../../../devices/pci0000:80/0000:80:02.0/0000:82:00.0/33485650-079b-11ee-9fca-8f6415d2734c
我发现 Ubuntu官方文档: Virtualisation with QEMU 中检查 systemctl status nvidia-vgpu-mgr
得到的状态信息和我不同。在这个文档中提供了一些Guest获得vGPU passed的信息(表明vGPU工作)案例:
$ systemctl status nvidia-vgpu-mgr
Loaded: loaded (/lib/systemd/system/nvidia-vgpu-mgr.service; enabled; vendor preset: enabled)
Active: active (running) since Tue 2021-09-14 07:30:19 UTC; 3min 58s ago
Process: 1559 ExecStart=/usr/bin/nvidia-vgpu-mgr (code=exited, status=0/SUCCESS)
Main PID: 1564 (nvidia-vgpu-mgr)
Tasks: 1 (limit: 309020)
Memory: 1.1M
CGroup: /system.slice/nvidia-vgpu-mgr.service
└─1564 /usr/bin/nvidia-vgpu-mgr
Sep 14 07:30:19 node-watt systemd[1]: Starting NVIDIA vGPU Manager Daemon...
Sep 14 07:30:19 node-watt systemd[1]: Started NVIDIA vGPU Manager Daemon.
Sep 14 07:30:20 node-watt nvidia-vgpu-mgr[1564]: notice: vmiop_env_log: nvidia-vgpu-mgr daemon started
# Entries when a guest gets a vGPU passed
Sep 14 08:29:50 node-watt nvidia-vgpu-mgr[2866]: notice: vmiop_log: (0x0): gpu-pci-id : 0x4100
Sep 14 08:29:50 node-watt nvidia-vgpu-mgr[2866]: notice: vmiop_log: (0x0): vgpu_type : Quadro
Sep 14 08:29:50 node-watt nvidia-vgpu-mgr[2866]: notice: vmiop_log: (0x0): Framebuffer: 0x1dc000000
Sep 14 08:29:50 node-watt nvidia-vgpu-mgr[2866]: notice: vmiop_log: (0x0): Virtual Device Id: 0x1db4:0x1252
Sep 14 08:29:50 node-watt nvidia-vgpu-mgr[2866]: notice: vmiop_log: (0x0): FRL Value: 60 FPS
Sep 14 08:29:50 node-watt nvidia-vgpu-mgr[2866]: notice: vmiop_log: ######## vGPU Manager Information: ########
Sep 14 08:29:50 node-watt nvidia-vgpu-mgr[2866]: notice: vmiop_log: Driver Version: 470.68
Sep 14 08:29:50 node-watt nvidia-vgpu-mgr[2866]: notice: vmiop_log: (0x0): vGPU supported range: (0x70001, 0xb0001)
Sep 14 08:29:50 node-watt nvidia-vgpu-mgr[2866]: notice: vmiop_log: (0x0): Init frame copy engine: syncing...
Sep 14 08:29:50 node-watt nvidia-vgpu-mgr[2866]: notice: vmiop_log: (0x0): vGPU migration enabled
Sep 14 08:29:50 node-watt nvidia-vgpu-mgr[2866]: notice: vmiop_log: display_init inst: 0 successful
# Entries when a guest grabs a license
Sep 15 06:55:50 node-watt nvidia-vgpu-mgr[4260]: notice: vmiop_log: (0x0): vGPU license state: Unlicensed (Unrestricted)
Sep 15 06:55:52 node-watt nvidia-vgpu-mgr[4260]: notice: vmiop_log: (0x0): vGPU license state: Licensed
# In the guest the card is then fully recognized and enabled
$ nvidia-smi -a | grep -A 2 "Licensed Product"
vGPU Software Licensed Product
Product Name : NVIDIA RTX Virtual Workstation
License Status : Licensed
我检查我的Host主机 nvidia-vgpu-mgr
日志,发现之前启动正常的服务日志,现在显示已经是一些错误信息了:
● nvidia-vgpu-mgr.service - NVIDIA vGPU Manager Daemon
Loaded: loaded (/lib/systemd/system/nvidia-vgpu-mgr.service; enabled; vendor preset: enabled)
Active: active (running) since Sat 2023-06-10 00:13:12 CST; 5 days ago
Process: 3760 ExecStart=/opt/vgpu_unlock/vgpu_unlock /usr/bin/nvidia-vgpu-mgr (code=exited, status=0/SUCCESS)
Main PID: 3764 (vgpu_unlock)
Tasks: 11 (limit: 464054)
Memory: 41.0M
CPU: 5min 18.731s
CGroup: /system.slice/nvidia-vgpu-mgr.service
├─3764 /bin/python3 /opt/vgpu_unlock/vgpu_unlock -f /usr/bin/nvidia-vgpu-mgr
└─3784 /usr/bin/nvidia-vgpu-mgr
Jun 14 11:03:21 zcloud.staging.huatai.me nvidia-vgpu-mgr[26737]: error: vmiop_log: (0x1): Thread for engine 0x0 could not join with error 0x5
Jun 14 11:03:21 zcloud.staging.huatai.me nvidia-vgpu-mgr[26737]: error: vmiop_log: (0x1): Failed to free thread event for engine 0x0. Error: 0x5
Jun 14 11:03:21 zcloud.staging.huatai.me nvidia-vgpu-mgr[26737]: error: vmiop_log: (0x1): Thread for engine 0x4 could not join with error 0x5
Jun 14 11:03:21 zcloud.staging.huatai.me nvidia-vgpu-mgr[26737]: error: vmiop_log: (0x1): Failed to free thread event for engine 0x4. Error: 0x5
Jun 14 11:03:21 zcloud.staging.huatai.me nvidia-vgpu-mgr[26737]: error: vmiop_log: (0x1): Thread for engine 0x5 could not join with error 0x5
Jun 14 11:03:21 zcloud.staging.huatai.me nvidia-vgpu-mgr[26737]: error: vmiop_log: (0x1): Failed to free thread event for engine 0x5. Error: 0x5
Jun 14 11:03:21 zcloud.staging.huatai.me nvidia-vgpu-mgr[26737]: error: vmiop_log: display_init failed for inst: 1
Jun 14 11:03:21 zcloud.staging.huatai.me nvidia-vgpu-mgr[26737]: error: vmiop_env_log: (0x1): vmiope_process_configuration failed with 0x1f
Jun 14 11:03:21 zcloud.staging.huatai.me nvidia-vgpu-mgr[26737]: error: vmiop_env_log: (0x1): plugin_initialize failed with error:0x1f
Jun 14 11:03:25 zcloud.staging.huatai.me nvidia-vgpu-mgr[26737]: notice: vmiop_log: (0x0): Srubbing completed but notification missed
想到我的虚拟机中尚未安装Guest GRID软件包,也没有配置连接Licence Server ( 安装NVIDIA license服务器 ),会不会是这个原因导致无法添加第2块vGPU呢?
再想了一下,不对,出现 vfio
设备添加报错是在VM启动初始化时候,此时GUEST操作系统尚未启动,所以虚拟机内部Guest GRID软件尚未起作用。头疼...
再次将4块 vGPU 添加到到
y-k8s-n-1
虚拟机中,启动依然是报错的,此时,检查journalctl -u nvidia-vgpu-mgr --no-pager
输出信息:
nvidia-vgpu-mgr
报错日志Jun 15 09:03:58 zcloud.staging.huatai.me systemd[1]: Stopping NVIDIA vGPU Manager Daemon...
Jun 15 09:03:58 zcloud.staging.huatai.me systemd[1]: nvidia-vgpu-mgr.service: Deactivated successfully.
Jun 15 09:03:58 zcloud.staging.huatai.me systemd[1]: nvidia-vgpu-mgr.service: Unit process 3784 (nvidia-vgpu-mgr) remains running after unit stopped.
Jun 15 09:03:58 zcloud.staging.huatai.me systemd[1]: Stopped NVIDIA vGPU Manager Daemon.
Jun 15 09:03:58 zcloud.staging.huatai.me systemd[1]: nvidia-vgpu-mgr.service: Consumed 5min 18.755s CPU time.
Jun 15 09:03:58 zcloud.staging.huatai.me systemd[1]: nvidia-vgpu-mgr.service: Found left-over process 3784 (nvidia-vgpu-mgr) in control group while starting unit. Ignoring.
Jun 15 09:03:58 zcloud.staging.huatai.me systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
Jun 15 09:03:58 zcloud.staging.huatai.me systemd[1]: Starting NVIDIA vGPU Manager Daemon...
Jun 15 09:03:58 zcloud.staging.huatai.me systemd[1]: Started NVIDIA vGPU Manager Daemon.
Jun 15 09:03:58 zcloud.staging.huatai.me bash[30237]: vgpu_unlock loaded.
Jun 15 09:03:58 zcloud.staging.huatai.me nvidia-vgpu-mgr[30237]: vgpu_unlock loaded.
Jun 15 09:03:58 zcloud.staging.huatai.me nvidia-vgpu-mgr[30253]: vgpu_unlock loaded.
Jun 15 09:03:58 zcloud.staging.huatai.me nvidia-vgpu-mgr[30253]: notice: vmiop_env_log: nvidia-vgpu-mgr daemon started
Jun 15 10:04:14 zcloud.staging.huatai.me nvidia-vgpu-mgr[31016]: notice: vmiop_env_log: vmiop-env: guest_max_gpfn:0x0
Jun 15 10:04:14 zcloud.staging.huatai.me nvidia-vgpu-mgr[31016]: notice: vmiop_env_log: (0x0): Received start call from nvidia-vgpu-vfio module: mdev uuid 334855e2-079b-11ee-9fc9-83e0dccb6713 GPU PCI id 00:82:00.0 config params vgpu_type_id=50
Jun 15 10:04:14 zcloud.staging.huatai.me nvidia-vgpu-mgr[31016]: notice: vmiop_env_log: (0x0): pluginconfig: vgpu_type_id=50
Jun 15 10:04:14 zcloud.staging.huatai.me nvidia-vgpu-mgr[31016]: notice: vmiop_env_log: Successfully updated env symbols!
Jun 15 10:04:14 zcloud.staging.huatai.me nvidia-vgpu-mgr[31016]: notice: vmiop_log: (0x0): gpu-pci-id : 0x8200
Jun 15 10:04:14 zcloud.staging.huatai.me nvidia-vgpu-mgr[31016]: notice: vmiop_log: (0x0): vgpu_type : Quadro
Jun 15 10:04:14 zcloud.staging.huatai.me nvidia-vgpu-mgr[31016]: notice: vmiop_log: (0x0): Framebuffer: 0x164000000
Jun 15 10:04:14 zcloud.staging.huatai.me nvidia-vgpu-mgr[31016]: notice: vmiop_log: (0x0): Virtual Device Id: 0x1b38:0x11ec
Jun 15 10:04:14 zcloud.staging.huatai.me nvidia-vgpu-mgr[31016]: notice: vmiop_log: (0x0): FRL Value: 60 FPS
Jun 15 10:04:14 zcloud.staging.huatai.me nvidia-vgpu-mgr[31016]: notice: vmiop_log: ######## vGPU Manager Information: ########
Jun 15 10:04:14 zcloud.staging.huatai.me nvidia-vgpu-mgr[31016]: notice: vmiop_log: Driver Version: 510.85.03
Jun 15 10:04:14 zcloud.staging.huatai.me nvidia-vgpu-mgr[31016]: notice: vmiop_log: (0x0): vGPU supported range: (0x70001, 0xd0001)
Jun 15 10:04:14 zcloud.staging.huatai.me nvidia-vgpu-mgr[31016]: notice: vmiop_log: (0x0): Detected ECC enabled on physical GPU.
Jun 15 10:04:14 zcloud.staging.huatai.me nvidia-vgpu-mgr[31016]: notice: vmiop_log: (0x0): Guest usable FB size is reduced due to ECC.
Jun 15 10:04:14 zcloud.staging.huatai.me nvidia-vgpu-mgr[31019]: vgpu_unlock loaded.
Jun 15 10:04:14 zcloud.staging.huatai.me nvidia-vgpu-mgr[31016]: notice: vmiop_log: (0x0): Init frame copy engine: syncing...
Jun 15 10:04:14 zcloud.staging.huatai.me nvidia-vgpu-mgr[31035]: vgpu_unlock loaded.
Jun 15 10:04:14 zcloud.staging.huatai.me nvidia-vgpu-mgr[31035]: notice: vmiop_env_log: vmiop-env: guest_max_gpfn:0x0
Jun 15 10:04:14 zcloud.staging.huatai.me nvidia-vgpu-mgr[31035]: notice: vmiop_env_log: (0x0): Received start call from nvidia-vgpu-vfio module: mdev uuid 334855e2-079b-11ee-9fc9-83e0dccb6713 GPU PCI id 00:82:00.0 config params vgpu_type_id=50
Jun 15 10:04:14 zcloud.staging.huatai.me nvidia-vgpu-mgr[31035]: notice: vmiop_env_log: (0x0): pluginconfig: vgpu_type_id=50
Jun 15 10:04:14 zcloud.staging.huatai.me nvidia-vgpu-mgr[31035]: notice: vmiop_env_log: Successfully updated env symbols!
Jun 15 10:04:14 zcloud.staging.huatai.me nvidia-vgpu-mgr[31016]: notice: vmiop_log: (0x0): vGPU migration enabled
Jun 15 10:04:14 zcloud.staging.huatai.me nvidia-vgpu-mgr[31035]: op_type: 0xa0810115 failed.
Jun 15 10:04:14 zcloud.staging.huatai.me nvidia-vgpu-mgr[31035]: notice: vmiop_log: (0x0): gpu-pci-id : 0x8200
Jun 15 10:04:14 zcloud.staging.huatai.me nvidia-vgpu-mgr[31035]: notice: vmiop_log: (0x0): vgpu_type : Quadro
Jun 15 10:04:14 zcloud.staging.huatai.me nvidia-vgpu-mgr[31035]: notice: vmiop_log: (0x0): Framebuffer: 0x164000000
Jun 15 10:04:14 zcloud.staging.huatai.me nvidia-vgpu-mgr[31035]: notice: vmiop_log: (0x0): Virtual Device Id: 0x1b38:0x11ec
Jun 15 10:04:14 zcloud.staging.huatai.me nvidia-vgpu-mgr[31035]: notice: vmiop_log: (0x0): FRL Value: 60 FPS
Jun 15 10:04:14 zcloud.staging.huatai.me nvidia-vgpu-mgr[31035]: notice: vmiop_log: ######## vGPU Manager Information: ########
Jun 15 10:04:14 zcloud.staging.huatai.me nvidia-vgpu-mgr[31035]: notice: vmiop_log: Driver Version: 510.85.03
Jun 15 10:04:14 zcloud.staging.huatai.me nvidia-vgpu-mgr[31016]: notice: vmiop_log: display_init inst: 0 successful
Jun 15 10:04:14 zcloud.staging.huatai.me nvidia-vgpu-mgr[31035]: notice: vmiop_log: (0x0): vGPU supported range: (0x70001, 0xd0001)
Jun 15 10:04:14 zcloud.staging.huatai.me nvidia-vgpu-mgr[31035]: notice: vmiop_log: (0x0): Detected ECC enabled on physical GPU.
Jun 15 10:04:14 zcloud.staging.huatai.me nvidia-vgpu-mgr[31035]: notice: vmiop_log: (0x0): Guest usable FB size is reduced due to ECC.
Jun 15 10:04:14 zcloud.staging.huatai.me nvidia-vgpu-mgr[31035]: notice: vmiop_log: (0x0): Init frame copy engine: syncing...
Jun 15 10:04:14 zcloud.staging.huatai.me nvidia-vgpu-mgr[31016]: notice: vmiop_env_log: (0x1): Received start call from nvidia-vgpu-vfio module: mdev uuid 33485650-079b-11ee-9fca-8f6415d2734c GPU PCI id 00:82:00.0 config params vgpu_type_id=50
Jun 15 10:04:14 zcloud.staging.huatai.me nvidia-vgpu-mgr[31016]: notice: vmiop_env_log: (0x1): pluginconfig: vgpu_type_id=50
Jun 15 10:04:14 zcloud.staging.huatai.me nvidia-vgpu-mgr[31016]: notice: vmiop_log: (0x1): gpu-pci-id : 0x8200
Jun 15 10:04:14 zcloud.staging.huatai.me nvidia-vgpu-mgr[31016]: notice: vmiop_log: (0x1): vgpu_type : Quadro
Jun 15 10:04:14 zcloud.staging.huatai.me nvidia-vgpu-mgr[31016]: notice: vmiop_log: (0x1): Framebuffer: 0x164000000
Jun 15 10:04:14 zcloud.staging.huatai.me nvidia-vgpu-mgr[31016]: notice: vmiop_log: (0x1): Virtual Device Id: 0x1b38:0x11ec
Jun 15 10:04:14 zcloud.staging.huatai.me nvidia-vgpu-mgr[31016]: notice: vmiop_log: (0x1): FRL Value: 60 FPS
Jun 15 10:04:14 zcloud.staging.huatai.me nvidia-vgpu-mgr[31016]: notice: vmiop_log: ######## vGPU Manager Information: ########
Jun 15 10:04:14 zcloud.staging.huatai.me nvidia-vgpu-mgr[31016]: notice: vmiop_log: Driver Version: 510.85.03
Jun 15 10:04:14 zcloud.staging.huatai.me nvidia-vgpu-mgr[31016]: error: vmiop_log: (0x1): init_device_instance failed for inst 1 with error 1 (multiple vGPUs in a VM not supported)
Jun 15 10:04:14 zcloud.staging.huatai.me nvidia-vgpu-mgr[31016]: error: vmiop_log: (0x1): Initialization: init_device_instance failed error 1
Jun 15 10:04:14 zcloud.staging.huatai.me nvidia-vgpu-mgr[31016]: error: vmiop_log: (0x1): Thread for engine 0x0 could not join with error 0x5
Jun 15 10:04:14 zcloud.staging.huatai.me nvidia-vgpu-mgr[31016]: error: vmiop_log: (0x1): Failed to free thread event for engine 0x0. Error: 0x5
Jun 15 10:04:14 zcloud.staging.huatai.me nvidia-vgpu-mgr[31016]: error: vmiop_log: (0x1): Thread for engine 0x4 could not join with error 0x5
Jun 15 10:04:14 zcloud.staging.huatai.me nvidia-vgpu-mgr[31016]: error: vmiop_log: (0x1): Failed to free thread event for engine 0x4. Error: 0x5
Jun 15 10:04:14 zcloud.staging.huatai.me nvidia-vgpu-mgr[31016]: error: vmiop_log: (0x1): Thread for engine 0x5 could not join with error 0x5
Jun 15 10:04:14 zcloud.staging.huatai.me nvidia-vgpu-mgr[31016]: error: vmiop_log: (0x1): Failed to free thread event for engine 0x5. Error: 0x5
Jun 15 10:04:14 zcloud.staging.huatai.me nvidia-vgpu-mgr[31016]: error: vmiop_log: display_init failed for inst: 1
Jun 15 10:04:14 zcloud.staging.huatai.me nvidia-vgpu-mgr[31016]: error: vmiop_env_log: (0x1): vmiope_process_configuration failed with 0x1f
Jun 15 10:04:14 zcloud.staging.huatai.me nvidia-vgpu-mgr[31016]: error: vmiop_env_log: (0x1): plugin_initialize failed with error:0x1f
Jun 15 10:04:14 zcloud.staging.huatai.me nvidia-vgpu-mgr[31035]: notice: vmiop_log: (0x0): vGPU migration enabled
Jun 15 10:04:14 zcloud.staging.huatai.me nvidia-vgpu-mgr[31035]: notice: vmiop_log: display_init inst: 0 successful
Jun 15 10:04:18 zcloud.staging.huatai.me nvidia-vgpu-mgr[31035]: notice: vmiop_log: (0x0): Srubbing completed but notification missed
乌龙 了
原来错误日志是如此明显: multiple vGPUs in a VM not supported
同一虚拟机配置多vGPU有硬件限制
原来 Virtual GPU Software R525 for Ubuntu Release Notes #Multiple vGPU Support 是有硬件限制的,而且非常苛刻:
NVIDIA Pascal GPU Architecture
(我的 Nvidia Tesla P10 GPU运算卡 )中Tesla P40
实际上只有2个vGPU规格支持在一个虚拟机中配置多个vGPU:P40-24Q
和P40-24C
(NVIDIA你是玩我呀,24C和24Q不就是完整的一块P40 GPU卡么)实际上真正有效的
vGPU
功能要从NVIDIA Volta GPU Architecture
系列以上,才支持全系列 Q / C 不同规格多vGPUs
配置到同一个VM
唉,折腾了好几天,原来我的 Nvidia Tesla P10 GPU运算卡 太低端了,无法实现 同一虚拟机配置多vGPU
,郁闷...
清理环境,再次起步
终于折腾完了 NVIDIA Virtual GPU (vGPU) ,断断续续杂七杂八写了很多曲折的笔记...
我决定将 Nvidia Tesla P10 GPU运算卡 切分成2块 NVIDIA Virtual GPU (vGPU) 来构建 GPU Kubernetes :操作汇总整理到 vGPU快速起步 。这里我先清理掉本文多次实践后的vGPU环境,以便重新开始:
# 消除mdev的profile(持久化配置)
mdevctl undefine -u 334855e2-079b-11ee-9fc9-83e0dccb6713
mdevctl undefine -u 33485650-079b-11ee-9fca-8f6415d2734c
mdevctl undefine -u 334852fe-079b-11ee-9fc7-77463608f467
mdevctl undefine -u 3348556a-079b-11ee-9fc8-7fb0c612aedd
# 删除mdev设备(stop)
mdevctl stop -u 334855e2-079b-11ee-9fc9-83e0dccb6713
mdevctl stop -u 33485650-079b-11ee-9fca-8f6415d2734c
mdevctl stop -u 334852fe-079b-11ee-9fc7-77463608f467
mdevctl stop -u 3348556a-079b-11ee-9fc8-7fb0c612aedd
# 最后检查列表,确认已经清理干净
mdevctl list
警告
目前我实际采用 vGPU快速起步 构建双vGPU模式来运行(每个vGPU分配12G显存)
nvidia-smi
清理
我在上述清理之后实践 vGPU快速起步 还是遇到了 y-k8s-n-1
启动报错:
nvidia-smi
没有清理干净error: Failed to start domain 'y-k8s-n-1'
error: internal error: qemu unexpectedly closed the monitor: 2023-06-15T07:08:55.663867Z qemu-system-x86_64: -device vfio-pci,id=hostdev0,sysfsdev=/sys/bus/mdev/devices/3eb9d560-0b31-11ee-91a9-bb28039c61eb,display=off,bus=pci.7,addr=0x0: vfio 3eb9d560-0b31-11ee-91a9-bb28039c61eb: error getting device from group 123: Input/output error
Verify all devices in group 123 are bound to vfio-<bus> or pci-stub and not already in use
此时我发现 nvidia-smi vgpu
居然还残留着之前配置的2个 P40-6Q
(当时启动失败,但是配置残留):
nvidia-smi vgpu
没有清理干净残留的2个 P40-6Q
Thu Jun 15 15:12:03 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.85.03 Driver Version: 510.85.03 |
|---------------------------------+------------------------------+------------+
| GPU Name | Bus-Id | GPU-Util |
| vGPU ID Name | VM ID VM Name | vGPU-Util |
|=================================+==============================+============|
| 0 NVIDIA Graphics Device | 00000000:82:00.0 | 0% |
| 3251634329 GRID P40-6Q | 10a1... y-k8s-n-1 | 0% |
| 3251634341 GRID P40-6Q | 10a1... y-k8s-n-1 | 0% |
+---------------------------------+------------------------------+------------+
而且此时 nvidia-smi
也残留着当时已经分配的一个 P40-6Q
vGPU:
nvidia-smi
没有清理干净残留的1个 P40-6Q
Thu Jun 15 15:18:05 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.85.03 Driver Version: 510.85.03 CUDA Version: N/A |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA Graphics... On | 00000000:82:00.0 Off | 0 |
| N/A 39C P8 18W / 150W | 5762MiB / 23040MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 31016 C+G vgpu 5712MiB |
+-----------------------------------------------------------------------------+
执行
systemctl restart nvidia-vgpu-mgr
,然后检查journalctl -u nvidia-vgpu-mgr
果然发现有残留:
nvidia-vgpu-mgr
发现有4个之前残留的进程(之前使用过的2个mdev设备 P40-6Q
)Jun 15 15:20:57 zcloud.staging.huatai.me systemd[1]: Stopping NVIDIA vGPU Manager Daemon...
Jun 15 15:20:57 zcloud.staging.huatai.me systemd[1]: nvidia-vgpu-mgr.service: Deactivated successfully.
Jun 15 15:20:57 zcloud.staging.huatai.me systemd[1]: nvidia-vgpu-mgr.service: Unit process 3784 (nvidia-vgpu-mgr) remains running after unit stopped.
Jun 15 15:20:57 zcloud.staging.huatai.me systemd[1]: nvidia-vgpu-mgr.service: Unit process 30232 (vgpu_unlock) remains running after unit stopped.
Jun 15 15:20:57 zcloud.staging.huatai.me systemd[1]: nvidia-vgpu-mgr.service: Unit process 30253 (nvidia-vgpu-mgr) remains running after unit stopped.
Jun 15 15:20:57 zcloud.staging.huatai.me systemd[1]: nvidia-vgpu-mgr.service: Unit process 31016 (nvidia-vgpu-mgr) remains running after unit stopped.
Jun 15 15:20:57 zcloud.staging.huatai.me systemd[1]: Stopped NVIDIA vGPU Manager Daemon.
Jun 15 15:20:57 zcloud.staging.huatai.me systemd[1]: nvidia-vgpu-mgr.service: Consumed 18.441s CPU time.
Jun 15 15:20:57 zcloud.staging.huatai.me systemd[1]: nvidia-vgpu-mgr.service: Found left-over process 3784 (nvidia-vgpu-mgr) in control group while starting unit. Ignoring.
Jun 15 15:20:57 zcloud.staging.huatai.me systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
Jun 15 15:20:57 zcloud.staging.huatai.me systemd[1]: nvidia-vgpu-mgr.service: Found left-over process 30232 (vgpu_unlock) in control group while starting unit. Ignoring.
Jun 15 15:20:57 zcloud.staging.huatai.me systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
Jun 15 15:20:57 zcloud.staging.huatai.me systemd[1]: nvidia-vgpu-mgr.service: Found left-over process 30253 (nvidia-vgpu-mgr) in control group while starting unit. Ignoring.
Jun 15 15:20:57 zcloud.staging.huatai.me systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
Jun 15 15:20:57 zcloud.staging.huatai.me systemd[1]: nvidia-vgpu-mgr.service: Found left-over process 31016 (nvidia-vgpu-mgr) in control group while starting unit. Ignoring.
Jun 15 15:20:57 zcloud.staging.huatai.me systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies.
Jun 15 15:20:57 zcloud.staging.huatai.me systemd[1]: Starting NVIDIA vGPU Manager Daemon...
Jun 15 15:20:57 zcloud.staging.huatai.me systemd[1]: Started NVIDIA vGPU Manager Daemon.
Jun 15 15:20:57 zcloud.staging.huatai.me bash[34344]: vgpu_unlock loaded.
Jun 15 15:20:57 zcloud.staging.huatai.me nvidia-vgpu-mgr[34344]: vgpu_unlock loaded.
Jun 15 15:20:57 zcloud.staging.huatai.me nvidia-vgpu-mgr[34360]: vgpu_unlock loaded.
Jun 15 15:20:57 zcloud.staging.huatai.me nvidia-vgpu-mgr[34360]: notice: vmiop_env_log: nvidia-vgpu-mgr daemon started
检查进程:
ps aux | grep nvidia-vgpu-mgr
果然看到了对应的pid:
ps
检查可以看到残留的 nvidia-vgpu-mgr
对应于之前曾经使用过的2个mdev设备 P40-6Q
root 3784 0.0 0.0 429044 2228 ? Ss Jun10 0:01 /usr/bin/nvidia-vgpu-mgr
root 30232 0.0 0.0 446312 49452 ? Sl 09:03 0:09 /bin/python3 /opt/vgpu_unlock/vgpu_unlock -f /usr/bin/nvidia-vgpu-mgr
root 30253 0.0 0.0 466376 9208 ? Ssl 09:03 0:00 /usr/bin/nvidia-vgpu-mgr
root 34340 0.0 0.0 438116 49256 ? Sl 15:20 0:00 /bin/python3 /opt/vgpu_unlock/vgpu_unlock -f /usr/bin/nvidia-vgpu-mgr
root 34360 0.0 0.0 474572 8456 ? Ssl 15:20 0:00 /usr/bin/nvidia-vgpu-mgr
问题出在已经销毁的
mdev
设备对应的vGPU
一直是激活状态,执行nvidia-smi vqpu -q
可以看到查询详情:
nvidia-smi vqpu -q
显示已经销毁的 mdev
设备对应的 vGPU
依然是激活状态,所以导致资源不是放GPU 00000000:82:00.0
Active vGPUs : 2
vGPU ID : 3251634329
VM UUID : 10a12241-1e83-4b70-bc59-a33d7c6d063c
VM Name : y-k8s-n-1
vGPU Name : GRID P40-6Q
vGPU Type : 50
vGPU UUID : ed1f9055-0b20-11ee-90a2-c79b496fe3f9
MDEV UUID : 334855e2-079b-11ee-9fc9-83e0dccb6713
Guest Driver Version : N/A
License Status : N/A (Expiry: N/A)
GPU Instance ID : N/A
Accounting Mode : N/A
ECC Mode : Disabled
Accounting Buffer Size : 4000
Frame Rate Limit : N/A
PCI
Bus Id : 00000000:00:00.0
FB Memory Usage
Total : 6144 MiB
Used : 0 MiB
Free : 6144 MiB
Utilization
Gpu : 0 %
Memory : 0 %
Encoder : 0 %
Decoder : 0 %
Encoder Stats
Active Sessions : 0
Average FPS : 0
Average Latency : 0
FBC Stats
Active Sessions : 0
Average FPS : 0
Average Latency : 0
vGPU ID : 3251634341
VM UUID : 10a12241-1e83-4b70-bc59-a33d7c6d063c
VM Name : y-k8s-n-1
vGPU Name : GRID P40-6Q
vGPU Type : 50
vGPU UUID : 00000000-0000-0000-0000-000000000000
MDEV UUID : 33485650-079b-11ee-9fca-8f6415d2734c
Guest Driver Version : N/A
License Status : N/A (Expiry: N/A)
GPU Instance ID : N/A
Accounting Mode : N/A
ECC Mode : Disabled
Accounting Buffer Size : 4000
Frame Rate Limit : N/A
PCI
Bus Id : 00000000:00:00.0
FB Memory Usage
Total : 6144 MiB
Used : 0 MiB
Free : 6144 MiB
Utilization
Gpu : 0 %
Memory : 0 %
Encoder : 0 %
Decoder : 0 %
Encoder Stats
Active Sessions : 0
Average FPS : 0
Average Latency : 0
FBC Stats
Active Sessions : 0
Average FPS : 0
Average Latency : 0
这说明关键因素在于 nvidia-smi vgpu
,需要清理残留
检查
nvidia-smi vgpu -h
,可以看到有一个对应参数-caa
[-caa | --clear-accounted-apps]: Clears accounting information of the vGPU instance that have already terminated.
可以用来清理已经终止的 vGPU 实例的记账信息
执行
vGPU
终止的实例accounting information
清理:
vGPU
终止的实例 accounting information
nvidia-smi vgpu -caa
可以看到残留的 vGPU accounting infomation 清理
vGPU
终止的实例 accounting information
Cleared Accounted PIDs for vGPU 3251634329
Cleared Accounted PIDs for vGPU 3251634341
但是没有解决问题
乌龙: 我尝试了
echo 1
到/sys/class/mdev_bus/0000:82:00.0/reset
,结果nvidia-smi
再也检测不到设备了:Unable to determine the device handle for GPU 0000:82:00.0: Unknown Error
尝试
rmmod
nvidia相关内核模块,但是显示正在使用执行
lsof | grep nvidia | awk '{print $2}' | sort -u
找出所有进程杀死,不过有一个内核进程[nvidia]
无法杀掉此时执行
lsmod | grep nvidia
可以看到已经基本上没有使用模块了:nvidia_vgpu_vfio 57344 0 nvidia 39174144 2 mdev 28672 1 nvidia_vgpu_vfio drm 622592 4 drm_kms_helper,nvidia,mgag200
则可以依次卸载内核模块:
rmmod nvidia_vgpu_vfio
rmmod nvidia
则所有 nvidia
相关模块都卸载了
再次加载
nvidia
模块:# modprobe nvidia # lsmod | grep nvidia nvidia 39174144 0 drm 622592 4 drm_kms_helper,nvidia,mgag200
此时执行 nvidia-smi
不再报错,但是显示没有设备:
No devices were found
我重新走了一遍 vgpu_unlock (为了重装一遍驱动以及加载内核模块),完成后可以看到内核模块重新加载:
nvidia_vgpu_vfio 57344 0 nvidia 39145472 2 mdev 28672 1 nvidia_vgpu_vfio drm 622592 4 drm_kms_helper,nvidia,mgag200
不过
nvidia-smi
依然显示No devices were found
lspci -v -s 82:00.0
输出显示没有异常:82:00.0 3D controller: NVIDIA Corporation GP102GL [Tesla P10] (rev a1) Subsystem: NVIDIA Corporation GP102GL [Tesla P10] Physical Slot: 3 Flags: bus master, fast devsel, latency 0, IRQ 16, NUMA node 1, IOMMU group 80 Memory at c8000000 (32-bit, non-prefetchable) [size=16M] Memory at 3b000000000 (64-bit, prefetchable) [size=32G] Memory at 3b800000000 (64-bit, prefetchable) [size=32M] Capabilities: [60] Power Management version 3 Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+ Capabilities: [78] Express Endpoint, MSI 00 Capabilities: [100] Virtual Channel Capabilities: [250] Latency Tolerance Reporting Capabilities: [128] Power Budgeting <?> Capabilities: [420] Advanced Error Reporting Capabilities: [600] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?> Capabilities: [900] Secondary PCI Express Kernel driver in use: nvidia Kernel modules: nvidiafb, nouveau, nvidia_vgpu_vfio, nvidia
备注
算了,我暂时放弃了,不折腾了。其实最简单的方式是重启服务器...
参考
Proxmox 7 vGPU – v2 最新文档,提供了5.15内核配置vGPU参考,而且可行,赞
Virtual GPU Software User Guide : Installing the Virtual GPU Manager Package for Linux KVM
Configuring the vGPU Manager for a Linux with KVM Hypervisor
Configuring NVIDIA Virtual GPU (vGPU) in a Linux VM on Lenovo ThinkSystem Servers
Ubuntu 22.04 LTS mdevctl Manual mdevctl, lsmdev - Mediated device management utility
Error when allocating multiple vGPUs in a single VM with Ubuntu KVM hypervisor