安装NVIDIA Virtual GPU Guest Driver¶
物理主机(Host)上 安装NVIDIA Virtual GPU Manager 后,通过 Libvirt虚拟机管理器 向虚拟机内部添加了 NVIDIA Virtual GPU (vGPU) 设备,现在到了开始真正使用vGPU的时候了。也就是需要在VM内部安装Guest驱动来实际使用GPU。
准备工作¶
虚拟机内部也要像 安装NVIDIA Virtual GPU Manager 安装 GCC和Linux Kernel Headers ,并且还需要像 vgpu_unlock 一样安装
dkms
:
sudo apt install gcc linux-headers-$(uname -r) dkms
安装 nvidia-linux-grid
Guest驱动¶
Ubuntu安装
nvidia-linux-grid
Guest驱动:
sudo dpkg -i nvidia-linux-grid-510_510.85.02_amd64.deb
然后重启虚拟机
配置licence¶
在Ubuntu虚拟机中编辑
/etc/nvidia/gridd.conf
配置:
# Description: Set License Server Address
# Data type: string
# Format: "<address>"
ServerAddress=192.168.6.248
# Description: Set License Server port number
# Data type: integer
# Format: <port>, default is 7070
ServerPort=7070
# Description: Set Feature to be enabled
# Data type: integer
# Possible values:
# 0 => for unlicensed state
# 1 => for NVIDIA vGPU (Optional, autodetected as per vGPU type)
# 2 => for NVIDIA RTX Virtual Workstation
# 4 => for NVIDIA Virtual Compute Server
# All other values reserved
FeatureType=4
这里有个问题,没有添加vGPU的虚拟机无法启动 nvida-gridd
服务。所以我返回 安装NVIDIA Virtual GPU Manager 为虚拟机 y-k8s-n-1
添加vGPU
启动
nvidia-gridd
:
systemctl start nvidia-gridd
客户端请求的服务器License必须得到服务器支持,例如License Server只提供 Quadro-Virtual-DWS
,但是客户端配置成 FeatureType=4
请求 Virtual Compute Server
,则客户端启动 nvidia-gridd
后日志会提示类似如下错误:
Jun 16 00:14:46 y-k8s-n-1 systemd[1]: Starting NVIDIA Grid Daemon...
Jun 16 00:14:46 y-k8s-n-1 systemd[1]: Started NVIDIA Grid Daemon.
Jun 16 00:14:46 y-k8s-n-1 nvidia-gridd[23795]: Started (23795)
Jun 16 00:14:46 y-k8s-n-1 nvidia-gridd[23795]: vGPU Software package (0)
Jun 16 00:14:46 y-k8s-n-1 nvidia-gridd[23795]: Ignore service provider licensing
Jun 16 00:14:46 y-k8s-n-1 nvidia-gridd[23795]: Unable to fetch the client configuration token file
Jun 16 00:14:47 y-k8s-n-1 nvidia-gridd[23795]: Service provider detection complete.
Jun 16 00:14:47 y-k8s-n-1 nvidia-gridd[23795]: Calling load_byte_array(tra)
Jun 16 00:14:47 y-k8s-n-1 nvidia-gridd[23795]: Acquiring license. (Info: http://192.168.6.248:7070/request; NVIDIA Virtual Compute Server)
Jun 16 00:14:47 y-k8s-n-1 nvidia-gridd[23795]: Calling load_byte_array(tra)
Jun 16 00:14:48 y-k8s-n-1 nvidia-gridd[23795]: Failed to acquire/renew license from license server. (Info: http://192.168.6.248:7070/request; NVIDIA Virtual Compute Server - Error: [1,7E2,2,0[7000000B,0,702C7]] Requested feature was not found.)
Jun 16 00:14:48 y-k8s-n-1 nvidia-gridd[23795]: Calling load_byte_array(tra)
Jun 16 00:14:49 y-k8s-n-1 nvidia-gridd[23795]: License acquired successfully. (Info: http://192.168.6.248:7070/request, NVIDIA Virtual Compute Server; Expiry: 2023-6-16 16:14:59 GMT)
不过,实践看来 Quadro-Virtual-DWS
License 似乎可以和 Virtual Compute Server
通用(从 nvidia-gridd
日志看最后加载License成功 )
此时,观察 Lince Server 服务器的 License Feature Usage
可以看到Licence计数已经减少了1个,也就是被vGPU客户端使用了