linuxserver/colibre使用GPU加速

在排查 一次异常排查:HTTP2多路复用(Multiplexig)/长连接和本地Safari浏览器缓存冲突 意外发现,原来在容器内部运行的图形系统 wayland显示服务器协议 实际上完全没有使用GPU:

日志显示CPU软解压(未使用GPU)且出现Segmentation fault
INFO:data_websocket:Successfully stopped all streams for display 'primary'.
INFO:data_websocket:Preparing to start capture for display='primary': Res=2560x1256, Offset=0x0
INFO:data_websocket:Registered Wayland cursor callback for 'primary'
INFO:data_websocket:Video chunk sender for 'primary' cancelled.
[Wayland] Configuring Output: 2560x1256 @ 60.00 FPS (Scale 1.00)
INFO:data_websocket:Video chunk sender for 'primary' finished.
INFO:data_websocket:Video chunk sender started for display 'primary'.
INFO:data_websocket:SUCCESS: Capture started for 'primary'.
[Wayland] CPU encoding selected (use_cpu=true or vaapi_node=-1).
[Wayland] Decision: No GPU Encoder available -> Using CPU Software Encoding.
Stream settings active -> Res: 2560x1256 | FPS: 60.0 | Stripes: 8 | Mode: JPEG | Quality: 60 | PaintOver Q: 90 (Trigger: 15f) | Damage Thresh: 10f | Damage Dur: 20f
/defaults/startwm_wayland.sh: line 18:   669 Segmentation fault         (core dumped) labwc > /dev/null 2>&1
No desktop processes found to terminate.
[svc-de] Wayland mode: Waiting for socket at /config/.XDG/wayland-1...
[svc-de] /config/.XDG/wayland-1 found launching de

可以看到上述日志中有以下异常: 纯 CPU 软解压与极高编码负荷 : 日志第 4 行显示 Decision: No GPU Encoder available -> Using CPU Software Encoding. ,且渲染分辨率高达 2560x1256 @ 60 FPS

在我的服务器上,CPU是 Intel Xeon E-2274G 内置了 Intel UHD Graphics 630 ,另外系统安装了 Nvidia Tesla P10 GPU运算卡 ,所以检查 /dev/dri 可以看到有2个设备:

/dev/dri 目录下有2个设备
$ ls -lh /dev/dri
total 0
drwxr-xr-x 2 root root        120 Jun  9 14:55 by-path
crw-rw---- 1 root video  226,   0 Jun  9 14:55 card0
crw-rw---- 1 root video  226,   1 Jun  9 14:55 card1
crw-rw---- 1 root render 226, 128 Jun  9 14:55 renderD128
crw-rw---- 1 root render 226, 129 Jun  9 14:55 renderD129

$ ls -l /sys/class/drm/card*/device/
...
/sys/class/drm/card0/device/:
...
lrwxrwxrwx 1 root root         0 Jun  9 14:55 driver -> ../../../bus/pci/drivers/i915
...
/sys/class/drm/card1/device/:
...
lrwxrwxrwx 1 root root           0 Jun  9 14:55 driver -> ../../../../bus/pci/drivers/nvidia
...

并且可以看出 card0 是 Intel GPU, card1 是 NVIDIA GPU

Intel GPU

警告

我在gemini的指导下配置Intel GPU,但是没有成功,日志显示 Failed to create processing pipeline config: 12 (the requested VAProfile is not supported).

NVIDIA 的 NVENC/NVDEC 硬件驱动在处理非标准分辨率、高分屏以及色彩空间转换(I420/NV12)时的弹性与兼容性,要远远比 Intel 严苛的 iHD 驱动宽容得多。

由于 NVIDIA GPU 性能更强功能更完备,所以我后续用于 Machine Learning ,而这里图形系统加速就采用轻量级的 Intel GPU ,这样不仅物尽其用,而且也节能。

  calibre-backend:
    image: lscr.io/linuxserver/calibre:latest
    container_name: calibre-backend
    restart: always

    #privileged: true # 👈 关键:物理放开内核特权,允许容器在内部执行 mount/umount 挂载操作(不过这个权限过宽所以最终取消)
    #
    # **我现在取消 /dev/sda 映射进容器,改为在Host主机挂载sda到 /mnt/kobo ,然后映射目录,可避免容器高危权限
    #ipc: host         # 👈 核心修正 1:与宿主机共享 IPC 内存总线
    #cap_add:
    #  - SYS_ADMIN     # 👈 核心修正 2:允许容器在内部执行 mount/umount 挂载物理 Kobo 设备
    #  - SYS_RAWIO     # 👈 核心修正 3: 允许容器直接对透传的 /dev/sda 块设备进行原始 I/O 读写
    #devices:
    #  - "/dev/sda:/dev/sda" # 👈 核心:将物理主机的整个 Kobo 磁盘透传给容器内部
    # ************

    # ports: # 👈 注意:这里绝对不要再把 8080 映射到宿主机了!由 Nginx 独占宿主机 8080

但是,仅仅将dri设备映射进容器,检查 docker logs calibre-backend 发现错误日志:

显示初始化VAAPI失败,Intel GPU没有成功配置,依然使用CPU软渲染
...
INFO:data_websocket:Video chunk sender started for display 'primary'.
[pcmflux] Attempting to connect to PulseAudio device: output.monitor with latency: 10ms
INFO:data_websocket:pcmflux audio capture started successfully.
INFO:data_websocket:pcmflux audio chunk broadcasting task started.
[pcmflux] ERROR: pa_simple_new() failed: No such entity
  (Could not find the device named: 'output.monitor')
[Wayland] Encode Node Index: 0 | Driver: ../../../bus/pci/drivers/i915
[Wayland] Initializing Unified VAAPI Encoder...
[Parsed_scale_vaapi_1 @ 0x7e3185189800] Failed to create processing pipeline config: 12 (the requested VAProfile is not supported).
[Parsed_scale_vaapi_1 @ 0x7e3185189800] Failed to configure output pad on Parsed_scale_vaapi_1
[Wayland] Failed to init VAAPI: Failed to config filter graph. Falling back to CPU.
[Wayland] Decision: No GPU Encoder available -> Using CPU Software Encoding.
...

这里日志显示 Driver: ../../../bus/pci/drivers/i915 说明设备已经透传成功,容器确实正确拿到了Intel核显 Intel UHD Graphics 630 的硬件句柄。

但是 Failed to create processing pipeline config: 12 (the requested VAProfile is not supported) 表明容器内置的 FFmpeg/KasmVNC 试图在硬件层调用某种特定的色彩空间缩放或H.264编码配置(例如特定的 Low-Power 模式或特定的色彩位深)时, Intel 的媒体驱动(Intel Media Driver)拒绝了这一请求,认为当前核显硬件的 VAProfile 无法支持此流水线

这通常是因为 Linuxserver 容器默认使用的开源 VA-API 驱动版本、容器内部色彩空间格式与硬件加速管道冲突。在 docker-compose.ymlenvironment 部分添加以下3个环境变量来指定设备。另外,需要校准色彩空间模式(消除 scale_vaapi 管道阻塞),因为日志中显示当前容器激活的色彩空寂按是 Colorspace: I420 (Limited Range) : Intel 的硬件缩放过滤器( scale_vaapi )在处理 I420 这种非硬件原生格式的帧时,极易报出 VAProfile is not supported 的配置管道错误 。Intel 核显更喜欢原生的 NV12 格式。所以需要在 environment 中再追加一个 KasmVNC/Selkies 底层视频流控制参数,强行让其在硬解时采用兼容性最好的全彩色范围:

  calibre-backend:
    image: lscr.io/linuxserver/calibre:latest
    container_name: calibre-backend
    restart: always
    #privileged: true # 👈 关键:物理放开内核特权,允许容器在内部执行 mount/umount 挂载操作(不过这个权限过宽所以最终取消)
    ipc: host         # 👈 核心修正 1:与宿主机共享 IPC 内存总线
    cap_add:
      - SYS_ADMIN     # 👈 核心修正 2:允许容器在内部执行 mount/umount 挂载物理 Kobo 设备
      - SYS_RAWIO     # 👈 核心修正 3: 允许容器直接对透传的 /dev/sda 块设备进行原始 I/O 读写
    devices:
      - "/dev/sda:/dev/sda" # 👈 核心:将物理主机的整个 Kobo 磁盘透传给容器内部
      # 🪐 精准透传:只把 Intel 核显节点透传容器,并在容器内将其伪装成主卡
      - "/dev/dri/card0:/dev/dri/card0" # 👈 核心修正 4:透传宿主机集成/独立显卡的图形核心(即使软解也极需此通道建立 DRM 会话)
      - "/dev/dri/renderD128:/dev/dri/renderD128"
    # ports: # 👈 注意:这里绝对不要再把 8080 映射到宿主机了!由 Nginx 独占宿主机 8080
    expose:
      - "8080"

备注

Intel Xeon E-2274G 内置的 UHD P630 核显,其硬件视频编码器(MFE/MFX)在底层的硬件 Profile(VAProfile)中,不支持非标准分辨率(高为 1256 这种不能被 1632 整除的非对齐分辨率)直接在 I420 格式下做硬件 scale_vaapi 图形缩放滤波。 驱动层直接抛出 12 (Unsupported) ,导致 FFmpeg 滤波器管道崩溃,瞬间回滚(Fallback)到了 CPU 软解 。

所以,要避免 Failed to create processing pipeline config: 12 (the requested VAProfile is not supported) 需要利用 Selkies 容器自带的强权分辨率对齐机制(force_aligned_resolution),强行把高分屏下的分辨率约束为能被 16 整除的标准硬件格式。

这就需要添加以下两个环境变量:

强制对齐16像素
calibre-backend:
    # ... 其余不变 ...
    environment:
      - PUID=1000
      - PGID=1000
      - TZ=Asia/Shanghai
      - LIBVA_DRIVER_NAME=iHD
      - render_node=/dev/dri/renderD128
      # 🪐 核心注入 1:强制客户端屏幕分辨率在万分之一秒内向 16 像素物理对齐(2560x1256 会被优雅约束为标准硬件规格)
      - force_aligned_resolution=true
      # 🪐 核心注入 2:强制开启 H.264 的全色彩范围编码,直接越过 I420 的缩放盲区
      - h264_fullcolor=true

警告

太沮丧了,居然还是同样失败!!!

NVIDIA GPU

由于实在搞不定Intel PGU,又不想死磕,所以改为采用 Nvidia Tesla P10 GPU运算卡 : NVIDIA 的 NVENC/NVDEC 硬件驱动在处理非标准分辨率、高分屏以及色彩空间转换(I420/NV12)时的弹性与兼容性,要远远比 Intel 严苛的 iHD 驱动宽容得多。

经过一番折腾,在gemini指导下完成了配置:

配置NVIDIA P10作为渲染硬件
  calibre-backend:
    image: lscr.io/linuxserver/calibre:latest
    container_name: calibre-backend
    restart: always

    #privileged: true # 👈 关键:物理放开内核特权,允许容器在内部执行 mount/umount 挂载操作(不过这个权限过宽所以最终取消)
    #
    # **我现在取消 /dev/sda 映射进容器,改为在Host主机挂载sda到 /mnt/kobo ,然后映射目录,可避免容器高危权限
    #ipc: host         # 👈 核心修正 1:与宿主机共享 IPC 内存总线
    #cap_add:
    #  - SYS_ADMIN     # 👈 核心修正 2:允许容器在内部执行 mount/umount 挂载物理 Kobo 设备
    #  - SYS_RAWIO     # 👈 核心修正 3: 允许容器直接对透传的 /dev/sda 块设备进行原始 I/O 读写
    #devices:
    #  - "/dev/sda:/dev/sda" # 👈 核心:将物理主机的整个 Kobo 磁盘透传给容器内部
    # ************

    # ports: # 👈 注意:这里绝对不要再把 8080 映射到宿主机了!由 Nginx 独占宿主机 8080
    expose:
      - "8080"
    networks:
      - calibre-network
    environment:
      - PUID=1000
      - PGID=1000
      - TZ=Asia/Shanghai
      # 🪐 核心注入 1:NVIDIA 全量可见性
      - NVIDIA_VISIBLE_DEVICES=GPU-794d1de5-b8c7-9b49-6fe3-f96f8fd98a19 # 通过 nvidia-smi --query-gpu=index,name,uuid --format=csv 查询出GPU设备的uuid
      - NVIDIA_DRIVER_CAPABILITIES=all,graphics,utility,video
      # 🪐 核心注入 2:直接通知 Selkies 渲染节点切到 Tesla P10 对应的渲染端(通常是 renderD129)
      - DRI_NODE=/dev/dri/renderD129
    # 🪐 核心注入 3:调用 NVIDIA 官方 Runtime 渲染引擎
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              # 🪐 核心修正 3:精准划拨设备,只将指定的 UUID 硬件资源注入容器空间
              device_ids: ['GPU-794d1de5-b8c7-9b49-6fe3-f96f8fd98a19']
              capabilities: [gpu, graphics, video]
    volumes:
      - ./calibre-config:/config
      - ./library:/library       # 关键:全量图书库和自动生成的 metadata.db 都在这里
      - ./import:/import         # 独立的书籍自动导入目录
      - /mnt/kobo:/media/kobo:shared # 直接把宿主机已经挂载好的 Kobo 目录,映射进容器的 /media/kobo

然后观察日志 docker logs calibre-backend 可以看到如下日志:

配置NVDIA的GPU成功初始化
...
INFO:data_websocket:Starting display reconfiguration...
INFO:data_websocket:Calculating new extended desktop layout from ALL clients...
INFO:data_websocket:Layout calculated: Total Size=2560x1256. Layouts: {'primary': {'x': 0, 'y': 0, 'w': 2560, 'h': 1256}}
INFO:data_websocket:Starting separate capture instances for each ACTIVE display region...
INFO:data_websocket:Client 'primary' is active. Starting its capture.
INFO:data_websocket:Preparing to start capture for display='primary': Res=2560x1256, Offset=0x0
INFO:main:Parsed DRI node '/dev/dri/renderD129' to index 1.
INFO:data_websocket:Registered Wayland cursor callback for 'primary'
[Wayland] Configuring Output: 2560x1256 @ 60.00 FPS (Scale 1.00)
INFO:data_websocket:SUCCESS: Capture started for 'primary'.
[Wayland] Encode Node Index: 1 | Driver: ../../../../bus/pci/drivers/nvidia
[Wayland] Nvidia Encoder detected. Initializing NVENC...
[NVENC] Initializing...
INFO:data_websocket:Broadcasting primary stream resolution to all clients: {"type": "stream_resolution", "width": 2560, "height": 1256}
INFO:data_websocket:Broadcasting display config update: DISPLAY_CONFIG_UPDATE,{"type": "display_config_update", "displays": ["primary"]}
INFO:data_websocket:Display reconfiguration finished successfully.
INFO:data_websocket:Reconfiguration process complete (state unlocked).
INFO:data_websocket:Initial client settings message processed by ws_handler.
INFO:data_websocket:Initial setup: Primary client connected, audio not active, attempting start.
INFO:data_websocket:Starting pcmflux audio pipeline...
INFO:data_websocket:pcmflux settings: device='output.monitor', bitrate=320000, channels=2
INFO:data_websocket:Video chunk sender started for display 'primary'.
[pcmflux] Attempting to connect to PulseAudio device: output.monitor with latency: 10ms
INFO:data_websocket:pcmflux audio capture started successfully.
INFO:data_websocket:pcmflux audio chunk broadcasting task started.
[pcmflux] ERROR: pa_simple_new() failed: No such entity
  (Could not find the device named: 'output.monitor')
INFO:data_websocket:Received redundant resize request for primary (2560x1256). No action taken.
INFO:data_websocket:Received START_AUDIO command from client for server-to-client audio.
INFO:data_websocket:START_AUDIO: pcmflux audio pipeline already active.
[NVENC] Found 1 CUDA devices:
[NVENC]   Device 0: NVIDIA Graphics Device
[NVENC] Bound to CUDA device via PCI Bus ID: 0000:03:00.0
[NVENC] Initialized successfully (4:4:4 mode: false).
[Wayland] NVENC Encoder initialized successfully.
[Wayland] Decision: Zero-Copy path active.
Stream settings active -> Res: 2560x1256 | FPS: 60.0 | Stripes: 1 | Mode: H264 (NVENC) FullFrame | CRF: 25 | PaintOver CRF: 18 (Burst: 5f) | Colorspace: I420 (Limited Range) | Damage Thresh: 10f | Damage Dur: 20f

最后可以看到 NVENC Encoder initialized successfully

此时在Host主机上执行 nvidia-smi 可以看到有一个使用GPU的进程:

使用 nvidia-smi 可以观察到一个计算类型 c 进程在使用GPU
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 580.159.03             Driver Version: 580.159.03     CUDA Version: 13.0     |
+-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA Graphics Device         Off |   00000000:03:00.0 Off |                    0 |
| N/A   37C    P0             38W /  150W |     295MiB /  23040MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A          112766      C   /lsiopy/bin/python3                     292MiB |
+-----------------------------------------------------------------------------------------+