HP DL360 Gen9 Large Bar Memory(Tesla P10)

GPU卡请求内存映射I/O超过限制

当我第一次在 HPE ProLiant DL360 Gen9服务器 上安装 Nvidia Tesla P10 GPU运算卡 ,启动时候BIOS自检会提示错误:

../../../../_images/gpu_mem_map_err.png
276 Option Card Configuration Error. An option card is requesting more memory
mapped I/O than is available.
Action: Remove the option card to allow the system to boot.

原因

NVIDIA的计算加速卡

注解

NVIDIA GPGPU Adapters system memory addressing limitations - IBM Systems 还指出了 NVIDIA Grid, NVIDIA Tesla, or NVIDIA Quadro 受到产品设计影响,内存地址限制导致不能用于内存大于 1TB 的系统。

Advisory: HPE ProLiant Servers - On Systems Configured with an NVIDIA GPU With More Than 1 TB of Server Host Memory, GPU Options Will Not Function Properly 也是同样建议

VMware ESX配置建议

  • 虚拟机系统为64-bit操作系统

  • 物理机和虚拟机都使用EFI引导模式

  • 若GPU 需要 16 GB 或更多的内存映射(BAR1 Memory),需要在物理机bios中启用GPU直通,设置项名称通常为:

    • Above 4G decoding
    • Memory mapped I/O above 4GB
    • PCI 64-bit resource handing above 4G
  • 在虚拟机的 vmx 文件配置中激活 64 位 Memory Mapped I/O (MMIO)

    pciPassthru.use64bitMMIO="TRUE"
    
  • Memory Mapped I/O (MMIO)大小调整:建议调整为(n*GPU显存)向上舍入到下一个2次幂:

    • 两个16G显存GPU,2 x 16 GB = 32,将 32 GB 向上舍入到下一个 2 次幂,所需的内存量为 64 GB

    • 三个16G显存GPU,3 x 16 GB = 48,将 48 GB 向上舍入到下一个 2 次幂,所需的内存量为 64 GB

    • 或者直接设置为虚拟机分配的所有GPU显存大小的两倍,2*n*GPU显存(单位为GB)

    • 设置举例:

      pciPassthru.64bitMMIOSizeGB ="64"
      
  • 虚拟机内存最小值建议为分配的所有GPU显存总大小的1.5倍

HP DL360 Gen9 BIOS设置

虽然VMware文档提示:

Your host BIOS must be configured to support the large memory regions needed by these high-end PCI devices.
To enable this, find the host BIOS setting for “above 4G decoding” or “memory mapped I/O above 4GB” or “PCI 64 bit resource handing above 4G” and enable it.
The exact wording of this option varies by system vendor, though the option is often found in the PCI section of the BIOS menu.
Consult your system provider if necessary to enable this option.

但是我反复查看BIOS配置,都没有找到 PCI 配置部分

不过, enable large BAR support 有人也问了相似的查找 BIOS 配置支持 ‘64-bit IO’ ,提到了术语 Large BAR 。果然,在 HPE 文档中,使用了术语 Support 64-Bit AddressingLarge BAR 。根据 hpe dl360 gen9 enable large BAR support 搜索能够找到支持文档 Advisory: (Revision) HP ProLiant SL250s Gen8 and ProLiant SL270s Gen8 Servers - Servers Configured with a Large Number Of NVIDIA Tesla or Intel Xeon GPU Computing Modules Require the System ROM to Support 64-Bit Addressing (Large BAR) Support :

  • 启动服务器,在BIOS提示时,按下 F9 进入 ROM-Based Setup Utility (RBSU)
  • 在RBSU中,按下 Ctrl + A ,此时会进入一个 Service Options – WOW,打开了一个新世界,原来很多选项都在这里
../../../../_images/rbsu_service_options.png
  • Service Options 中,通过上下键移动菜单高亮,选择 PCI Express 64-Bit BAR Support ,默认这个选项是 Disabled ,按下回车键进入修改选项,将这个参数修改成 Enabled
../../../../_images/rbsu_enable_large_bar.png
  • 退出保存,然后重启服务器,此时 Large BAR 就已经激活

注解

根据HPE文档,当 System Maintenance Switch 9 设置为 ON 的时候将始终激活 Large BAR 功能,所以如果要在 RBSU 中关闭 Large BAR 需要将 System Maintenance Switch 9 设置为 OFF 位置。

可以看到新增加的NVIDIA设备:

dl360_gen9_large_bar_memory/lspci_tesla_p10.txt
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
08:00.0 3D controller: NVIDIA Corporation Device 1b39 (rev a1)
        Subsystem: NVIDIA Corporation Device 1217
        Physical Slot: 1
        Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B- DisINTx+
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0, Cache Line Size: 64 bytes
        Interrupt: pin A routed to IRQ 75
        NUMA node: 0
        Region 0: Memory at 93000000 (32-bit, non-prefetchable) [size=16M]
        Region 1: Memory at 39000000000 (64-bit, prefetchable) [size=32G]
        Region 3: Memory at 39800000000 (64-bit, prefetchable) [size=32M]
        Capabilities: [60] Power Management version 3
                Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-)
                Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
        Capabilities: [68] MSI: Enable+ Count=1/1 Maskable- 64bit+
                Address: 00000000fee00718  Data: 0000
        Capabilities: [78] Express (v2) Endpoint, MSI 00
                DevCap: MaxPayload 256 bytes, PhantFunc 0, Latency L0s unlimited, L1 <64us
                        ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset- SlotPowerLimit 0.000W
                DevCtl: CorrErr- NonFatalErr+ FatalErr+ UnsupReq-
                        RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+
                        MaxPayload 256 bytes, MaxReadReq 4096 bytes
                DevSta: CorrErr+ NonFatalErr- FatalErr- UnsupReq+ AuxPwr- TransPend-
                LnkCap: Port #0, Speed 8GT/s, Width x16, ASPM not supported
                        ClockPM+ Surprise- LLActRep- BwNot- ASPMOptComp+
                LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+
                        ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
                LnkSta: Speed 2.5GT/s (downgraded), Width x16 (ok)
                        TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
                DevCap2: Completion Timeout: Range AB, TimeoutDis+, NROPrPrP-, LTR+
                         10BitTagComp-, 10BitTagReq-, OBFF Via message, ExtFmt-, EETLPPrefix-
                         EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
                         FRS-, TPHComp-, ExtTPHComp-
                         AtomicOpsCap: 32bit- 64bit- 128bitCAS-
                DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled
                         AtomicOpsCtl: ReqEn-
                LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
                         Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
                         Compliance De-emphasis: -6dB
                LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete+, EqualizationPhase1+
                         EqualizationPhase2+, EqualizationPhase3+, LinkEqualizationRequest-
        Capabilities: [100 v1] Virtual Channel
                Caps:   LPEVC=0 RefClk=100ns PATEntryBits=1
                Arb:    Fixed- WRR32- WRR64- WRR128-
                Ctrl:   ArbSelect=Fixed
                Status: InProgress-
                VC0:    Caps:   PATOffset=00 MaxTimeSlots=1 RejSnoopTrans-
                        Arb:    Fixed- WRR32- WRR64- WRR128- TWRR128- WRR256-
                        Ctrl:   Enable+ ID=0 ArbSelect=Fixed TC/VC=ff
                        Status: NegoPending- InProgress-
        Capabilities: [250 v1] Latency Tolerance Reporting
                Max snoop latency: 0ns
                Max no snoop latency: 0ns
        Capabilities: [128 v1] Power Budgeting <?>
        Capabilities: [420 v2] Advanced Error Reporting
                UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
                UESvrt: DLP+ SDES+ TLP- FCP+ CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
                CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr-
                CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
                AERCap: First Error Pointer: 00, ECRCGenCap- ECRCGenEn- ECRCChkCap- ECRCChkEn-
                        MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
                HeaderLog: 00000000 00000000 00000000 00000000
        Capabilities: [600 v1] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?>
        Capabilities: [900 v1] Secondary PCI Express
                LnkCtl3: LnkEquIntrruptEn-, PerformEqu-
                LaneErrStat: 0
        Kernel driver in use: nouveau
        Kernel modules: nvidiafb, nouveau

注解

NVIDIA设备需要安装官方提供的私有驱动,默认Ubuntu软件仓库没有提供。 Linux view GPU information display? What kind of video card is this? 提供了常规安装显卡驱动的方法:

  • 添加 Ubuntu 图形驱动 ppa: Proprietary GPU Drivers

    sudo add-apt-repository ppa:graphics-drivers/ppa
    sudo apt-get update
    
  • 安装驱动:

    sudo apt install nvidia-driver-XXX
    
  • 下载安装CUDA