How to select device on server with multiple GPU's?

Support for HandBrake on Linux, Solaris, and other Unix-like platforms
Forum rules
An Activity Log is required for support requests. Please read How-to get an activity log? for details on how and why this should be provided.
Post Reply
frijsdijk
Posts: 7
Joined: Mon Nov 11, 2024 10:31 am

How to select device on server with multiple GPU's?

Post by frijsdijk »

Description of problem or question:

Can't find a way to balance load over a server that has 4 Tesla V100S's (device 0,1,2,3)


Steps to reproduce the problem (If Applicable):

When starting handbrake, encoding works fine, but it's always on device 0.



HandBrake version (e.g., 1.0.0):

1.8.2 (CLI)



Operating system and version (e.g., Ubuntu 16.04 LTS, macOS 10.13 High Sierra, Windows 10 Creators Update):

Ubuntu 22.04 LTS



HandBrake Activity Log ***required*** (see How-to get an activity log)

Since Handbrake works fine, I don't think these are needed, but perhaps this is useful?

Bit of activity log that shows i'm using hardware encoding:

Code: Select all

[12:11:46] encavcodecInit: H.265 (Nvidia NVENC)
Drivers installed:

Code: Select all

hi  libnvidia-cfg1-550-server:amd64        550.90.07-0ubuntu0.22.04.1              amd64        NVIDIA binary OpenGL/GLX configuration library
hi  libnvidia-compute-550-server:amd64     550.90.07-0ubuntu0.22.04.1              amd64        NVIDIA libcompute package
hi  libnvidia-decode-550-server:amd64      550.90.07-0ubuntu0.22.04.1              amd64        NVIDIA Video Decoding runtime libraries
hi  libnvidia-encode-550-server:amd64      550.90.07-0ubuntu0.22.04.1              amd64        NVENC Video Encoding runtime library
hi  nvidia-compute-utils-550-server        550.90.07-0ubuntu0.22.04.1              amd64        NVIDIA compute utilities
hi  nvidia-dkms-550-server                 550.90.07-0ubuntu0.22.04.1              amd64        NVIDIA DKMS package
hi  nvidia-firmware-550-server-550.90.07   550.90.07-0ubuntu0.22.04.1              amd64        Firmware files used by the kernel module
hi  nvidia-headless-550-server             550.90.07-0ubuntu0.22.04.1              amd64        NVIDIA headless metapackage
hi  nvidia-headless-no-dkms-550-server     550.90.07-0ubuntu0.22.04.1              amd64        NVIDIA headless metapackage - no DKMS
hi  nvidia-kernel-common-550-server        550.90.07-0ubuntu0.22.04.1              amd64        Shared files used with the kernel module
hi  nvidia-kernel-source-550-server        550.90.07-0ubuntu0.22.04.1              amd64        NVIDIA kernel source package
lspci -v output:

Code: Select all

3b:00.0 3D controller: NVIDIA Corporation GV100GL [Tesla V100S PCIe 32GB] (rev a1)
        Subsystem: NVIDIA Corporation GV100GL [Tesla V100S PCIe 32GB]
        Flags: bus master, fast devsel, latency 0, IRQ 335, NUMA node 0
        Memory at b7000000 (32-bit, non-prefetchable) [size=16M]
        Memory at 3af000000000 (64-bit, prefetchable) [size=32G]
        Memory at 3af800000000 (64-bit, prefetchable) [size=32M]
        Capabilities: [60] Power Management version 3
        Capabilities: [68] MSI: Enable+ Count=1/1 Maskable- 64bit+
        Capabilities: [78] Express Endpoint, MSI 00
        Capabilities: [100] Virtual Channel
        Capabilities: [258] L1 PM Substates
        Capabilities: [128] Power Budgeting <?>
        Capabilities: [420] Advanced Error Reporting
        Capabilities: [600] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?>
        Capabilities: [900] Secondary PCI Express
        Capabilities: [ac0] Designated Vendor-Specific: Vendor=10de ID=0001 Rev=1 Len=12 <?>
        Kernel driver in use: nvidia
        Kernel modules: nvidiafb, nouveau, nvidia_drm, nvidia

5d:00.0 PCI bridge: Intel Corporation Sky Lake-E PCI Express Root Port A (rev 07) (prog-if 00 [Normal decode])
--
5e:00.0 3D controller: NVIDIA Corporation GV100GL [Tesla V100S PCIe 32GB] (rev a1)
        Subsystem: NVIDIA Corporation GV100GL [Tesla V100S PCIe 32GB]
        Flags: bus master, fast devsel, latency 0, IRQ 336, NUMA node 0
        Memory at c4000000 (32-bit, non-prefetchable) [size=16M]
        Memory at 3bf000000000 (64-bit, prefetchable) [size=32G]
        Memory at 3bf800000000 (64-bit, prefetchable) [size=32M]
        Capabilities: [60] Power Management version 3
        Capabilities: [68] MSI: Enable+ Count=1/1 Maskable- 64bit+
        Capabilities: [78] Express Endpoint, MSI 00
        Capabilities: [100] Virtual Channel
        Capabilities: [258] L1 PM Substates
        Capabilities: [128] Power Budgeting <?>
        Capabilities: [420] Advanced Error Reporting
        Capabilities: [600] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?>
        Capabilities: [900] Secondary PCI Express
        Capabilities: [ac0] Designated Vendor-Specific: Vendor=10de ID=0001 Rev=1 Len=12 <?>
        Kernel driver in use: nvidia
        Kernel modules: nvidiafb, nouveau, nvidia_drm, nvidia

80:04.0 System peripheral: Intel Corporation Sky Lake-E CBDMA Registers (rev 07)
--
86:00.0 3D controller: NVIDIA Corporation GV100GL [Tesla V100S PCIe 32GB] (rev a1)
        Subsystem: NVIDIA Corporation GV100GL [Tesla V100S PCIe 32GB]
        Flags: bus master, fast devsel, latency 0, IRQ 337, NUMA node 1
        Memory at df000000 (32-bit, non-prefetchable) [size=16M]
        Memory at 3df000000000 (64-bit, prefetchable) [size=32G]
        Memory at 3df800000000 (64-bit, prefetchable) [size=32M]
        Capabilities: [60] Power Management version 3
        Capabilities: [68] MSI: Enable+ Count=1/1 Maskable- 64bit+
        Capabilities: [78] Express Endpoint, MSI 00
        Capabilities: [100] Virtual Channel
        Capabilities: [258] L1 PM Substates
        Capabilities: [128] Power Budgeting <?>
        Capabilities: [420] Advanced Error Reporting
        Capabilities: [600] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?>
        Capabilities: [900] Secondary PCI Express
        Capabilities: [ac0] Designated Vendor-Specific: Vendor=10de ID=0001 Rev=1 Len=12 <?>
        Kernel driver in use: nvidia
        Kernel modules: nvidiafb, nouveau, nvidia_drm, nvidia

ae:00.0 PCI bridge: Intel Corporation Sky Lake-E PCI Express Root Port A (rev 07) (prog-if 00 [Normal decode])
--
d8:00.0 3D controller: NVIDIA Corporation GV100GL [Tesla V100S PCIe 32GB] (rev a1)
        Subsystem: NVIDIA Corporation GV100GL [Tesla V100S PCIe 32GB]
        Flags: bus master, fast devsel, latency 0, IRQ 338, NUMA node 1
        Memory at fa000000 (32-bit, non-prefetchable) [size=16M]
        Memory at 3ff000000000 (64-bit, prefetchable) [size=32G]
        Memory at 3ff800000000 (64-bit, prefetchable) [size=32M]
        Capabilities: [60] Power Management version 3
        Capabilities: [68] MSI: Enable+ Count=1/1 Maskable- 64bit+
        Capabilities: [78] Express Endpoint, MSI 00
        Capabilities: [100] Virtual Channel
        Capabilities: [258] L1 PM Substates
        Capabilities: [128] Power Budgeting <?>
        Capabilities: [420] Advanced Error Reporting
        Capabilities: [600] Vendor Specific Information: ID=0001 Rev=1 Len=024 <?>
        Capabilities: [900] Secondary PCI Express
        Capabilities: [ac0] Designated Vendor-Specific: Vendor=10de ID=0001 Rev=1 Len=12 <?>
        Kernel driver in use: nvidia
        Kernel modules: nvidiafb, nouveau, nvidia_drm, nvidia
rollin_eng
Veteran User
Posts: 4997
Joined: Wed May 04, 2011 11:06 pm

Re: How to select device on server with multiple GPU's?

Post by rollin_eng »

I believe handbrake will just select the first device not in use.

There is no way to use multiple cards for one single encode, but you should be able to run a single encode on each card.
frijsdijk
Posts: 7
Joined: Mon Nov 11, 2024 10:31 am

Re: How to select device on server with multiple GPU's?

Post by frijsdijk »

rollin_eng wrote: Mon Nov 11, 2024 1:16 pm I believe handbrake will just select the first device not in use.

There is no way to use multiple cards for one single encode, but you should be able to run a single encode on each card.
Oh, sorry for not specifying, but I was planning on launching so much threads to evenly and optimally load all 4 GPU's. I have a ton of video to transcode. But as it stands, it seems I can only load one out of 4 available devices in my server. With ffmpeg I could select the device the process would land on.
User avatar
Ritsuka
HandBrake Team
Posts: 1747
Joined: Fri Jan 12, 2007 11:29 am

Re: How to select device on server with multiple GPU's?

Post by Ritsuka »

gpu=1 or the number of the device in the advanced options should work.
frijsdijk
Posts: 7
Joined: Mon Nov 11, 2024 10:31 am

Re: How to select device on server with multiple GPU's?

Post by frijsdijk »

Great, I'll test this!
frijsdijk
Posts: 7
Joined: Mon Nov 11, 2024 10:31 am

Re: How to select device on server with multiple GPU's?

Post by frijsdijk »

Works, nice. Thanks!
frijsdijk
Posts: 7
Joined: Mon Nov 11, 2024 10:31 am

Re: How to select device on server with multiple GPU's?

Post by frijsdijk »

Following up on this, is it possible to control the number of CPU cores HandBrakeCLI uses while running? I'm transcoding on a machine that has 2 CPU's (Intel(R) Xeon(R) Gold 6226R CPU @ 2.90GHz), so 64 cores HT seen in /proc/cpuinfo, and it has 4 Tesla V100S, and I can only run like 4-5 or so HandBrake processes before the CPU's start to smoke (load > 64, idle ~ 0%) and the Tesla's are picking their noses.

Here's a copy of the preset I've created: https://pastebin.com/QfkJHUEw

I have 4 presets like this, each with a different "VideoOptionExtra" of gpu=0, gpu=1, gpu=2, gpu=3 to spread the load across the 4 available GPU's in the server.

I then run HandBrake like this:

fr.handbrake.HandBrakeCLI -i test.mp4 -o test-out.mp4 --preset-import-file /home/ubuntu/handbrake-presets/mypreset-${GPU}.json -Z mypreset
rollin_eng
Veteran User
Posts: 4997
Joined: Wed May 04, 2011 11:06 pm

Re: How to select device on server with multiple GPU's?

Post by rollin_eng »

I’m not sure what you want. You have 4 encodes running, one on each gfx card right?

So 4 encodes running at a time. Are your CPU’s being maxed out by this?

Can you please post an encode log.
frijsdijk
Posts: 7
Joined: Mon Nov 11, 2024 10:31 am

Re: How to select device on server with multiple GPU's?

Post by frijsdijk »

Sure, here's an example: https://pastebin.com/0FwSW8nT

I can see the HandBrakeCLI processes landing on the correct GPU's (nvtop), and i launch about 4-5 at a time (so one of the GPU's will have 2 HandBrake processes but load is still quite low), and each HandBrakeCLI process in top takes anywhere near 500-1000% CPU:

Code: Select all

top - 02:23:11 up 45 days, 49 min, 22 users,  load average: 56.15, 53.36, 47.95
Tasks: 1012 total,   1 running, 1011 sleeping,   0 stopped,   0 zombie
%Cpu(s): 49.1 us,  4.5 sy,  0.1 ni, 42.2 id,  3.8 wa,  0.0 hi,  0.4 si,  0.0 st
MiB Mem : 1546800.+total,   9876.1 free,  23888.5 used, 1513035.+buff/cache
MiB Swap:   1024.0 total,    967.6 free,     56.4 used. 1514477.+avail Mem

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
2404645 ubuntu    20   0   24.1g 969248 297500 S 901.3   0.1  59:09.95 HandBrakeCLI
2402445 ubuntu    20   0   24.6g   1.4g 303288 S 721.1   0.1  69:30.43 HandBrakeCLI
2401736 ubuntu    20   0   24.2g 868060 293800 S 592.1   0.1  49:06.93 HandBrakeCLI
2403278 ubuntu    20   0   31.2g   1.0g 299148 S 443.8   0.1  47:32.26 HandBrakeCLI
2406385 ubuntu    20   0   24.8g 666048 284148 S 376.6   0.0   8:34.51 HandBrakeCLI
I try to not load the system 100% as you can see. Before HB I was using just plain old ffmpeg (with cuda) and I could launch about 12-16 of those and the CPU load was nowhere near what I'm seeing when I use HB.
frijsdijk
Posts: 7
Joined: Mon Nov 11, 2024 10:31 am

Re: How to select device on server with multiple GPU's?

Post by frijsdijk »

To answer your question: I'd like to lower the CPU usage per process if possible.
rollin_eng
Veteran User
Posts: 4997
Joined: Wed May 04, 2011 11:06 pm

Re: How to select device on server with multiple GPU's?

Post by rollin_eng »

I’m not sure what happens when you run multiple encodes on the same gfx card, I didn’t think it was possible. I don’t use hardware encoding so hopefully someone who does can answer this.

Regarding the cpu usage, only your video encoding is done on the gfx card so everything else ie decoding, filtering, audio encoding etc is done on cpu. Looking at your log you could turn of your filters unless they are needed, that might help.
Post Reply