Parallel queue implementation

HandBrake for Windows support
Forum rules
An Activity Log is required for support requests. Please read How-to get an activity log? for details on how and why this should be provided.
Bright Spark User
Posts: 249
Joined: Fri Jul 24, 2015 4:13 am

Re: Parallel queue implementation

Post by nhyone »

andrewk89 wrote:
Mon Jan 20, 2020 5:33 pm
The simulation software we use at work is reported to hit CPU starvation somewhere between 2-3 cores per memory channel. Beyond that, there isn't enough RAM bandwidth to keep the CPUs busy. A colleague recently bought a 48-core 384 GB workstation and it kind of runs like a turd (chipset gives 12 memory channels the way his RAM is installed). Unfortunately, he found this research only after making the purchase.
I would think the vendor should know the proper memory configuration? e.g. Intel Xeon Scalable Family Balanced Memory Configurations (From Lenovo)

In the example, the system can take 12-DIMMs. The good configurations:
2 DIMM, single 2-channel interleave set: 35%
4 DIMM, single 4-channel interleave set: 67%
6 DIMM, single 6-channel interleave set: 97%
12 DIMM: 100%

Maybe CPU starvation is due to something else, maybe cache thrashing or I/O?

Posts: 58
Joined: Thu Jun 13, 2013 4:29 pm

Re: Parallel queue implementation

Post by andrewk89 »

nhyone wrote:
Wed Jan 22, 2020 4:25 am
Maybe CPU starvation is due to something else, maybe cache thrashing or I/O?
Here is the reference he found on the topic.

384 GB = 12 x 32 GB - all 12 channels are populated, or 1 memory channel per 4 cores. Put a temp drive on a RAM disk to eliminate I/O - it didn't really help. If you believe the conclusion given in the paper, the RAM drive is counter-productive since you didn't have enough RAM bandwidth in the first place.

Posts: 7
Joined: Mon Jan 13, 2020 1:06 pm

Re: Parallel queue implementation

Post by Umadevi »

Dear all,
As we have suggested earlier we were observing low CPU utilizations for large machines with HandBrake. We have managed to bring up a stable version of our suggested parallel implementation of HandBrake. We are finding the initial results of the parallel implementation to be promising. We are currently testing our implementation in AMD Threadripper 2990wx (with 64 threads) and our parallel versions is showing higher CPU and Memory utilization compared to the sequential version.
PFA our average execution time ( ... sp=sharing), CPU utilization ( ... sp=sharing) and memory utilization ( ... sp=sharing) for serial, dynamic parallel and static parallel modes. In static parallel mode the user can select the number of jobs to be run in parallel. In the charts we are providing the results for 3,4 and 5 jobs in parallel (marked as parallel 3, parallel 4 and parallel 5 respectively in the charts). In dynamic parallel (marked as just parallel in the charts), our online dynamic cost function framework decides on the number of jobs to be run in parallel.

The queues have been randomly selected to cover as much use cases as possible.

We are still actively trying to improve our parallel logic, however if you could try our parallel implementation out and provide your valuable suggestions it would be great.
Please find attached our CLI version of parallel HandBrake here. ... sp=sharing
To run the sequential mode : HandBrakeCLI.exe --queue-import-file file.json [The execution time for this mode is similar to traditional HandBrake]
Dynamic parallel mode : HandBrakeCLI.exe --queue-import-file file.json --parallel
Static parallel mode : HandBrakeCLI.exe --queue-import-file file.json --parallel=n (where n represents number of jobs to be run in parallel )

Post Reply