Optimal cores for Apple 2160p60 4K HEVC Surround preset?

Post your testing results with HandBrake.
Post Reply
mike693
Posts: 44
Joined: Wed Feb 14, 2007 5:42 am

Optimal cores for Apple 2160p60 4K HEVC Surround preset?

Post by mike693 »

I am shopping for a new Mac.

How many physical cores can a single instance of HB utilize with the "Apple 2160p60 4K HEVC Surround" preset?

I understand that 264 encoding performance tops out at around 8 cores. I heard 265/HEVC can utilize more, before hitting the point of diminishing returns. My goal is to reduce compute time of a single instance.

(Apologies if this was asked and answered elsewhere. I could not find it, if it was.)

Thank you!

mduell
Veteran User
Posts: 7017
Joined: Sat Apr 21, 2007 8:54 pm

Re: Optimal cores for Apple 2160p60 4K HEVC Surround preset?

Post by mduell »

Diminishing returns aren't the same as topping out, so perhaps you can clarify what you're actually asking.

For x264 diminishing returns start (trivially) at 2 (maybe 3 because of video decoder and other threads), but don't become particularly noticeable until ~6. Topping out is north of 32 even for 1080p content and 4K should scale higher.

While x265 is more computationally demanding which could potentially allow it to scale higher, some of the things H.265/HEVC allows for efficiency also make it less suitable for parallelism. The net result is about the same for diminishing returns and topping out.

mduell
Veteran User
Posts: 7017
Joined: Sat Apr 21, 2007 8:54 pm

Re: Optimal cores for Apple 2160p60 4K HEVC Surround preset?

Post by mduell »

So I figured I'd do a little experiment to show one possibility of HEVC scaling:
i7-5960X (8C/16T) with 2133Mhz DDR4
A 100 second 1080p film (24fps) clip with moderate motion and detail
x265 10-bit/Main10 preset slow with rect=0 at RF 23
HB 1.3.1 on Windows

Used pools to limit x265 encoder threads:
1 thread: 1.32 fps
2 threads: 2.08 fps
4 threads: 5.35 fps
8 threads: 11.9 fps
16 threads: 13.3 fps

Linear or better than linear gains in there, except from 1 to 2 cores, and then HT buys another 10-15%.
Now this wasn't very scientific, and I wouldn't be surprised if Turbo Boost's behavior is playing games with the result.

At 4K, you may see good scaling to 16 cores, although I'd expect some diminishing returns at 32 cores.

edit: Reran 1 thread test, got better result. I suspect this is Turbo Boost playing games.
Last edited by mduell on Sat Jan 18, 2020 2:26 pm, edited 2 times in total.

rollin_eng
Veteran User
Posts: 3370
Joined: Wed May 04, 2011 11:06 pm

Re: Optimal cores for Apple 2160p60 4K HEVC Surround preset?

Post by rollin_eng »

Would be interesting to see the same test done on a 16 core machine to see if that changes anything.

User avatar
BradleyS
Moderator
Posts: 1814
Joined: Thu Aug 09, 2007 12:16 pm

Re: Optimal cores for Apple 2160p60 4K HEVC Surround preset?

Post by BradleyS »

Some of the figures in this article might also be interesting: https://handbrake.fr/docs/en/latest/tec ... mance.html

I'd say mduell's are most applicable in this scenario.

Can say from experience, x265 does work well with > 16 cores. Xeon 22 core is great for x265.

If you're working with fully progressive video sources, disable the interlace detection and deinterlacing filters to avoid bottlenecking there. Not typically a big issue until you get a quite fast machine (or are using a fast hardware encoder), but it sounds like you're in the market.

nhyone
Bright Spark User
Posts: 249
Joined: Fri Jul 24, 2015 4:13 am

Re: Optimal cores for Apple 2160p60 4K HEVC Surround preset?

Post by nhyone »

mike693 wrote:
Fri Jan 17, 2020 11:55 pm
I understand that 264 encoding performance tops out at around 8 cores. I heard 265/HEVC can utilize more, before hitting the point of diminishing returns. My goal is to reduce compute time of a single instance.
For x264, there were some recent discussion to limit to 8 threads for "maximum" efficiency.

What it means is that it is better to encode 2 videos concurrently using 8 threads each than encoding them sequentially using 16 (assuming you have 16 cores, or 8 cores + 8 HyperThread). There are three reasons:
  • It is faster the more cores you use, but the speedup is not linear. So 16 threads is not 2x of 8 threads, but maybe only 1.5x
  • Perhaps related to the above, the cores are not fully loaded at 100%. In fact, they can range from 80% to as low as 30% (bottlenecked by decoding/filtering, probably)
  • Lastly, the more threads you use, the less space efficient the output is. (*) This does not affect video quality, just file size
(*) I don't know if this is true for x265 or not.

User avatar
BradleyS
Moderator
Posts: 1814
Joined: Thu Aug 09, 2007 12:16 pm

Re: Optimal cores for Apple 2160p60 4K HEVC Surround preset?

Post by BradleyS »

For the most part, x265 overcomes those performance/scaling shortcomings.

mike693
Posts: 44
Joined: Wed Feb 14, 2007 5:42 am

Re: Optimal cores for Apple 2160p60 4K HEVC Surround preset?

Post by mike693 »

All

Thanks to each of you for your informed and insightful responses!

It appears that 265 does not top out at 22 cores. Good to know!

It appears that the rate of fps/core improvement for 265 at 1080p may decrease somewhere between 8 and 16 cores.

I am probably looking at a 1080p workflow for the next few years.

Therefore, the best “bang for my buck” (fps/$) is likely to be an 8 core at this point. When the time comes to go 4K, I will look for the best per core cost for a 16-32 core system.

Kind regards
Mike693

User avatar
BradleyS
Moderator
Posts: 1814
Joined: Thu Aug 09, 2007 12:16 pm

Re: Optimal cores for Apple 2160p60 4K HEVC Surround preset?

Post by BradleyS »

Remember, mduell's 16 thread result wasn't much faster than the 8 thread result because the CPU only has 8 physical cores. The minor speed boost is due to hyperthreading. 16 physical cores would likely be much faster.

nhyone
Bright Spark User
Posts: 249
Joined: Fri Jul 24, 2015 4:13 am

Re: Optimal cores for Apple 2160p60 4K HEVC Surround preset?

Post by nhyone »

I'm getting some strange results.

First, my testbed:

Code: Select all

[09:52:45] NVENC version not supported. Disabling feature.
[09:52:45] hb_init: starting libhb thread
[09:52:45] thread 7fd6c8737700 started ("libhb")
HandBrake 1.3.0 (2019111000) - Linux x86_64 - https://handbrake.fr
40 CPUs detected
Opening test.mkv...
[09:52:45] CPU: Intel(R) Xeon(R) CPU E5-2660 v3 @ 2.60GHz
[09:52:45]  - Intel microarchitecture Haswell
[09:52:45]  - logical processor count: 40
(An old CPU; the oldest one that supports AVX2 used by x265.)

Calling HB:

Code: Select all

HandBrakeCLI --encoder x265 --encoder-preset slow --encopts 'pools=X' -q 20 -a none --input test.mkv --output out.mkv --start-at seconds:600 --stop-at seconds:180
slow is the slowest I can bear.

Normal (no pools), CPU usage 30%:

Code: Select all

x265 [info]: Thread pool 0 using 40 threads on numa nodes 0,1
[09:22:16] work: average encoding speed for job is 8.107468 fps
20 threads (X = 20), CPU usage ~30% [used cores only]:

Code: Select all

[09:31:07]      + options: pools=20
x265 [info]: Thread pool 0 using 20 threads on numa nodes 0,1
[09:40:11] work: average encoding speed for job is 8.009732 fps
20 threads (X = 10,10), CPU usage ~55% [used cores only]:

Code: Select all

[10:36:57]      + options: pools=10,10
x265 [info]: Thread pool 0 using 10 threads on numa nodes 0
x265 [info]: Thread pool 1 using 10 threads on numa nodes 1
[10:45:46] work: average encoding speed for job is 8.235546 fps
10 threads (X = 10), CPU usage ~40% [used cores only]:

Code: Select all

[09:41:07]      + options: pools=10
x265 [info]: Thread pool 0 using 10 threads on numa nodes 0,1
[09:51:18] work: average encoding speed for job is 7.143307 fps
10 threads (X = 10,0), CPU usage ~80% [used cores only]:

Code: Select all

[09:52:46]      + options: pools=10,0
x265 [info]: Thread pool 0 using 10 threads on numa nodes 0
[10:02:02] work: average encoding speed for job is 7.853088 fps
5 threads (X = 5,0), CPU usage ~95% [used cores only]:

Code: Select all

[10:22:54]      + options: pools=5,0
x265 [info]: Thread pool 0 using 5 threads on numa nodes 0
[10:36:44] work: average encoding speed for job is 5.265279 fps
x265 does not respect preset CPU affinity. It must use pools, but this does not allow you to use designated cores.

I'll say x265 still suffers from scaling issue. (Oops, I didn't use --pmode nor --pme.)

However, it does not suffer from file size efficiency problem. All output files have the same size (17 bytes diff).

Post Reply