Optimal cores for Apple 2160p60 4K HEVC Surround preset?

Post your testing results with HandBrake.
mike693
Posts: 46
Joined: Wed Feb 14, 2007 5:42 am

Optimal cores for Apple 2160p60 4K HEVC Surround preset?

Post by mike693 »

I am shopping for a new Mac.

How many physical cores can a single instance of HB utilize with the "Apple 2160p60 4K HEVC Surround" preset?

I understand that 264 encoding performance tops out at around 8 cores. I heard 265/HEVC can utilize more, before hitting the point of diminishing returns. My goal is to reduce compute time of a single instance.

(Apologies if this was asked and answered elsewhere. I could not find it, if it was.)

Thank you!
mduell
Veteran User
Posts: 7211
Joined: Sat Apr 21, 2007 8:54 pm

Re: Optimal cores for Apple 2160p60 4K HEVC Surround preset?

Post by mduell »

Diminishing returns aren't the same as topping out, so perhaps you can clarify what you're actually asking.

For x264 diminishing returns start (trivially) at 2 (maybe 3 because of video decoder and other threads), but don't become particularly noticeable until ~6. Topping out is north of 32 even for 1080p content and 4K should scale higher.

While x265 is more computationally demanding which could potentially allow it to scale higher, some of the things H.265/HEVC allows for efficiency also make it less suitable for parallelism. The net result is about the same for diminishing returns and topping out.
mduell
Veteran User
Posts: 7211
Joined: Sat Apr 21, 2007 8:54 pm

Re: Optimal cores for Apple 2160p60 4K HEVC Surround preset?

Post by mduell »

So I figured I'd do a little experiment to show one possibility of HEVC scaling:
i7-5960X (8C/16T) with 2133Mhz DDR4
A 100 second 1080p film (24fps) clip with moderate motion and detail
x265 10-bit/Main10 preset slow with rect=0 at RF 23
HB 1.3.1 on Windows

Used pools to limit x265 encoder threads:
1 thread: 1.32 fps
2 threads: 2.08 fps
4 threads: 5.35 fps
8 threads: 11.9 fps
16 threads: 13.3 fps

Linear or better than linear gains in there, except from 1 to 2 cores, and then HT buys another 10-15%.
Now this wasn't very scientific, and I wouldn't be surprised if Turbo Boost's behavior is playing games with the result.

At 4K, you may see good scaling to 16 cores, although I'd expect some diminishing returns at 32 cores.

edit: Reran 1 thread test, got better result. I suspect this is Turbo Boost playing games.
Last edited by mduell on Sat Jan 18, 2020 2:26 pm, edited 2 times in total.
rollin_eng
Veteran User
Posts: 3554
Joined: Wed May 04, 2011 11:06 pm

Re: Optimal cores for Apple 2160p60 4K HEVC Surround preset?

Post by rollin_eng »

Would be interesting to see the same test done on a 16 core machine to see if that changes anything.
User avatar
BradleyS
Moderator
Posts: 1859
Joined: Thu Aug 09, 2007 12:16 pm

Re: Optimal cores for Apple 2160p60 4K HEVC Surround preset?

Post by BradleyS »

Some of the figures in this article might also be interesting: https://handbrake.fr/docs/en/latest/tec ... mance.html

I'd say mduell's are most applicable in this scenario.

Can say from experience, x265 does work well with > 16 cores. Xeon 22 core is great for x265.

If you're working with fully progressive video sources, disable the interlace detection and deinterlacing filters to avoid bottlenecking there. Not typically a big issue until you get a quite fast machine (or are using a fast hardware encoder), but it sounds like you're in the market.
nhyone
Bright Spark User
Posts: 249
Joined: Fri Jul 24, 2015 4:13 am

Re: Optimal cores for Apple 2160p60 4K HEVC Surround preset?

Post by nhyone »

mike693 wrote: Fri Jan 17, 2020 11:55 pm I understand that 264 encoding performance tops out at around 8 cores. I heard 265/HEVC can utilize more, before hitting the point of diminishing returns. My goal is to reduce compute time of a single instance.
For x264, there were some recent discussion to limit to 8 threads for "maximum" efficiency.

What it means is that it is better to encode 2 videos concurrently using 8 threads each than encoding them sequentially using 16 (assuming you have 16 cores, or 8 cores + 8 HyperThread). There are three reasons:
  • It is faster the more cores you use, but the speedup is not linear. So 16 threads is not 2x of 8 threads, but maybe only 1.5x
  • Perhaps related to the above, the cores are not fully loaded at 100%. In fact, they can range from 80% to as low as 30% (bottlenecked by decoding/filtering, probably)
  • Lastly, the more threads you use, the less space efficient the output is. (*) This does not affect video quality, just file size
(*) I don't know if this is true for x265 or not.
User avatar
BradleyS
Moderator
Posts: 1859
Joined: Thu Aug 09, 2007 12:16 pm

Re: Optimal cores for Apple 2160p60 4K HEVC Surround preset?

Post by BradleyS »

For the most part, x265 overcomes those performance/scaling shortcomings.
mike693
Posts: 46
Joined: Wed Feb 14, 2007 5:42 am

Re: Optimal cores for Apple 2160p60 4K HEVC Surround preset?

Post by mike693 »

All

Thanks to each of you for your informed and insightful responses!

It appears that 265 does not top out at 22 cores. Good to know!

It appears that the rate of fps/core improvement for 265 at 1080p may decrease somewhere between 8 and 16 cores.

I am probably looking at a 1080p workflow for the next few years.

Therefore, the best “bang for my buck” (fps/$) is likely to be an 8 core at this point. When the time comes to go 4K, I will look for the best per core cost for a 16-32 core system.

Kind regards
Mike693
User avatar
BradleyS
Moderator
Posts: 1859
Joined: Thu Aug 09, 2007 12:16 pm

Re: Optimal cores for Apple 2160p60 4K HEVC Surround preset?

Post by BradleyS »

Remember, mduell's 16 thread result wasn't much faster than the 8 thread result because the CPU only has 8 physical cores. The minor speed boost is due to hyperthreading. 16 physical cores would likely be much faster.
nhyone
Bright Spark User
Posts: 249
Joined: Fri Jul 24, 2015 4:13 am

Re: Optimal cores for Apple 2160p60 4K HEVC Surround preset?

Post by nhyone »

I'm getting some strange results.

First, my testbed:

Code: Select all

[09:52:45] NVENC version not supported. Disabling feature.
[09:52:45] hb_init: starting libhb thread
[09:52:45] thread 7fd6c8737700 started ("libhb")
HandBrake 1.3.0 (2019111000) - Linux x86_64 - https://handbrake.fr
40 CPUs detected
Opening test.mkv...
[09:52:45] CPU: Intel(R) Xeon(R) CPU E5-2660 v3 @ 2.60GHz
[09:52:45]  - Intel microarchitecture Haswell
[09:52:45]  - logical processor count: 40
(An old CPU; the oldest one that supports AVX2 used by x265.)

Calling HB:

Code: Select all

HandBrakeCLI --encoder x265 --encoder-preset slow --encopts 'pools=X' -q 20 -a none --input test.mkv --output out.mkv --start-at seconds:600 --stop-at seconds:180
slow is the slowest I can bear.

Normal (no pools), CPU usage 30%:

Code: Select all

x265 [info]: Thread pool 0 using 40 threads on numa nodes 0,1
[09:22:16] work: average encoding speed for job is 8.107468 fps
20 threads (X = 20), CPU usage ~30% [used cores only]:

Code: Select all

[09:31:07]      + options: pools=20
x265 [info]: Thread pool 0 using 20 threads on numa nodes 0,1
[09:40:11] work: average encoding speed for job is 8.009732 fps
20 threads (X = 10,10), CPU usage ~55% [used cores only]:

Code: Select all

[10:36:57]      + options: pools=10,10
x265 [info]: Thread pool 0 using 10 threads on numa nodes 0
x265 [info]: Thread pool 1 using 10 threads on numa nodes 1
[10:45:46] work: average encoding speed for job is 8.235546 fps
10 threads (X = 10), CPU usage ~40% [used cores only]:

Code: Select all

[09:41:07]      + options: pools=10
x265 [info]: Thread pool 0 using 10 threads on numa nodes 0,1
[09:51:18] work: average encoding speed for job is 7.143307 fps
10 threads (X = 10,0), CPU usage ~80% [used cores only]:

Code: Select all

[09:52:46]      + options: pools=10,0
x265 [info]: Thread pool 0 using 10 threads on numa nodes 0
[10:02:02] work: average encoding speed for job is 7.853088 fps
5 threads (X = 5,0), CPU usage ~95% [used cores only]:

Code: Select all

[10:22:54]      + options: pools=5,0
x265 [info]: Thread pool 0 using 5 threads on numa nodes 0
[10:36:44] work: average encoding speed for job is 5.265279 fps
x265 does not respect preset CPU affinity. It must use pools, but this does not allow you to use designated cores.

I'll say x265 still suffers from scaling issue. (Oops, I didn't use --pmode nor --pme.)

However, it does not suffer from file size efficiency problem. All output files have the same size (17 bytes diff).
8Ringer
Posts: 14
Joined: Sun Jun 21, 2020 7:16 pm

Re: Optimal cores for Apple 2160p60 4K HEVC Surround preset?

Post by 8Ringer »

nhyone, I just put together a FreeNAS rig with dual 2620v3 Xeons, very similar to your setup above but 6-core/12-thread parts. I'm currently seeing ~60% utilization with my HEVC encodes (using a custom exported preset based loosely on the Apple HEVC 4k preset, with a minor tweak to audio settings and RF20 or something).

As a troubleshooting step I've run the same encode with no preset specified, so CLI default x264 and it saturated all cores. If I run the two encodes in parallel I nearly saturate all cores (~1100% cpu on htop for each encode, so essentially 11 threads fully utilized for each encode which is close enough to 100% for government work). I'm okay with 90% or even 95% but ~60% utilization for a single encode is pretty weak, I'd like to fully utilize the rig I've put together.

The bummer about x265 being finicky about threading is that I was intending to upgrade the rig later on with 2660s or 2670s (maybe even V4s) but if x265 is going to see such diminishing returns at high thread levels, then theres really no point in getting a 24-core, 48 thread 2P machine if its only going to use 1/4 of the CPU...

Were you ever able to nails down a config that was able to properly saturate the CPUs?
8Ringer
Posts: 14
Joined: Sun Jun 21, 2020 7:16 pm

Re: Optimal cores for Apple 2160p60 4K HEVC Surround preset?

Post by 8Ringer »

Just as a quick followup, I misunderstood what you ere trying to accomplish with the thread numbers, you were trying to diagnose the root cause, not try to get full utilization.

Anyway I've played arounda nd adding in the --encopts 'pmode=1:pme=1' has kicked CPU utilization up to 1800%. Further adding 'frame-threads=5' kicked that up to almost 2000% but actually seems to have reduced throughput (fps). Do those options add any quality to the resulting stream? Or are they just burning more resources for no real gain?
rollin_eng
Veteran User
Posts: 3554
Joined: Wed May 04, 2011 11:06 pm

Re: Optimal cores for Apple 2160p60 4K HEVC Surround preset?

Post by rollin_eng »

8Ringer wrote: Mon Jun 22, 2020 6:22 am Just as a quick followup, I misunderstood what you ere trying to accomplish with the thread numbers, you were trying to diagnose the root cause, not try to get full utilization.

Anyway I've played arounda nd adding in the --encopts 'pmode=1:pme=1' has kicked CPU utilization up to 1800%. Further adding 'frame-threads=5' kicked that up to almost 2000% but actually seems to have reduced throughput (fps). Do those options add any quality to the resulting stream? Or are they just burning more resources for no real gain?
If you dont know what things do, dont change them.

Could you please post your logs.
mduell
Veteran User
Posts: 7211
Joined: Sat Apr 21, 2007 8:54 pm

Re: Optimal cores for Apple 2160p60 4K HEVC Surround preset?

Post by mduell »

If you're getting 1100% for one encode on a machine with only 12 hardware threads, there's not much more to be had.
8Ringer
Posts: 14
Joined: Sun Jun 21, 2020 7:16 pm

Re: Optimal cores for Apple 2160p60 4K HEVC Surround preset?

Post by 8Ringer »

It’s a dual socket 2620 v3, so 2x6core with hyper threading, 24 hardware threads, verified at boot in FreeBSD. Sorry if that wasn’t clear.
8Ringer
Posts: 14
Joined: Sun Jun 21, 2020 7:16 pm

Re: Optimal cores for Apple 2160p60 4K HEVC Surround preset?

Post by 8Ringer »

rollin_eng wrote: Mon Jun 22, 2020 10:42 am
8Ringer wrote: Mon Jun 22, 2020 6:22 am Just as a quick followup, I misunderstood what you ere trying to accomplish with the thread numbers, you were trying to diagnose the root cause, not try to get full utilization.

Anyway I've played arounda nd adding in the --encopts 'pmode=1:pme=1' has kicked CPU utilization up to 1800%. Further adding 'frame-threads=5' kicked that up to almost 2000% but actually seems to have reduced throughput (fps). Do those options add any quality to the resulting stream? Or are they just burning more resources for no real gain?
If you dont know what things do, dont change them.

Could you please post your logs.
Well I’ve read the x265 technical documentation fairly extensively, it’s a bit over my head but the practical effect of much of the options they discuss in that section isn’t made clear, hence my asking.

I can post a log tomorrow though if that will help. A suspect that my x265 binaries aren’t compiled with NUMA support, I see zero mention of it anywhere in my encoder outputs.
mduell
Veteran User
Posts: 7211
Joined: Sat Apr 21, 2007 8:54 pm

Re: Optimal cores for Apple 2160p60 4K HEVC Surround preset?

Post by mduell »

8Ringer wrote: Tue Jun 23, 2020 6:29 am It’s a dual socket 2620 v3, so 2x6core with hyper threading, 24 hardware threads, verified at boot in FreeBSD. Sorry if that wasn’t clear.
You've only got 12 actual cores, they're just duplicated with SMT to squeeze out another ~10% in some situations.
8Ringer
Posts: 14
Joined: Sun Jun 21, 2020 7:16 pm

Re: Optimal cores for Apple 2160p60 4K HEVC Surround preset?

Post by 8Ringer »

Ok so here is some more detailed info. So, just to remove variables and keep this data more relevant to the thread (and now that they fixed built-in preset references in CLI) I have the encoding log here from the "Apple 2160p60 4K HEVC Surround" preset:

Code: Select all

root@Handbrake:~ # nohup HandBrakeCLI -i /mnt/movierips/BenchmarkVideo.mp4 -o /mnt/Media/PlexMedia/Movies/BMVNASapl.mp4 --preset="Apple 2160p60 4K HEVC Surround" &
[1] 29205
root@Handbrake:~ # [09:21:07] hb_init: starting libhb thread
[09:21:07] thread 808a18500 started ("libhb")
HandBrake 1.3.3 (2020062000) - FreeBSD amd64 - https://handbrake.fr
24 CPUs detected
Opening /mnt/movierips/BenchmarkVideo.mp4...
[09:21:07] CPU: Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz
[09:21:07]  - Intel microarchitecture Haswell
[09:21:07]  - logical processor count: 24
[09:21:07] Intel Quick Sync Video support: no
[09:21:07] hb_scan: path=/mnt/movierips/BenchmarkVideo.mp4, title_index=1
udfread ERROR: ECMA 167 Volume Recognition failed
disc.c:323: failed opening UDF image /mnt/movierips/BenchmarkVideo.mp4
disc.c:424: error opening file BDMV/index.bdmv
disc.c:424: error opening file BDMV/BACKUP/index.bdmv
bluray.c:2585: nav_get_title_list(/mnt/movierips/BenchmarkVideo.mp4/) failed
[09:21:07] bd: not a bd - trying as a stream/file instead
libdvdnav: Using dvdnav version 6.0.1
libdvdread:DVDOpenFileUDF:UDFFindFile /VIDEO_TS/VIDEO_TS.IFO failed
libdvdread:DVDOpenFileUDF:UDFFindFile /VIDEO_TS/VIDEO_TS.BUP failed
libdvdread: Can't open file VIDEO_TS.IFO.
libdvdnav: vm: failed to read VIDEO_TS.IFO
[09:21:07] dvd: not a dvd - trying as a stream/file instead
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from '/mnt/movierips/BenchmarkVideo.mp4':
  Metadata:
    major_brand     : mp42
    minor_version   : 512
    compatible_brands: isomiso2avc1mp41
    creation_time   : 2020-05-30T17:28:53.000000Z
    title           : Ghostbusters
    encoder         : HandBrake 1.3.2 2020050300
  Duration: 00:05:00.05, start: 0.000000, bitrate: 21151 kb/s
    Chapter #0:0: start 0.000000, end 291.964000
    Metadata:
      title           : Chapter 5
    Chapter #0:1: start 291.964000, end 299.972000
    Metadata:
      title           : Chapter 6
    Stream #0:0(und): Video: h264 (High) (avc1 / 0x31637661), yuv420p(tv, bt709), 1920x800 [SAR 1:1 DAR 12:5], 20340 kb/s, 23.98 fps, 23.98 tbr, 90k tbn, 180k tbc (default)
    Metadata:
      creation_time   : 2020-05-30T17:28:53.000000Z
      handler_name    : VideoHandler
    Stream #0:1(eng): Audio: aac (LC) (mp4a / 0x6134706D), 48000 Hz, stereo, fltp, 164 kb/s (default)
    Metadata:
      creation_time   : 2020-05-30T17:28:53.000000Z
      handler_name    : Surround 5.1
      title           : Surround 5.1
    Stream #0:2(eng): Audio: ac3 (ac-3 / 0x332D6361), 48000 Hz, 5.1(side), fltp, 640 kb/s
    Metadata:
      creation_time   : 2020-05-30T17:28:53.000000Z
      handler_name    : Surround 5.1
      title           : Surround 5.1
    Side data:
      audio service type: main
    Stream #0:3(eng): Data: bin_data (text / 0x74786574), 0 kb/s
    Metadata:
      creation_time   : 2020-05-30T17:28:53.000000Z
      handler_name    : SubtitleHandler
[09:21:07] scan: decoding previews for title 1
[09:21:07] scan: audio 0x1: aac, rate=48000Hz, bitrate=164794 English (AAC LC) (2.0 ch) (164 kbps)
[09:21:07] scan: audio 0x2: ac3, rate=48000Hz, bitrate=640000 English (AC3) (5.1 ch) (640 kbps)
Scanning title 1 of 1, preview 9, 90.00 %[09:21:08] scan: 10 previews, 1920x800, 23.976 fps, autocrop = 0/0/0/0, aspect 2.40:1, PAR 1:1
[09:21:08] scan: supported video decoders: avcodec qsv
[09:21:08] libhb: scan thread found 1 valid title(s)
+ Using preset: Apple 2160p60 4K HEVC Surround
+ title 1:
  + stream: /mnt/movierips/BenchmarkVideo.mp4
  + duration: 00:05:00
  + size: 1920x800, pixel aspect: 1/1, display aspect: 2.40, 23.976 fps
  + autocrop: 0/0/0/0
  + chapters:
    + 1: duration 00:04:52
    + 2: duration 00:00:08
  + audio tracks:
    + 1, English (AAC LC) (2.0 ch) (164 kbps) (iso639-2: eng)
    + 2, English (AC3) (5.1 ch) (640 kbps) (iso639-2: eng), 48000Hz, 640000bps
  + subtitle tracks:
[09:21:09] Starting work at: Tue Jun 23 09:21:09 2020

[09:21:09] 1 job(s) to process
[09:21:09] json job:
{
    "Audio": {
        "AudioList": [
            {
                "Bitrate": 160,
                "CompressionLevel": -1.0,
                "DRC": 0.0,
                "DitherMethod": "auto",
                "Encoder": "av_aac",
                "Gain": 0.0,
                "Mixdown": "stereo",
                "Name": "Surround 5.1",
                "NormalizeMixLevel": false,
                "PresetEncoder": "av_aac",
                "Quality": -3.0,
                "Samplerate": 0,
                "Track": 0
            },
            {
                "Bitrate": 640,
                "CompressionLevel": -1.0,
                "DRC": 0.0,
                "DitherMethod": "auto",
                "Encoder": "ac3",
                "Gain": 0.0,
                "Mixdown": "stereo",
                "Name": "Surround 5.1",
                "NormalizeMixLevel": false,
                "PresetEncoder": "copy:ac3",
                "Quality": -3.0,
                "Samplerate": 0,
                "Track": 0
            }
        ],
        "CopyMask": [
            "copy:aac",
            "copy:ac3"
        ],
        "FallbackEncoder": "av_aac"
    },
    "Destination": {
        "AlignAVStart": false,
        "ChapterList": [
            {
                "Duration": {
                    "Hours": 0,
                    "Minutes": 4,
                    "Seconds": 52,
                    "Ticks": 26276760
                },
                "Name": "Chapter 1"
            },
            {
                "Duration": {
                    "Hours": 0,
                    "Minutes": 0,
                    "Seconds": 8,
                    "Ticks": 728100
                },
                "Name": "Chapter 2"
            }
        ],
        "ChapterMarkers": true,
        "File": "/mnt/Media/PlexMedia/Movies/BMVNASapl.mp4",
        "InlineParameterSets": false,
        "Mp4Options": {
            "IpodAtom": false,
            "Mp4Optimize": false
        },
        "Mux": "m4v"
    },
    "Filters": {
        "FilterList": [
            {
                "ID": 3,
                "Settings": {
                    "block-height": "16",
                    "block-thresh": "40",
                    "block-width": "16",
                    "filter-mode": "2",
                    "mode": "3",
                    "motion-thresh": "1",
                    "spatial-metric": "2",
                    "spatial-thresh": "1"
                }
            },
            {
                "ID": 4,
                "Settings": {
                    "mode": "7"
                }
            },
            {
                "ID": 6,
                "Settings": {
                    "mode": 2,
                    "rate": "27000000/450000"
                }
            },
            {
                "ID": 12,
                "Settings": {
                    "crop-bottom": 0,
                    "crop-left": 0,
                    "crop-right": 0,
                    "crop-top": 0,
                    "height": 800,
                    "width": 1920
                }
            }
        ]
    },
    "Metadata": {
        "Name": "Ghostbusters"
    },
    "PAR": {
        "Den": 1,
        "Num": 1
    },
    "SequenceID": 0,
    "Source": {
        "Angle": 0,
        "Path": "/mnt/movierips/BenchmarkVideo.mp4",
        "Range": {
            "End": 2,
            "Start": 1,
            "Type": "chapter"
        },
        "Title": 1
    },
    "Subtitle": {
        "Search": {
            "Burn": true,
            "Default": false,
            "Enable": false,
            "Forced": false
        },
        "SubtitleList": []
    },
    "Video": {
        "ColorFormat": 0,
        "ColorMatrix": 1,
        "ColorPrimaries": 1,
        "ColorRange": 1,
        "ColorTransfer": 1,
        "Encoder": "x265",
        "Level": "auto",
        "Options": "strong-intra-smoothing=0:rect=0:aq-mode=1",
        "Preset": "slow",
        "Profile": "main",
        "QSV": {
            "AsyncDepth": 4,
            "Decode": false
        },
        "Quality": 24.0,
        "Tune": "",
        "Turbo": false,
        "TwoPass": false
    }
}
[09:21:09] Starting Task: Encoding Pass
[09:21:09] Skipping crop/scale filter
[09:21:09] job configuration:
[09:21:09]  * source
[09:21:09]    + /mnt/movierips/BenchmarkVideo.mp4
[09:21:09]    + title 1, chapter(s) 1 to 2
[09:21:09]    + container: mov,mp4,m4a,3gp,3g2,mj2
[09:21:09]    + data rate: 21151 kbps
[09:21:09]  * destination
[09:21:09]    + /mnt/Media/PlexMedia/Movies/BMVNASapl.mp4
[09:21:09]    + container: MPEG-4 (libavformat)
[09:21:09]      + chapter markers
[09:21:09]  * video track
[09:21:09]    + decoder: h264
[09:21:09]      + bitrate 20340 kbps
[09:21:09]    + filters
[09:21:09]      + Comb Detect (mode=3:spatial-metric=2:motion-thresh=1:spatial-thresh=1:filter-mode=2:block-thresh=40:block-width=16:block-height=16)
[09:21:09]      + Decomb (mode=39)
[09:21:09]      + Framerate Shaper (mode=2:rate=27000000/450000)
[09:21:09]        + frame rate: 23.976 fps -> peak rate limited to 60.000 fps
[09:21:09]    + Output geometry
[09:21:09]      + storage dimensions: 1920 x 800
[09:21:09]      + pixel aspect ratio: 1 : 1
[09:21:09]      + display dimensions: 1920 x 800
[09:21:09]    + encoder: H.265 (libx265)
[09:21:09]      + preset:  slow
[09:21:09]      + options: strong-intra-smoothing=0:rect=0:aq-mode=1
[09:21:09]      + profile: main
[09:21:09]      + level:   auto
[09:21:09]      + quality: 24.00 (RF)
[09:21:09]      + color profile: 1-1-1
[09:21:09]  * audio track 1
[09:21:09]    + name: Surround 5.1
[09:21:09]    + decoder: English (AAC LC) (2.0 ch) (164 kbps) (track 1, id 0x1)
[09:21:09]      + bitrate: 164 kbps, samplerate: 48000 Hz
[09:21:09]    + mixdown: Stereo
[09:21:09]    + dither: triangular
[09:21:09]    + encoder: AAC (libavcodec)
[09:21:09]      + bitrate: 160 kbps, samplerate: 48000 Hz
[09:21:09]  * audio track 2
[09:21:09]    + name: Surround 5.1
[09:21:09]    + decoder: English (AAC LC) (2.0 ch) (164 kbps) (track 1, id 0x1)
[09:21:09]      + bitrate: 164 kbps, samplerate: 48000 Hz
[09:21:09]    + mixdown: Stereo
[09:21:09]    + dither: triangular
[09:21:09]    + encoder: AC3 (libavcodec)
[09:21:09]      + bitrate: 640 kbps, samplerate: 48000 Hz
[09:21:09] sync: expecting 7194 video frames
x265 [info]: HEVC encoder version 3.2.1+1-b5c86a64bbbe
x265 [info]: build info [Unk-OS][clang 8.0.0][64 bit] 8bit+10bit+12bit
x265 [info]: using cpu capabilities: MMX2 SSE2Fast LZCNT SSSE3 SSE4.2 AVX FMA3 BMI2 AVX2
x265 [info]: Main profile, Level-4 (Main tier)
x265 [info]: Thread pool created using 24 threads
x265 [info]: Slices                              : 1
x265 [info]: frame threads / pool features       : 4 / wpp(13 rows)
x265 [info]: Coding QT: max CU size, min CU size : 64 / 8
x265 [info]: Residual QT: max TU size, max depth : 32 / 1 inter / 1 intra
x265 [info]: ME / range / subpel / merge         : star / 57 / 3 / 3
x265 [info]: Keyframe min / max / scenecut / bias: 24 / 240 / 40 / 5.00
x265 [info]: Lookahead / bframes / badapt        : 25 / 4 / 2
x265 [info]: b-pyramid / weightp / weightb       : 1 / 1 / 0
x265 [info]: References / ref-limit  cu / depth  : 4 / on / on
x265 [info]: AQ: mode / str / qg-size / cu-tree  : 1 / 1.0 / 32 / 1
x265 [info]: Rate Control / qCompress            : CRF-24.0 / 0.60
x265 [info]: tools: limit-modes rd=4 psy-rd=2.00 rdoq=2 psy-rdoq=1.00 rskip
x265 [info]: tools: signhide tmvp lslices=4 deblock sao
[09:21:09] sync: first pts audio 0x1 is 0
[09:21:09] sync: first pts audio 0x1 is 0
[09:21:09] sync: first pts video is 570
[09:21:09] sync: "Chapter 1" (1) at frame 1 time 570
Encoding: task 1 of 1, 97.36 % (11.54 fps, avg 12.00 fps, ETA 00h00m16s)[09:30:53] sync: "Chapter 2" (2) at frame 7006 time 26295588
Encoding: task 1 of 1, 98.85 % (12.21 fps, avg 11.99 fps, ETA 00h00m08s)[09:31:02] reader: done. 1 scr changes
Encoding: task 1 of 1, 99.97 % (10.92 fps, avg 12.00 fps, ETA 00h00m04s)[09:31:10] work: average encoding speed for job is 12.003880 fps
Encoding: task 1 of 1, 99.97 % (10.92 fps, avg 12.00 fps, ETA 00h00m04s)[09:31:11] comb detect: heavy 1 | light 62 | uncombed 7130 | total 7193
[09:31:11] decomb: deinterlaced 1 | blended 62 | unfiltered 7130 | total 7193
[09:31:11] vfr: 7193 frames output, 0 dropped and 0 duped for CFR/PFR
[09:31:11] vfr: lost time: 0 (0 frames)
[09:31:11] vfr: gained time: 0 (0 frames) (0 not accounted for)
[09:31:11] aac-decoder done: 14063 frames, 0 decoder errors
[09:31:11] aac-decoder done: 14063 frames, 0 decoder errors
[09:31:11] h264-decoder done: 7193 frames, 0 decoder errors
[09:31:11] sync: got 7193 frames, 7194 expected
[09:31:11] sync: framerate min 23.976 fps, max 23.976 fps, avg 23.976 fps
x265 [info]: frame I:     70, Avg QP:24.96  kb/s: 11513.62
x265 [info]: frame P:   1633, Avg QP:26.02  kb/s: 6369.47 
x265 [info]: frame B:   5490, Avg QP:29.37  kb/s: 1946.16 
x265 [info]: Weighted P-Frames: Y:9.6% UV:5.6%
x265 [info]: consecutive B-frames: 4.9% 1.9% 7.6% 36.8% 48.7% 

encoded 7193 frames in 602.10s (11.95 fps), 3043.48 kb/s, Avg QP:28.57
[09:31:11] mux: track 0, 7193 frames, 114162260 bytes, 3043.82 kbps, fifo 1024
[09:31:11] mux: track 1, 14064 frames, 6078566 bytes, 162.07 kbps, fifo 2048
[09:31:11] mux: track 2, 9375 frames, 24000000 bytes, 639.89 kbps, fifo 2048
[09:31:11] Finished work at: Tue Jun 23 09:31:11 2020

[09:31:11] libhb: work result = 0

Encode done!
and the "htop" output while encoding here:

Code: Select all

  1  [|||||||||||||||||||||||||||||||||||                   57.6%]   Tasks: 81, 0 thr; 2 running
  2  [||||||||||||||||||||                                  32.7%]   Load average: 11.55 3.85 1.56 
  3  [||||||||||||||||||||||||||||||||||                    56.5%]   Uptime: 2 days, 18:17:17
  4  [||||||||||||||||||||||||||                            43.3%]
  5  [||||||||||||||||||||||||||||||||                      53.2%]
  6  [||||||||||||||||||||||||                              39.0%]
  7  [|||||||||||||||||||||||||||||||||                     53.0%]
  8  [||||||||||||||||||||||||||||||                        48.5%]
  9  [||||||||||||||||||||||||||||                          45.2%]
  10 [||||||||||||||||||||||||||||||                        48.4%]
  11 [|||||||||||||||||||||||||||||                         45.7%]
  12 [|||||||||||||||||||||||||||||                         46.6%]
  13 [||||||||||||||||||||||||||||||||||                    55.9%]
  14 [|||||||||||||||||||||||||||||                         48.7%]
  15 [||||||||||||||||||||||||||                            41.9%]
  16 [|||||||||||||||||||||||||||                           44.4%]
  17 [|||||||||||||||||||||||||||||                         47.6%]
  18 [||||||||||||||||||||||||||||                          44.5%]
  19 [||||||||||||||||||||||||||||||||||||                  59.1%]
  20 [|||||||||||||||||||||||||||                           42.7%]
  21 [|||||||||||||||||||||||||||||                         47.0%]
  22 [|||||||||||||||||||||||||||||                         47.6%]
  23 [|||||||||||||||||||||||||||||||||                     53.3%]
  24 [|||||||||||||||||||||||||                             39.5%]
  Mem[||||||||||||||||||||||||||||||||||||||||||||||||3.64G/15.9G]
  Swp[                                                   0K/6.00G]

  PID USER      PRI  NI  VIRT   RES S CPU% MEM%   TIME+  Command
29205 root       40  20 1450M  893M S 1216  5.5 18:41.77 HandBrakeCLI -i /mnt/movierips/BenchmarkVideo.mp4
The CPU usage varies between 1200-1450% which is in the 50-60% range. Is this helpful/what you were asking for rollin_eng?
8Ringer
Posts: 14
Joined: Sun Jun 21, 2020 7:16 pm

Re: Optimal cores for Apple 2160p60 4K HEVC Surround preset?

Post by 8Ringer »

mduell wrote: Tue Jun 23, 2020 4:30 pm
8Ringer wrote: Tue Jun 23, 2020 6:29 am It’s a dual socket 2620 v3, so 2x6core with hyper threading, 24 hardware threads, verified at boot in FreeBSD. Sorry if that wasn’t clear.
You've only got 12 actual cores, they're just duplicated with SMT to squeeze out another ~10% in some situations.
I know what SMT is, and I know there are only 2x6 "true" cores, but the fact is my system is loaded up only 50%. In Linux/BSD (which my machine is running) each hyperthreaded core is recognized as a logical core, so my system thinks I have 24 CPUs, because thats actually accurate as far as thread scheduling goes. If I run mprime my CPU usage spikes to 2400%, it loads up all 24 threads/cores. If I run an encode on this same file with the "Apple 1080p60 Surround" h264 preset I see 2100% usage. I know h265 has some issues scaling beyond a certain point due to how the encoding engine does its thing, but it just seems odd that h264 has no problem doing it but h265 falls flat on its face at roughtly 1/2 to 3/4 the CPU usage.
User avatar
s55
HandBrake Team
Posts: 9829
Joined: Sun Dec 24, 2006 1:05 pm

Re: Optimal cores for Apple 2160p60 4K HEVC Surround preset?

Post by s55 »

@8Ringer:
Couple of points:
1. If you don't need Interlace detection and decbom, turn them off. You probably don't if your dealing with 1080p. Filters can and usually are a bottleneck.
2. Another possible bottleneck is decode. Particularly on lower clockspeed chips. Ideally, >3.5Ghz base chips will help here as there are limited numbers of threads to decode.
3. Single socket is defiantly preferred. Few of the encoders support NUMA.
8Ringer
Posts: 14
Joined: Sun Jun 21, 2020 7:16 pm

Re: Optimal cores for Apple 2160p60 4K HEVC Surround preset?

Post by 8Ringer »

s55 wrote: Tue Jun 23, 2020 5:33 pm @8Ringer:
Couple of points:
1. If you don't need Interlace detection and decbom, turn them off. You probably don't if your dealing with 1080p. Filters can and usually are a bottleneck.
2. Another possible bottleneck is decode. Particularly on lower clockspeed chips. Ideally, >3.5Ghz base chips will help here as there are limited numbers of threads to decode.
3. Single socket is defiantly preferred. Few of the encoders support NUMA.
Appreciate your response!

1) I'll turn those off and give them a go. All my content is going to be non-interlaced so, you're right, turning those off seems like the right idea as they aren't performing a useful job.

2) Thats not something that had occurred to me, I never really thought about the decode stage. My benchmark file is actually an h264 encoded file, (I usually am encoding BD rips, so they're usually VC-1 or AVC/h.264).

3) This IS something that had occurred to me, but I always thought video encoding could scale well beyond 12-cores/24-threads. If 14-15 threads really is the ceiling for x265 then its not the end of the world I suppose, single video encoding isn't the only role this machine is tasked with, its a FreeNAS box, so Plex is a big part of its use so it could see a decent transcoding load from time to time, as well as timemachine backups and file sharing.

Either way I'd LOVE to be able to get x265 to scale closer to 24-thread usage but I'm also not willing to jump through too many hoops to get there (not going to try to compile x265 with NUMA flags for instance, that likely above my paygrade) since its damn fast as it is, IMO. Or at least FAR FAR faster than my 4-core 3450 i5 I was running previously with only minimally higher power consumption.

(Update: I ran a new encode while I was typing my response with the same preset with decomb and deinterlace off and it seems I managed to snag a bit more performance out of the encoder, thanks for the tip!)
mduell
Veteran User
Posts: 7211
Joined: Sat Apr 21, 2007 8:54 pm

Re: Optimal cores for Apple 2160p60 4K HEVC Surround preset?

Post by mduell »

8Ringer wrote: Tue Jun 23, 2020 5:31 pm
mduell wrote: Tue Jun 23, 2020 4:30 pm
8Ringer wrote: Tue Jun 23, 2020 6:29 am It’s a dual socket 2620 v3, so 2x6core with hyper threading, 24 hardware threads, verified at boot in FreeBSD. Sorry if that wasn’t clear.
You've only got 12 actual cores, they're just duplicated with SMT to squeeze out another ~10% in some situations.
I know what SMT is, and I know there are only 2x6 "true" cores, but the fact is my system is loaded up only 50%. In Linux/BSD (which my machine is running) each hyperthreaded core is recognized as a logical core, so my system thinks I have 24 CPUs, because thats actually accurate as far as thread scheduling goes. If I run mprime my CPU usage spikes to 2400%, it loads up all 24 threads/cores. If I run an encode on this same file with the "Apple 1080p60 Surround" h264 preset I see 2100% usage. I know h265 has some issues scaling beyond a certain point due to how the encoding engine does its thing, but it just seems odd that h264 has no problem doing it but h265 falls flat on its face at roughtly 1/2 to 3/4 the CPU usage.
But once you're at 1200%, you don't have much for execution resources left. Pushing to 2400% by double scheduling the cores with SMT only buys you about 10% more actual work being done.
8Ringer
Posts: 14
Joined: Sun Jun 21, 2020 7:16 pm

Re: Optimal cores for Apple 2160p60 4K HEVC Surround preset?

Post by 8Ringer »

With respect, I'm not interested in debating the merits of hyperthreading, I understand what it is, I understand its not "double" the performance. However it can absolutely provide a meaningful performance increase over a non hyperthreaded CPU, and the difference is often a lot more than 10% as you're stating.

As a quick test I ran a couple encode runs:
2 concurrent encodes w/ 12 threads each 24.38 (cpu usage ~2200)
1 encode w/ 12 threads 18.62 (cpu usage ~1100%)
1 encode w/ 24 threads 19.64 (cpu usage ~1400%)

I'll need to check my bios and see if I can disable HT and try running the tests again. IMO choosing 12 vs 24 threads isn't the same as running 12-threads with HT disabled, as the scheduler is still managing 24 cpu thread slots and likely loads them all up however it sees fit. My point here is that I believe the difference between 12 and 24 is due to a bottleneck, whether thats the decoder, x265 itself, x265 not being compiled for NUMA, I don't know and thats sorta the thing I'm trying to figure out here. As it stands it would seem that x265 just can't take advantage of more than 14 threads on any given encode, I don't really think this is a hyperthreading issue at all. I might be able to test the theory with my work laptop (6-core 12-thread 9th gen i7 Macbook Pro) if Supermicro doesn't allow you to disable HT but thats for another day as its late...
mduell
Veteran User
Posts: 7211
Joined: Sat Apr 21, 2007 8:54 pm

Re: Optimal cores for Apple 2160p60 4K HEVC Surround preset?

Post by mduell »

8Ringer wrote: Sat Jun 27, 2020 6:38 amWith respect, I'm not interested in debating the merits of hyperthreading, I understand what it is, I understand its not "double" the performance. However it can absolutely provide a meaningful performance increase over a non hyperthreaded CPU, and the difference is often a lot more than 10% as you're stating.
Oh, let's look at the results:
1 encode w/ 12 threads 18.62 (cpu usage ~1100%)
1 encode w/ 24 threads 19.64 (cpu usage ~1400%)
Oh gee, 5.5%.
8Ringer
Posts: 14
Joined: Sun Jun 21, 2020 7:16 pm

Re: Optimal cores for Apple 2160p60 4K HEVC Surround preset?

Post by 8Ringer »

You being a bit of an insufferable tit aside (really, man, I don't need your arrogant attitude), I've got more/better numbers if anyone else is curious.

I dug into the BIOS and made some adjustments to my CPU power settings (gained a nice 10% performance increase due to properly operating turbo boost performance is my guess) and was able to turn on/off HT for more accurate testing. Because limiting to 12-threads isn't actually the same as disabling HT and running the same test. Yes, limiting to 12-threads (on 12-cores) approximates the result but I wanted definitive data so I went and got it with, you know, testing.

HT ON: (run conditions/avg fps/relative performance)
2 concurrent encodes w/ 12 threads each 24.27 112%
1 encode w/ 12 threads 18.6 86%
1 encode w/ 24 threads 21.75 100%
HT OFF:
1 encode w/ 12 threads 19.15 88%

So now at least I've definitively established that, yes, you're correct that HT only adds ~10% fps. I questioned that value, yes, and getting my hopes up for more was clearly a let down. Moreover, HT off is more efficient than limiting to 12 threads with HT on.

Which brings me back to the initial question, given running 2 encodes brings a further 12% more performance and is about as closed to maxed on on CPU load as one is realistically likely to get, is is possible to get that perforamnce bump with a single encode? Or is x265 encoding really just limited to not fully using all 24 threads? And before you snipe back (as I know you're going to do), the fact that there is 12% more performance left on the table with 2 encodes proves that the CPU isn't out of execution resources as you glibly stated above, mduell. Are there tweaks that can be done os is this the best I can hope for? If thats it, then thats fine, just curious if I can optimize my workflow and wring all the things out of these CPUs. While 12% is nothing in the scheme of things, if its on the table I'd like to leverage it if I can.
Post Reply