Handbrake and Haswell Quicksync

gmb · Post by **gmb** » Sat Jun 15, 2013 10:35 am

I thought Lookahead is a driver feature. Whitepaper says otherwise. Does it mean it needs app support? What encoder Intel used in the whitepaper?

Lookahead is an advanced feature available at the SDK level that can provide further quality improvements, especially when the contents have many scene changes.

I forgot to mention in my last test that quality preset doubled the encoding time on Haswell but does not really look better. Not sure if this is the intended behaviour. That's why I use Balanced on Haswell. Encoding time more comparable to Ivy Bridge quality preset. Now I'm curious how Handbrake and MediaEspresso (press version for Haswell) performs. I will do a test today.

Post by **s55** » Sat Jun 15, 2013 11:56 am

1080P, Animation, 10 Minutes 576P, Action, 10 Minutes
avg fps MB avg fps MB
Speed 197 552 Speed 858 235
Balanced 169 504 Balanced 698 221
Quality 136 502 Quality 513 220

This is with the latest SVN. It's quite a drop off to Quality and right now, I'd tend to agree with you that it's not substantially better.
Rodeo and Maxym are still busy tuning the code at the moment. I was certainly expecting more of a difference with Haswell but it's not materialized yet so it may be we are doing something wrong.

gmb · Post by **gmb** » Sat Jun 15, 2013 12:43 pm

I finished the test. I have to do screenshots but here at first encoding times.

Video 1
Handbrake Balanced HSW= 1:53 (*60 fps encoding, QSTranscode and MediaEspresso 30 fps encoding)
MediaEspresso Faster HSW= 1:07

Video 2
Handbrake Balanced HSW= 0:11
MediaEspresso Faster HSW= 0:08

Video 3
Handbrake Balanced HSW= 1:27
MediaEspresso Faster HSW= 0:53

Video 4
Handbrake Balanced HSW= 0:34
MediaEspresso Faster HSW= 0:18

Video 5
Handbrake Balanced HSW= 0:34
MediaEspresso Faster HSW= 0:34

Video 6
Handbrake Balanced HSW= 1:02
MediaEspresso Faster HSW= 0:49

Video 7
Handbrake Balanced HSW= 1:13
MediaEspresso Faster HSW= 1:03

Handbrake encoding times with Ivy Bridge quality preset were basically the same as with Haswell balanced preset now. Video 6 and Video 7 slightly slower on Haswell but overall basically the same. Compared to MediaEspresso 6.7.3521 or QSTranscode Handbrake is clearly the slowest though. Screenshots following later.

gmb · Post by **gmb** » Sat Jun 15, 2013 5:04 pm

First of all the MediaEspresso samples are taken from m2ts Output. If someone use MediaEspresso with the common mp4 Output it gives much worse results.

Video 1
Handbrake Balanced HSW= http://imageshack.us/a/img708/1882/9hd.png
MediaEspresso Faster HSW= http://imageshack.us/a/img153/3997/2kgr.png

Video 2
Handbrake Balanced HSW= http://imageshack.us/a/img593/2659/lcpk.png
MediaEspresso Faster HSW= http://imageshack.us/a/img577/3572/hbi.png

Video 3
Handbrake Balanced HSW= http://imageshack.us/a/img89/4572/yqt.png
MediaEspresso Faster HSW= http://imageshack.us/a/img9/5576/wdt.png

Video 4
Handbrake Balanced HSW= http://imageshack.us/a/img829/1876/o9n.png
MediaEspresso Faster HSW= http://imageshack.us/a/img442/9686/qcx.png

Video 5
Handbrake Balanced HSW= http://imageshack.us/a/img839/2328/8mq.png
MediaEspresso Faster HSW= http://imageshack.us/a/img692/8819/z2j.png

Video 6
Handbrake Balanced HSW= http://imageshack.us/a/img35/4071/qpc.png
MediaEspresso Faster HSW= http://imageshack.us/a/img801/1641/c9r.png

Video 7
Handbrake Balanced HSW= http://imageshack.us/a/img4/6272/bn7e.png
MediaEspresso Faster HSW= http://imageshack.us/a/img838/484/zutb.png

Video 1: QSTranscode clearly the best.
Video 2: No big differences here, maybe QSTranscode slightly better.
Video 3: No big differences visible.
Video 4: QSTranscode and Handbrake both much improved to what we got from Ivy Bridge. MediaEspresso doing much worse here.
Video 5: Biggest surprise here. Handbrake doing much better than QSTranscode. MediaEspresso slightly worse than QSTranscode.
Video 6: Not much between QSTranscode and Handbrake. MediaEspresso slightly worse here.
Video 7: QSTranscode slightly better than the rest.

gmb · Post by **gmb** » Tue Jun 18, 2013 8:48 pm

The lookahed feature requires a driver and SDK update: http://software.intel.com/en-us/forums/topic/393873

New SDK should come soon. It looks to me for full Haswell support the new SDK is mandatory because a couple of Haswell exclusive features are missing in current SDK.

gmb · Post by **gmb** » Sun Jun 23, 2013 8:09 pm

http://www.missingremote.com/review/int ... erformance

Another Quicksync review.

gmb · Post by **gmb** » Fri Jul 12, 2013 3:39 pm

ftp://ftp.ts.fujitsu.com/pub/Mainboard- ... _Win64.zip

Looks like we have an API 1.7 driver now.

gmb · Post by **gmb** » Fri Jul 12, 2013 9:15 pm

My first test. You can see the difference. QSTranscode 1011 used.

CBR: http://img856.imageshack.us/img856/604/iu6.png
VBR: http://img43.imageshack.us/img43/7281/okyx.png
AVBR: http://img33.imageshack.us/img33/5505/xdq2.png
CQP: http://img11.imageshack.us/img11/7277/bvbw.png
LA_Depth 60: http://img11.imageshack.us/img11/5739/mzry.png

Deleted User 11865 · Post by **Deleted User 11865** » Sat Jul 13, 2013 6:56 pm

gmb wrote:ftp://ftp.ts.fujitsu.com/pub/Mainboard- ... _Win64.zip

Looks like we have an API 1.7 driver now.

FWIW, the next HandBrake QSV Beta will have support for lookahead.

gmb · Post by **gmb** » Sun Jul 14, 2013 9:32 am

For a proper evaluation I need at least 2 Quicksync encoder.

Looks like CBR/VBR/AVBR in the screenshots above look so bad because it was a scene change. A frame later and it looks almost comparable to CQP. CQP is more robust in scene changes obviously. Without the scene change VBR is usually better than CQP. LA is the most robust bitrate control mode of course. Also the slowest, encoding time about 50% longer than VBR in this video.

Post by **s55** » Sat Jul 27, 2013 1:56 pm

The new beta has been pushed to sourceforge now.

gmb · Post by **gmb** » Sat Jul 27, 2013 2:11 pm

Is this version with lookahead support? How can I enable it?

Deleted User 11865 · Post by **Deleted User 11865** » Sat Jul 27, 2013 2:14 pm

gmb wrote:Is this version with lookahead support? How can I enable it?

Code: Select all

HandBrakeCLI.exe --help

gmb · Post by **gmb** » Sat Jul 27, 2013 2:54 pm

Then this site should use the proper commands: https://trac.handbrake.fr/wiki/QuickSyncOptions

la did nothing. lookahead=1 works now. Custom lookahead-depth values over 40 are not working here, is there another way to enforce higher values like 60? In my most videos 60 did a little better than 40.

Deleted User 11865 · Post by **Deleted User 11865** » Sat Jul 27, 2013 3:31 pm

You mean encoding hangs with a lookahead depth of 60? It's a bug that needs fixing (no ETA).

Lookahead is default as long as target usage is <= 2 and average bitrate is used.

gmb · Post by **gmb** » Sat Jul 27, 2013 4:16 pm

Yes it hangs. I don't use very high in Handbrake, I use TU4 via command line which is a good tradeoff. The best quality preset (I guess TU2) is so much slower that the small quality enhancement isn't worth it. For a default setting TU4 is the right choice.

Deleted User 11865 · Post by **Deleted User 11865** » Sat Jul 27, 2013 4:42 pm

My testing with lookahead was that the lookahead is a bit of a bottleneck (so TU 2 is not significantly slower than TU 4 or even 7 when the lookahead is enabled). YMMV.

gmb · Post by **gmb** » Sat Jul 27, 2013 5:07 pm

LA with one of Intels demo movies:

TU4= 1:25 min
TU2= 2:12 min

TU1 is a doubling on my system, TU2 still pretty close. I'm struggling to see a difference, slightly better than TU4 but not worth it for the big speed penalty. TU4 is simply the best tradeoff. I see some great results with LA in Handbrake. Very competitive speed and quality when I compare to QSTranscode. I think there are further improvements possible with gop tweaking.

Deleted User 11865 · Post by **Deleted User 11865** » Sat Jul 27, 2013 5:13 pm

Interesting. I guess I need to re-test stuff.

gmb · Post by **gmb** » Tue Jul 30, 2013 2:07 am

Did they use the old Handbrake Beta?

http://abload.de/img/handbrake8fonv.png

http://download-software.intel.com/site ... -Video.pdf

Deleted User 11865 · Post by **Deleted User 11865** » Tue Jul 30, 2013 9:02 am

Not sure what version of HandBrake they used for the x264 portion, but hopefully they used something else to demonstrate the QSV part. I'm not quite sure all our Haswell issues are fixed.

gmb · Post by **gmb** » Tue Jul 30, 2013 10:31 am

Obviously they used this for demonstration.

Intel® Media Demo Booth
–
Ultra HD (4K) Decode and Encode
–
HandBrake Quick Sync Video Enabling

Internally they have a QS transcoder with lookahead support since quite a while. In one of the presentations:

The software used in comparisons is non-commercial Intel test application MFX transcoder.

Also in the same paper one page back

Demo Clip transcoded with Intel internal test application with 2Mbps bitrate encoding

For lookahead comparisons they probably used the internal transcoder.

gmb · Post by **gmb** » Tue Jul 30, 2013 2:33 pm

On page 12 btw there is another confirmation that the quality difference between TU4 and TU2 is very very tiny. That's why I don't consider TU2 as a useful preset, at least not with such a huge encoding slowdown.

TU1 is a bit strange as well, afaik the only difference to TU2 is enabled Trellis in TU1. With Trellis enabled my output loses details, maybe Trellis is useful only for high bitrate encoding. The slide says Trellis Quantization: high bit rate encoding quality improvement. My test videos in 1080p and ~5 MBps were on the lower bitrate side. (I mean the output, my original videos have much more bitrate)

gmb · Post by **gmb** » Thu Sep 12, 2013 10:01 pm

New Quicksync slides from IDF San Francisco.

And API 1.7 driver for those wo prefer downloadcenter drivers: https://downloadcenter.intel.com/Detail ... 864-bit%29

xooyoozoo · Post by **xooyoozoo** » Fri Sep 13, 2013 6:41 am

Thanks for the slides. Slide 12 is especially interesting because it comes so close to being real, useful information.

It's strange that they chose to give PSNR decibels, instead of filesize percentages, as a measurement. It's grudgingly acceptable if they only focused on a single bitrate or QP, but the slide itself noted that on the CQP graph, they did the full 4-point measurement necessary for reliable delta-filesize summarization. That meant they had the RD curve right in front of them, but they chose to measure the less "real" axis (PSNR db). Well, let's try our best to speculate anyway, as Intel's methods are similar to the "official" MPEG/JCT ones and I highly doubt they reinvented the wheel in regards to how they approach quantization parameters.

We first need to reproduce Intel's PSNR vs log-bitrate graph. Old JCT docs testing the reference JM encoder very reliably produce a ~9 db PSNR gap between QP22 and QP37. Filesize spreads there are harder to summarize, but I say an ~8x (mostly between 6x and 10x) size spread between smallest and largest encode per clip is a good average. Assuming there isn't something wrong with the encoder, this log-log RD graph (PSNR-to-logSize) should then be linear in this range, which means each extra 0.1 db encoder quality bump would correspond to a new file being ~97.7% of the original size for the same quality. The full 0.7db increase going from Haswell-TU7 to TU1 would then suggest that TU1 files are ~85% of TU7's size for the same quality. (Using a 6x spread changes the last number to 87% instead. 10x to 83.6%)

Algorithmic metrics have their downfalls, of course, but having a numeric starting point as rough guidance is better than having nothing at all. Additionally, while the above is entirely speculatory, based on the limits we know exist, I doubt independent testing with better metrics (SSIM or preferably MS-SSIM) would produce something much different. Still, it'd be nice to know specific numbers for ourselves. I've been waiting for the compression folks at MSU to release their yearly report, but they've been silent lately.

If anyone has Haswell and can share encodes spread over 4 points on some common raw test sequences, I can help with the number crunching.

And after that's done, it'd also be nice to aggregate encoding speed, combine it with quality/delta numbers, and compare it to x264 and produce something like this.

HandBrake

Handbrake and Haswell Quicksync

Re: Handbrake and Haswell Quicksync

Re: Handbrake and Haswell Quicksync

Re: Handbrake and Haswell Quicksync

Re: Handbrake and Haswell Quicksync

Re: Handbrake and Haswell Quicksync

Re: Handbrake and Haswell Quicksync

Re: Handbrake and Haswell Quicksync

Re: Handbrake and Haswell Quicksync

Re: Handbrake and Haswell Quicksync

Re: Handbrake and Haswell Quicksync

Re: Handbrake and Haswell Quicksync

Re: Handbrake and Haswell Quicksync

Re: Handbrake and Haswell Quicksync

Re: Handbrake and Haswell Quicksync

Re: Handbrake and Haswell Quicksync

Re: Handbrake and Haswell Quicksync

Re: Handbrake and Haswell Quicksync

Re: Handbrake and Haswell Quicksync

Re: Handbrake and Haswell Quicksync

Re: Handbrake and Haswell Quicksync

Re: Handbrake and Haswell Quicksync

Re: Handbrake and Haswell Quicksync

Re: Handbrake and Haswell Quicksync

Re: Handbrake and Haswell Quicksync

Re: Handbrake and Haswell Quicksync