Handbrake and Haswell Quicksync

Archive of historical development discussions
Discussions / Development has moved to GitHub
Forum rules
*******************************
Please be aware we are now using GitHub for issue tracking and feature requests.
- This section of the forum is now closed to new topics.

*******************************
gmb
Bright Spark User
Posts: 350
Joined: Thu Mar 28, 2013 12:49 pm

Re: Handbrake and Haswell Quicksync

Post by gmb »

I thought Lookahead is a driver feature. Whitepaper says otherwise. Does it mean it needs app support? What encoder Intel used in the whitepaper?
Lookahead is an advanced feature available at the SDK level that can provide further quality improvements, especially when the contents have many scene changes.

I forgot to mention in my last test that quality preset doubled the encoding time on Haswell but does not really look better. Not sure if this is the intended behaviour. That's why I use Balanced on Haswell. Encoding time more comparable to Ivy Bridge quality preset. Now I'm curious how Handbrake and MediaEspresso (press version for Haswell) performs. I will do a test today.
User avatar
s55
HandBrake Team
Posts: 10357
Joined: Sun Dec 24, 2006 1:05 pm

Re: Handbrake and Haswell Quicksync

Post by s55 »

1080P, Animation, 10 Minutes 576P, Action, 10 Minutes
avg fps MB avg fps MB
Speed 197 552 Speed 858 235
Balanced 169 504 Balanced 698 221
Quality 136 502 Quality 513 220


This is with the latest SVN. It's quite a drop off to Quality and right now, I'd tend to agree with you that it's not substantially better.
Rodeo and Maxym are still busy tuning the code at the moment. I was certainly expecting more of a difference with Haswell but it's not materialized yet so it may be we are doing something wrong.
gmb
Bright Spark User
Posts: 350
Joined: Thu Mar 28, 2013 12:49 pm

Re: Handbrake and Haswell Quicksync

Post by gmb »

I finished the test. I have to do screenshots but here at first encoding times.

Video 1
Handbrake Balanced HSW= 1:53 (*60 fps encoding, QSTranscode and MediaEspresso 30 fps encoding)
MediaEspresso Faster HSW= 1:07

Video 2
Handbrake Balanced HSW= 0:11
MediaEspresso Faster HSW= 0:08

Video 3
Handbrake Balanced HSW= 1:27
MediaEspresso Faster HSW= 0:53

Video 4
Handbrake Balanced HSW= 0:34
MediaEspresso Faster HSW= 0:18

Video 5
Handbrake Balanced HSW= 0:34
MediaEspresso Faster HSW= 0:34

Video 6
Handbrake Balanced HSW= 1:02
MediaEspresso Faster HSW= 0:49

Video 7
Handbrake Balanced HSW= 1:13
MediaEspresso Faster HSW= 1:03


Handbrake encoding times with Ivy Bridge quality preset were basically the same as with Haswell balanced preset now. Video 6 and Video 7 slightly slower on Haswell but overall basically the same. Compared to MediaEspresso 6.7.3521 or QSTranscode Handbrake is clearly the slowest though. Screenshots following later.
gmb
Bright Spark User
Posts: 350
Joined: Thu Mar 28, 2013 12:49 pm

Re: Handbrake and Haswell Quicksync

Post by gmb »

First of all the MediaEspresso samples are taken from m2ts Output. If someone use MediaEspresso with the common mp4 Output it gives much worse results.

Video 1
Handbrake Balanced HSW= http://imageshack.us/a/img708/1882/9hd.png
MediaEspresso Faster HSW= http://imageshack.us/a/img153/3997/2kgr.png

Video 2
Handbrake Balanced HSW= http://imageshack.us/a/img593/2659/lcpk.png
MediaEspresso Faster HSW= http://imageshack.us/a/img577/3572/hbi.png

Video 3
Handbrake Balanced HSW= http://imageshack.us/a/img89/4572/yqt.png
MediaEspresso Faster HSW= http://imageshack.us/a/img9/5576/wdt.png

Video 4
Handbrake Balanced HSW= http://imageshack.us/a/img829/1876/o9n.png
MediaEspresso Faster HSW= http://imageshack.us/a/img442/9686/qcx.png

Video 5
Handbrake Balanced HSW= http://imageshack.us/a/img839/2328/8mq.png
MediaEspresso Faster HSW= http://imageshack.us/a/img692/8819/z2j.png

Video 6
Handbrake Balanced HSW= http://imageshack.us/a/img35/4071/qpc.png
MediaEspresso Faster HSW= http://imageshack.us/a/img801/1641/c9r.png

Video 7
Handbrake Balanced HSW= http://imageshack.us/a/img4/6272/bn7e.png
MediaEspresso Faster HSW= http://imageshack.us/a/img838/484/zutb.png


Video 1: QSTranscode clearly the best.
Video 2: No big differences here, maybe QSTranscode slightly better.
Video 3: No big differences visible.
Video 4: QSTranscode and Handbrake both much improved to what we got from Ivy Bridge. MediaEspresso doing much worse here.
Video 5: Biggest surprise here. Handbrake doing much better than QSTranscode. MediaEspresso slightly worse than QSTranscode.
Video 6: Not much between QSTranscode and Handbrake. MediaEspresso slightly worse here.
Video 7: QSTranscode slightly better than the rest.
gmb
Bright Spark User
Posts: 350
Joined: Thu Mar 28, 2013 12:49 pm

Re: Handbrake and Haswell Quicksync

Post by gmb »

The lookahed feature requires a driver and SDK update: http://software.intel.com/en-us/forums/topic/393873

New SDK should come soon. It looks to me for full Haswell support the new SDK is mandatory because a couple of Haswell exclusive features are missing in current SDK.
gmb
Bright Spark User
Posts: 350
Joined: Thu Mar 28, 2013 12:49 pm

Re: Handbrake and Haswell Quicksync

Post by gmb »

gmb
Bright Spark User
Posts: 350
Joined: Thu Mar 28, 2013 12:49 pm

Re: Handbrake and Haswell Quicksync

Post by gmb »

ftp://ftp.ts.fujitsu.com/pub/Mainboard- ... _Win64.zip


Looks like we have an API 1.7 driver now.
gmb
Bright Spark User
Posts: 350
Joined: Thu Mar 28, 2013 12:49 pm

Re: Handbrake and Haswell Quicksync

Post by gmb »

Deleted User 11865

Re: Handbrake and Haswell Quicksync

Post by Deleted User 11865 »

gmb wrote:ftp://ftp.ts.fujitsu.com/pub/Mainboard- ... _Win64.zip


Looks like we have an API 1.7 driver now.
FWIW, the next HandBrake QSV Beta will have support for lookahead.
gmb
Bright Spark User
Posts: 350
Joined: Thu Mar 28, 2013 12:49 pm

Re: Handbrake and Haswell Quicksync

Post by gmb »

For a proper evaluation I need at least 2 Quicksync encoder.


Looks like CBR/VBR/AVBR in the screenshots above look so bad because it was a scene change. A frame later and it looks almost comparable to CQP. CQP is more robust in scene changes obviously. Without the scene change VBR is usually better than CQP. LA is the most robust bitrate control mode of course. Also the slowest, encoding time about 50% longer than VBR in this video.
User avatar
s55
HandBrake Team
Posts: 10357
Joined: Sun Dec 24, 2006 1:05 pm

Re: Handbrake and Haswell Quicksync

Post by s55 »

The new beta has been pushed to sourceforge now.
gmb
Bright Spark User
Posts: 350
Joined: Thu Mar 28, 2013 12:49 pm

Re: Handbrake and Haswell Quicksync

Post by gmb »

Is this version with lookahead support? How can I enable it?
Deleted User 11865

Re: Handbrake and Haswell Quicksync

Post by Deleted User 11865 »

gmb wrote:Is this version with lookahead support? How can I enable it?

Code: Select all

HandBrakeCLI.exe --help
gmb
Bright Spark User
Posts: 350
Joined: Thu Mar 28, 2013 12:49 pm

Re: Handbrake and Haswell Quicksync

Post by gmb »

Then this site should use the proper commands: https://trac.handbrake.fr/wiki/QuickSyncOptions

la did nothing. lookahead=1 works now. Custom lookahead-depth values over 40 are not working here, is there another way to enforce higher values like 60? In my most videos 60 did a little better than 40.
Deleted User 11865

Re: Handbrake and Haswell Quicksync

Post by Deleted User 11865 »

You mean encoding hangs with a lookahead depth of 60? It's a bug that needs fixing (no ETA).

Lookahead is default as long as target usage is <= 2 and average bitrate is used.
gmb
Bright Spark User
Posts: 350
Joined: Thu Mar 28, 2013 12:49 pm

Re: Handbrake and Haswell Quicksync

Post by gmb »

Yes it hangs. I don't use very high in Handbrake, I use TU4 via command line which is a good tradeoff. The best quality preset (I guess TU2) is so much slower that the small quality enhancement isn't worth it. For a default setting TU4 is the right choice.
Deleted User 11865

Re: Handbrake and Haswell Quicksync

Post by Deleted User 11865 »

My testing with lookahead was that the lookahead is a bit of a bottleneck (so TU 2 is not significantly slower than TU 4 or even 7 when the lookahead is enabled). YMMV.
gmb
Bright Spark User
Posts: 350
Joined: Thu Mar 28, 2013 12:49 pm

Re: Handbrake and Haswell Quicksync

Post by gmb »

LA with one of Intels demo movies:

TU4= 1:25 min
TU2= 2:12 min

TU1 is a doubling on my system, TU2 still pretty close. I'm struggling to see a difference, slightly better than TU4 but not worth it for the big speed penalty. TU4 is simply the best tradeoff. I see some great results with LA in Handbrake. Very competitive speed and quality when I compare to QSTranscode. I think there are further improvements possible with gop tweaking.
Deleted User 11865

Re: Handbrake and Haswell Quicksync

Post by Deleted User 11865 »

Interesting. I guess I need to re-test stuff.
gmb
Bright Spark User
Posts: 350
Joined: Thu Mar 28, 2013 12:49 pm

Re: Handbrake and Haswell Quicksync

Post by gmb »

Deleted User 11865

Re: Handbrake and Haswell Quicksync

Post by Deleted User 11865 »

Not sure what version of HandBrake they used for the x264 portion, but hopefully they used something else to demonstrate the QSV part. I'm not quite sure all our Haswell issues are fixed.
gmb
Bright Spark User
Posts: 350
Joined: Thu Mar 28, 2013 12:49 pm

Re: Handbrake and Haswell Quicksync

Post by gmb »

Obviously they used this for demonstration.
Intel® Media Demo Booth

Ultra HD (4K) Decode and Encode

HandBrake Quick Sync Video Enabling



Internally they have a QS transcoder with lookahead support since quite a while. In one of the presentations:
The software used in comparisons is non-commercial Intel test application MFX transcoder.

Also in the same paper one page back
Demo Clip transcoded with Intel internal test application with 2Mbps bitrate encoding
For lookahead comparisons they probably used the internal transcoder.
gmb
Bright Spark User
Posts: 350
Joined: Thu Mar 28, 2013 12:49 pm

Re: Handbrake and Haswell Quicksync

Post by gmb »

On page 12 btw there is another confirmation that the quality difference between TU4 and TU2 is very very tiny. That's why I don't consider TU2 as a useful preset, at least not with such a huge encoding slowdown.

TU1 is a bit strange as well, afaik the only difference to TU2 is enabled Trellis in TU1. With Trellis enabled my output loses details, maybe Trellis is useful only for high bitrate encoding. The slide says Trellis Quantization: high bit rate encoding quality improvement. My test videos in 1080p and ~5 MBps were on the lower bitrate side. (I mean the output, my original videos have much more bitrate)
gmb
Bright Spark User
Posts: 350
Joined: Thu Mar 28, 2013 12:49 pm

Re: Handbrake and Haswell Quicksync

Post by gmb »

New Quicksync slides from IDF San Francisco.

Image

Image

Image

Image

Image

Image

Image

Image

Image

Image

Image

Image

Image

Image

Image

Image

Image

Image

Image


And API 1.7 driver for those wo prefer downloadcenter drivers: https://downloadcenter.intel.com/Detail ... 864-bit%29
xooyoozoo
Posts: 6
Joined: Thu Mar 28, 2013 2:09 am

Re: Handbrake and Haswell Quicksync

Post by xooyoozoo »

Thanks for the slides. Slide 12 is especially interesting because it comes so close to being real, useful information.

It's strange that they chose to give PSNR decibels, instead of filesize percentages, as a measurement. It's grudgingly acceptable if they only focused on a single bitrate or QP, but the slide itself noted that on the CQP graph, they did the full 4-point measurement necessary for reliable delta-filesize summarization. That meant they had the RD curve right in front of them, but they chose to measure the less "real" axis (PSNR db). Well, let's try our best to speculate anyway, as Intel's methods are similar to the "official" MPEG/JCT ones and I highly doubt they reinvented the wheel in regards to how they approach quantization parameters.

We first need to reproduce Intel's PSNR vs log-bitrate graph. Old JCT docs testing the reference JM encoder very reliably produce a ~9 db PSNR gap between QP22 and QP37. Filesize spreads there are harder to summarize, but I say an ~8x (mostly between 6x and 10x) size spread between smallest and largest encode per clip is a good average. Assuming there isn't something wrong with the encoder, this log-log RD graph (PSNR-to-logSize) should then be linear in this range, which means each extra 0.1 db encoder quality bump would correspond to a new file being ~97.7% of the original size for the same quality. The full 0.7db increase going from Haswell-TU7 to TU1 would then suggest that TU1 files are ~85% of TU7's size for the same quality. (Using a 6x spread changes the last number to 87% instead. 10x to 83.6%)

Algorithmic metrics have their downfalls, of course, but having a numeric starting point as rough guidance is better than having nothing at all. Additionally, while the above is entirely speculatory, based on the limits we know exist, I doubt independent testing with better metrics (SSIM or preferably MS-SSIM) would produce something much different. Still, it'd be nice to know specific numbers for ourselves. I've been waiting for the compression folks at MSU to release their yearly report, but they've been silent lately.

If anyone has Haswell and can share encodes spread over 4 points on some common raw test sequences, I can help with the number crunching. :) And after that's done, it'd also be nice to aggregate encoding speed, combine it with quality/delta numbers, and compare it to x264 and produce something like this.
Post Reply