Page 1 of 2

[PATCH] Universal Text Subtitle Input

Posted: Mon Apr 26, 2010 1:04 am
by davidfstr
This patch adds support for reading TEXT subtitle tracks from file inputs. In practice this usually means embedded SRT files. I have also added some documentation for hb_buffer and hb_subtitle.

I have tested this with:
* MKV input -> MKV output = Good

Less luck testing with:
* MKV input -> MP4 output = No subtitle track visible when playing

I think this is an issue with the MP4 muxer, but I would need a DVD with CC608 subtitles in order to exercise the muxer and verify this.

I would also like to test:
* MP4 input -> {MKV,MP4} output = ?

To do so I would need one of the following:
1. An MP4 with an embedded TEXT track.
2. A DVD with CC608 subtitles, with which I could reliably generate such an MP4 with HandBrake.
3. A bugfix for either my code or the MP4 demuxer so that I could generate such an MP4 (from, say, an MKV) using this patch.

Anyway, here's the patch:
http://handbrake.fr/pastebin/pastebin.php?show=1376

Re: [PATCH] Universal Text Subtitle Input

Posted: Tue Apr 27, 2010 3:56 am
by dynaflash
Please pastebin your patch and link it here as the forum code blocks tend to munge up the patch code quite badly.

Re: [PATCH] Universal Text Subtitle Input

Posted: Tue Apr 27, 2010 5:15 am
by davidfstr
Done - see original post.

Is there a mailing list or something similar I could subscribe to that would alert me of checkins to SVN?

Re: [PATCH] Universal Text Subtitle Input

Posted: Tue Apr 27, 2010 10:10 am
by Ritsuka
The subtitles are muxed in the mp4 file, but QuickTime X does not show anything and QuickTime 7 crashes. For some reasons there is a lot of garbage in the muxed file. Are the strings from ffmpeg null terminated?

Re: [PATCH] Universal Text Subtitle Input

Posted: Tue Apr 27, 2010 1:28 pm
by Rodeo
davidfstr wrote:Done - see original post.

Is there a mailing list or something similar I could subscribe to that would alert me of checkins to SVN?
http://trac.handbrake.fr/timeline

Also, we don't have a mailing list so developer discussion takes place on IRC:

http://trac.handbrake.fr/wiki/IRC

Re: [PATCH] Universal Text Subtitle Input

Posted: Wed Apr 28, 2010 6:40 am
by davidfstr
The packets from FFMPEG are not null-terminated. Makes sense, as this would imply that an unnecessary NULL character was stored in the packet from the input file.

For the time being I've modified the UTF-8 work-object to manually null-terminate subtitle packets (if they aren't already). Naturally this stores an extra NULL character in every subtitle packet we output. Not the cleanest, but it works.

New patch:
http://handbrake.fr/pastebin/pastebin.php?show=1381

Sadly I'm still not able to output MP4s that either VLC or QuickTime likes.

Random observation: hb_buffer_realloc doesn't look like its designed to handle out-of-memory situations. In particular, it will never return an error.

Re: [PATCH] Universal Text Subtitle Input

Posted: Wed Apr 28, 2010 7:11 am
by Ritsuka
It works here, make sure to use the m4v file extension to watch them in QuickTIme 7.
Another issue you could find with mp4 is that there must be no overlapping samples, but there should already be some code in handbrake to deal with it, and this is more a problem with that mess called advanced substation alpha.

Re: [PATCH] Universal Text Subtitle Input

Posted: Wed Apr 28, 2010 3:15 pm
by dynaflash
davidfstr, thanks for the patches ... one other thing can you in the future create all of your patches from the root hb directory and when ammending patches do it as a new complete diff if possible. It just makes it easier if we all do it the same. Thanks!

Re: [PATCH] Universal Text Subtitle Input

Posted: Wed Apr 28, 2010 3:17 pm
by JohnAStebbins
that was two things :mrgreen:

Re: [PATCH] Universal Text Subtitle Input

Posted: Wed Apr 28, 2010 4:18 pm
by dynaflash
JohnAStebbins wrote:that was two things :mrgreen:
Heh, so it is ! I always said you could count. :lol:

Re: [PATCH] Universal Text Subtitle Input

Posted: Wed Apr 28, 2010 4:24 pm
by Rodeo
Also, it's not "one other thing". The proper way to say it is "one more thing" :D

Re: [PATCH] Universal Text Subtitle Input

Posted: Wed Apr 28, 2010 4:47 pm
by dynaflash
Just compiled and ran an mkv clip with embedded text subs and in fact it looks good in quicktimX from MKV -> .m4v. The source subs were forced english and qt even picked em up by default. Nice! Compared it the same clip with the subs demuxed and added as an srt and it is the same. Saves a few steps to be sure.

fwiw, here is my diff of davidfstr's patches from hb root:

http://handbrake.fr/pastebin/pastebin.php?show=1382

Re: [PATCH] Universal Text Subtitle Input

Posted: Wed Apr 28, 2010 4:59 pm
by dynaflash
davidfstr wrote: I would also like to test:
* MP4 input -> {MKV,MP4} output = ?

To do so I would need one of the following:
1. An MP4 with an embedded TEXT track.
2. A DVD with CC608 subtitles, with which I could reliably generate such an MP4 with HandBrake.
3. A bugfix for either my code or the MP4 demuxer so that I could generate such an MP4 (from, say, an MKV) using this patch.
Now, taking that same mp4 created by hb from that mkv and feeding it back into hb ( to try to read the 3GGP text track hb created) is a no go. No sub track detected by hb. So in other words in this case we can not eat our own dog food. Show stopper ... not in my opinion but worth noting here in my tests.

Edit: Here btw is the relevant part of the scan code when scanning the mp4 which was previously made from the mkv:

Code: Select all

  Duration: 00:01:01.48, start: 0.000000, bitrate: 2866 kb/s
    Stream #0.0(und): Video: h264, yuv420p, 1280x720, 2531 kb/s, 23.96 fps, 2k tbr, 90k tbn, 180k tbc
    Stream #0.1(eng): Audio: aac, 48000 Hz, stereo, s16, 328 kb/s
    Stream #0.2(eng): Subtitle: tx3g / 0x67337874, 0 kb/s
[11:54:47] add_ffmpeg_subtitle: unknown subtitle stream type: 0x17005

Re: [PATCH] Universal Text Subtitle Input

Posted: Thu Apr 29, 2010 4:57 am
by davidfstr
dynaflash wrote: Now, taking that same mp4 created by hb from that mkv and feeding it back into hb ( to try to read the 3GGP text track hb created) is a no go. No sub track detected by hb. So in other words in this case we can not eat our own dog food. Show stopper ... not in my opinion but worth noting here in my tests.

Edit: Here btw is the relevant part of the scan code when scanning the mp4 which was previously made from the mkv:

Code: Select all

  Duration: 00:01:01.48, start: 0.000000, bitrate: 2866 kb/s
    Stream #0.0(und): Video: h264, yuv420p, 1280x720, 2531 kb/s, 23.96 fps, 2k tbr, 90k tbn, 180k tbc
    Stream #0.1(eng): Audio: aac, 48000 Hz, stereo, s16, 328 kb/s
    Stream #0.2(eng): Subtitle: tx3g / 0x67337874, 0 kb/s
[11:54:47] add_ffmpeg_subtitle: unknown subtitle stream type: 0x17005
Well that's interesting. Based on that log, it looks like TX3G/3GGP encoded by the MP4 muxer is treated as a new type of subtitle track by FFMPEG (CODEC_ID_MOV_TEXT) that I'd have to convert and passthru. Requires:
  • add TX3GSUB to the current {PICTURESUB, TEXTSUB} subtitle types
  • update the FFMPEG subtitle demuxer: add_ffmpeg_subtitle
  • add a dectx3gsub work-object to pass it through almost as-is (similar to the decutf8sub work-object I introduced)
  • modify the MP4 muxer to pass it thru to a TX3G track in the output MP4, as-is
  • modify the MKV muxer to either (1) downconvert it to a UTF8 track, or (2) upconvert it to SSA.
  • no other muxers understand subtitles, so they wouldn't need updating
Unfortunately MKV does not appear to support TX3G as a native subtitle track. It only has support for {ASCII, UTF8, SSA, A-S-S, USF, VOBSUB, BMP}, according to libmkv.h.

I would prefer to add the TX3G support in a separate patch on top of this one. Acceptable?

Re: [PATCH] Universal Text Subtitle Input

Posted: Thu Apr 29, 2010 5:14 am
by davidfstr
Actually it would be cleaner, in the case of converting TX3G to be put into an MKV output container, to instead have the TX3G->UTF-8 or TX3G->SSA logic in work-objects. That way we don't get any more subtitle-specific stuff in the muxers.

So that would mean:
  • add a dec-tx3g-2-tx3g-sub work-object, to be used when MP4 is the destination container
  • add a dec-tx3g-2-utf8-sub or dec-tx3g-2-ssa-sub work-object, to be used when MKV is the destination container
  • add conditional logic in do_work() that selects an appropriate work-object to use depending on the input subtitle format and the output container
And then the natural followup refactoring:
  • extract existing TEXT->TX3G conversion logic from the MP4 muxer out to a separate dec-utf8-2-tx3g-sub work-object
  • update the conditional logic to use this work-object when input subtitle is a TEXTSUB and the output container is MP4

Re: [PATCH] Universal Text Subtitle Input

Posted: Thu Apr 29, 2010 2:24 pm
by dynaflash
davidfstr wrote:I would prefer to add the TX3G support in a separate patch on top of this one. Acceptable?
Sure.

Re: [PATCH] Universal Text Subtitle Input

Posted: Thu Apr 29, 2010 6:10 pm
by JohnAStebbins
As a first step, you might to want to just decode the tx3g to srt mark-up utf8 and pass that to the muxers. This is what the current muxers expect for text tracks anyway. They re-encode the text to the appropriate text subtitle format for the container. It would be ideal if the tx3g were passed through unmodified to the mp4 muxer, but since you have to decode it to utf8 anyway for mkv, you might as well do that first.

Re: [PATCH] Universal Text Subtitle Input

Posted: Sun May 02, 2010 6:15 am
by davidfstr
Since I see that this patch hasn't been submitted, there's one small bug in it that I'll fix now: I had assumed incorrectly that hb_buffer_realloc was updating the buf->size of its parameter. Thus the original NULL-termination code incorrectly lopped-off the last character of every subtitle.

I've also made the patch relative to the root of the trunk.

New patch is:
http://handbrake.fr/pastebin/pastebin.php?show=1389

Re: [PATCH] Universal Text Subtitle Input

Posted: Mon May 03, 2010 7:23 pm
by dynaflash
Okay, after some testing I did speak with ritsuka on irc and he did not see any downside to committing this into the HB svn as well as hopefully further developed in the future possibly intially as per JohnAStebbins suggestion. However I would still like a bit of input from JohnAStebbins and possibly others before actually committing this patch.

Re: [PATCH] Universal Text Subtitle Input

Posted: Tue May 04, 2010 6:43 am
by davidfstr
I've expanded the previous patch to also include the TX3G decoder support requested by JohnAStebbins. So both parts could be added at once if you wish. Or we could just focus on the original patch if you find that easier.

With this latest patch, I have successfully tested:
  • MKV UTF-8 -> MKV UTF-8 (passthru)
  • MKV UTF-8 -> MP4 TX3G (upconvert)
  • MP4 TX3G -> MKV UTF-8 (downconvert)
  • MP4 TX3G -> MP4 TX3G (downconvert to UTF-8 then upconvert)
The set of test subtitles was manipulated to include multi-byte UTF-8 characters (which have to be handled specially by the TX3G encoder in the MP4 muxer and by the TX3G decoder that I wrote). Various combinations of overlapping styles were also used.

The new patch with TX3G decoder support:
http://handbrake.fr/pastebin/pastebin.php?show=1392

Re: [PATCH] Universal Text Subtitle Input

Posted: Tue May 04, 2010 2:40 pm
by dynaflash
Okay, great will run a couple tests on your latest. JohnAStebbins gave the nod on irc yesterday so if this all looks good we'll commit the whole ball of wax!

Re: [PATCH] Universal Text Subtitle Input

Posted: Tue May 04, 2010 5:02 pm
by dynaflash
Committed latest patch as http://trac.handbrake.fr/changeset/3283.
Nice work davidfstr ! Thanks!

Re: [PATCH] Universal Text Subtitle Input

Posted: Wed May 05, 2010 5:04 am
by davidfstr
Great! Now I just have to address those new TODOs, probably in the following order:
1. TODO(davidfstr): get universal VOB sub input working
2. TODO(davidfstr): For MP4 containers, an alternate work-object should be used that just passes the packets through, instead of downconverting to UTF-8 subtitles.
3. TODO(davidfstr): implement SSA subtitle support

Of course FFMPEG has other subtitle formats that support should be added for if I truly want 100% universal subtitle input:
  • CODEC_ID_XSUB (DivX 6 bitmap subtitles)
  • CODEC_ID_HDMV_PGS_SUBTITLE (HDMV/PGS bitmap subtitles = HD bitmap subtitles)
  • CODEC_ID_DVB_TELETEXT (DVB subtitles)
Of course that would take awhile. :roll:

Better to concentrate on the important 20% first, from the top list.

Re: [PATCH] Universal Text Subtitle Input

Posted: Wed May 05, 2010 5:09 am
by dynaflash
Well, no one is gonna hold you to it ... but if you want to no one will stop you! ;)

Thanks again!

Re: [PATCH] Universal Text Subtitle Input

Posted: Wed May 05, 2010 3:25 pm
by cvk_b
This is awesome.