[PATCH] Universal Text Subtitle Input

Archive of historical development discussions
Discussions / Development has moved to GitHub
Forum rules
*******************************
Please be aware we are now using GitHub for issue tracking and feature requests.
- This section of the forum is now closed to new topics.

*******************************
davidfstr
Enlightened
Posts: 149
Joined: Sun Apr 12, 2009 7:41 pm

[PATCH] Universal Text Subtitle Input

Post by davidfstr »

This patch adds support for reading TEXT subtitle tracks from file inputs. In practice this usually means embedded SRT files. I have also added some documentation for hb_buffer and hb_subtitle.

I have tested this with:
* MKV input -> MKV output = Good

Less luck testing with:
* MKV input -> MP4 output = No subtitle track visible when playing

I think this is an issue with the MP4 muxer, but I would need a DVD with CC608 subtitles in order to exercise the muxer and verify this.

I would also like to test:
* MP4 input -> {MKV,MP4} output = ?

To do so I would need one of the following:
1. An MP4 with an embedded TEXT track.
2. A DVD with CC608 subtitles, with which I could reliably generate such an MP4 with HandBrake.
3. A bugfix for either my code or the MP4 demuxer so that I could generate such an MP4 (from, say, an MKV) using this patch.

Anyway, here's the patch:
http://handbrake.fr/pastebin/pastebin.php?show=1376
Last edited by davidfstr on Tue Apr 27, 2010 5:10 am, edited 1 time in total.
dynaflash
Veteran User
Posts: 3820
Joined: Thu Nov 02, 2006 8:19 pm

Re: [PATCH] Universal Text Subtitle Input

Post by dynaflash »

Please pastebin your patch and link it here as the forum code blocks tend to munge up the patch code quite badly.
davidfstr
Enlightened
Posts: 149
Joined: Sun Apr 12, 2009 7:41 pm

Re: [PATCH] Universal Text Subtitle Input

Post by davidfstr »

Done - see original post.

Is there a mailing list or something similar I could subscribe to that would alert me of checkins to SVN?
User avatar
Ritsuka
HandBrake Team
Posts: 1650
Joined: Fri Jan 12, 2007 11:29 am

Re: [PATCH] Universal Text Subtitle Input

Post by Ritsuka »

The subtitles are muxed in the mp4 file, but QuickTime X does not show anything and QuickTime 7 crashes. For some reasons there is a lot of garbage in the muxed file. Are the strings from ffmpeg null terminated?
Deleted User 11865

Re: [PATCH] Universal Text Subtitle Input

Post by Deleted User 11865 »

davidfstr wrote:Done - see original post.

Is there a mailing list or something similar I could subscribe to that would alert me of checkins to SVN?
http://trac.handbrake.fr/timeline

Also, we don't have a mailing list so developer discussion takes place on IRC:

http://trac.handbrake.fr/wiki/IRC
davidfstr
Enlightened
Posts: 149
Joined: Sun Apr 12, 2009 7:41 pm

Re: [PATCH] Universal Text Subtitle Input

Post by davidfstr »

The packets from FFMPEG are not null-terminated. Makes sense, as this would imply that an unnecessary NULL character was stored in the packet from the input file.

For the time being I've modified the UTF-8 work-object to manually null-terminate subtitle packets (if they aren't already). Naturally this stores an extra NULL character in every subtitle packet we output. Not the cleanest, but it works.

New patch:
http://handbrake.fr/pastebin/pastebin.php?show=1381

Sadly I'm still not able to output MP4s that either VLC or QuickTime likes.

Random observation: hb_buffer_realloc doesn't look like its designed to handle out-of-memory situations. In particular, it will never return an error.
User avatar
Ritsuka
HandBrake Team
Posts: 1650
Joined: Fri Jan 12, 2007 11:29 am

Re: [PATCH] Universal Text Subtitle Input

Post by Ritsuka »

It works here, make sure to use the m4v file extension to watch them in QuickTIme 7.
Another issue you could find with mp4 is that there must be no overlapping samples, but there should already be some code in handbrake to deal with it, and this is more a problem with that mess called advanced substation alpha.
dynaflash
Veteran User
Posts: 3820
Joined: Thu Nov 02, 2006 8:19 pm

Re: [PATCH] Universal Text Subtitle Input

Post by dynaflash »

davidfstr, thanks for the patches ... one other thing can you in the future create all of your patches from the root hb directory and when ammending patches do it as a new complete diff if possible. It just makes it easier if we all do it the same. Thanks!
User avatar
JohnAStebbins
HandBrake Team
Posts: 5712
Joined: Sat Feb 09, 2008 7:21 pm

Re: [PATCH] Universal Text Subtitle Input

Post by JohnAStebbins »

that was two things :mrgreen:
dynaflash
Veteran User
Posts: 3820
Joined: Thu Nov 02, 2006 8:19 pm

Re: [PATCH] Universal Text Subtitle Input

Post by dynaflash »

JohnAStebbins wrote:that was two things :mrgreen:
Heh, so it is ! I always said you could count. :lol:
Deleted User 11865

Re: [PATCH] Universal Text Subtitle Input

Post by Deleted User 11865 »

Also, it's not "one other thing". The proper way to say it is "one more thing" :D
dynaflash
Veteran User
Posts: 3820
Joined: Thu Nov 02, 2006 8:19 pm

Re: [PATCH] Universal Text Subtitle Input

Post by dynaflash »

Just compiled and ran an mkv clip with embedded text subs and in fact it looks good in quicktimX from MKV -> .m4v. The source subs were forced english and qt even picked em up by default. Nice! Compared it the same clip with the subs demuxed and added as an srt and it is the same. Saves a few steps to be sure.

fwiw, here is my diff of davidfstr's patches from hb root:

http://handbrake.fr/pastebin/pastebin.php?show=1382
dynaflash
Veteran User
Posts: 3820
Joined: Thu Nov 02, 2006 8:19 pm

Re: [PATCH] Universal Text Subtitle Input

Post by dynaflash »

davidfstr wrote: I would also like to test:
* MP4 input -> {MKV,MP4} output = ?

To do so I would need one of the following:
1. An MP4 with an embedded TEXT track.
2. A DVD with CC608 subtitles, with which I could reliably generate such an MP4 with HandBrake.
3. A bugfix for either my code or the MP4 demuxer so that I could generate such an MP4 (from, say, an MKV) using this patch.
Now, taking that same mp4 created by hb from that mkv and feeding it back into hb ( to try to read the 3GGP text track hb created) is a no go. No sub track detected by hb. So in other words in this case we can not eat our own dog food. Show stopper ... not in my opinion but worth noting here in my tests.

Edit: Here btw is the relevant part of the scan code when scanning the mp4 which was previously made from the mkv:

Code: Select all

  Duration: 00:01:01.48, start: 0.000000, bitrate: 2866 kb/s
    Stream #0.0(und): Video: h264, yuv420p, 1280x720, 2531 kb/s, 23.96 fps, 2k tbr, 90k tbn, 180k tbc
    Stream #0.1(eng): Audio: aac, 48000 Hz, stereo, s16, 328 kb/s
    Stream #0.2(eng): Subtitle: tx3g / 0x67337874, 0 kb/s
[11:54:47] add_ffmpeg_subtitle: unknown subtitle stream type: 0x17005
davidfstr
Enlightened
Posts: 149
Joined: Sun Apr 12, 2009 7:41 pm

Re: [PATCH] Universal Text Subtitle Input

Post by davidfstr »

dynaflash wrote: Now, taking that same mp4 created by hb from that mkv and feeding it back into hb ( to try to read the 3GGP text track hb created) is a no go. No sub track detected by hb. So in other words in this case we can not eat our own dog food. Show stopper ... not in my opinion but worth noting here in my tests.

Edit: Here btw is the relevant part of the scan code when scanning the mp4 which was previously made from the mkv:

Code: Select all

  Duration: 00:01:01.48, start: 0.000000, bitrate: 2866 kb/s
    Stream #0.0(und): Video: h264, yuv420p, 1280x720, 2531 kb/s, 23.96 fps, 2k tbr, 90k tbn, 180k tbc
    Stream #0.1(eng): Audio: aac, 48000 Hz, stereo, s16, 328 kb/s
    Stream #0.2(eng): Subtitle: tx3g / 0x67337874, 0 kb/s
[11:54:47] add_ffmpeg_subtitle: unknown subtitle stream type: 0x17005
Well that's interesting. Based on that log, it looks like TX3G/3GGP encoded by the MP4 muxer is treated as a new type of subtitle track by FFMPEG (CODEC_ID_MOV_TEXT) that I'd have to convert and passthru. Requires:
  • add TX3GSUB to the current {PICTURESUB, TEXTSUB} subtitle types
  • update the FFMPEG subtitle demuxer: add_ffmpeg_subtitle
  • add a dectx3gsub work-object to pass it through almost as-is (similar to the decutf8sub work-object I introduced)
  • modify the MP4 muxer to pass it thru to a TX3G track in the output MP4, as-is
  • modify the MKV muxer to either (1) downconvert it to a UTF8 track, or (2) upconvert it to SSA.
  • no other muxers understand subtitles, so they wouldn't need updating
Unfortunately MKV does not appear to support TX3G as a native subtitle track. It only has support for {ASCII, UTF8, SSA, A-S-S, USF, VOBSUB, BMP}, according to libmkv.h.

I would prefer to add the TX3G support in a separate patch on top of this one. Acceptable?
Last edited by davidfstr on Thu Apr 29, 2010 5:14 am, edited 1 time in total.
davidfstr
Enlightened
Posts: 149
Joined: Sun Apr 12, 2009 7:41 pm

Re: [PATCH] Universal Text Subtitle Input

Post by davidfstr »

Actually it would be cleaner, in the case of converting TX3G to be put into an MKV output container, to instead have the TX3G->UTF-8 or TX3G->SSA logic in work-objects. That way we don't get any more subtitle-specific stuff in the muxers.

So that would mean:
  • add a dec-tx3g-2-tx3g-sub work-object, to be used when MP4 is the destination container
  • add a dec-tx3g-2-utf8-sub or dec-tx3g-2-ssa-sub work-object, to be used when MKV is the destination container
  • add conditional logic in do_work() that selects an appropriate work-object to use depending on the input subtitle format and the output container
And then the natural followup refactoring:
  • extract existing TEXT->TX3G conversion logic from the MP4 muxer out to a separate dec-utf8-2-tx3g-sub work-object
  • update the conditional logic to use this work-object when input subtitle is a TEXTSUB and the output container is MP4
dynaflash
Veteran User
Posts: 3820
Joined: Thu Nov 02, 2006 8:19 pm

Re: [PATCH] Universal Text Subtitle Input

Post by dynaflash »

davidfstr wrote:I would prefer to add the TX3G support in a separate patch on top of this one. Acceptable?
Sure.
User avatar
JohnAStebbins
HandBrake Team
Posts: 5712
Joined: Sat Feb 09, 2008 7:21 pm

Re: [PATCH] Universal Text Subtitle Input

Post by JohnAStebbins »

As a first step, you might to want to just decode the tx3g to srt mark-up utf8 and pass that to the muxers. This is what the current muxers expect for text tracks anyway. They re-encode the text to the appropriate text subtitle format for the container. It would be ideal if the tx3g were passed through unmodified to the mp4 muxer, but since you have to decode it to utf8 anyway for mkv, you might as well do that first.
davidfstr
Enlightened
Posts: 149
Joined: Sun Apr 12, 2009 7:41 pm

Re: [PATCH] Universal Text Subtitle Input

Post by davidfstr »

Since I see that this patch hasn't been submitted, there's one small bug in it that I'll fix now: I had assumed incorrectly that hb_buffer_realloc was updating the buf->size of its parameter. Thus the original NULL-termination code incorrectly lopped-off the last character of every subtitle.

I've also made the patch relative to the root of the trunk.

New patch is:
http://handbrake.fr/pastebin/pastebin.php?show=1389
dynaflash
Veteran User
Posts: 3820
Joined: Thu Nov 02, 2006 8:19 pm

Re: [PATCH] Universal Text Subtitle Input

Post by dynaflash »

Okay, after some testing I did speak with ritsuka on irc and he did not see any downside to committing this into the HB svn as well as hopefully further developed in the future possibly intially as per JohnAStebbins suggestion. However I would still like a bit of input from JohnAStebbins and possibly others before actually committing this patch.
davidfstr
Enlightened
Posts: 149
Joined: Sun Apr 12, 2009 7:41 pm

Re: [PATCH] Universal Text Subtitle Input

Post by davidfstr »

I've expanded the previous patch to also include the TX3G decoder support requested by JohnAStebbins. So both parts could be added at once if you wish. Or we could just focus on the original patch if you find that easier.

With this latest patch, I have successfully tested:
  • MKV UTF-8 -> MKV UTF-8 (passthru)
  • MKV UTF-8 -> MP4 TX3G (upconvert)
  • MP4 TX3G -> MKV UTF-8 (downconvert)
  • MP4 TX3G -> MP4 TX3G (downconvert to UTF-8 then upconvert)
The set of test subtitles was manipulated to include multi-byte UTF-8 characters (which have to be handled specially by the TX3G encoder in the MP4 muxer and by the TX3G decoder that I wrote). Various combinations of overlapping styles were also used.

The new patch with TX3G decoder support:
http://handbrake.fr/pastebin/pastebin.php?show=1392
dynaflash
Veteran User
Posts: 3820
Joined: Thu Nov 02, 2006 8:19 pm

Re: [PATCH] Universal Text Subtitle Input

Post by dynaflash »

Okay, great will run a couple tests on your latest. JohnAStebbins gave the nod on irc yesterday so if this all looks good we'll commit the whole ball of wax!
dynaflash
Veteran User
Posts: 3820
Joined: Thu Nov 02, 2006 8:19 pm

Re: [PATCH] Universal Text Subtitle Input

Post by dynaflash »

Committed latest patch as http://trac.handbrake.fr/changeset/3283.
Nice work davidfstr ! Thanks!
davidfstr
Enlightened
Posts: 149
Joined: Sun Apr 12, 2009 7:41 pm

Re: [PATCH] Universal Text Subtitle Input

Post by davidfstr »

Great! Now I just have to address those new TODOs, probably in the following order:
1. TODO(davidfstr): get universal VOB sub input working
2. TODO(davidfstr): For MP4 containers, an alternate work-object should be used that just passes the packets through, instead of downconverting to UTF-8 subtitles.
3. TODO(davidfstr): implement SSA subtitle support

Of course FFMPEG has other subtitle formats that support should be added for if I truly want 100% universal subtitle input:
  • CODEC_ID_XSUB (DivX 6 bitmap subtitles)
  • CODEC_ID_HDMV_PGS_SUBTITLE (HDMV/PGS bitmap subtitles = HD bitmap subtitles)
  • CODEC_ID_DVB_TELETEXT (DVB subtitles)
Of course that would take awhile. :roll:

Better to concentrate on the important 20% first, from the top list.
dynaflash
Veteran User
Posts: 3820
Joined: Thu Nov 02, 2006 8:19 pm

Re: [PATCH] Universal Text Subtitle Input

Post by dynaflash »

Well, no one is gonna hold you to it ... but if you want to no one will stop you! ;)

Thanks again!
cvk_b
Veteran User
Posts: 527
Joined: Sun Mar 18, 2007 2:11 am

Re: [PATCH] Universal Text Subtitle Input

Post by cvk_b »

This is awesome.
Post Reply