[PATCH] Universal Text Subtitle Input
Forum rules
*******************************
Please be aware we are now using GitHub for issue tracking and feature requests.
- This section of the forum is now closed to new topics.
*******************************
*******************************
Please be aware we are now using GitHub for issue tracking and feature requests.
- This section of the forum is now closed to new topics.
*******************************
[PATCH] Universal Text Subtitle Input
This patch adds support for reading TEXT subtitle tracks from file inputs. In practice this usually means embedded SRT files. I have also added some documentation for hb_buffer and hb_subtitle.
I have tested this with:
* MKV input -> MKV output = Good
Less luck testing with:
* MKV input -> MP4 output = No subtitle track visible when playing
I think this is an issue with the MP4 muxer, but I would need a DVD with CC608 subtitles in order to exercise the muxer and verify this.
I would also like to test:
* MP4 input -> {MKV,MP4} output = ?
To do so I would need one of the following:
1. An MP4 with an embedded TEXT track.
2. A DVD with CC608 subtitles, with which I could reliably generate such an MP4 with HandBrake.
3. A bugfix for either my code or the MP4 demuxer so that I could generate such an MP4 (from, say, an MKV) using this patch.
Anyway, here's the patch:
http://handbrake.fr/pastebin/pastebin.php?show=1376
I have tested this with:
* MKV input -> MKV output = Good
Less luck testing with:
* MKV input -> MP4 output = No subtitle track visible when playing
I think this is an issue with the MP4 muxer, but I would need a DVD with CC608 subtitles in order to exercise the muxer and verify this.
I would also like to test:
* MP4 input -> {MKV,MP4} output = ?
To do so I would need one of the following:
1. An MP4 with an embedded TEXT track.
2. A DVD with CC608 subtitles, with which I could reliably generate such an MP4 with HandBrake.
3. A bugfix for either my code or the MP4 demuxer so that I could generate such an MP4 (from, say, an MKV) using this patch.
Anyway, here's the patch:
http://handbrake.fr/pastebin/pastebin.php?show=1376
Last edited by davidfstr on Tue Apr 27, 2010 5:10 am, edited 1 time in total.
Re: [PATCH] Universal Text Subtitle Input
Please pastebin your patch and link it here as the forum code blocks tend to munge up the patch code quite badly.
Re: [PATCH] Universal Text Subtitle Input
Done - see original post.
Is there a mailing list or something similar I could subscribe to that would alert me of checkins to SVN?
Is there a mailing list or something similar I could subscribe to that would alert me of checkins to SVN?
Re: [PATCH] Universal Text Subtitle Input
The subtitles are muxed in the mp4 file, but QuickTime X does not show anything and QuickTime 7 crashes. For some reasons there is a lot of garbage in the muxed file. Are the strings from ffmpeg null terminated?
Re: [PATCH] Universal Text Subtitle Input
http://trac.handbrake.fr/timelinedavidfstr wrote:Done - see original post.
Is there a mailing list or something similar I could subscribe to that would alert me of checkins to SVN?
Also, we don't have a mailing list so developer discussion takes place on IRC:
http://trac.handbrake.fr/wiki/IRC
Re: [PATCH] Universal Text Subtitle Input
The packets from FFMPEG are not null-terminated. Makes sense, as this would imply that an unnecessary NULL character was stored in the packet from the input file.
For the time being I've modified the UTF-8 work-object to manually null-terminate subtitle packets (if they aren't already). Naturally this stores an extra NULL character in every subtitle packet we output. Not the cleanest, but it works.
New patch:
http://handbrake.fr/pastebin/pastebin.php?show=1381
Sadly I'm still not able to output MP4s that either VLC or QuickTime likes.
Random observation: hb_buffer_realloc doesn't look like its designed to handle out-of-memory situations. In particular, it will never return an error.
For the time being I've modified the UTF-8 work-object to manually null-terminate subtitle packets (if they aren't already). Naturally this stores an extra NULL character in every subtitle packet we output. Not the cleanest, but it works.
New patch:
http://handbrake.fr/pastebin/pastebin.php?show=1381
Sadly I'm still not able to output MP4s that either VLC or QuickTime likes.
Random observation: hb_buffer_realloc doesn't look like its designed to handle out-of-memory situations. In particular, it will never return an error.
Re: [PATCH] Universal Text Subtitle Input
It works here, make sure to use the m4v file extension to watch them in QuickTIme 7.
Another issue you could find with mp4 is that there must be no overlapping samples, but there should already be some code in handbrake to deal with it, and this is more a problem with that mess called advanced substation alpha.
Another issue you could find with mp4 is that there must be no overlapping samples, but there should already be some code in handbrake to deal with it, and this is more a problem with that mess called advanced substation alpha.
Re: [PATCH] Universal Text Subtitle Input
davidfstr, thanks for the patches ... one other thing can you in the future create all of your patches from the root hb directory and when ammending patches do it as a new complete diff if possible. It just makes it easier if we all do it the same. Thanks!
- JohnAStebbins
- HandBrake Team
- Posts: 5681
- Joined: Sat Feb 09, 2008 7:21 pm
Re: [PATCH] Universal Text Subtitle Input
that was two things 

Re: [PATCH] Universal Text Subtitle Input
Heh, so it is ! I always said you could count.JohnAStebbins wrote:that was two things

Re: [PATCH] Universal Text Subtitle Input
Also, it's not "one other thing". The proper way to say it is "one more thing" 

Re: [PATCH] Universal Text Subtitle Input
Just compiled and ran an mkv clip with embedded text subs and in fact it looks good in quicktimX from MKV -> .m4v. The source subs were forced english and qt even picked em up by default. Nice! Compared it the same clip with the subs demuxed and added as an srt and it is the same. Saves a few steps to be sure.
fwiw, here is my diff of davidfstr's patches from hb root:
http://handbrake.fr/pastebin/pastebin.php?show=1382
fwiw, here is my diff of davidfstr's patches from hb root:
http://handbrake.fr/pastebin/pastebin.php?show=1382
Re: [PATCH] Universal Text Subtitle Input
Now, taking that same mp4 created by hb from that mkv and feeding it back into hb ( to try to read the 3GGP text track hb created) is a no go. No sub track detected by hb. So in other words in this case we can not eat our own dog food. Show stopper ... not in my opinion but worth noting here in my tests.davidfstr wrote: I would also like to test:
* MP4 input -> {MKV,MP4} output = ?
To do so I would need one of the following:
1. An MP4 with an embedded TEXT track.
2. A DVD with CC608 subtitles, with which I could reliably generate such an MP4 with HandBrake.
3. A bugfix for either my code or the MP4 demuxer so that I could generate such an MP4 (from, say, an MKV) using this patch.
Edit: Here btw is the relevant part of the scan code when scanning the mp4 which was previously made from the mkv:
Code: Select all
Duration: 00:01:01.48, start: 0.000000, bitrate: 2866 kb/s
Stream #0.0(und): Video: h264, yuv420p, 1280x720, 2531 kb/s, 23.96 fps, 2k tbr, 90k tbn, 180k tbc
Stream #0.1(eng): Audio: aac, 48000 Hz, stereo, s16, 328 kb/s
Stream #0.2(eng): Subtitle: tx3g / 0x67337874, 0 kb/s
[11:54:47] add_ffmpeg_subtitle: unknown subtitle stream type: 0x17005
Re: [PATCH] Universal Text Subtitle Input
Well that's interesting. Based on that log, it looks like TX3G/3GGP encoded by the MP4 muxer is treated as a new type of subtitle track by FFMPEG (CODEC_ID_MOV_TEXT) that I'd have to convert and passthru. Requires:dynaflash wrote: Now, taking that same mp4 created by hb from that mkv and feeding it back into hb ( to try to read the 3GGP text track hb created) is a no go. No sub track detected by hb. So in other words in this case we can not eat our own dog food. Show stopper ... not in my opinion but worth noting here in my tests.
Edit: Here btw is the relevant part of the scan code when scanning the mp4 which was previously made from the mkv:Code: Select all
Duration: 00:01:01.48, start: 0.000000, bitrate: 2866 kb/s Stream #0.0(und): Video: h264, yuv420p, 1280x720, 2531 kb/s, 23.96 fps, 2k tbr, 90k tbn, 180k tbc Stream #0.1(eng): Audio: aac, 48000 Hz, stereo, s16, 328 kb/s Stream #0.2(eng): Subtitle: tx3g / 0x67337874, 0 kb/s [11:54:47] add_ffmpeg_subtitle: unknown subtitle stream type: 0x17005
- add TX3GSUB to the current {PICTURESUB, TEXTSUB} subtitle types
- update the FFMPEG subtitle demuxer: add_ffmpeg_subtitle
- add a dectx3gsub work-object to pass it through almost as-is (similar to the decutf8sub work-object I introduced)
- modify the MP4 muxer to pass it thru to a TX3G track in the output MP4, as-is
- modify the MKV muxer to either (1) downconvert it to a UTF8 track, or (2) upconvert it to SSA.
- no other muxers understand subtitles, so they wouldn't need updating
I would prefer to add the TX3G support in a separate patch on top of this one. Acceptable?
Last edited by davidfstr on Thu Apr 29, 2010 5:14 am, edited 1 time in total.
Re: [PATCH] Universal Text Subtitle Input
Actually it would be cleaner, in the case of converting TX3G to be put into an MKV output container, to instead have the TX3G->UTF-8 or TX3G->SSA logic in work-objects. That way we don't get any more subtitle-specific stuff in the muxers.
So that would mean:
So that would mean:
- add a dec-tx3g-2-tx3g-sub work-object, to be used when MP4 is the destination container
- add a dec-tx3g-2-utf8-sub or dec-tx3g-2-ssa-sub work-object, to be used when MKV is the destination container
- add conditional logic in do_work() that selects an appropriate work-object to use depending on the input subtitle format and the output container
- extract existing TEXT->TX3G conversion logic from the MP4 muxer out to a separate dec-utf8-2-tx3g-sub work-object
- update the conditional logic to use this work-object when input subtitle is a TEXTSUB and the output container is MP4
Re: [PATCH] Universal Text Subtitle Input
Sure.davidfstr wrote:I would prefer to add the TX3G support in a separate patch on top of this one. Acceptable?
- JohnAStebbins
- HandBrake Team
- Posts: 5681
- Joined: Sat Feb 09, 2008 7:21 pm
Re: [PATCH] Universal Text Subtitle Input
As a first step, you might to want to just decode the tx3g to srt mark-up utf8 and pass that to the muxers. This is what the current muxers expect for text tracks anyway. They re-encode the text to the appropriate text subtitle format for the container. It would be ideal if the tx3g were passed through unmodified to the mp4 muxer, but since you have to decode it to utf8 anyway for mkv, you might as well do that first.
Re: [PATCH] Universal Text Subtitle Input
Since I see that this patch hasn't been submitted, there's one small bug in it that I'll fix now: I had assumed incorrectly that hb_buffer_realloc was updating the buf->size of its parameter. Thus the original NULL-termination code incorrectly lopped-off the last character of every subtitle.
I've also made the patch relative to the root of the trunk.
New patch is:
http://handbrake.fr/pastebin/pastebin.php?show=1389
I've also made the patch relative to the root of the trunk.
New patch is:
http://handbrake.fr/pastebin/pastebin.php?show=1389
Re: [PATCH] Universal Text Subtitle Input
Okay, after some testing I did speak with ritsuka on irc and he did not see any downside to committing this into the HB svn as well as hopefully further developed in the future possibly intially as per JohnAStebbins suggestion. However I would still like a bit of input from JohnAStebbins and possibly others before actually committing this patch.
Re: [PATCH] Universal Text Subtitle Input
I've expanded the previous patch to also include the TX3G decoder support requested by JohnAStebbins. So both parts could be added at once if you wish. Or we could just focus on the original patch if you find that easier.
With this latest patch, I have successfully tested:
The new patch with TX3G decoder support:
http://handbrake.fr/pastebin/pastebin.php?show=1392
With this latest patch, I have successfully tested:
- MKV UTF-8 -> MKV UTF-8 (passthru)
- MKV UTF-8 -> MP4 TX3G (upconvert)
- MP4 TX3G -> MKV UTF-8 (downconvert)
- MP4 TX3G -> MP4 TX3G (downconvert to UTF-8 then upconvert)
The new patch with TX3G decoder support:
http://handbrake.fr/pastebin/pastebin.php?show=1392
Re: [PATCH] Universal Text Subtitle Input
Okay, great will run a couple tests on your latest. JohnAStebbins gave the nod on irc yesterday so if this all looks good we'll commit the whole ball of wax!
Re: [PATCH] Universal Text Subtitle Input
Committed latest patch as http://trac.handbrake.fr/changeset/3283.
Nice work davidfstr ! Thanks!
Nice work davidfstr ! Thanks!
Re: [PATCH] Universal Text Subtitle Input
Great! Now I just have to address those new TODOs, probably in the following order:
1. TODO(davidfstr): get universal VOB sub input working
2. TODO(davidfstr): For MP4 containers, an alternate work-object should be used that just passes the packets through, instead of downconverting to UTF-8 subtitles.
3. TODO(davidfstr): implement SSA subtitle support
Of course FFMPEG has other subtitle formats that support should be added for if I truly want 100% universal subtitle input:
Better to concentrate on the important 20% first, from the top list.
1. TODO(davidfstr): get universal VOB sub input working
2. TODO(davidfstr): For MP4 containers, an alternate work-object should be used that just passes the packets through, instead of downconverting to UTF-8 subtitles.
3. TODO(davidfstr): implement SSA subtitle support
Of course FFMPEG has other subtitle formats that support should be added for if I truly want 100% universal subtitle input:
- CODEC_ID_XSUB (DivX 6 bitmap subtitles)
- CODEC_ID_HDMV_PGS_SUBTITLE (HDMV/PGS bitmap subtitles = HD bitmap subtitles)
- CODEC_ID_DVB_TELETEXT (DVB subtitles)

Better to concentrate on the important 20% first, from the top list.
Re: [PATCH] Universal Text Subtitle Input
Well, no one is gonna hold you to it ... but if you want to no one will stop you! 
Thanks again!

Thanks again!
Re: [PATCH] Universal Text Subtitle Input
This is awesome.