Suggestions For CC Handling

Archive of historical feature requests.
Please use the GitHub link above to report issues.
Forum rules
*******************************
Please be aware we are now using GitHub for issue tracking and feature requests.
- This section of the forum is now closed to new topics.

*******************************
kevvo
Posts: 10
Joined: Tue May 19, 2009 9:49 pm

Suggestions For CC Handling

Post by kevvo »

Hi all!

Good job on getting closed captions into HandBrake, I'm really impressed with the progress ya'll have made. I had a few suggestions for their implementation that could make them cleaner.
1. I suggest adding an option to enable/disable auto-capitalization correction. The current correction is not very sophisticated, and I know that I personally would prefer to select the method. That way, users could select what they want to do with them. I think it would be relatively simple to add a checkbox that appears when CCs are selected as the subtitle source, there's already a variable in place to take care of this function.
2. If handling of tags (<i>, <u>, etc.) is not possible through MP4 or MKV, I would suggest slightly altering the source code of ccextractor to remove automatic tagging. I went in and edited my copy if the source code (SVN 2424) and foubd that it builds correctly and stops adding the HTNL style tags to the captions. If MP4/MKV does support formatted text in captions, a subroutine could be written to translate the tags into a compatible from.

I'm sorry I can't be of more help in writing the code for this, my coding skill are limited to high school level C++ procedural, which was just enough to remove tagging and auto-capitalization. Thanks a lot and keep up the good work.

TedJ
Veteran User
Posts: 5388
Joined: Wed Feb 20, 2008 11:25 pm

Re: Suggestions For CC Handling

Post by TedJ »

Moving to Ponies.

eddyg
Veteran User
Posts: 798
Joined: Mon Apr 23, 2007 3:34 am

Re: Suggestions For CC Handling

Post by eddyg »

kevvo wrote:Hi all!

Good job on getting closed captions into HandBrake, I'm really impressed with the progress ya'll have made. I had a few suggestions for their implementation that could make them cleaner.
1. I suggest adding an option to enable/disable auto-capitalization correction. The current correction is not very sophisticated, and I know that I personally would prefer to select the method. That way, users could select what they want to do with them. I think it would be relatively simple to add a checkbox that appears when CCs are selected as the subtitle source, there's already a variable in place to take care of this function.
Would you prefer a better default? Eventually we can make this configurable - but probably not worth the UI complexity at present. But I can at least make it visible to the UIs so that they can make it a preference option if they so wish.

Could you give me a code snippet of exactly which knob in deccc608.c you are twiddling here?
kevvo wrote: 2. If handling of tags (<i>, <u>, etc.) is not possible through MP4 or MKV, I would suggest slightly altering the source code of ccextractor to remove automatic tagging. I went in and edited my copy if the source code (SVN 2424) and foubd that it builds correctly and stops adding the HTNL style tags to the captions. If MP4/MKV does support formatted text in captions, a subroutine could be written to translate the tags into a compatible from.
I started off not using the markup - using transcript mode - but given that both MKV and MP4 support markup I left it in, so that in time the guys doing the muxing can interpret those tags if they want. Bear in mind that this is a work in progress.

Cheers, Ed.

User avatar
JohnAStebbins
HandBrake Team
Posts: 5575
Joined: Sat Feb 09, 2008 7:21 pm

Re: Suggestions For CC Handling

Post by JohnAStebbins »

If I understood eddyg correctly, the markup being used is standard srt. And since mkv supports srt directly, the markup should be doing the right thing as it is when used in the mkv container.

kevvo
Posts: 10
Joined: Tue May 19, 2009 9:49 pm

Re: Suggestions For CC Handling

Post by kevvo »

Thanks for the quick reply!
eddyg wrote: Would you prefer a better default? Eventually we can make this configurable - but probably not worth the UI complexity at present. But I can at least make it visible to the UIs so that they can make it a preference option if they so wish.

Could you give me a code snippet of exactly which knob in deccc608.c you are twiddling here?
I was changing the value of static int sentence_cap = 1 to 0 at the beginning of the file, in the intial variable declarations. I figured this fix would be helpful because, at present, the auto-caps feature decimates all capitalization that doesn't occur at the beginning of the sentence, such as proper nouns and acronyms, i.e. M.D. becomes m.D. and all names and titles are switched to lowercase. It probably would be helpful if this was turned into a variable that is accessible for configuration; later someone could go in and add the ability to configure it in the CLI and GUI.
eddyg wrote: I started off not using the markup - using transcript mode - but given that both MKV and MP4 support markup I left it in, so that in time the guys doing the muxing can interpret those tags if they want. Bear in mind that this is a work in progress.
I will have to look at how the subtitles come out in the MKV container. I had only tested MP4, as I was unaware MKV had support for captions. I must admit, I am unfamiliar with the patching system, but I suppose a patch could be written to remove markup for those who prefer it that way, at least until the feature is completed.

eddyg
Veteran User
Posts: 798
Joined: Mon Apr 23, 2007 3:34 am

Re: Suggestions For CC Handling

Post by eddyg »

mp4 does do markup - we haven't implemented it yet.

kneeslasher
Regular User
Posts: 85
Joined: Tue Mar 06, 2007 8:40 pm

Re: Suggestions For CC Handling

Post by kneeslasher »

For end users, one of the main attractions of using CCs as the soft-sub source is that it is a text track as-God-intended (or rather, as the DVD creators / film studio intended). I.e., the user doesn't have to make any corrections, he/she can rely on what comes straight off the DVD, which is always mostly free of errors. No OCR needed. No human interaction step.

If we then start messing with the CC track, changing case (I know that many CC tracks are all caps), trying to implement bold/italic, then unless the implementation is perfect, this may well introduce either errors (incorrect de-capitalisation) or artefacts (bits of formatting being left visible: <i>, etc.). The latter I have seen first hand when using CC-extraction tools off DVDs. Either way, we lose the peace of mind that the original sub-tack hasn't been messed with.

My tuppence would therefore be to have default output just like kevvo suggests: no capitalisation changes and with stripping of all formatting tags. At least that way we end up with a clean text track. Any enhancement (corrections, formatting, etc.) to this should be implemented only when there is also a tick box for turning said enhancement off, hence always preserving the option of a clean text track.

kneeslasher
Regular User
Posts: 85
Joined: Tue Mar 06, 2007 8:40 pm

Re: Suggestions For CC Handling

Post by kneeslasher »

Er, if this has erroneously ended up in the Development forum, my apologies. I saw it in Ponies so thought it OK to post. I know we shouldn't be spamming up the Development forum with chat.

eddyg
Veteran User
Posts: 798
Joined: Mon Apr 23, 2007 3:34 am

Re: Suggestions For CC Handling

Post by eddyg »

It's in ponies all discussion and input welcome here.

It's a new feature - so we are interested in feedback. I want to be careful to not have too many knobs - which confuse people. And instead have sensible defaults.

At the moment I'm tending towards turning off the auto-capitilisation, but leaving the emphasis for now with the intention of it being implemented properly later. Has anyone tried cc's in a mkv and see what happens with the emphasis?

Cheers Ed

User avatar
JohnAStebbins
HandBrake Team
Posts: 5575
Joined: Sat Feb 09, 2008 7:21 pm

Re: Suggestions For CC Handling

Post by JohnAStebbins »

I just found a segment of CC with emphasis and tested in mkv. mplayer ignores the markup. vlc does the right thing.

eddyg
Veteran User
Posts: 798
Joined: Mon Apr 23, 2007 3:34 am

Re: Suggestions For CC Handling

Post by eddyg »

Thanks john. We'll leave the markup then with a view to add it for mp4.

Cheers Ed

kneeslasher
Regular User
Posts: 85
Joined: Tue Mar 06, 2007 8:40 pm

Re: Suggestions For CC Handling

Post by kneeslasher »

This may be a redundant question since you've doubtless covered it in internal discussion. But in case not...

Asfar as I recall, all CCs are in capitals. So let us say we have CCs for the following speech:

"Hello John. I need to travel to Europe."

Straight off a DVD, this would be:

"HELLO JOHN. I NEED TO TRAVEL TO EUROPE."

Effectively, spelling and grammar correction is needed to convert the latter into the former. And a vocabulary which includes "John" and "Europe", i.e., proper nouns. What about:

"GDAY MCCOY AND EL-FADL. I HEAR YOU'VE RECEIVED SOME POST FROM SHAHKOT."

where the proper nouns are more obscure, can have multiple forms of capitalisation and are guaranteed not to be in any dictionary? How can this ever be auto-capitalised correctly?

Surely Occam's Razor would lead one to think that the best thing would be to leave well alone and stick with caps.

Or have I got the wrong end of the stick and there are markup tags for capitalisation in the CC stream, which renders my fears groundless?

User avatar
JohnAStebbins
HandBrake Team
Posts: 5575
Joined: Sat Feb 09, 2008 7:21 pm

Re: Suggestions For CC Handling

Post by JohnAStebbins »

CC does not need to be all capitals. The available character set can be seen here http://en.wikipedia.org/wiki/EIA-608
However, often the closed captions are all caps. So ideally, we should detect if the captions are using lowercase, and leave them as is if they are. When we detect that they are all uppercase, the options are to leave them all upper case (CAN YOU SAY UGLY), or make some errors in capitalization for the sake of aesthetics. Really, the capitalization is already incorrect (IT'S ALL UPPER CASE FOR GOODNESS SAKES), so we are reducing the amount of overall error by only screwing up the occasional proper noun instead of leaving 80% of the characters capitalized incorrectly.

kneeslasher
Regular User
Posts: 85
Joined: Tue Mar 06, 2007 8:40 pm

Re: Suggestions For CC Handling

Post by kneeslasher »

Sure, all caps are wrong too. From my own DVDs, I hadn't realised that CCs could be lower case too. Obviously I'm seeing all caps because the DVD people couldn't be bothered to use lower case in that case, rather than a limitation of the character set. And I take your point about the subjective nature of preferring all caps to un-capitalised proper nouns.

Will we get an option though? A check box to have no case correction on CCs?

eddyg
Veteran User
Posts: 798
Joined: Mon Apr 23, 2007 3:34 am

Re: Suggestions For CC Handling

Post by eddyg »

No knobs for CC in the next release - but am not against them. Just don't want to side track the UI folks at the moment.

Although - you never know maybe we will :)

Cheers Ed

kneeslasher
Regular User
Posts: 85
Joined: Tue Mar 06, 2007 8:40 pm

Re: Suggestions For CC Handling

Post by kneeslasher »

Sure, we're all just grateful for the CC option to be there at all :).

What will be the no-knobs default though? My 2p would be to leave the CCs all capitalised but maybe that's just me (and kevvo).

rhester
Veteran User
Posts: 2888
Joined: Tue Apr 18, 2006 10:24 pm

Re: Suggestions For CC Handling

Post by rhester »

I'd prefer (in a no-knobs solution) that whatever capitalization was used on the source be maintained.

Rodney

kneeslasher
Regular User
Posts: 85
Joined: Tue Mar 06, 2007 8:40 pm

Re: Suggestions For CC Handling

Post by kneeslasher »

Chaps, I've just tested out the experimental binary on an R1 DVD of The Good Life (known as Good Neighbors in the US). What can I say? Flawless encode, giving selectable subs on the iPhone. It all works and this HB user couldn't be more pleased. For what it's worth, you've made this nitpicking cantankerous user very happy.

However, I notice that capitalisation correction is on by default. Whilst one's preference had already been stated in this thread, having seen and experienced it in the flesh, I must again add my voice for vociferously advocating that untouched CC-subs be the default (in the absence of a choice).

Apart from that, HB works stably and without problems. I didn't even have any GUI issues. I tried out the new preview option as well. Very useful. Even give a preview of subs! Great work!

eddyg
Veteran User
Posts: 798
Joined: Mon Apr 23, 2007 3:34 am

Re: Suggestions For CC Handling

Post by eddyg »

we will add the capitisation options - all in time...

eddyg
Veteran User
Posts: 798
Joined: Mon Apr 23, 2007 3:34 am

Re: Suggestions For CC Handling

Post by eddyg »

All done - markup and capitals in SVN now.

kneeslasher
Regular User
Posts: 85
Joined: Tue Mar 06, 2007 8:40 pm

Re: Suggestions For CC Handling

Post by kneeslasher »

Great stuff. In my impatience, I thought I'd try to build the most recent svn.

Right at the very end of the compile (Intel Mac), I get:


=== BUILDING NATIVE TARGET HandBrake WITH CONFIGURATION standard ===


With a ton of errors of the sort:


[? various files] depends on itself. This target might include its own product.


finishing with:


: ** BUILD FAILED **
: make: *** [macosx.build] Error 1
-------------------------------------------------------------------------------
time end: Tue Jun 30 09:36:33 2009
duration: 11 minutes, 3 seconds (663.48s)
result: FAILURE (code 2)
-------------------------------------------------------------------------------
Build is finished!
You may now cd into ./build and examine the output.


Of course I can wait until the next binary snapshot (or 0.94). But any pointers in the meanwhile would be appreciated; I'm sure I've done something trivially stupid since this is the first time I've tried to build HB.

TedJ
Veteran User
Posts: 5388
Joined: Wed Feb 20, 2008 11:25 pm

Re: Suggestions For CC Handling

Post by TedJ »

The only two dependancies are an up to date version of Xcode 3 (3.1.2+) and yasm 0.8.0+ as is mentioned in the Compile Guide.

It's possible that the latest SVN version is broken - I'll be updating my install shortly and will confirm.

:EDIT: Just updated to svn2647 without issue - my guess would be a missing or out of date yasm install.

kneeslasher
Regular User
Posts: 85
Joined: Tue Mar 06, 2007 8:40 pm

Re: Suggestions For CC Handling

Post by kneeslasher »

I installed yasm as described in the link, so my yasm is definitely as it should be. It's been a few months since I installed the most recent version of XCode though, surely that can't be the problem?

User avatar
Rodeo
HandBrake Team
Posts: 12509
Joined: Tue Mar 03, 2009 8:55 pm

Re: Suggestions For CC Handling

Post by Rodeo »

kneeslasher wrote:I installed yasm as described in the link, so my yasm is definitely as it should be. It's been a few months since I installed the most recent version of XCode though, surely that can't be the problem?
You will need Xcode 3.1 or later.

kneeslasher
Regular User
Posts: 85
Joined: Tue Mar 06, 2007 8:40 pm

Re: Suggestions For CC Handling

Post by kneeslasher »

That (XCode version) was it. Compiled OK after that.

Feedback on the latest svn:

- Capitalisation and subs-from-CCs works flawlessly. Feature-perfect in this user's opinion.

- Maybe too specific to be of use. But encoding an episode (30 mins) of The Good Life (Good Neighbors in the US) left a detectable audio/video sync problem. I then tried encoding a single chapter (5 mins), and the lack of sync was even more pronounced.

Post Reply