Two unicode characters not rendered correctly in subtitle

General questions or discussion about HandBrake, Video and/or audio transcoding, trends etc.
Post Reply
TeeJT
Posts: 3
Joined: Sun Sep 23, 2018 11:57 pm

Two unicode characters not rendered correctly in subtitle

Post by TeeJT »

Description of problem or question:
Two unicode characters not rendered correctly - zwj (zero width joiner) and zwnj (zero width non joiner)


Steps to reproduce the problem (If Applicable):
You can also copy this text which has zwj character- परमेश्‍वर in srt subtitle file
I chose to burn in the subtitle
You will find that it shows it so differently – परमेश्‌वर
The zwj will cause the characters to be joined correctly as in परमेश्‍वर
For this sample I used Arial Unicode MS font which also supports Nepali text.
I believe this could be some library that is not interpreting those characters correctly.

HandBrake version (e.g., 1.0.0):
Version is 1.1.2 (2018090500) - 64 bit - Windows

Operating system and version (e.g., Ubuntu 16.04 LTS, macOS 10.13 High Sierra, Windows 10 Creators Update):
Windows
Windows 10 Home Single Language
Version 10.0.17134 Build 17134

HandBrake Activity Log ***required*** (see How-to get an activity log)

Code: Select all

HandBrake 1.1.2 (2018090500) - 64bit
OS: Microsoft Windows NT 10.0.17134.0 - 64bit
CPU: Intel(R) Core(TM) i5-7200U CPU @ 2.50GHz
Ram: 12173 MB, 
GPU Information:
  Radeon(TM) 520 - 21.19.412.1280
  Intel(R) HD Graphics 620 - 22.20.16.4691
Screen: 1366x768
Temp Dir: C:\Users\HP\AppData\Local\Temp\
Install Dir: C:\Program Files\HandBrake
Data Dir: C:\Users\HP\AppData\Roaming\HandBrake

-------------------------------------------


# Starting Encode ...

[07:38:39] hb_init: starting libhb thread
[07:38:39] 1 job(s) to process
[07:38:39] json job:
{
  "Audio": {
    "AudioList": [
      {
        "Bitrate": 160,
        "DRC": 0.0,
        "Encoder": 65536,
        "Gain": 0.0,
        "Mixdown": 4,
        "NormalizeMixLevel": false,
        "Samplerate": 48000,
        "Track": 0,
        "DitherMethod": 0
      }
    ],
    "CopyMask": [
      1073807360,
      1073743872,
      1074003968,
      1073750016,
      1090519040,
      1074790400,
      1074266112,
      1107296256
    ],
    "FallbackEncoder": 2048
  },
  "Destination": {
    "ChapterList": [
      {
        "Name": "Chapter 1"
      }
    ],
    "ChapterMarkers": true,
    "AlignAVStart": true,
    "File": "d:\\MyDocs\\Church\\Nepali\\Seremban\\Sermons\\Mohan Sijali\\Blessed is the one whose sins are forgiven\\Lk7.36-50.mp4",
    "Mp4Options": {
      "IpodAtom": false,
      "Mp4Optimize": false
    },
    "Mux": 131072
  },
  "Filters": {
    "FilterList": [
      {
        "ID": 4,
        "Settings": {
          "mode": "7"
        }
      },
      {
        "ID": 3,
        "Settings": {
          "block-height": "16",
          "block-thresh": "40",
          "block-width": "16",
          "filter-mode": "2",
          "mode": "3",
          "motion-thresh": "1",
          "spatial-metric": "2",
          "spatial-thresh": "1"
        }
      },
      {
        "ID": 11,
        "Settings": {
          "crop-bottom": "38",
          "crop-left": "0",
          "crop-right": "0",
          "crop-top": "38",
          "height": "404",
          "width": "720"
        }
      },
      {
        "ID": 6,
        "Settings": {
          "mode": "2",
          "rate": "27000000/900000"
        }
      }
    ]
  },
  "PAR": {
    "Num": 1,
    "Den": 1
  },
  "Metadata": {},
  "SequenceID": 0,
  "Source": {
    "Angle": 1,
    "Range": {
      "Type": "chapter",
      "Start": 1,
      "End": 1
    },
    "Title": 1,
    "Path": "d:\\MyDocs\\Church\\Nepali\\Seremban\\Sermons\\Mohan Sijali\\Blessed is the one whose sins are forgiven\\Jesus Anointed by a Sinful Woman - Lk7.36-50.mp4"
  },
  "Subtitle": {
    "Search": {
      "Burn": false,
      "Default": false,
      "Enable": true,
      "Forced": true
    },
    "SubtitleList": [
      {
        "Burn": true,
        "Default": false,
        "Forced": false,
        "ID": 0,
        "Offset": 0,
        "Track": -1,
        "SRT": {
          "Codeset": "UTF-8",
          "Filename": "d:\\MyDocs\\Church\\Nepali\\Seremban\\Sermons\\Mohan Sijali\\Blessed is the one whose sins are forgiven\\Jesus Anointed by a Sinful Woman - Lk7.36-50 - Copy.srt",
          "Language": "eng"
        }
      }
    ]
  },
  "Video": {
    "Encoder": 65536,
    "Level": "4.0",
    "TwoPass": false,
    "Turbo": false,
    "ColorMatrixCode": 0,
    "Options": "",
    "Preset": "fast",
    "Profile": "main",
    "Quality": 22.0,
    "HWDecode": false,
    "QSV": {
      "Decode": false,
      "AsyncDepth": 0
    }
  }
}
[07:38:39] CPU: Intel(R) Core(TM) i5-7200U CPU @ 2.50GHz
[07:38:39]  - Intel microarchitecture Kaby Lake
[07:38:39]  - logical processor count: 4
[07:38:39] Intel Quick Sync Video support: yes
[07:38:39]  - Intel Media SDK hardware: API 1.23 (minimum: 1.3)
[07:38:39]  - H.264 encoder: yes
[07:38:39]     - preferred implementation: hardware (any) via D3D11
[07:38:39]     - capabilities (hardware):  breftype icq+la+i+downs vsinfo opt1 opt2+mbbrc+extbrc+trellis+ib_adapt+nmpslice
[07:38:39]  - H.265 encoder: yes (8bit: yes, 10bit: yes)
[07:38:39]     - preferred implementation: hardware (any) via D3D11
[07:38:39]     - capabilities (hardware):  bpyramid icq vsinfo opt1
[07:38:39] hb_scan: path=d:\MyDocs\Church\Nepali\Seremban\Sermons\Mohan Sijali\Blessed is the one whose sins are forgiven\Jesus Anointed by a Sinful Woman - Lk7.36-50.mp4, title_index=1
udfread ERROR: ECMA 167 Volume Recognition failed
src/libbluray/disc/disc.c:323: failed opening UDF image d:\MyDocs\Church\Nepali\Seremban\Sermons\Mohan Sijali\Blessed is the one whose sins are forgiven\Jesus Anointed by a Sinful Woman - Lk7.36-50.mp4
src/libbluray/disc/disc.c:424: error opening file BDMV\index.bdmv
src/libbluray/disc/disc.c:424: error opening file BDMV\BACKUP\index.bdmv
[07:38:39] bd: not a bd - trying as a stream/file instead
libdvdnav: Using dvdnav version 6.0.0
libdvdread: Encrypted DVD support unavailable.
libdvdread:DVDOpenFileUDF:UDFFindFile /VIDEO_TS/VIDEO_TS.IFO failed
libdvdread:DVDOpenFileUDF:UDFFindFile /VIDEO_TS/VIDEO_TS.BUP failed
libdvdread: Can't open file VIDEO_TS.IFO.
libdvdnav: vm: failed to read VIDEO_TS.IFO
[07:38:39] dvd: not a dvd - trying as a stream/file instead
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'd:\MyDocs\Church\Nepali\Seremban\Sermons\Mohan Sijali\Blessed is the one whose sins are forgiven\Jesus Anointed by a Sinful Woman - Lk7.36-50.mp4':
  Metadata:
    major_brand     : isom
    minor_version   : 512
    compatible_brands: isomiso2avc1mp42
    encoder         : Lavf55.33.100
  Duration: 00:02:19.23, start: 0.000000, bitrate: 1693 kb/s
    Stream #0:0(und): Video: h264 (High) [avc1 / 0x31637661]
      yuv420p, 720x480 [PAR 1:1 DAR 3:2], 1497 kb/s
      30 fps, 15360 tbn (default)
    Metadata:
      handler_name    : VideoHandler
    Stream #0:1(und): Audio: aac (LC) [mp4a / 0x6134706D]
      44100 Hz, stereo, fltp, 189 kb/s (default)
    Metadata:
      handler_name    : SoundHandler
[07:38:39] scan: decoding previews for title 1
[07:38:39] scan: audio 0x1: aac, rate=44100Hz, bitrate=189350 Unknown (AAC) (2.0 ch)
[07:38:39] scan: 10 previews, 720x480, 30.000 fps, autocrop = 38/38/0/0, aspect 1.50:1, PAR 1:1
[07:38:39] scan: supported video decoders: avcodec qsv
[07:38:39] libhb: scan thread found 1 valid title(s)
[07:38:39] Skipping subtitle scan.  No suitable subtitle tracks.
[07:38:39] starting job
[07:38:39] decomb filter thread started for segment 0
[07:38:39] decomb filter thread started for segment 1
[07:38:39] decomb filter thread started for segment 2
[07:38:39] decomb filter thread started for segment 3
[07:38:39] decomb check thread started for segment 0
[07:38:39] decomb check thread started for segment 1
[07:38:39] [ass] Shaper: FriBidi 0.19.7 (SIMPLE) HarfBuzz-ng 1.7.2 (COMPLEX)
[07:38:39] decomb check thread started for segment 2
[07:38:39] decomb check thread started for segment 3
[07:38:39] mask filter thread started for segment 0
[07:38:39] mask filter thread started for segment 1
[07:38:39] mask filter thread started for segment 2
[07:38:39] mask filter thread started for segment 3
[07:38:39] mask erode thread started for segment 0
[07:38:39] mask erode thread started for segment 1
[07:38:39] mask erode thread started for segment 2
[07:38:39] mask erode thread started for segment 3
[07:38:39] mask dilate thread started for segment 0
[07:38:39] mask dilate thread started for segment 1
[07:38:39] mask dilate thread started for segment 2
[07:38:39] mask dilate thread started for segment 3
[07:38:39] yadif thread started for segment 0
[07:38:39] yadif thread started for segment 1
[07:38:39] yadif thread started for segment 2
[07:38:39] yadif thread started for segment 3
[07:38:39] [ass] Using font provider directwrite
[07:38:39] work: track 1, dithering not supported by codec
[07:38:39] work: only 1 chapter, disabling chapter markers
[07:38:39] job configuration:
[07:38:39]  * source
[07:38:39]    + d:\MyDocs\Church\Nepali\Seremban\Sermons\Mohan Sijali\Blessed is the one whose sins are forgiven\Jesus Anointed by a Sinful Woman - Lk7.36-50.mp4
[07:38:39]    + title 1, chapter(s) 1 to 1
[07:38:39]    + container: mov,mp4,m4a,3gp,3g2,mj2
[07:38:39]    + data rate: 1693 kbps
[07:38:39]  * destination
[07:38:39]    + d:\MyDocs\Church\Nepali\Seremban\Sermons\Mohan Sijali\Blessed is the one whose sins are forgiven\Lk7.36-50.mp4
[07:38:39]    + container: MPEG-4 (libavformat)
[07:38:39]      + align initial A/V stream timestamps
[07:38:39]  * video track
[07:38:39]    + decoder: h264
[07:38:39]      + bitrate 1497 kbps
[07:38:39]    + filters
[07:38:39]      + Comb Detect (mode=3:spatial-metric=2:motion-thresh=1:spatial-thresh=1:filter-mode=2:block-thresh=40:block-width=16:block-height=16)
[07:38:39]      + Decomb (mode=39)
[07:38:39]      + Framerate Shaper (mode=2:rate=27000000/900000)
[07:38:39]        + frame rate: 30.000 fps -> peak rate limited to 30.000 fps
[07:38:39]      + Subtitle renderer ()
[07:38:39]      + Crop and Scale (width=720:height=404:crop-top=38:crop-bottom=38:crop-left=0:crop-right=0)
[07:38:39]        + source: 720 * 480, crop (38/38/0/0): 720 * 404, scale: 720 * 404
[07:38:39]    + Output geometry
[07:38:39]      + storage dimensions: 720 x 404
[07:38:39]      + pixel aspect ratio: 1 : 1
[07:38:39]      + display dimensions: 720 x 404
[07:38:39]    + encoder: H.264 (libx264)
[07:38:39]      + preset:  fast
[07:38:39]      + profile: main
[07:38:39]      + level:   4.0
[07:38:39]      + quality: 22.00 (RF)
[07:38:39]  * subtitle track 1, English [SRT] (track 0, id 0xff, Text) -> Render/Burn-in, offset: 0, charset: UTF-8
[07:38:39]  * audio track 1
[07:38:39]    + decoder: Unknown (AAC) (2.0 ch) (track 1, id 0x1)
[07:38:39]      + bitrate: 189 kbps, samplerate: 44100 Hz
[07:38:39]    + mixdown: Stereo
[07:38:39]    + encoder: AAC (libavcodec)
[07:38:39]      + bitrate: 160 kbps, samplerate: 48000 Hz
[07:38:39] sync: expecting 4177 video frames
[07:38:39] encx264: min-keyint: 30, keyint: 300
[07:38:39] encx264: encoding at constant RF 22.000000
[07:38:39] encx264: unparsed options: level=4.0:ref=2:8x8dct=0:weightp=1:subme=6:vbv-bufsize=25000:vbv-maxrate=20000:rc-lookahead=30
x264 [info]: using SAR=1/1
x264 [info]: using cpu capabilities: MMX2 SSE2Fast SSSE3 SSE4.2 AVX FMA3 BMI2 AVX2
x264 [info]: profile Main, level 4.0
[07:38:40] sync: first pts video is 0
[07:38:40] sync: first pts audio 0x1 is 0
[07:38:40] sync: first pts subtitle 0xff is 0
[07:38:40] sync: "Chapter 1" (1) at frame 3 time 6000
[07:38:40] [ass] fontselect: (sans-serif, 400, 0) -> ArialMT, 0, ArialMT
[07:38:40] [ass] Glyph 0x969 not found, selecting one more font for (sans-serif, 400, 0)
[07:38:40] [ass] fontselect: (sans-serif, 400, 0) -> NirmalaUI, 0, NirmalaUI
[07:39:46] reader: done. 1 scr changes
[07:39:47] work: average encoding speed for job is 61.521420 fps
[07:39:47] comb detect: heavy 0 | light 0 | uncombed 4177 | total 4177
[07:39:47] decomb: deinterlaced 0 | blended 0 | unfiltered 4177 | total 4177
[07:39:47] vfr: 4177 frames output, 0 dropped and 0 duped for CFR/PFR
[07:39:47] vfr: lost time: 0 (0 frames)
[07:39:47] vfr: gained time: 0 (0 frames) (0 not accounted for)
[07:39:48] aac-decoder done: 5992 frames, 0 decoder errors
[07:39:48] h264-decoder done: 4175 frames, 0 decoder errors
[07:39:48] sync: got 4177 frames, 4177 expected
[07:39:48] sync: framerate min 30.000 fps, max 30.000 fps, avg 30.000 fps
x264 [info]: frame I:15    Avg QP:17.44  size: 30129
x264 [info]: frame P:1254  Avg QP:20.54  size:  4594
x264 [info]: frame B:2908  Avg QP:23.19  size:   668
x264 [info]: consecutive B-frames:  2.6% 10.7%  9.0% 77.7%
x264 [info]: mb I  I16..4: 21.4%  0.0% 78.6%
x264 [info]: mb P  I16..4:  3.7%  0.0%  3.2%  P16..4: 37.4% 15.7%  8.8%  0.0%  0.0%    skip:31.3%
x264 [info]: mb B  I16..4:  1.6%  0.0%  0.1%  B16..8: 15.5%  2.7%  0.1%  direct: 5.1%  skip:74.8%  L0:42.4% L1:50.3% BI: 7.2%
x264 [info]: coded y,uvDC,uvAC intra: 31.9% 50.1% 12.9% inter: 6.4% 13.5% 0.0%
x264 [info]: i16 v,h,dc,p: 53% 21% 18%  9%
x264 [info]: i4 v,h,dc,ddl,ddr,vr,hd,vl,hu: 34% 16% 16%  5%  6%  7%  5%  6%  4%
x264 [info]: i8c dc,h,v,p: 60% 15% 22%  3%
x264 [info]: Weighted P-Frames: Y:0.2% UV:0.0%
x264 [info]: ref P L0: 71.9% 28.1%
x264 [info]: ref B L0: 84.2% 15.8%
x264 [info]: ref B L1: 96.3%  3.7%
x264 [info]: kb/s:468.51
[07:39:48] mux: track 0, 4177 frames, 8153487 bytes, 468.37 kbps, fifo 4096
[07:39:48] mux: track 1, 6522 frames, 2753545 bytes, 158.17 kbps, fifo 8192
[07:39:48] libhb: work result = 0

# Encode Completed ...

User avatar
BradleyS
Moderator
Posts: 1860
Joined: Thu Aug 09, 2007 12:16 pm

Re: Two unicode characters not rendered correctly in subtitle

Post by BradleyS »

Code: Select all

[07:38:40] [ass] fontselect: (sans-serif, 400, 0) -> ArialMT, 0, ArialMT
[07:38:40] [ass] Glyph 0x969 not found, selecting one more font for (sans-serif, 400, 0)
[07:38:40] [ass] fontselect: (sans-serif, 400, 0) -> NirmalaUI, 0, NirmalaUI
The subtitle library can't seem to find one of the characters in ArialMT, so it used a different font as a fallback.

Anyway, there is a good chance we can update the subtitle libraries again, I will take a look. Thanks for the heads up.
User avatar
BradleyS
Moderator
Posts: 1860
Joined: Thu Aug 09, 2007 12:16 pm

Re: Two unicode characters not rendered correctly in subtitle

Post by BradleyS »

There doesn't appear to be a newer release of libass. Perhaps raise the issue on their GitHub: https://github.com/libass/libass
TeeJT
Posts: 3
Joined: Sun Sep 23, 2018 11:57 pm

Re: Two unicode characters not rendered correctly in subtitle

Post by TeeJT »

Sorry for the trouble. It was not the library problem but the font problem.
I found it is the default font that does not support zwj and zwnj Unicode characters. I read from the activity log that the program cannot find the glyph for the character.
srt subtitle format does not allow me to specify the font so default font is used. However ass subtitle format (Alpha Sub Station format) does allow me to specify font and size so I am using this format but had to do a little workaround to achieve the end result.

What I am doing is given an mp4 video, I want to add Nepali lyrics to the song.
I have done it but first must install MKVToolNix 64 bit and HandbrakeCli.exe
1) First I use Subtitle Edit to create the subtitle and save as ass format (Alpha Sub Station format)
2) I merge both the video and the ass file to form another mp4 file using mkvmerge.exe which comes with MKVToolNix 64 bit
3) I then use HandbrakeCli.exe and burn in the ass subtitle with the font changed to kokila and size is 26.

I wrote a perl script to do it. So the perl script will change the font from Arial and size 20 to kokila and size 26 and use commandline to create the mp4. It works like a charm to do the work. Thank you very much for such a wonderful program with a commandline interface.

AddSub.pl

Code: Select all

use utf8;
use File::Copy;
use Cwd;

$pathToMkMerge = "C:/Program Files/MKVToolNix/mkvmerge.exe";
$pathToHandbrakeCli = "C:/Program Files/HandBrake/HandbrakeCLI.exe";

my $myDir = Cwd::abs_path($0);
$myDir =~ s/[\/\\][^\/\\]+$//;
chdir($myDir);

&main;
exit;

sub main {	
	print "Please key in mp4 filename (e.g. Kohi Chaina):";
	my $fName;
	chomp($fName = <STDIN>);
	#$fName = "lk7.36-50";
	if (!-e "./$fName\.mp4") {
		$fName = "$fName\.mp4";
		print "No such mp4 file - $fName\.mp4!\n"; 
		getc(STDIN); 
		exit;
	}
	if (!-e "./$fName\.ass") {
		print "No such subtitle file - $fName\.ass!\n"; 
		getc(STDIN); 
		exit;
	}
	
	print "$fName\n";

	&ModifyImportTxt($fName);
	my $cmdline;
	$cmdline = "\"$pathToMkMerge\" -o temp.mp4 \"$fName.mp4\" \"$fName\.ass\"";
	print "$cmdline\n";
	system($cmdline);
	#exit;
	
	$cmdline = "\"$pathToHandbrakeCli\" -i temp.mp4 -o \"$fName\_final.mp4\" -f av_mp4 --subtitle 1 --subtitle-burned";
	system($cmdline);
	
	print "Finished\n";
	getc(STDIN);
	exit;
}

sub ModifyImportTxt{
	my($fname)=@_;
	my($file,@lines,$line,$num_lines,$i,$j,$bk,$no);

	my $fname = "./$fname\.ass";
	open(INTXT, $fname);
	$num_lines=@lines=<INTXT>;
	print "num_lines= $num_lines - fname=$fname\n";
	close(INTXT);

	my $outTxt = "";
	for ($i=0; $i<$num_lines; $i++) {
		$line=$lines[$i];
		$line =~ s/[\n\r]//g;
		$line =~ s/Default,Arial,\d+,&H00FFFFFF/Default,Kokila,28,&H00FFFFFF/ig;
		$outTxt .= "$line"; 
		if ($i<$num_lines-1) {
			$outTxt .= "\n"; 
		}
	}
	open OUTXT, ">$fname";
	print OUTXT $outTxt;
	close OUTXT;
}


Woodstock
Veteran User
Posts: 4614
Joined: Tue Aug 27, 2013 6:39 am

Re: Two unicode characters not rendered correctly in subtitle

Post by Woodstock »

The SRT import in handbrake does have limited support for SSA font commands. John detailed a few of the supported commands in a message a couple of years ago, but ... I have no clue where it is now.
User avatar
BradleyS
Moderator
Posts: 1860
Joined: Thu Aug 09, 2007 12:16 pm

Re: Two unicode characters not rendered correctly in subtitle

Post by BradleyS »

A very competent solution. Nice work.
TeeJT
Posts: 3
Joined: Sun Sep 23, 2018 11:57 pm

Re: Two unicode characters not rendered correctly in subtitle

Post by TeeJT »

Since mkvmerge has license - This code comes under the GPL v2 (see www.gnu.org or the file COPYING - https://gitlab.com/mbunkus/mkvtoolnix/b ... er/COPYING).
Modify as needed.
https://gitlab.com/mbunkus/mkvtoolnix/b ... /README.md
With such this license, the source can be incorporated to merge and external ass subtitle file and burn in the subtitle in the final mp4.
Post Reply