0 votes

We have a project in Burmese project that has Zero Width Non-Joiner (u+200C) invisible characters which I would like to mark as Invalid in the Characters inventory, but we cannot. We get a message:
image
However it does not exist in the Alphabetic Characters tab.or any other tab of the Language settings.
I also found the the same was true for Zero Width Joiner (u+200D).

When I examined the ldml.xml file and found these curious lines:

<!--sil:punctuation-pattern pattern="‌" context="medial" /-->
<sil:punctuation-pattern pattern="‍" context="medial" />

Inside the quotes were the ZWNJ 200C and ZWJ 200D characters. I tried deleting these lines but that did not change Paratext’s behavior.

I also did an experiment in an English project and tried adding ZWNJ and ZWJ to the language settings. Even though they were added as non-standard diacritics, I was not able to see them in the character inventory.

So I am wondering is this a feature or a Bug? Is the behavior of ZWNJ and ZWJ built into the behavior of certain languages and cannot be changed? If not, how can I mark ZWNJ Invalid in my Burmese project? (Note: The language code for Burmese is my.)

Paratext by (1.8k points)
reshown

3 Answers

0 votes
Best answer

To get that message, the item you’re wanting to mark as invalid has to be in either the characters list or the medial punctuation list. It might be worth copy/pasting those fields into another application that can show you invisible Unicode characters.

by [Expert]
(16.2k points)

That is what I thought as well. But that does not seem to be the case. After repeatedly searching for those characters, including doing a search in the Settings and ldml files, the problem persisted.

I did an experiment today and added ZWNJ and ZWJ (200C and 200D) to an English project then opened the Character Inventory. The both were marked as Valid and could not be marked as Invalid. The same error message as above appeared. I will file a Bug report…

Proceeding with caution, if you really want to get rid of them, you could try seeing if they are in the project’s settings file, either the Settings or ldml xml files and removing them from there. Backup the files somewhere safe first, of course.

Blessings,

0 votes

Hey CrazyRocky, I ran into an issue with \u200f (which Paratext inserted in some of my RTL projects after the verse numbers.) To remove it I did this:
Search: (\v [\d-]*)\u200f( )
Replace: \1\2

I don’t know if this will help with your situation, because it is a little different.

by (192 points)
reshown
0 votes

I wonder if the zero characters exist in any of the alphabet characters themselves. When you look in characters inventories and select “show combinations,” does the Unicode value “09CD” show up in the “Unicode value” column? This occurs in the Bangla font as a zero-width joiner character.

by (175 points)

Related questions

Welcome to Support Bible, where you can ask questions and receive answers from other members of the community.
For just as each of us has one body with many members, and these members do not all have the same function, so in Christ we, though many, form one body, and each member belongs to all the others.
Romans 12:4-5
2,476 questions
5,170 answers
4,866 comments
1,282 users