0 votes

Reading about Unicode equivalence here and on SIL’s NRSI pages here and also here on Wikipedia it would appear that Unicode software can choose to work as NFC or NFD but is supposed to treat all sequences as equivalent and can transform whatever underlying form there is at will. Given the behaviour of the character inventory in listing the character codes (and sequences of characters if you tick the Combinations box), I would like to know how Paratext deals with Unicode normalisation internally, in Find/Replace, in checks/inventories, the word list and regarding output.

Some projects I have seen end up with a mixture of combined forms (single code point for a complex character, such as ‘a’ with a tilde above it) and the separate forms (‘a’ + combining tilde). Visually identical words then are treated as distinct words in the Wordlist (7.5 at least).

Does Paratext try to avoid normalising its text? If so, for what reasons? It would just be nice to know what the software is trying to do, so that any input in terms of keyboarding systems, autocorrect etc and output (apps etc) can be made most helpfully.

Paratext by (506 points)

3 Answers

+1 vote
Best answer

Yes, in 7.5 and before, Paratext did not work well with non-normalized text. We tried to fix much of these types of problems in 7.6 so that the Wordlist doesn’t show multiple words that are the same, Find/Replace finds stuff correctly, Biblical Terms don’t fail to find renderings, etc.
In addition, 7.6 also includes a new option for new projects to select the normalization mode for the entire project so that the files are always saved in a certain normalized format. We don’t allow projects to change normalization mid-way because it will likely create massive merge conflicts during S/R (If I recall correctly, you can use Convert Project to change the normalization of a project mid-way if it’s important).

So, short answer is that Paratext 7.6+ accepts the input as-is from the keyboard and keeps it that way for any existing projects. The tools have been updated to handle non-normalized text much better. Any new projects can also select a normalization to have all text be normalized the same to disk.

EDIT: Aww, @John+Wickberg beat me. :disappointed:

by [Expert]
(16.2k points)

reshown
+1 vote

Prior to Paratext 7.6, Paratext just saved whatever the user typed - there
was no normalization of the data.

In Paratext 7.6, we added a value for normalization on the Advanced tab of
the Project Properties and Settings dialog. The default value for new
projects is NFC, but that can be changed to NFD or None. The value for
existing projects can’t be changed, so any project created before Paratext
7.6 will be left as None.

We at first were just going to go with NFC, but found that NFD was required
for some projects to display correctly. The None option was later added
since even NFD does some ordering that was causing problems for some
projects.

If NFC or NFD is used, any typed input will be normalized before it is used.

The Tools > Advanced > Convert Project command can be used to create a copy
of a project with a new normalization setting.

John+Wickberg

by [Administrator]
(3.1k points)

reshown
0 votes

Adding keywords for search:
composed decomposed

by (1.4k points)
reshown

Related questions

0 votes
0 answers
+1 vote
0 answers
0 votes
2 answers
Paratext Oct 1, 2018 asked by anon480013 (162 points)
0 votes
0 answers
Welcome to Support Bible, where you can ask questions and receive answers from other members of the community.
Finally, all of you, be like-minded, be sympathetic, love one another, be compassionate and humble.
1 Peter 3:8
2,664 questions
5,423 answers
5,083 comments
1,480 users