0 votes

It is quite common for projects to make some wholesale orthography changes while the project is already underway. Many of these are just superficial changes to the users but deeply problematic to Paratext because a lot of entries in the WordList, Biblical Terms renderings and Interlinearizer will now be obsolete or orphaned in the old orthography. What tips are there for trying to rescue data stored there?

Some changes made with text editors to the Lexicon.xml and WordList files result in Paratext complaining the files are corrupt. Knowing what can make a file corrupt in Paratext’s eyes would be rather helpful.

Paratext by (502 points)

2 Answers

0 votes

The problem with editing the Biblical Terms and Wordlist decisions outside
of Paratext with a regular expression type of tool is that languages are
quite stubborn when it comes to conforming to regular expressions and you
are making blanket, blind choices that the team is not able to interact
with. For many teams, once you mark it as correct in this way it will be
ignored forever.

I recommend a different approach.

Before the change is made, first work through the Wordlist spelling tool
till it finds no more errors and the Biblical Terms tool till all the words
are consistent. That sounds daunting, but you can limit this in various,
controlled ways. For example, for the wordlist tool, set the filter so that
it only contains words affected by the characters in question. And set the
book filter for only the books where significant work has been done which
you wish preserved. Once all the words have been corrected for errors, I
would actually mark the words you expect to change as Wrong so that when
the team accidentally spells them the old way they will be flagged
immediately. For the Biblical Terms tool, you can also set the filters for
only the active books and for only the terms you have done significant work
on.

After that, you can make the orthography changes and again work through the
two tools with those filters set. I would have both tools open at the same
time to refer to them together as you approve the new spellings. I expect
you will find quite a number of Biblical Terms desired by the team (names
especially) that won’t quite conform to the new system. I also expect that
the house-cleaning will either find lots of hidden things that needed
correcting and be well worth the time taken, or if not much is found, that
it doesn’t take as long as you thought.

Blessings,

Shegnada J.

Language Technology and Publishing Coordinator

Wycliffe Nigeria

Skype: Shegnada.james.

([Phone Removed]W / ([Phone Removed]C

by (1.3k points)
0 votes

Shegnada’s recommendation to go slowly and consider all the unforeseen effects as they come up may be the best approach for a lot of projects.

But as far as the technical question, can you edit the spellingstatus.xml file outside of Paratext and preserve your spelling status while applying orthography changes, I think it should be possible. Although when I was experimenting, I did come up frequently with the “spelling status is corrupt and has been removed” message. But if I applied the changes to the spelling status and to the book files at the same time, then started Paratext and brought up the wordlist, it worked. So I think what makes Paratext detect corruption in SpellingStatus.xml is significant mismatches between that file and the words it finds in the text.

Similarly, I could change the spelling of words and morphemes in the lexicon.xml file and also in the different interlinear xml files (one per book, in the folder named for the glossing language).
Similarly, one can change the renderings in the BiblicalTerms.xml file.

One challenge would be if the orthography changes would apply to the English words or bits of words inside XML files, you would have to develop a process not to modify those words.

For instance, a bit of SpellingStatus.xml looks like this:

<?xml version="1.0" encoding="utf-8"?> na terniawꞌekepna

Any rule that could apply to “SpellingStatus” or “Status Word” or “State” or “encoding” , etc. would have to be constrained so it is only applied to the language text within the quote marks.

by [Expert]
(3.1k points)

Related questions

0 votes
3 answers
0 votes
1 answer
Paratext Feb 25, 2020 asked by anon670954 (135 points)
0 votes
2 answers
0 votes
1 answer
Welcome to Support Bible, where you can ask questions and receive answers from other members of the community.
But if we walk in the light, as he is in the light, we have fellowship with one another, and the blood of Jesus, his Son, purifies us from all sin.
1 John 1:7
2,619 questions
5,350 answers
5,037 comments
1,420 users