+4 votes

It seems that Paratext 8 has a problem migrating interlinear glosses if you use the Interlinearizer in 7 to gloss text using the NT Greek as your model text. A user asked for help in this situation, and I saw that after migration, the language code in the lexicon.xml and in the per-book gloss files was “el” while the code for NT greek should be “grc”. (In a test project, I glossed a few words in Greek, then migrated, and in this migrated project, the code for NT Greek became “lbj”, the code for a language from India. I have reported this issue to the developers).

The language code is used in three places in the interlinearizer data.

  1. Inside the lexicon.xml file, in the “Gloss Language” field. This field occurs for every word that is given a gloss in that language.
  2. in the subfolder name and the file name of the per-book interlinearizer file. For example, “interlinearizer_el_MAT.xml” is the file for glosses in the “el” language for Matthew.
  3. Inside each per-book file, in the glossLanguage field in the second line of the file.

How did I find out that “grc” was the right code for NT greek? I glossed one word in Paratext 8 with Greek as the model, saved the change then looked at the files.

So to manually convert this data, I did:
0) close Paratext if it is open

  1. a search-replace of lexicon.xml, and replaced “el” with “grc”, for example:

    <Gloss Language="el">δέ</Gloss> 
    

    becomes

    <Gloss Language="grc">δέ</Gloss>
    

To limit the change just to the codes and not any “el” strings inside a larger word, include the quote marks (straight double quotes) in the search string and in the replace string.

2a) change the name of the “Interlinear_el” folder inside the project folder to “Interlinear_grc”. (If you’ve created a test file in the desired code, you would delete the folder and its file first).

2b) change the file names inside this folder from "Interlinear_el_[Bookcode].xml to "Interlinear_grc_[Bookcode].xml

  1. change the Glosslanguage code in the second line of each per-book file to the desired code. For instance

     <InterlinearData ScrTextName="MP8" GlossLanguage="el" BookId="MAT">
    
     becomes
    
     <InterlinearData ScrTextName="MP8" GlossLanguage="grc" BookId="MAT">
    
  2. Start Paratext and see if it worked.

When editing the XML files, make sure you don’t change any < or > or </ or /> codes, these are like backslashes in USFM. If you make a mistake, Paratext may reject your edited lexicon.xml and change its name to lexicon.xmlcorrupt, and start creating a new one. If you save a copy of your lexicon.xml file in another location before editing, you could bring that back if you hit this problem and cannot identify what went wrong in your edited file.

Paratext by [Expert]
(3.1k points)

reshown

4 Answers

0 votes
Best answer

Another example: using the NIV84 as the model text.
After migration, the project has “en” as the language code. But the actual language code for the NIV84 in Paratext 8 is “en-US”. So you have to run through the steps to change “en” to “en-US” in the lexicon, in the file names of the per-book files, and inside the per-book files.

by [Expert]
(3.1k points)

Is there a good reason to differentiate between the US and Anglicized versions of a translation using the en-US and en-UK language codes? If not, would it be a good idea to remove the language code distinction between the usNIV11 and ukNIV11 texts, as well as the usNIV84? (We do not have a ukNIV84 project.)

I doubt it is important to distinguish between US and UK English. You’d have “favor” vs “favour”, “honor” vs “honour” but I think these words are not significant enough in number to warrant distinguishing the varieties.

I suspect there may be rather more differences than a few spellings. I don’t know the NIV variants that well but there are many differences of usage and idiom across the US and UK TEV/GNB versions.

JR

0 votes

It can be simpler to use a different model text that has the same language code as the one that was used in PT 7, if there is an acceptable alternative. Thanks for the tips on editing the xml files sewhite I had tried this for a user that had an orthography change, but Paratext rejected my new file in the way that you described.

by [Expert]
(2.9k points)

Update – I had the same problem anon044949 had with the orthography change project. The issue was doing a search/replace on five vowels in the language to replace them with different characters. It turns out, there were a few glosses done in the new orthography in the lexicon, done after the conversion. When I converted all the old entries, there were a few duplicates. Two instances of the same word, each one with a different sense ID or gloss ID. Paratext when loading this file into memory protested and marked the lexicon file as corrupt. So besides changing the < > and </ > codes, there is a second way to “corrupt” a lexicon, end up with duplicate words. But this was a different situation than changing the language codes, this required changing the words and morphemes inside the lexicon file to match the new orthography.

0 votes

Yesterday I ran into a situation where in PT7 the language had been “Spanish” for the RV60 and in PT8 the language for RVR1960 is “spa”. I followed the instructions of Steven in an earlier post to make the appropriate changes, but the glosses still did not appear as approved (as they were in PT7).

Tim S. pointed out to me that in PT8 there are certain languages that display the three letter code (in this case spa), but internally use the two letter code (in this case es) for matching the interlinear data. Once I made the appropriate changes and used “es” the glosses appeared as approved.

So, if you try using the language of the model text and it doesn’t work you might try the appropriate two letter code.

A chart of these codes can be found at: https://www.loc.gov/standards/iso639-2/php/code_list.php

by (8.4k points)
+1 vote

This process was necessary to re link our BT project with the Interlinearizer after upgrading to PT9.2 (we had originally put the language code the same as the main project rather than (en), and PT couldn’t identify it.
The only step I would add is that in the BT menu under “project settings” in PT you are able to switch the language code even after creating the project (in a standard project you cannot do this). We had multiple teams with BT projects that were created with the same language code as their main project. Changing the code to English (eng) allowed us to select our BT project in the Interlinearizer setup

(NOTE: we also figured out that you cannot use “Create Glosses for ZZZ with no model text” and choose “output to” to select a BT project. We don’t use a model text for our BT’s so this seemed ideal. But it will only let you chose a Standard Project for output. So the option to create a back translation using the BT project AS THE MODEL was the option we needed. It’s all working great now!!

by (161 points)

Related questions

0 votes
1 answer
0 votes
2 answers
0 votes
2 answers
Welcome to Support Bible, where you can ask questions and receive answers from other members of the community.
Just as a body, though one, has many parts, but all its many parts form one body, so it is with Christ.
1 Corinthians 12:12
2,665 questions
5,424 answers
5,086 comments
1,486 users