Importing a revision of the translation (as a programmer)

Question

In our team, we embarked on revising a published translation of the New Testament. In the beginning we were optimistic, hoping that we wouldn’t take long, and that not many changes need to be made. Now, we have found over 90% of the verses have been modified.

We did all our work on the revision externally from Paratext. This is for many reasons, that I won’t get in to here. However, now that we are nearing the finish line, we need to work on importing the work back into Paratext.

The text we have is structured very simply, you can think of it as a table, where the first column is the verse reference (eg: Mark 1:1), and the second column is the contents of the verse in plain text. There are no headings, no paragraph markers, no footnotes, nothing like that. I wanted to take the existing USFM, and update it with the new text. I developed some code that did this, mainly using usfm-grammar. I took the existing USFM, converted it to JSON, modified the JSON to update the translation, and then converted the translation back to USFM using the same tool.

However, I’m noticing that keeping the headers, paragraphs and cross-references preserved is tricky. I spent a day or two working on this (including the process of deciding upon this approach), and I don’t really want to spend a lot of time programming something if it is not a good use of my time. When I imported the result back to Paratext, it would complain about invalid USFM. So it looks like usfm-grammar can produce invalid USFM.

I had a look for libraries to write USFM. I found quite a few options for parsing USFM and for converting USFM to other formats, but usfm-grammar was the only library that I found that could write USFM, converting JSON to USFM.

I’m now looking at two options:

Continue to develop a tool that can merge plain text revisions with the existing USFM.
or… effectively start a new project in Paratext, importing the plain text as USFM, without attempting to merge it with the existing headers and other information in the existing translation.

I’ve already tried option 2. As you can imagine, it was quite easy to generate output like this:

\id MRK
\c 1
\p
\v 1 the first verse
\v 2 the second verse
\v 3 the third verse
\v 4 etc

What advice would you give me? I’m currently leaning towards option 2, and just doing the work of adding back the headers and cross-references manually. The headers need a manual revision any way.

Paratext Dec 2, 2020 asked by bit (443 points)

4 Answers

Interesting! I am using PT9, Linux version 9.0, 0.99. In the toolbar I see the two project dropdown boxes as per anon291708’s screenshot (from Windows PT). I can choose another project name in either dropdown list, but the project text displayed does not update. The Version buttons are both active. So indeed there seems to be a Linux bug.

fwiw–In my currently installed PT8 the Tools > Compare Texts menu item brings up a window with no project dropdown boxes, like the shot “bit” posted from his PT9 screen; I’m guessing maybe my PT9-Linux version is more recent than bit’s.

So, I currently can not compare two different project texts in either PT8 or PT9. There used to be two menu items in PT: Compare versions on the Project menu, and Compare texts on the Tools menu. I know there was some confusion about these two tools so maybe they were merged into one? But the functions of both are very important.

Dec 9, 2020 commented by KimB (632 points)
Dec 9, 2020 reshown

Fool Running · Answer 1 · 2020-12-02T18:28:16+0000

Thanks. I didn’t realise that it was possible to compare two projects.

I searched for “How do I compare two different projects?” in the help inside Paratext, and I found this result:

“How do I find where punctuation marks differ in two projects?”

If I search for “differ in two projects”, I get more results:

“How do I find where quotation marks differ in two projects?”
“How do I find where markers differ in two projects?”
“How do I find where punctuation marks differ in two projects?”

It looks like you run various checklists and compare it against another project. Click on the hamburger icon for the project, then expand the menu, then click “Checklists” under “Tools”, then one of the submenu items. It seems to work with these checklists:

Verse text
Word or phrase
Section headings
Book titles
References
Footnotes
Markers
Quotation marks
Punctuation
Relatively long verses
Relatively short verses
Long sentences
Long paragraphs

I’m recording this for the lurkers and for when I come back to this later.

Now that I know about this, I think I will go with option #2, and use these tools within Paratext that can compare two projects. That does sound easier than trying to code a tool to merge two translations.

Dec 4, 2020 commented by bit (443 points)

anon806807 · Answer 2 · 2020-12-02T19:20:18+0000

This is a really long shot, but your description of getting table-based data to standard format made me think of SheetSwiper (https://software.sil.org/sheetswiper/). Suppose you export the headers, paragraph markers, etc (arranged in appropriate columns) along with their reference indications, then sort that together with the existing verse information, could you generate a spreadsheet that would give the output you need? Unfortunately Scripture text is not as hierarchical as the lexical data that SheetSwiper targets.

anon892024 · Answer 3 · 2020-12-07T16:36:56+0000

What advice would you give me? I’m currently leaning towards option 2, and just doing the work of adding back the headers and cross-references manually. The headers need a manual revision any way.

That does look like the easier approach. Especially since the headers need a manual revision anyway.

But I wonder if you could just do this as a transformation, you seem close to that now. I would try to get a feel for how long it does to do the headers manually and decide if it’s just faster to get the transformation right.

I gather you do not need to round-trip, this is just an import? Is there anything in the existing Paratext USFM that needs to be preserved, or is the text you are importing a current version that can replace whatever is there?

anon892024

Importing a revision of the translation (as a programmer)

Please log in or register to answer this question.

4 Answers

Please log in or register to add a comment.

Please log in or register to add a comment.

Please log in or register to add a comment.

Please log in or register to add a comment.

Related questions

Categories