0 votes

In our team, we embarked on revising a published translation of the New Testament. In the beginning we were optimistic, hoping that we wouldn’t take long, and that not many changes need to be made. Now, we have found over 90% of the verses have been modified.

We did all our work on the revision externally from Paratext. This is for many reasons, that I won’t get in to here. However, now that we are nearing the finish line, we need to work on importing the work back into Paratext.

The text we have is structured very simply, you can think of it as a table, where the first column is the verse reference (eg: Mark 1:1), and the second column is the contents of the verse in plain text. There are no headings, no paragraph markers, no footnotes, nothing like that. I wanted to take the existing USFM, and update it with the new text. I developed some code that did this, mainly using usfm-grammar. I took the existing USFM, converted it to JSON, modified the JSON to update the translation, and then converted the translation back to USFM using the same tool.

However, I’m noticing that keeping the headers, paragraphs and cross-references preserved is tricky. I spent a day or two working on this (including the process of deciding upon this approach), and I don’t really want to spend a lot of time programming something if it is not a good use of my time. When I imported the result back to Paratext, it would complain about invalid USFM. So it looks like usfm-grammar can produce invalid USFM.

I had a look for libraries to write USFM. I found quite a few options for parsing USFM and for converting USFM to other formats, but usfm-grammar was the only library that I found that could write USFM, converting JSON to USFM.

I’m now looking at two options:

  1. Continue to develop a tool that can merge plain text revisions with the existing USFM.
  2. or… effectively start a new project in Paratext, importing the plain text as USFM, without attempting to merge it with the existing headers and other information in the existing translation.

I’ve already tried option 2. As you can imagine, it was quite easy to generate output like this:

\id MRK
\c 1
\p
\v 1 the first verse
\v 2 the second verse
\v 3 the third verse
\v 4 etc

What advice would you give me? I’m currently leaning towards option 2, and just doing the work of adding back the headers and cross-references manually. The headers need a manual revision any way.

Paratext by (440 points)

4 Answers

0 votes
Best answer

In the Compare versions tool, you can select different projects on the left and right sides. So you can compare your main project to your minimal project (selecting Current Version on both sides). You can then right-click on the differences to get your main project to have the changes from your minimal project:

by [Expert]
(16.2k points)

Thank you for the screenshot. I now realise that I seem to be experiencing a bug on Paratext for Linux. This is what I see:

When I open Paratext on Windows, I do see those toolbars.

Interesting! I am using PT9, Linux version 9.0, 0.99. In the toolbar I see the two project dropdown boxes as per anon291708’s screenshot (from Windows PT). I can choose another project name in either dropdown list, but the project text displayed does not update. The Version buttons are both active. So indeed there seems to be a Linux bug.

fwiw–In my currently installed PT8 the Tools > Compare Texts menu item brings up a window with no project dropdown boxes, like the shot “bit” posted from his PT9 screen; I’m guessing maybe my PT9-Linux version is more recent than bit’s.

So, I currently can not compare two different project texts in either PT8 or PT9. There used to be two menu items in PT: Compare versions on the Project menu, and Compare texts on the Tools menu. I know there was some confusion about these two tools so maybe they were merged into one? But the functions of both are very important.

+1 vote

You might be able to do a combination of #1 and #2. Paratext includes a tool that shows the differences between two texts. You can create the simple project, then use that tool to merge in the changes from the simple project to the main project. It would still be a lot of manual effort, but it’s probably easier than developing a tool to do it automatically.

You can find help with the tool by searching for “How do I compare two different projects?” in the help inside Paratext. You can then right-click on changes to accept/revert them (e.g. to put the headings and titles back).

by [Expert]
(16.2k points)

reshown

Thanks. I didn’t realise that it was possible to compare two projects.

I searched for “How do I compare two different projects?” in the help inside Paratext, and I found this result:

  • “How do I find where punctuation marks differ in two projects?”

If I search for “differ in two projects”, I get more results:

  • “How do I find where quotation marks differ in two projects?”
  • “How do I find where markers differ in two projects?”
  • “How do I find where punctuation marks differ in two projects?”

It looks like you run various checklists and compare it against another project. Click on the hamburger icon for the project, then expand the menu, then click “Checklists” under “Tools”, then one of the submenu items. It seems to work with these checklists:

  • Verse text
  • Word or phrase
  • Section headings
  • Book titles
  • References
  • Footnotes
  • Markers
  • Quotation marks
  • Punctuation
  • Relatively long verses
  • Relatively short verses
  • Long sentences
  • Long paragraphs

I’m recording this for the lurkers and for when I come back to this later.

Now that I know about this, I think I will go with option #2, and use these tools within Paratext that can compare two projects. That does sound easier than trying to code a tool to merge two translations.

This is not at all what I meant. Which version of Paratext are you using?

I was talking about the Compare Versions tool (Tools > Compare Texts in PT8 - I think, Project > Compare versions in PT9).

I’m using Paratext 9.0

If I click on the hamburger icon in the project window, I can find the menu item labelled “Compare versions”. This opens a window with two panes, one in each column. I can press toolbar buttons that look like arrows to navigate between changes. To me, this looks like this lets me view the historic changes in a project (I think). I don’t see how this tool can let me compare two projects, as it doesn’t even ask me which two projects I want to compare.

+1 vote

This is a really long shot, but your description of getting table-based data to standard format made me think of SheetSwiper (https://software.sil.org/sheetswiper/). Suppose you export the headers, paragraph markers, etc (arranged in appropriate columns) along with their reference indications, then sort that together with the existing verse information, could you generate a spreadsheet that would give the output you need? Unfortunately Scripture text is not as hierarchical as the lexical data that SheetSwiper targets.

by (296 points)

SheetSwiper

Thanks for suggesting SheetSwiper. I click the link, and I tried to find the GitHub repo, but it looks like the link in that page is broken. I think this tool converts a spreadsheet to USFM, which is useful, but not too difficult to code myself. The part that is difficult is merging it with existing USFM.

Glad you have a path forward without SheetSwiper. It looks as though there is no source available, just Windows .exe download.

0 votes

What advice would you give me? I’m currently leaning towards option 2, and just doing the work of adding back the headers and cross-references manually. The headers need a manual revision any way.

That does look like the easier approach. Especially since the headers need a manual revision anyway.

But I wonder if you could just do this as a transformation, you seem close to that now. I would try to get a feel for how long it does to do the headers manually and decide if it’s just faster to get the transformation right.

I gather you do not need to round-trip, this is just an import? Is there anything in the existing Paratext USFM that needs to be preserved, or is the text you are importing a current version that can replace whatever is there?

anon892024

by (448 points)
reshown

Initially, I was hoping to preserve headings, cross-references, and the like, from the existing Paratext USFM. But I’m now thinking that it is easier to port these over manually, for these reasons:

  • We are probably going to be making quite a few changes to the headers manually any way
  • Anything which links to particular parts of verses, such as cross-references, needs to be manually checked any way
  • I will save time developing some code to do transformations, the quality of which can only be verified by hand any way
Welcome to Support Bible, where you can ask questions and receive answers from other members of the community.
For we were all baptized by one Spirit so as to form one body—whether Jews or Gentiles, slave or free—and we were all given the one Spirit to drink.
1 Corinthians 12:13
2,476 questions
5,170 answers
4,866 comments
1,282 users