In our team, we embarked on revising a published translation of the New Testament. In the beginning we were optimistic, hoping that we wouldn’t take long, and that not many changes need to be made. Now, we have found over 90% of the verses have been modified.
We did all our work on the revision externally from Paratext. This is for many reasons, that I won’t get in to here. However, now that we are nearing the finish line, we need to work on importing the work back into Paratext.
The text we have is structured very simply, you can think of it as a table, where the first column is the verse reference (eg: Mark 1:1), and the second column is the contents of the verse in plain text. There are no headings, no paragraph markers, no footnotes, nothing like that. I wanted to take the existing USFM, and update it with the new text. I developed some code that did this, mainly using usfm-grammar
. I took the existing USFM, converted it to JSON, modified the JSON to update the translation, and then converted the translation back to USFM using the same tool.
However, I’m noticing that keeping the headers, paragraphs and cross-references preserved is tricky. I spent a day or two working on this (including the process of deciding upon this approach), and I don’t really want to spend a lot of time programming something if it is not a good use of my time. When I imported the result back to Paratext, it would complain about invalid USFM. So it looks like usfm-grammar
can produce invalid USFM.
I had a look for libraries to write USFM. I found quite a few options for parsing USFM and for converting USFM to other formats, but usfm-grammar
was the only library that I found that could write USFM, converting JSON to USFM.
I’m now looking at two options:
- Continue to develop a tool that can merge plain text revisions with the existing USFM.
- or… effectively start a new project in Paratext, importing the plain text as USFM, without attempting to merge it with the existing headers and other information in the existing translation.
I’ve already tried option 2. As you can imagine, it was quite easy to generate output like this:
\id MRK
\c 1
\p
\v 1 the first verse
\v 2 the second verse
\v 3 the third verse
\v 4 etc
What advice would you give me? I’m currently leaning towards option 2, and just doing the work of adding back the headers and cross-references manually. The headers need a manual revision any way.