0 votes
We have been approached by a community who have been translating the Bible by themselves and now want help finishing it off. The text is in Word, with basic formatting but no paragraph styles. There are no usfms. To be able to help them effectively we would need to import the text into Paratext. Where do you begin marking up the text with usfms? I know someone out there has done this before - what tips can you give us?
Paratext by (108 points)

2 Answers

0 votes
I just did this recently with nearly 100 chapters in individual doc/rtf files. This was my basic procedure:

1. I first converted the files to txt with LibreOffice command-line.

2. I wrote a Python script to parse the txt files, figuring out section headings, cross-references and verses. It also cleaned out some junk and converted some code-points to update the encoding. Some files still had \v markers (originally exported from Paratext) and others didn't so I just had to go by the numbers in the text. The Python script output SFM

3. I iterated back and forth between updating/refining the code and making some manual edits to the txt files to clean things up until I had it in pretty good shape.

4. The last step was to concatenate the chapters into one SFM file, and then I copied and pasted that book by book into the Standard view in Paratext. (I had been recommended this years ago as the safest way to paste data, as PT will do some checking and tidying up of the formatting when you do this.)

5. If there were major issues (e.g. \s markers were missing \p after them so half the chapter formatted as section heading) then I'd go back to step 3. Otherwise I'd fix up any small issues manually within PT itself.

I've been encouraged to put my code/procedure online and hope to do that soon. If it might be useful to you, I can get onto that sooner than later...
by (237 points)
Thanks Craig, it sounds about as complex as I expected, but it would be good to have your procedure. I don't know for definite if I will need to do this yet - maybe I will get back to you if it is confirmed.
0 votes

I am doing something similar but from bibles in txt format. I have no programing skill so I imported to excel and used formulas to put basic usfm markers in the right place (header, remarks, chapter numbers, verse numbers, paragraphs section headings). There were no footnotes to worry about. It worked ok, but lots of cleaning up to do in Paratext. Regex find/replace helped a lot.

NB: a time-saver I found is that Paratext import books can import one file with all Bible books in it, provided there is the correct \id line for each book. So I used the command line to combine the book sfm files then imported all books in one go. 

Info in these links might be helpful: 
https://support.bible/604/how-to-convert-microsoft-word-document-with-footnotes-usfm
https://support.bible/5672/importing-text-to-paratext
https://lingtran.net/Import-TXT-or-Word-DOC-Files-into-Paratext-Using-SILAS
https://paratext.org/paratext-training/tutorials/create-and-register-a-new-project/ (Section 4)

Good luck!

ago by (116 points)
Welcome to Support Bible, where you can ask questions and receive answers from other members of the community.
Accept the one whose faith is weak, without quarreling over disputable matters.
Romans 14:1
2,625 questions
5,361 answers
5,041 comments
1,420 users