0 votes

[reposting from another list - TCOP]

I occasionally concatenate USFM files into a complete Bible text file in order to process them outside of Paratext. This consists of all of the files copied into one text file in sequential order.

One of the cumbersome manual steps in this process has been to separate them back into individual book files for long term storage.

Today, when I am importing a processed file back into Paratext, I haven’t separated the file properly: Each book file exists, but the text file Mark also contains John to Revelation instead of ending with Mark 16. Paratext recognized this as “multiple files contain information about the same book.”

When I try to import only the concatenated file which contains the books Mark-Rev, I get all these books in Paratext, with no apparent loss or added information.

This is foundationally different than the USFM I know, where 1 Bible book == 1 text file. Is this {Bible Book Aggregation|File concatenation} a feature? Does Paratext support importing concatenated files officially?

Taking it one step further: my post processing usually includes file concatenation. Aggregating is to a degree part of our local implementation of USFM. Is book aggregation actually part of the USFM specification? (and if not, could it be?)

I might owe someone a coffee if this is true … 1 bible = 1 text file would save so so much file manipulation time for me.

Thanks,

Michael Hart
Senior Publishing Services Specialist
Bible League International

Paratext by (149 points)

2 Answers

+1 vote
Best answer

I believe that an attempt was made to many years ago to allow importing multiple USFM books from a single file. Most people don’t know about this and it is (I believe) it is rarely done so I can’t guarantee that you won’t find some way to break it.

Another thing that is probably mostly unknown is that you can import USX.

by (646 points)
0 votes

After I posted this I did more research and discovered the linux/unix command csplit provides most of the functionality I need to restore a concatenated USFM file into it’s component pieces. That is, the linux command

$csplit -k /\\id\ / {1,100} *.sfm

breaks a joined file into SFM compatible files that still need naming properly.

Is there a windows equivalent to csplit? That is a known way to split a file by its contents from the command line?

Note that I"m working with American English files. In the past, I’ve had problem with Chinese characters being corrupted during the join step using the linux command line in some systems, even those that claimed POSIX compliance. If you try this at home and are working with unicode < Ux2100, I’d like to hear how it works.

by (149 points)

Related questions

0 votes
3 answers
Paratext Apr 11, 2017 asked by anon716631 (346 points)
0 votes
2 answers
Paratext Apr 18, 2023 asked by goodgoan (316 points)
0 votes
2 answers
Paratext Mar 1, 2023 asked by mnjames (1.6k points)
0 votes
2 answers
Welcome to Support Bible, where you can ask questions and receive answers from other members of the community.
And I tell you that you are Peter, and on this rock I will build my church, and the gates of Hades will not overcome it.
Matthew 16:18
2,479 questions
5,174 answers
4,872 comments
1,283 users