According to my old notes, I get the cleanest text via copy and paste, with my “View Setting” set to “unformatted”.
I need from PT8 entire chapters in plain text, UTF-8 with clean, original SFM (or USFM if you want) for post-processing with another tool.
I works mostly, but I get typical output like this, see the \p marker:
\c 2\zblo 7974\s1 Marii a ŋʊm Yeesu\p\v 1 Kaŋkǝlǝ́ nɖe na bʊtǝnyarɩ 'baya Ogʊsto a shee ganɔ wàà, ba tʊ̂r bʊtǝna mbʊɖee Room kǝbaja ba jɩ ma, kǝbɛrɛ baŋunii.\v 2 Ŋkǝlǝ́ nɖee ba lee ʊrɩtʊr ʊsǝbaka ɖe ma, Kiriniyɔsɩ a lee ka Nɖiyar ashee Sirii kaatǝna.
Can you see how my paragraph markers never carry any “closing space”. I believe this is against the definition of proper SFM data. I looked it up again in usfmReference2_4.pdf:
SFMs start with a backslash character “” and end with the next space.
For my other tool, working also with Regexes, it would be very very helpful if I could properly define each SFM as “backslash whatever closing-space”.
Please advise for the best way to get clean text with valid markers out of PT8.