0 votes

I’ve got a Teckit script (with the help of MSEA non-roman-script initiative software section) for the conversion between our Non roman script (Tibetan) to Latin. We have the facility in it for capitalisation after a full stop and have placed it into Paratext 8 for automatic transliteration.

It works fine if the full stop occurs in the middle of a verse or backslash region; however our script breaks/fails when a backslash command is between the full stop and the next character which should have capitalised.

Is there anything in Paratext 8 which we can be used as an automatically tool to convert the lower case letter after a full stop to the capitalised form at a sentence boundary?

Or is there a facility in SIL converters which can do that?

Paratext by (115 points)
reshown

4 Answers

0 votes

From your comments I surmise that any character styling, or a new line of poetry (\q1 , \q2) will block the application of the rule. Likewise a verse number (\v # ) may also be an problem.

I do not have an answer, but I have a similar problem that may have a related solution, so I am piggybacking on your post.

What I am trying to do is to change the capitalization of US English headings to UK title capitalization. US title style capitalizes the first word and all words except a limited set of short words, whereas the UK style capitalizes the first word and only proper nouns. I have the logic to do this, but I need a way to apply it only to text in headings (\s1 , \s2 , etc. ).

What I have found is that transliteration projects only apply encoding converter changes to vernacular text, and they are not aware of markup contexts. I have have a cc table that applies capitalization changes properly outside of Paratext, but when I load it into a transliteration project it does not work correctly. My table makes changes to all vernacular text fields. Non-vernacular fields (\id , \rem ) were not affected

For the transliteration functionality to really be useful, we need a way to make changes that are aware of the markup context. Is that possible?

by (1.8k points)
reshown

Before Paratext allowed transliteration projects within Paratext itself, we
used the TECkit Bulk SFM Converter tool which is part of SIL Converters (
http://scripts.sil.org/cms/scripts/page.php?site_id=nrsi&id=enccnvtrs).
This tool uses the same *.tec file used in the transliteration project in
Paratext but has the ability to apply the conversion to specific SFM’s
only. The only problem in using this tool is that it must be applied to the
text manually and outside Paratext whenever the text or converter is
changed and each Scripture file (book) must be saved manually, a pain when
there are 66+ books to deal with.

Blessings,

Shegnada J.

Language Technology & Publishing Coordinator, SIL Nigeria

Complex Script Layout Specialist, GPS Dallas

Skype: Shegnada.james.

([Phone Removed]

0 votes

Replying to this conversation rather than starting a new one since this conversation seems closely related to my question:

We have an NT published in Arabic script; there is a demand for publication the same text in Latin script.

This will be a one-time transliteration, so does not need to be updated dynamically within Paratext. But I am wondering if anyone has developed a RegEx string to make sentence initial characters uppercase? I have the same challenge as anon024386 mentioned in the initial post of getting the string to look around all the possible SFM’s and see the next word following final punctuation…

by (175 points)

This is a side response to your email, not an answer to your question but a comment on your process.

In my experience, transliteration is never the simple one-time event we all wish it could be and it is worth using the transliteration tool in Paratext. You will want to run all the Paratext checks again on the transliterated project and you are going to find issues that will make you edit your converter and your source text multiple times. As well, the TecIt mapping is much stronger and more flexible than RegEx and can include your capitalization need within itself. Having done the process both outside Paratext (when it was not possible) and inside Paratext, I highly encourage you to do it within Paratext.

Thank you Shegnada! Yes, I completely agree that no transliteration protocol will give a perfect result, and we will need to run the checks in Paratext.

We are using TecKit for transliteration, but it is inadequate because of some ambiguties when moving from Arabic to Latin script that cannot (so far) be sorted out by an algorithm. So we are going to have to use some kind of dictionary/word list based approach for the initial transliteration.

This solves the capitalization of proper names. But we are looking for an automated way to add capitalization conventions for the beginning of sentences, quotations, etc.

I hope this clarifies!

0 votes

You can use the following code to capitalize letters in RegexPal:
^^^
Here is a regex which will capitalize the first letter of a sentence following a [.?!] . It allows for intervening closing quotes, paragraph, poetry or list markup, blank lines, verse numbers, and sentence initial quotation marks. It also capitalizes after a \s and a \p , which may occur chapter initially or following a heading that does not have final punctuation. You will need to manually check words following ?" or !" these are sometimes capitalized and sometimes not.
Find:
(?<=([.?!][”’]*(\s+\\(b|p|q|li))*|\\(p|s)?)(\s+\\v\s\S+)?\s+[“‘]*)(\p{Ll})
Replace:
^^^\6
This code will capitalize words in lists marked with \li and poetry marked with \q If you use more complex markup: footnotes cross-references, \q1\q2 \li1 \li2 \pm \pc \qc \qr etc. you will need to make adjustments.
Note Curly quotes do not mirror so they need to be reversed in the encoding converter transliteration from RTL to LTR.

by (1.8k points)
reshown
0 votes

CrazyRocky, very belated thanks for this RegEx for capitalization. (Finally returning to this task after a long interruption.) I was able to modify the string as you suggested to accommodate our quotation marks« » as well as \q1 and \q2 markers, and it works great.

I noticed today that at Luke 2:49-50 we have a footnote between closing punctuation and closing quotation mark, which seems to prevent the capitalization of the first word of v. 50:

image

The footnote contains an alternate translation of the last phrase of the quotation. Would you suggest modifying the search string, or moving the footnote?

by (175 points)

CrazyRocky, I’m away from my desk this week, but wanted to let you know that you can run an external process using a batch file (with a loop for each of the 66+ books) which only converts specific markers, either using CC or TECkit mappings. The only caveat is that you need to switch chapters in Paratext (which can even remain open while it runs) to refresh/update the text displayed.
My experience is that it is fast AND reliable.
I could send you a sample at the end of the week, but I’m guessing you’ve already got the know-how. :slight_smile:
Blessings,
Mark

Related questions

Welcome to Support Bible, where you can ask questions and receive answers from other members of the community.
Finally, all of you, be like-minded, be sympathetic, love one another, be compassionate and humble.
1 Peter 3:8
2,648 questions
5,396 answers
5,069 comments
1,443 users