All Cap text and undocumented RegEx

Question

ALL CAPITAL TEXT in a project is anathema to typesetters for various reasons. They can just use styles in InDesign to make text All Cap. However when doing a Print Draft or putting text up on the DBL for YouVersion, there is no way to ensure that the text is All Cap without it actually being ALL CAPITALIZED. (An AllCap setting in the style sheet would be nice.)

So, I came up with a clever plan to use the undocumented RegEx Pal capitalization feature ^^^ in the DBLChanges.txt and PrintDraftChanges.txt files to capitalize lowercase letters in selected fields.

For instance, our Swahili translation wants inscription text, marked \sc …\sc*, to be All Cap rather than Small Cap. So I thought I could use this code in PrintDraftChanges.txt to export \sc text as All Cap:

in '(?<=\\sc\s)[^\\]*?(?=\\)': '(\p{Ll})' > '^^^\1'

But instead of Bwana becoming BWANA it becomes B^^^w^^^a^^^n^^^a

So obviously the RegEx engine for the Changes.txt files does not support this nifty undocumented feature that works in RegEx Pal.

How very nice it would be if case changing were a documented feature of both Changes.txt files and RegEx Pal. I really do not want to keep All Cap text in our master publishing files, but if I don’t, there are times I will need to create it on the fly.

Any suggestions? Have you run into this and how have you decided to handle it?
Would anyone else love to get case changing capability in Paratext regular expressions?
Ought I submit this as a feature request?

Paratext Dec 20, 2017 asked by Kent Spielmann (1.8k points)
Dec 20, 2017 reshown

4 Answers

Best answer

It’s not stupid if it works:

in '(?<=\\sc\s)[^\\]*?(?=\\)': 'a' > 'A'
in '(?<=\\sc\s)[^\\]*?(?=\\)': 'b' > 'B'
in '(?<=\\sc\s)[^\\]*?(?=\\)': 'c' > 'C'
in '(?<=\\sc\s)[^\\]*?(?=\\)': 'd' > 'D'
in '(?<=\\sc\s)[^\\]*?(?=\\)': 'e' > 'E'
in '(?<=\\sc\s)[^\\]*?(?=\\)': 'f' > 'F'
in '(?<=\\sc\s)[^\\]*?(?=\\)': 'g' > 'G'
in '(?<=\\sc\s)[^\\]*?(?=\\)': 'h' > 'H'
in '(?<=\\sc\s)[^\\]*?(?=\\)': 'i' > 'I'
in '(?<=\\sc\s)[^\\]*?(?=\\)': 'j' > 'J'
in '(?<=\\sc\s)[^\\]*?(?=\\)': 'k' > 'K'
in '(?<=\\sc\s)[^\\]*?(?=\\)': 'l' > 'L'
in '(?<=\\sc\s)[^\\]*?(?=\\)': 'm' > 'M'
in '(?<=\\sc\s)[^\\]*?(?=\\)': 'n' > 'N'
in '(?<=\\sc\s)[^\\]*?(?=\\)': 'o' > 'O'
in '(?<=\\sc\s)[^\\]*?(?=\\)': 'p' > 'P'
in '(?<=\\sc\s)[^\\]*?(?=\\)': 'q' > 'Q'
in '(?<=\\sc\s)[^\\]*?(?=\\)': 'r' > 'R'
in '(?<=\\sc\s)[^\\]*?(?=\\)': 's' > 'S'
in '(?<=\\sc\s)[^\\]*?(?=\\)': 't' > 'T'
in '(?<=\\sc\s)[^\\]*?(?=\\)': 'u' > 'U'
in '(?<=\\sc\s)[^\\]*?(?=\\)': 'v' > 'V'
in '(?<=\\sc\s)[^\\]*?(?=\\)': 'w' > 'W'
in '(?<=\\sc\s)[^\\]*?(?=\\)': 'x' > 'X'
in '(?<=\\sc\s)[^\\]*?(?=\\)': 'y' > 'Y'
in '(?<=\\sc\s)[^\\]*?(?=\\)': 'z' > 'Z'

Dec 21, 2017 answered by lichti (864 points)

Hi CrazyRocky & anon719148

You can have multiple find/replaces stored in a file and run an
expression that points to that file in RegExPal. The syntax is to use
"–>"and then point to the filename with the replaces in the find box.
Similar to the following:

–>\Users\HastyDR\Desktop\smallcaps.txt

Note: I don’t think it works if the path name contains any spaces, which
is why I chose to place the file on my desktop.

In that file place statements similar to the following for all lower to
uppercase combinations in your language.

(?<=\\sc\s[^\\]*?)a-->A
(?<=\\sc\s[^\\]*?)e-->E
(?<=\\sc\s[^\\]*?)i-->I

Once the file is created, run RegExPal and place -->filenamein the find
box and run as a replace. Note that this kind of replace ignores any
text that may be in the replace dialog box.

See the sample in the following screen capture of the RegExPal screen below.

The way this change works is that it stops on the first match in a
chapter and selects the rest of the chapter. You will see each of the
changes in the replacement highlighted in pink. Notice the lower case
vowels in the highlighted yellow become uppercase in the highlighted pink.

Note that it changes the actual text in the project. Undo the changes by
switching the cases on the match and replace to get back to the original
encoding.

D anon467281

Global Publishing Services
Scripture Typesetting trainer & Regular Expression “specialist”
Dallas, TX

Jan 1, 2018 commented by anon467281 (571 points)
Jan 1, 2018 reshown

lichti · Answer 1 · 2017-12-21T07:48:04+0000

This seems to be wholly undocumented. Where can we get more info about it?
(Replying here, but this should be a topic on its own.)

LivingField · Answer 2 · 2022-11-08T22:42:50+0000

I did some testing since this seems like it could be a useful feature. But it is seems very buggy to me. In my tests, however, there were times when using the Unicode Character Numbers seemed to work better (use BabelPad to convert), e.g.:

(?<=\xt\s[^\]?)\u092F\u0942\u0939–>यूहन्‌ना

But I was never ever able to get RegExPal to Find both lines; It seemed that it would search the top line only if “baba” doesn’t exist.

Phil_Leckrone · Answer 3 · 2022-11-09T14:44:44+0000

Here is another post on the use of the list in RegEx Pal.

All Cap text and undocumented RegEx

Please log in or register to answer this question.

4 Answers

Please log in or register to add a comment.

Please log in or register to add a comment.

Please log in or register to add a comment.

Please log in or register to add a comment.

Related questions

Categories