Expressions for RegEx Pal

Question

Is there are way to get the expressions on the User menu in RegEx Pal as a text file? The list of user expressions is very long, and it would be useful to be able to search it, for example.

I’ve had problems adding my own expressions: it seems that you’re allowed to add a search/replace pair of expressions, but when I did this a while back, the replace expression got munged.

A question I have about regexes is: what is the definition of a word-forming character (\w)? Does it include every alphabet, or only Latin? What’s the best way to search for characters in one alphabet only? For Cyrillic, I could keep the string [абвг…АБВГ…] in a test file, and copy it into a regex, but that would result in some very long strings. Is there not a better way? Can you take a step back and define strings that can be referenced in a regex with codes as short as \w ?

Despite the large number of user expressions that are included, I’m sure there are loads more that users might like. For some things I’m working on, I could do with one to replace straight quotes with curly quotes, and choose the right one for opening or closing a quotation.

Maybe there could be two categories added to this forum: one for questions like “somebody help me write an expression to do this”; and another where poeple could post expressions that have been tested on real project data so that they’re watertight. The latter could help us build a library of useful expressions.

Paratext Sep 10, 2015 asked by wdavidhj (1.4k points)

10 Answers

I have lots of expressions added too my usermenu.txt file that I use
for training. However, I have not recently validated them and I’m midway
through cleaning up my file and have not completed this. I am teaching
next week, but after that I will try to clean them up.

I am looking to make sets of markers in separate files that can be added
to a usermenu as desired by an end user. I will also explain a bit about
the syntax of the usermenu file.

Once done I will post them as a file(s) in a Typesetters Community of
Practice (TCoP) google group. I am trying to make them sort of self
documenting. I have over 150 menu selections in groups like:

[VARIANT TEXT] (IMPLIED TEXT)
POSSIBLE TRANSLITERATIONS to get multiple words (up to 30 chars)
REMOVE space after the [^
INTROS \mt, \toc and OUTLINES
DASHES BETWEEN #'s versus WORDS
SCRIPTURE REF PREPARATION
BOOK NAMES
CH/VS SYNTAX
MARKERS
\r PARALLEL PASSAGE
QUOTES MARKS
\f FOOTNOTE BEFORE MAKING CHANGES MARK POINT IN PROJECT HISTORY IN
PARATEXT FIRST
\x XREFs

These all help with analysis, cleanup, and common conversions.

For example: footnotes/xrefs have to do with cleanup and standardizing
the syntax.

I guess these a teaser of something that could be available V in October.

anon467281
Global Publishing Services
Scripture Typesetting trainer & Regular Expression Specialist
Dallas TX

Sep 11, 2015 commented by anon467281 (571 points)

I don’t have either of these files in 7.5, and only the latter in 7.6.

I do have “c:\Program Files (x86)\Paratext 7\ParatextRegExPal\userMenuStd.txt” in both, but no userMenu.txt file in that folder.

And those files that I list contain very little text (20 lines, excluding blank lines) – nowhere near what was in the User menu when I first installed PT. Or maybe those 20 lines are the default set?

So is there something strange with my PT installation? Where is it getting the 100+ items it displays on the User menu?

Sep 17, 2015 commented by wdavidhj (1.4k points)

Doesn’t look like the attachment was added. I have placed the contents of my UserMenu.txt file after the line of equal signs at the bottom of this reply.

Here’s the latest. I am still not totally finished testing out all of the selections.

D anon467281

Global Publishing Services
Scripture Typesetting trainer & Regular Expression "specialist"
Dallas, TX

———CH/VS SYNTAX—————————#f#
———\v VS NUMBER SYNTAX—————————#f#
CV-01 –IlO— - find invalid characters in \v verse number (found letters I l O endash emdash)#ei#\v\s?\S*:::\S*([–IlO—]|–+)\S*
CV-02 L->1 - change letter lowercase “L” and uppercase “I” in verse number to number “1”#r#\v\s?\S+:::l#1
CV-03 o->0 - change letter uppercase “o” in verse number to number “0”#r#\v\s?\S+:::O#0
CV-04 en/em->dash - change en/em dash and duplicate dashes verse range character in verse number to singledash “-”#r#\v\s?\S+:::[\u2013\u2014]|–+#-
———CH:VS SEPARATOR—————————#f#
CV-11 a- ANALYZE ch:vs separators#csd#\d[:.] ?\d
CV-12 :- STANDARDIZE to : colon ch:vs separators#r#(\d)[:.] ?(\d)#\1:\2
CV-13 .- STANDARDIZE to . period/full stop as ch:vs separators#r#(\d)[:.] ?(\d)#\1.\2
———CHAPTER SEPARATORS “;”—————————#f#
CV-21 ; c- COUNT missing chapter separator#csd#([:.]\d+)( \d+[:.]\d+)
CV-22 ; a- ADD missing chapter separator#r#([:.]\d+)( \d+[:.])#\1;\2
CV-23 9;9 c- COUNT missing spaces after chapter separators “;”#csd#(\d+[:.][\d, \p{Pd}abc\p{Lu}]+)(\d?;)\d[^\]+
CV-24 9; 9 a- ADD missing space after chapter separator “;”–>"; "#r#(\d+[:.][\d, \p{Pd}abc\p{Lu}]+)(\d?;)(?=\d[^\]+)#\1\2
———CHAPTER BRIDGE—————————#f#
CV-31 a- ANALYZE chapter bridge separators#csd#(\d+[.:][\dabc]+(, ?)?[\dabc])\s[\p{Pd}]+\s*(\d+[.:]\d)
CV-32 n- STANDARDIZE to N-dash (u2013) as chapter bridge separator#r#(\d+[.:][\dabc]+(, ?)?[\dabc])\s[\p{Pd}]+\s*(\d+[.:]\d)#\1\u2013\3
CV-33 m- STANDARDIZE to M-dash (u2014) as chapter bridge separator#r#(\d+[.:][\dabc]+(, ?)?[\dabc])\s[\p{Pd}]+\s*(\d+[.:]\d)#\1\u2014\3
———BOOK SEPARATORS/BRIDGES—————————#f#
BS-41 ; L- List invalid book sep - then run CV-42 1st then CV-43#edi#(([12] ?)?\p{Lu}[\w/~]+ \d+[:.][\d,-—a-f]+)[^;\p{Pd}] ?([\w/~]+ \d+([:.][\d,-—a-f]+))
BS-42 ,_ 1st- DO FIRST-Change book separator from ", " -> "; "#r#([123] ?)?(\p{Lu}[\w/~]+ \d+[:.][\d,-—a-f]+), ?(?=([123] ?)?\p{Lu}[\w/~]+ \d+([:.][\d,-—a-f]+))#\1\2;
BS-43 ,|_ 2nd- DO SECOND-Add "; " missing book separator#r#([123] ?)?(\p{Lu}[\w/~]+ \d+[:.][\d,-—a-f]+) (?=([123] ?)?\p{Lu}[\w/~]+ \d+([:.][\d,-—a-f]+))#\1\2;
BS-44 +; - insert missing ; bk sep between vs-no. and \xdc#r#(?<=\x .?\x(dc|ot|nt|t) ).?\x*:::(\d+)( \x(dc|ot|nt|t))#\1;\2
BB-45 – - List all book bridges#edi#(?<=[12] ?\p{Lu}[\w]* \d+[:.][\d,-a-f]+)\p{Pd}+(?=[12] ?\p{Lu}[\w]+ \d+[:.])
BB-46 – - Make all book bridges be – en-dash (u2013) as in 2Sa 9.9–1Ki 9.9#r#(?<=[12] ?\p{Lu}[\w]* \d+[:.][\d,-a-f]+)\p{Pd}+(?=[12] ?\p{Lu}[\w]+ \d+[:.])#\u2013
BB-47 — - Make all book bridges be — em-dash (u2014) as in 2Sa 9.9—1Ki 9.9#r#(?<=[12] ?\p{Lu}[\w]* \d+[:.][\d,-a-f]+)\p{Pd}+(?=[12] ?\p{Lu}[\w]+ \d+[:.])#\u2014
———VERSE SEPARATOR—————————#f#
CV-51 c- ANALYZE verse separators ", " or “,” inside ch/vs refs–Are there more ", " or “,”? Go with majority.#cd#[:.]([\d, -:.abc]+):::,.
CV-52 ,- REMOVE space in ", " vs. sep.#r#[:.]([\d, \p{Pd}:.abc]+):::,\s#,
CV-53 , - ADD space after “,” in vs. sep.#r#[:.]([\d, :.abc]+):::,(?=\S)#,
———DASHES BETWEEN #'s—————————#f#
BV-61 - list verse bridges#csd#[\S-[(]]+\d[-]\d[\S-[\);]]+
BC-62 - list chapter bridges#csd#\S+\d[\u2013]\d[\S-[\);]]+

x

———MARKERS: ADD MISSING 1 TO LEVEL 1————————#f#
M-01 - identify level markers—such as \q,\q1,q2—in order to see level-1 inconsistencies#cs#(\(q|qm|li|mt|mte|ms|s|imt|is|iq|io))(\b|\d)
M-02 - change \q to \q1 (only if there are \q2)–modify to \mt and \s and rerun as needed#r#(\q\b)#\11
M-03 - Move section head & parallel passage ref from BEFORE to AFTER chapter numbers#r#(\c .?\r\n)(\s .?\r\n)(\r .?\r\n)?#\2\3\1
M-04 - Move section head & parallel passage ref from AFTER to BEFORE chapter numbers#r#(\s .?\r\n)(\r .?\r\n)?(\c .?\r\n)#\3\1\2

x

———DASHES BETWEEN WORDS—————————#f#
D-01 - analyze dashes between words#cs#[\p{Pd}]+
D-02 - analyze dashes between numbers#cs#(?<=\d)[\p{Pd}]+(?=\d)

x

———QUOTE MARKS—————————#f#
QT-00 all - view common QUOTATION marks#cu#[’<">\p{Pi}\p{Pf}]
QT-01 seq - list quote mark sequences (check for Valid/Invalid white space)#csd#[`’<">\p{Pi}\p{Pf}]+\s*[’<">\p{Pi}\p{Pf}]*
QT-02 mid - quotes used mid-word#csd#(?<=[\p{L}\p{M}])’<">\p{Pi}\p{Pf}
———LEGACY ENCODING—————————#f#
QT-10 <<<? - <<< must change these manually to either << < or < << BEFORE converting to curly quotes#f#<<<|>>>
QT-11 << to “#r#<<#“
QT-12 >> to ”#r#>>#”
QT-13 < to ‘#r#<#‘
QT-14 > to ’#r#>#’
——FIX DOUBLE QUOTES ENTERED AS “—————————#f#
QT-20 " - view open double inch mark#cs#(.)”([^\\s])|" (’)
QT-21 “->“ - fix open double inch mark#r#(.)”([^\\s])|" (’)#\1“\2
QT-30 " - view close double inch mark#cs#(.)"
QT-31 “->” - fix close double inch mark#r#(.)”#\1”
——FIX GLOTTALS ENTERED AS APOSTROPHES ꞌ—————————#f#
QT-40 a’a - find midword apostrophe/curly close#cs#(\w)’’
QT-41 ‘->ꞌ - chg midword apostrophe/curly close to curly close#r#(\w)’’#\1\uA78C\2
——FIX QUOTES ENTERED AS ‘—————————#f#
QT-50 ‘’=’ - apostrophes that are quote marks#cs# ‘([^’]?)’(\W)
QT-51 ‘’->’ - change apostrophes behaving like single quotes to single quotes#r# ‘([^’]?)’(\W)# ‘\1’\2
QT-52 ‘->‘ - convert word initial straight apostrophe ’ to open curly quote ‘#r#(\s)’(\p{L}*)#\1\u2018\2
——SPACES IN BETWEEN—————————#f#
QT-60 “ ‘ - add space between “‘#r#“‘#“ ‘
QT-61 “ ‘ - add space between ’”#r#’”#’ ”
QT-62 “ ‘ - add space between ‘“#r#‘“#‘ “
QT-63 “ ‘ - add space between ”’#r#”’#” ’
———APOSTROPHES—————————#f#
AP-70 wd’wd - display mid-word straight apostrophe ’ words#csd#(\p{L}+)’(\p{L}+)
AP-71 ‘->ʼ - convert mid-word straight apostrophe ’ to curly apostrophe ʼ \u02bc#r#(\p{L}+)’(\p{L}+)#\1\u02bc\2
AP-71 wd’->ʼ - convert word ending straight apostrophe ’ to curly apostrophe ʼ \u02bc#r#(\p{L}+)’(\W)#\1\u02bc\2

x

———\f FOOTNOTE————————— BEFORE MAKING CHANGES MARK POINT IN PROJECT HISTORY IN PARATEXT FIRST —————————#f#
.have you discovered and changed footnotes (\f) that are really cross refs to \x markup?#f#
.if not, under ———\x XREF——— below, run ——IS FOOTNOTE AN XREF?—— steps first.#f#

x

  ——CALLER ID—————————#f#

A \f + examine \f caller ids (prefer +)#csd#\f [^\ ]+
B add missing space after fn caller#r#(\f \S+)(\\w+)#\1 \2
C make \f caller + (auto generated) for all#r#(?<=\f )[^\+ ]+ ?#+
D1 \fr find original references missing \fr #f#(?<=\f \S )([\d:.,\p{Pd}a-d]+)
D2 \fr add missing \fr #r#(?<=\f \S )([\d:.,\p{Pd}a-d]+)#\fr \1
D3 no \fr find missing \fr and missing origin ref#f#(?<=\f \S )\f[^r]
D4 no \fr 9.9 add missing \fr with origin reference cv-sep . and ending :#r#(?s)(\c )(\d+)(.?)(\v )(\S+)([^\r]\f \S )(\f[^r])#\1\2\3\4\5\6\fr \2.\5: \7

x

  ——REMOVE UNNEEDED FOOTNOTE MARKERS AND SPACES—————————#f#

E \f?* remove unneeded embedded close markers followed by open embedded marker#r#\f[a-uw-z]*(\f.)(?!*)#\1

E \f?* remove unnecessary footnote closing markers (keep \f)#r#\f[\w-[iv]]+#

F \f?…\f? remove repeated \f? (duplicate with text in between)#r#(\f\w )([^\])\1(([^\])\1)?#\1\2\4
G sp\f* remove space from end of \f* closing marker#r# (\f*)#\1
H sp\f remove space before a footnote#r#(?s)(?<!\v \S+)\s+(\f\s)#\1
L \fk…\ft -> …\fk* replace closing fnote key markup with closing " \fk*"#r#(?<=\fk [^\])(\S) ?(\\S+)( \ft)?#\1\fk

x

  ——ORIGIN REF \fr—————————#f#

I \fr 9.9\ add missing ending space to end of \fr#r#(\fr \S[^\ ]\S)(\)#\1 \2
J1 9.9? examine \fr ch:vs syntax (just stuff before the following #csd#\fr \d+\D[^ \] +
J2 CV : make \fr ch/vs separator : (colon)#rd#(\fr \d+)[^:\d]([^ :\]+)#\1:\2
J3 CV . make \fr ch/vs separator . (period/full stop)#rd#(\fr \d+)[^.\d]([^ \]+)#\1.\2
K \f x \f*… examine footnote marker patterns “\f + \fr … \ft … \f*”#csn#\f .*?\f*

x

———SCRIPTURE REF PREPARATION—————————#f#
———BOOK NAMES——————possible short name and abbreviations for Scripture Reference Settings… in Paratext 7#f#
.Since \TOC2 most often matches \h, you are looking for abbrev. to use in \TOC3.#f#
.Extract pos

May 12, 2016 commented by anon467281 (571 points)
May 12, 2016 reshown

I wonder where I got this userMenu.txt – maybe it’s of use to some readers, since it seems to cover different ground to anon467281’s one.

It starts after the line of equal signs at the bottom of this reply.

================================================
1-Create TOC1 for 4 part book titles#r#\mt(\d* )(.?)\r\n\mt(\d )(.?)\r\n\mt(\d )(.?)\r\n\mt(\d )(.?)\r\n#\toc1 \2 \4 \6 \8\r\n\a\1 \2\r\n\a\3 \4\r\n\a\5 \6\r\n\a\7 \8\r\n
2-Create TOC1 for 3 part book titles#r#\mt(\d )(.?)\r\n\mt(\d )(.?)\r\n\mt(\d )(.?)\r\n#\toc1 \2 \4 \6\r\n\a\1 \2\r\n\a\3 \4\r\n\a\5 \6\r\n
3-Create TOC1 for 2 part book titles#r#\mt(\d )(.?)\r\n\mt(\d )(.?)\r\n#\toc1 \2 \4\r\n\a\1 \2\r\n\a\3 \4\r\n
4-Create TOC1 for 1 part book titles#r#\mt(\d )(.?)\r\n#\toc1 \2\r\n\a\1 \2\r\n
5-Create TOC1 - restore \mt’s from temporary \a’s#r#\a(?<=\d)#\mt
7-cleanup TOC1 extra spaces#r#( +)#
8-Create TOC2 and TOC3#r#(\h )(.?)(\r\n)(\toc1.?\r\n)#\1\2\3\4\toc2 \2\3\toc3 \3
9-swap \toc1 & \toc2 contents & add empty \toc3#r#(\toc1 )(.?\r\n)(\toc2 )(.?\r\n)#\1\4\3\2\toc3 \r\n
10-create \toc2 from \h & an empty \toc3#r#(?s)(\h )(.?)(\r\n)(\.?)(?=\mt)#\1\2\3\4\toc2 \2\3\toc3 \3
99-remove TOC’s#r#\toc\d.?\r\n#
————————————#f##
11-extract \r booknames (to be \toc2)#cu#(?<=\r ((|.?; ))[123][\p{L} ]{2,99}
12-extract \f, \ft book names (to be \toc3)#csu#(?<=\f + )(\d |\p{L})[^\;\d\s]|(?<=\f + (\w |\p{L})[^\;\d\s][^\;]; )(\w )\w{3,99}|(?<=\ft [^\]{1,40}; )(\d )\p{L}{3,99}|(?<=\ft )(\d )\p{L}{3,99}(?= \d)
Find lines that do not start with backslash code#f#\r\n[^\]
Find close codes preceded by a space# \\w+* ?
Find long poetic lines#f#\q(\s+\v )?[^\r]{70,}
Find non-word characters before footnote callers#f#[^\w]\f\s
Find missing capitals after period#f#.\s+["¿¡’?][a-z]
Count SFM clusters#c#\\S+
Count all cap words#cr#\b[A-Z][A-Z]+\b
Count footnote marker patterns#cni#\f .\f*
Count cross reference marker patterns#cni#\x .\x*
Count book reference abbreviations#c#[A-Z]\w\w?.(?=\s\d)
Count verse number patterns#ci#\v\s+\S+
Count chapter/verse patterns#crd#\d[-\d.:;, ]+
Extract and sort all lines#es#\.*
Extract outlines#e#(\id …|\io.)
Replace missing space after \v#r#\v(\d)#\v \1
Reformat paragraphs#r#\r\n(?!\)#
Extract (…)#e#([^)]+)|<[^>]+>
Extract parallel refs#ei#\r .
Extract cross references#e#\x .\x*
Extract all footnotes#e#\f .\f*
Change verse bridge , to -#r#(\v \d+),\s|,#\1-\2
Remove italics in intros#r#\ip \it (.)\it*#\ip \1
Convert hyphen to n-dash in chapter range#r#(.\d+)-(\d+.)#\1–\2
Add ID info#r#(\id …).\r\n#\1 - ??? NT [???] -Papua New Guinea 19?? (web version -2013 bd) \r\n
Add DBL ID info#r#(\id …).\r\n#\1 - ??? NT -country 19?? (DBL -2013)\r\n
Add tocs from \h \mt1#r#\h (.)\r\n\mt1 (.)\r\n#\h \1\r\n\toc1 \2\r\n\toc2 \1\r\n\toc3 \r\n\mt1 \2\r\n
Add tocs from \h \mt1 \mt2#r#\h (.)\r\n\mt1 (.)\r\n\mt2 (.)\r\n#\h \1\r\n\toc1 \2 \3\r\n\toc2 \1\r\n\toc3 \r\n\mt1 \2\r\n\mt2 \3\r\n
Add tocs from \h \mt2 \mt1#r#\h (.)\r\n\mt2 (.)\r\n\mt1 (.)\r\n#\h \1\r\n\toc1 \2 \3\r\n\toc2 \1\r\n\toc3 \r\n\mt2 \2\r\n\mt1 \3\r\n
Add tocs from \h \mt2 \mt1 \mt2#r#\h (.)\r\n\mt2 (.)\r\n\mt1 (.)\r\n\mt2 (.)\r\n#\h \1\r\n\toc1 \2 \3 \4\r\n\toc2 \1\r\n\toc3 \r\n\mt2 \2\r\n\mt1 \3\r\n\mt2 \4\r\n
Add tocs from \h \mt1 \mt2 \mt1#r#\h (.)\r\n\mt1 (.)\r\n\mt2 (.)\r\n\mt1 (.)\r\n#\h \1\r\n\toc1 \2 \3 \4\r\n\toc2 \1\r\n\toc3 \r\n\mt1 \2\r\n\mt2 \3\r\n\mt1 \4\r\n
Add tocs from \h \toc1 \mt1#r#\h (.)\r\n\toc1 (.)\r\n\mt1 (.)\r\n#\h \1\r\n\toc1 \3\r\n\toc2 \2\r\n\toc3 \r\n\mt1 \3\r\n
Add tocs from \h \toc1 \toc2 \mt1#r#\h (.)\r\n\toc1 (.)\r\n\toc2 (.)\r\n\mt1 (.)\r\n#\h \1\r\n\toc1 \4\r\n\toc2 \2\r\n\toc3 \3\r\n\mt1 \4\r\n
Add tocs from \h \toc1 \toc2 \mt1 \mt2#r#\h (.)\r\n\toc1 (.)\r\n\toc2 (.)\r\n\mt1 (.)\r\n\mt2 (.)\r\n#\h \1\r\n\toc1 \4 \5\r\n\toc2 \2\r\n\toc3 \3\r\n\mt1 \4\r\n\mt2 \5\r\n
Change Mdash to Ndash#r#(\d)—(\d)#\1–\2
Change \qr > \rq…\rq#r#\r\n\qr (.)\r\n# \rq \1\rq\r\n
Extract \r booknames#cu#(?<=\r ((|.?; ))[123][\p{L} ]{2,99}
Convert \ft to \x#r#(\f )(+ )(\fr )(\S+ )(\ft )(?!Kiñeke |Tati )([^\]\d)( LXX)?.?\f*#\x * \xo \4\xt \6\7\8.\x*
Convert \f to \x where book abbrevs are less than 4 letters long#r#(\f )(+ )(\fr )(\S+ )(\ft )(?!([a-z]|[A-Z]){4})([^\]{1,50}\d).?\f*#\x * \xo \4\xt \6\7\8\x*
Convert \f to \x where the lines are less than 50 chars#r#(\f )(+ )(\fr )(\S+ )(\ft )([^\]{1,50}\d).?\f*#\x * \xo \4\xt \6\7\8.\x*
——— most common DBL checks——————#f##
* \mt --> \mt1 #r#(\mt\b)#\11
* \q --> \q1 #r#(\q\b)#\11
* Find repeated \s(9) and \r (repeated marker used for a line break?)#cd#\(s\d?|r) .+\r\n\\1
* Replace repeated \s(9) or \r marker with a space#r#(\[rs]\d? [^\r]?)\s\r\n\s*(?=[^\])#\1
* Find line break in text following a \s or \r#ei#\[rs]\d? [^\r]?(?=\r\n[^\$])
* Femove hard line break in text of \s or \r#r#(\[rs]\d? [^\r]?)\s*\r\n(?=[^\])#
———White Space———#f##
* CNT _ \f*_ SPACE before closing note marker#c# +\[fx]+*
DELE \f* #r# +(?=\[fx]+*)#
* CNT _ \r_ SPACE before linebreak#c# +(?=\r)
DELE \r #r# +(?=\r)#
* CNT _ _ LINE INITIAL SPACE before any SFM#c#(?<=\r\n) +
DELE __ #r#(?<=\r\n) +#
* CNT “\r\nA” HARD LINE BREAKS in marker text#c# ?\r\n (?=[^\])
REPL " A" #r#\s\r\n(?=[^\])#
* CNT ~ PARATEXT NOBREAK SPACE)#c#~
REPL space #r#~#
* CNT // PARATEXT SOFT RETURN#c#\s*//\s*
REPL space #r#\s*//\s*#
* CNT \u00A0 & ~ NOBREAK SPACES#c#[\u00a0~]
REPL space #r#[\u00a0~]#
———\f FOOTNOTE—————————#f##
* Examine footnote marker patterns [\f + \fr … \ft … \f*]#csn#\f .?\f*
* Extract all footnotes#e#\f .?\f*
* Examine \f callers (prefer +)#c#\f \S+
* Make fn caller + (when it is something else)#r#(?<=\f )[^+]\S+ #+
* Find \f callers with missing space before next #c#\f \S+\\w+
* Examine \fr ch:vs patterns#cd#\fr [^\]*
* Find \fr with missing space before next #c#\fr \S+\
* Add missing \fr when reference already exists#r#(?<=\f + )([\d:.,-a-z]+)#\fr \1
* Examine how fn ends (with or without a “.”)#csd#.\f*
* Add the missing . when fn ends with mostly “.\f*” #r#([^.])(?=\f*)#\1.
* Remove space at end of footnote " \f*"–> “\f*#r# (\f*)#\1
* Count footnotes that are NOT after a word#c#[^\p{L}\p{M}]\f\s
———\r PARALLEL PASSAGE—————————#f##
* Count standalone \r’s versus \s \r sequences\r#ei#(?<!\s\d? .\r\n)\r .(?=\r\n)
* Remove hard line break from \r#r#(?<=\r .?)\r\n([^\r])# \1
———\x XREF—————————#f##
* Extract all cross references#e#\x .?\x*
* Examine cross reference marker patterns [\x + \xo … \xt … \x*]#csn#\x .?\x*
* Examine \x callers#c#\x \S+
Make xref caller + (when it is something else)#r#(?<=\x )[^+]\S #+
Make xref caller - (when it is something else)#r#(?<=\x )[^-]\S* #-
* Find \x callers with missing space before next #c#\x \S+\\w+
* Examine \xo ch:vs patterns#cd#\xo [^\]*
* Examine how \xt’s end (with or without a “.”)#csd#(?<=\xt [^\]).\x*
Remove space at end of xref " \x”–> “\x*#r# (\x*)#\1
———DASHES BETWEEN #‘s VS WORDS—————————#f##
* Find dashes#cu#[–-—\u2011]+
* Find dashes between numbers#cu#(?<=\d)[–-—\u2011]+(?=\d)
———QUOTES MARKS—————————#f##
common QUOTATION marks#cu#[’<”>\p{Pi}\p{Pf}]
list quote mark sequences (check for Valid/Invalid white space)#csd#[’<">\p{Pi}\p{Pf}]+\s*[’<">\p{Pi}\p{Pf}]+
quotes used mid-word#csd#(?<=[\p{L}\p{M}])’<">\p{Pi}\p{Pf}
<<< must change these to either << < or < << BEFORE converting to curly quotes#f#<<<
<< to “#r#<<#“
>> to ”#r#>>#”
< to ‘#r#<#‘
> to ’#r#>#’
add space between “‘#r#“‘#“ ‘
add space between ‘“#r#‘“#‘ “
add space between ”’#r#”’#” ’
add space between ’”#r#’”#’ ”
———INTROS \mt, \toc and OUTLINES—————————#f##
review outlines#e#(\id …|\io.)
Show outlines that don’t start with a \iot (maybe a \is)#ei#\r\n\[^i][^o][^t].?\r\n\io1.*
extract refs missing \ior at end of outline#ei#\io\d? .:::(\S+\d(?=\r))
find \ior type references in outlines#csd# \io\d? .\r:::[(]?(\ior )?(\d[\d.:-\u2013\u2014,abc]+)(\ior*)?[)]?
find \ior type references in outline without closing )#csd#\io\d .\r\n:::(\d+[;.]\d+[\d.:abc-\u2013\u2014]+)\r\n
add MISSING \ior markup around references in outlines#r#\io\d .:::(\S+\d+)(?=\r)#\ior \1\ior*
add MISSING ( ) to outline references#r#(?<=\io\d ).\r\n:::[(]?((\d+[;.]\d+[\d.:abc-\u2013\u2014]+)|\d+(-\d+)?)\r\n#(\1)\r\n
———ETEN (not found through ParaTExt checks————————#f##
unmarked text following a chapter (avoids schema

Jun 16, 2016 commented by wdavidhj (1.4k points)
Jun 16, 2016 reshown

Related questions

0 votes

3 answers

RegEx Pal: using regular expressions to do full search and replace, and more

Paratext Jun 5, 2019 asked by wdavidhj (1.4k points)

+1 vote

5 answers

Using a list for replacing in RegEx Pal

Paratext Mar 2, 2022 asked by Phil_Leckrone (8.8k points)

0 votes

3 answers

Paratext RegEx Pal for Paratext 8?

Paratext Mar 15, 2018 asked by SIL LSS PNG (411 points)

0 votes

1 answer

RegEx Pal in PTX9

Paratext Aug 23, 2021 asked by anon180868 (190 points)

0 votes

8 answers

Is there a RegEx Pal string we can use to find all \pn or \nd inside of a \ft

Paratext Aug 20, 2018 asked by MSEAIT_LT (476 points)

Phil_Leckrone · Answer 1 · 2015-09-10T17:50:34+0000

NOTE: Some things that work in one tool don’t work in another tool. Some regex features that work in RegEx Pal do not work in Paratext searches.

I’m not sure about expressions getting “munged”.

\w Matches any word character. Equivalent to the Unicode character categories [\p{Ll}\p{Lu}\p{Lt}\p{Lo}\p{Nd}\p{Pc}]. If ECMAScript-compliant behavior is specified with the ECMAScript option, \w is equivalent to [a-zA-Z_0-9].

\p{name} Matches any character in the named character class specified by {name}. Supported names are Unicode groups and block ranges. For example, Ll, Nd, Z, IsGreek, IsBoxDrawing.

So for Cyrillic you could search for \p{Cyrillic}

The following website gives a list of the groups and block ranges: http://www.regular-expressions.info/unicode.html

Sep 10, 2015 answered by Phil_Leckrone (8.8k points)
Sep 10, 2015 reshown

The website you reference says:

But neither PT nor RegEx Pal seem to work without the “Is” prefix on the group name.

[quote=“anon848905, post:2, topic:852”]
The following website gives a list of the groups and block ranges: http://www.regular-expressions.info/unicode.html[/quote]

The bit that gives you that list is quite a way down the page at http://www.regular-expressions.info/unicode.html#script (far enough down that I thought I was on an introductory page, and went on a wild goose chase elsewhere in the site, looking for the list – maybe I was just tired .) The sections immediately above and below this section are also worth reading.

That site has very good explanations of the various regex codes. But for the list of scripts, I found site with more – with the character ranges also listed: https://msdn.microsoft.com/en-us/library/20bw873z(v=vs.110).aspx .

BUT … what about restricting it to the alphabet for a particular language? Can I define my own named class for our language somehow? Or would I need some third party regex app like RegexBuddy to do that?

Sep 17, 2015 commented by wdavidhj (1.4k points)

Pretty much every language has a different alphabet: English has 26 letters, and they all fall in two contiguous blocks in most character sets (EBCDIC excluded). Polish uses K, Q & X only for loan words, but hase 9 extra letters, so if you’re looking for pure Polish words, you have 8 blocks from the Latin 26-letter alphabet (lower- and upper-case), and 18 individual letters, so your set becomes:

[a-jl-pr-wyząćęłńóśżźA-JL-PR-WYZĄĆĘŁńŃÓŚŻŹ]

Now imagine that you write a regex where that set is used many times – it will become very unwieldy.

So, when I’m talking about sets for a particular language, I’m talking about tight ones, ones that exclude characters that are not used. For our (Cyrillic-alphabet) language, the set would look similar to the above, and I could write a different one for every language that uses Cyrillic.

Sep 17, 2015 commented by wdavidhj (1.4k points)

I presume that you can add both Find and Find-and-replace items to the User menu. Just now tried to add:

\[typed in Find box\]

… and got the error “Invalid userMenu.txt file entry when I tried to use it.

If I try to add a Find-and-replace item and then use it, Pal takes me back to the Find function and enters a completely different string in the box to the one I saved.

Sep 17, 2015 commented by wdavidhj (1.4k points)
Sep 17, 2015 reshown

anon716631 · Answer 2 · 2018-03-15T11:04:22+0000

While we wait for the gurus, perhaps an easy way to do that would be in Paratext directly rather than RegexPal. There is a tick box in the Paratext Find menu which restricts searches to the text. Click on the More box and then tick “Match only in Verse Text”. Then just do a replace for \it > \add (no space or * will do it one pass and it should be safe since it has )

I would test it past a known footnote \it location, though, I’ve never actually used that tick and so can’t quite attest to its exact behavior.

Blessings,

Shegnada

Language Technology and Publishing Coordinator, SIL Nigeria

Text Processing Specialist – Complex Script, GPS, SIL Intl

Skype: Shegnada..

[Email Removed]

+1 972 974 8146

Mar 15, 2018 commented by Shegnada (1.3k points)
Mar 15, 2018 reshown

Phil_Leckrone · Answer 3 · 2018-03-15T21:55:46+0000

Sorry again. The * after the period got clobbered.

The format should be:

\\f .*?\\f\:::\\add

So if you want to find the \it you could search for:

\\f .*?\\f\:::\\it

Phil_Leckrone · Answer 4 · 2018-03-16T10:38:54+0000

I give up! I notice that the * before the colon got clobbered too

\\f .*?\\f\*:::\\add

I guess the important part to notice is that you can use ::: to separate what is being searched for (on the right) from a context (on the left).

Mar 16, 2018 answered by Phil_Leckrone (8.8k points)

Show 7 previous comments

I was converting it from a
very small print 2-page landscape document to 4-page portrait.
Should be a little easier to read. Not sure if it is valid to
call a 4-page document a cheat sheet, but it lives on as such.

      I have attached the word

document & PDF of the Regular Expression Cheat sheet.

Mar 26, 2018 commented by anon467281 (571 points)
Mar 26, 2018 reshown

Somehow the attachment did not
make it on the copy of the email to [PT Support Site]. So I
have created a google doc & pdf at the following links:

PDF: https://drive.google.com/open?id=1bXP0j6Mv_UrD678QGXCdlp6XzdjqUUhX

    DOC:

  Everyone with an SIL/Wycliffe email

should be able to access these links. If you do not have an
SIL/Wycliffe email let me know if clicking on the link works or
fails.

Mar 27, 2018 commented by anon467281 (571 points)
Mar 27, 2018 reshown

Fool Running · Answer 5 · 2018-03-27T21:08:49+0000

This site only allows files with the following extensions: jpg, jpeg, png, gif, zip

I guess when sending a response via e-mail you don’t get a message about that restriction.

Kent Spielmann · Answer 6 · 2018-08-31T19:31:47+0000

Here is some RegEx code in userMenu.txt format you may find useful:
———Contexts—————————#r##
In footnotes#f#(?<=\\f\s).*?(?=\\f\*):::
Not in footnotes#f#(?<=\A|\\f\*)(?s).*?(?=\\f\s|\Z):::
In text (ignores markers and headings) #f#(?<=((\\\+?(i[^de]\w*|[smlpq]\w*|r|d|nb|cl|s|tr|tc\w+|v\s+\S+|f[^r]\w*|x\w+|add|bk|tl|sc|nd)[\*\s])|(f|x|fe|ef|c|fr)\s+\S\s*(?=\\)|\\[fx]\*))[^\\]*?(?=\\|\Z):::
In Ref fields#f#(?<=(\\\+?(xt|ior|fig|rq|zpa-xb)\s|\$r|mr|sr|ipr)\s))[^\\]*?(?=(\\\+?(xt|ior|fig|rq|zpa-xb)\*|\s\$):::
Not in Ref fields#f#(?<=\A|\\\+?(xt|ior|fig|rq)\*|\$?!(fr|cl|r|mr|sr|ipr|toc\d|(\+?(xt|ior|fig|rq)))\s))(?s).*?(?=(\\\+?(xt|ior|fig)\s|(\\(fr|r|mr|sr|rq|ipr|toc\d)\s[^\\]*?(?=\$))|\Z):::

Expressions for RegEx Pal

Please log in or register to answer this question.

10 Answers

Global Publishing Services
Scripture Typesetting trainer & Regular Expression "specialist"
Dallas, TX

x

x

x

x

x

x

x

E \f?* remove unnecessary footnote closing markers (keep \f)#r#\f[\w-[iv]]+#

x

x

Please log in or register to add a comment.

Please log in or register to add a comment.

Please log in or register to add a comment.

Please log in or register to add a comment.

Please log in or register to add a comment.

Please log in or register to add a comment.

Please log in or register to add a comment.

Please log in or register to add a comment.

Please log in or register to add a comment.

Please log in or register to add a comment.

Related questions

Categories

Expressions for RegEx Pal

Please log in or register to answer this question.

10 Answers

Global Publishing Services Scripture Typesetting trainer & Regular Expression "specialist" Dallas, TX

x

x

x

x

x

x

x

E \f?* remove unnecessary footnote closing markers (keep \f*)#r#\f[\w-[iv]]+*#

x

x

Please log in or register to add a comment.

Please log in or register to add a comment.

Please log in or register to add a comment.

Please log in or register to add a comment.

Please log in or register to add a comment.

Please log in or register to add a comment.

Please log in or register to add a comment.

Please log in or register to add a comment.

Please log in or register to add a comment.

Please log in or register to add a comment.

Related questions

Categories

Global Publishing Services
Scripture Typesetting trainer & Regular Expression "specialist"
Dallas, TX

E \f?* remove unnecessary footnote closing markers (keep \f)#r#\f[\w-[iv]]+#