0 votes

I have on occasion found a blank space following a proper noun when using the pn USFM marker.

For example: \pn Jesus \pn* . You can see the blank space following Jesus’ name. How can I search for proper nouns with a space following the name?

When I search for: \pn* I get about 7000 hits. When I search for \pn* with a space before it, I get about 1000 hits. However, the items it discovers do not have a space following the name. This really confuses me. Do you have any suggestions for how to search for this? One would think this should be an easy search but it doesn’t seem to be.

Paratext by (239 points)

3 Answers

0 votes
Best answer

The best way to search for this is with regular expressions. In the Paratext Find window paste this
regex:\s\\pn\*
You must “escape” both \ and * with a \.
I have a general solution I use on all projects, which is to never allow a space before end character tags. You can find all possible instances using this in the Paratext Find window:
regex:\s\\\+?\w+\*
You can fix them with this find and replace using RegEx Pal:
(\s)(\u200F?\\\+?([^xf]\w*|[xf]\w+)\*)
\2\1
Note that this regex excludes spaces at the ends of footnotes and cross-references. It also makes provision for the Unicode right-to-left marker (u+200F). Fixing extra spaces in RTL text is particularity difficult without this tool.

by (1.8k points)

Actually, the quickest way to find \b before \pn is with the Marker Inventory. Tick show preceding marker and sort on \b or \pn. It is a very handy tool. Don’t forget to right click and copy the references to the list window.
Blessings Shegnada

I think there is some confusion here.about the wording “blank space”.
My understanding is that @Clear7419 was asking about, “Searching for a space before \pn*” not “Searching for a blank line

@CrazyRocky You are correct, I was just searching for a blank space and not a blank line. Thanks for your helpful regular expression. Can you explain what you mean by escaping both \ and * with a \ ? I pasted the expressing but it found examples with no blanks.

From a Regex Tutorial - Literal Characters and Special Characters:

Because we want to do more than simply search for literal pieces of text, we need to reserve certain characters for special use. There are at least 12 characters with special meanings:

the backslash \, the caret ^, the dollar sign $, the period or dot ., the vertical bar or pipe symbol |, the question mark ?, the asterisk or star *, the plus sign +, the opening parenthesis (, the closing parenthesis ), the opening square bracket [, and the opening curly brace {, These special characters are often called “metacharacters”. Most of them are errors when used alone.

If you want to use any of these characters as a literal in a regex, you need to escape them with a backslash. If you want to match 1+1=2, the correct regex is 1\+1=2. Otherwise, the plus sign has a special meaning.

And in PT, we often need the \ in a regex, because it is so fundamental for usfms. So to search for \p in a regex, you type \\p.

(It is useful to learn about escaping, as even in this forum-tool, certain characters will not render, unless you either escape them or use the “preformatted text” feature via the </> icon.)

I believe you are referring to this code.:

My generalized expression searches for embedded markers, which are particularly common in footnotes. For example valid USFM requires that in footnotes the \pn ...\pn* marker be marked as embedded using a plus character after the backslash:
\f + \fr 8:5 \ft I want to put \+pn Jesus \+pn* in a footnote.\f*
In this example there is a space before and after \+pn* The RegexPal code above will find that space and move it after \+pn*. If that results in two spaces in a row Paratext will automatically combine them into one.
As @Tim states the the plus sign + must be escaped, but in these searches it is also optional, so the code for this is \+?.
I hope this addresses your question.

+1 vote

I did this the old fashion way, searching each final letter + a space + the USFM marker. It did the job.

by (239 points)
0 votes

Hola.

Different formulas can be constructed. One that answers your question and that you can use directly in the Paratext search engine is:
regex:\p{L}\s\\pn\*
where:
regex: is the command to use regular expressions.
\p{L} will search for any letter (in this case the end of a word).
\s will search for a space
\\pn\* will search \pn*. It must be written like this for Regex to recognize it.
This way you will have a list of all the places where you have a letter followed by a space and \pn*.

If you want the search to extract the whole word before the space, you just need to add a + sign like here: regex:\p{L}+\s\\pn\*

Translated with DeepL Translate: The world's most accurate translator

by (840 points)

Related questions

0 votes
5 answers
0 votes
1 answer
0 votes
6 answers
0 votes
6 answers
Welcome to Support Bible, where you can ask questions and receive answers from other members of the community.
And over all these virtues put on love, which binds them all together in perfect unity.
Colossians 3:14
2,616 questions
5,350 answers
5,037 comments
1,420 users