0 votes

After delving into the laborious, cluttersome demands of tagging the Words of Jesus with \wj … \wj* to make a Paratext custom tool to apply them to a Paratext project (according to character assignment in a related Glyssen project), I longed for a tagging system that was more intuitive for the translator.

Here’s my gripe with the system of \wj … \wj* tagging: It’s is designed for the convenience of the rendering software, not for the convenience of the translator. To successfully apply these tags, you must close and reopen the tags at every verse number. And at every paragraph break. And then, you need to add a plus (+) symbol to every word/character marker within it, like +w …+w* and +nd …+nd*.

The most surprising to me is that if you want to pass the markers check, you even need to close and reopen the tags at every footnote!! Is that really necessary, or is the markers check generating all these warnings for nothing?

This made me think that what the translator really needs is a way to just tag where a speaker starts and stops, never mind the paratextual content or formatting issues. This is also important for keeping track of speakers for multi-voice audio recordings, as revisions to the text can twist things around for Glyssen to have to resynchronize.

So I was delighted to be introduced to the milestones feature in USFM3 which is designed for exactly this purpose. But now I have so many more questions…

First, will this replace \wj marking? If a project using \qt-s …\qt-e markers wants a red-letter publication, will it need to add \wj marking in addition? That seems so redundant and messy.

Second, can we assume that paratextual fields, such as \s, \r, and \f, like \x, constitute inherent exceptions to the active \qt-s? That is, you could put \qt-s |who=Jesus* at the start of the Sermon on the Mount, and put \qt-e* at the end of it, and you’d be done. One pair of tags rather than 120 or more, depending on paragraphs and footnotes, let alone no need to mess with nesting all the glossary-tagged terms. It seems that stuff that the computer can automatically figure out (to infer red lettering) doesn’t need to have markers redundantly cluttering the text.

Third, what are the best practices for the “who” attribute? Will tools expect these to be English names/descriptors, such as “Jesus” and “spies from Pharisees and Herodians”? Should these be the same character names as used by Glyssen?

Fourth, to what extent should these be marked in contexts where no disambiguation is required? In certain verses, Jesus is the only character who could be speaking, while in others, more than one person speaks, and disambiguation is actually necessary. Is it better not to unduly clutter the text in such cases with speaker tagging?

OK, that’s all my questions for now.

I’d also be interested in testing my WordsOfJesus tagging-tool on a wider variety of projects that have done character assignments in a Glyssen project. If you have such a project, please let me know if you would be willing to share a copy with me. -Thanks!

Paratext by (286 points)
reshown

3 Answers

0 votes
Best answer

This is really useful feedback, thank you.

I apologise that my reply doesn’t offer you solutions to your questions, rather it is questions to you related to your issues and concerns.

I assume you’re a project admin and it sounds like you’re comfortable using markup and learning the associated rules. I’d like to ask a few questions about you and your team (if you have one) and how you use Paratext:

  1. Are there other team members on your project who find USFM markup to be an obstacle they can’t overcome (or prefer not to learn about)?
  2. Do you mostly use unformatted view or standard view when working on more complicated nested markup?
  3. Would you find it helpful to not need to edit markup and instead highlight a portion of text to be given a certain attribute or set of attributes?
  4. If we did something like the above, would it be helpful if Paratext figured out how to most simply represent your choices about attributes behind the scenes?
  5. Would the item above seem like a loss of control over the text?
  6. Would you want to be able to edit the markup that Paratext created in the point above (and see the usual warnings if edits caused errors)? Or would simply viewing it be sufficient?

I’m interested in knowing whether teams care about the markup behind their text. Or if they care more about ensuring the correct attributes end up attached to words or portions of text and prefer not to have to worry about the markup (and the obstacle of learning USFM which is mostly documented in English).

by [Moderator]
(1.1k points)

reshown

Hello, IanH. There are people on other projects that are terminally confused by the + notation rules, and will probably never get them right without someone like me to clean up the mess. As a software developer, I understand the reasons for the + notation, but still find it to be an unnecessary kludge, like the requirement to stop and start \wj …\wj* at every footnote, etc. It gets SUPER messy when \w is used for Strong’s number tagging. And yes, there are real people doing that with minority languages. So, highlighting and applying attributes and asking Paratext to deal with the markup in the background might be an improvement. It could be offered as an option, like current standard and unformatted views, which could still be there. I often drop back to unformatted view when I’m trying to figure out what is causing a schema error. So I don’t think going all with highlighted ranges without the option to see and edit the markup would be great at first, but if done well it would be an improvement. It would be more like Microsoft Word and less like the old WordPerfect and its codes, and we know how the marked played out between those two. Of course, compatibility with existing texts was key to Microsoft’s success, but also good UI design.
– Paratext user and developer of Haiola

Hi IanH,

To answer your questions:

I do Paratext training and support for a wide variety of teams from multiple partner organizations across South Asia. I also train typesetters and app developers, and do some other language-technology training. It’s not all that often that I build custom tools to plug into Paratext, like I’m doing at the moment. I tend to work more with intermediate and advanced teams that are comfortable with USFM than with the MTT teams that probably struggle more with the USFM learning curve.

Personally, I hop back and forth all the time.

I can imagine that for some users, that would be lovely.

Absolutely, for many of the teams I work with. You can’t apply changes widely if you can’t get your fingers on the markers. They’d be back to clicking around forever to manually make changes.

Teams definitely care about the markup behind their text. It’s just that in some cases, like \wj, the formatting rules are quite a pain. It would be nice if Paratext took care of some of those details for us and hid the clutter.

0 votes

That depends on the software used for publishing. I don’t think Publishing Assistant handles milestones like you want (you’d have to check with that team to make sure).

USFM 3.0 milestones were mostly created to aid in doing audio productions (e.g. interfacing with Glyssen) to help determine who needs to speak at a particular point and is not designed to handle the formatting of a printed publication.

Yes, it’s necessary. It’s a limitation of the USFM format.

by [Expert]
(16.2k points)

reshown

Thanks for this. Please help me understand: Does the USFM format have this limitation when a cross-reference falls within the words of Jesus? They are structured the same. And PA and SAB render the words of Jesus just fine regardless of whether or not there’s a cross-reference or footnote that appears within the \wj tags. But the marker check generates a warning for the footnote but not the cross reference. What’s the difference?

0 votes

Perhaps I can help out with some regex to insert all of those intermediate \wj and \wj* markers.
The regex closes and then reopens opens \wj before and after paragraph markers, verse numbers, footnotes, and section headings.
To use:

  1. Put a \wj at the beginning and \wj* at the end of any text you want marked.
  • If the words of Jesus span paragraphs you do not need to close the \wj,
  • however for multi-chapter monologues you will need to put a closing \wj* at the end of the chapter and an opening \wj at the beginning of the next chapter.
    You can do this for as many sections of text as you want.
  1. Once you have marked up the text run the following in Regex Pal
    Find:
    (?<=\\wj\s)([^\\]|\\(?!wj\*))*?(?=\\wj\*):::((\s*(\\(b|p\w*|mi?|q\w*)\s|(\\(m?r|m?s\w*)\s.*)+|\s+\\v\s\S+\s|\\(x|ef|f|add)\s.*?\\(x|ef|f|add)\*))+)(\s*)
    Replace:
    \wj*\1\9\\wj (be sure to include a space at the end of the replace)

  2. If you happen to have character markup inside your red letter text like \w Pharisees|Pharisee\w* or \tl Talitha cum \tl* you will need to add embedded character markup to the tags, which is to say add a + after the \:
    Run this regex:
    Find:
    (?<=\\wj\s)([^\\]|\\(?!wj\*))*?(?=\\wj\*):::\\(\w+)([^\\].*?)(\s*)\\\1\*\
    Replace:
    \\+\1\2\\+\1*\3
    This will produce: \+w Pharisees|Pharisee\+w* and \+tl Talitha cum\+tl*
    Note the space after cum is moved to the right side of \tl*

This is the code to be inserted in userMenu.txt so you can run it from the RegexPal User menu:
———\wj Cleanup—————————#f#
\wj*...\wj#r#(?<=\\wj\s)([^\\]|\\(?!wj\*))*?(?=\\wj\*):::((\s*(\\(b|p\w*|mi?|q\w*)\s|(\\(m?r|m?s\w*)\s.*)+|\s+\\v\s\S+\s|\\(x|ef|f|add)\s.*?\\(x|ef|f|add)\*))+)(\s*)#\\wj*\1\9\\wj
fix embedded markers#r#(?<=\\wj\s)([^\\]|\\(?!wj\*))*?(?=\\wj\*):::\\(\w+)(?s)(.*?)(\s*)\\\1\*#\\+\1\2\\+\1*\3

by (1.8k points)
reshown

Thanks, CrazyRocky! Great to hear from you again! And what cool regexes!

Hey, a couple things that caught my eye:

I see that you exclude the \add text from \wj marking. Is that because it’s not really an explicit word of Jesus? I’ve been treating it like other character-level formatting, like \nd and \w. But I don’t know whether it’s customary to take the red ink away when a translator adds an implicit word to Jesus’ quote.

Also, I’ve never thought of nesting \wj inside \w like this before. Interesting! I’m curious why you prefer it this way. I’ve always encouraged users to nest it the other way around:
\wj Woe to you \+w Pharisees|Pharisee\+w* and teachers of the Law!\wj*
Are there particular advantages to doing it with \wj on the inside?

Actually, I’ve just put together have a much faster and more reliable way to insert all the \wj tags in the project and nest the \w markers and such. It’s a custom tool for Paratext’s Custom Tools menu. Basically, Glyssen already knows which verses Jesus should be speaking in, and there are just a handful of ambiguous cases to manually disambiguate, so there’s no need to manually go through the text inserting any \wj markers. We use Glyssen’s work to apply the \wj tags wherever they are needed.

But where are they really needed? That was my question, and I saw no instructions in the USFM docs telling users what these markers must close and reopen around. Cross-references work fine without such clutter, as do footnotes, but the markers check complains in the case of footnotes. Now as I’m looking into this further, I realize that even though the marker check does not complain about cross-references, the schema check does. I still think it would be best if the marker check treated footnotes and cross-references consistently. Anyway, I guess I’ll have to make closing and reopening \wj around cross-references and footnotes both standard in my tool. Let me know if you’d be interested in testing it out. -Thanks!

That depends I suppose. I have a version where the \add content was commentary so I wanted to not mark these as red. So just remove \add from the regex and it will be included in the red letter.

I looked at my project where this is an issue and found that that is the way I do it too. This must be old code. I will remove it from my userMenu.txt file and edit it out of the above.
I will replace with code to add a + to all of the embedded character tags.

I’ll jump in and say this system of having to open/close \wj markers repeatedly has always bothered me for the exact reasons given. In order to pass checks I’ve gone through and fixed markers, but no one else on the team is able to understand the logic of why or when closing and reopening a \wj marker is necessary. Or why nesting is necessary (or exactly when it is and isn’t necessary).

I think the end goal of PT should be that normal users can use it without constant tweaking by high level support staff. Marking the words of Jesus definitely seems like something that the regular user ought to be able to do.

The fact that CrazyRocky has been able to create more or less successful regex expressions to automate this should tell us that it’s possible to hide all that markup behind the scenes: that the USFM files could be marked in a minimal way and PT or publishing programs could be programmed to fill in the extra information when needed and on the fly.

Related questions

Welcome to Support Bible, where you can ask questions and receive answers from other members of the community.
For where two or three gather in my name, there am I with them.
Matthew 18:20
2,626 questions
5,364 answers
5,041 comments
1,420 users