0 votes

Should characters used in foreign words (sometimes used in footnotes) be included in the character inventory? Or is there some way to accept the “errors” when they show up in the basic checks as invalid?

Paratext by (105 points)

6 Answers

+1 vote
Best answer

Coming to this conversation a bit late, but I completely agree with davidc78 . USFM should have (minimally) a marker (maybe \fo … \fo* - “o” for “other language”) for words used in a footnote that are in a different language, whether it’s Greek, Hebrew, regional LWC, national language, international language, whatever. Words marked in that way should be excluded from the wordlist, and also excluded from character checking. Some users might even want to have a set of markers, like \foh for Hebrew words, \fog for Greek words, \foa for Aramaic words, \fon for national language, \for for regional LWC (or maybe related variety), etc., along with making Paratext capable of handling multiple wordlists (vernacular, Greek, Hebrew, Aramaic, etc.) Wouldn’t this also make it easier to link Paratext to FLEx, by excluding words that the user doesn’t want included in a FLex lexicon? In fact, having multiple markers as I described could allow Paratext to be linked to multiple wordlists within Paratext, and maybe even multiple FLex lexicons.

by (260 points)

I realized later that \tl is a USFM marker that can be used within a footnote, to mark words from another language (whether a biblical language, a national language, or whatever). However, words that I’ve marked with \tl still show up in the wordlist. This is not what I want, and I’d be surprised if anyone wants this, since they are not words in the vernacular. (If some users do want to see such words in a wordlist, I’d suggest that Paratext have the ability to generate a secondary wordlist of only those words that are marked with \tl.)

I agree that it might be nice for Paratext to have an option to distinguish words in the vernacular and words in other languages (as well as the distinción on characters). However, the other side of this is that Paratext is checking the words that exist (and will be published) in the text. Whether or not the word is vernacular is immaterial to the checking process as far as needing to make sure that the words are spelled correctly and possibly hyphenated correctly. The Paratext word list is reporting the words that exist in the publishable text.

Good point.

0 votes

anon070973,

There are some varied opinions on this, but I think the consensus would be that you only include characters in the language settings that occur in the language. So, I would not include Greek or Hebrew in the language settings since they do not occur in the language. However, if I have the word Cristo in my text and the letter c does not normally occur in my language, I would still include the letter c in my language settings since Cristo is now a word “used” in my language.
For the other foreign letters you would simply mark them as valid in the Character Inventory if they are “valid” in the project.

Having offered that opinion - I’m sure there will be those who disagree with me for various reasons.

by (8.4k points)
0 votes

I agree with anon848905 that one should include letters in language settings if they occur in words that are borrowed. Adding to what anon848905 said, I worked in a language where “b” was only in borrowed words, glottalized b (b’) and “v” were in native words, and it was common to forget to type the glottal, or to confuse “b” with “v”.
Having included "b’ ", “b”, and “v” as legitimate characters, a very helpful way to find words that were incorrect was to use the “Find similar words” command in the Wordlist. By including pairs of these three sounds in the “Letters that sound alike” box, the Wordlist would list pairs of words that only differed by the sounds. (For example, if “bolt” and “volt” both occurred in the translation, comparing “b” and “v” would list those two words and tell how many times each occurred.)
Furthermore, looking through the Wordlist to see the words that occur in the translation would help find words containing spelling errors - whether those errors were due to letters from borrowed words or due to imperfect people inadvertently making mistakes.

by [Expert]
(733 points)
0 votes

Shouldn’t there be a way to mark a foreign word with SFMs? It seems like sometimes our translations want to include a French word to explain something in a footnote, but it would be nice to set it off with SFMs, and ideally for Paratext character inventory to not include words marked like that. The SFMs marked like that wouldn’t need to be typeset differently, unless desired.

by (1.3k points)

Yes, we’ve done the same – using the LWC in footnotes – and need to specially mark those characters as Valid. SFMs to mark those would aid choosing an appropriate font at typesetting time, I presume.

0 votes

I found a marker \tl … \tl* in the USFM reference:
http://ubs-icap.org/chm/usfm/2.4/special_text_character_styles.htm#tl
(You may have to click on the \tl … \tl* link to get to the explanation of that particular SFM…)

It is defined as:

Transliterated (or foreign) word(s).

That SFM can be used (as in the example given on that page) for words that you wish to appear differently (e.g. in italics) in the typesetting. But they wouldn’t have to appear differently. I can imagine that we would want to use that SFM in a footnote that has a word(s) as part of the explanation in French, but as to whether it is typeset differently or not, that would be a question for the team. But if we marked those foreign words, then maybe they could be excluded in the character inventory.

I tested the character inventory, and it DOES always include characters that are within a \tl … \tl* marker. It seems to me that if this is a transliterated and/or foreign word(s), maybe the text should be excluded from the character inventory. Or maybe better yet, add a checkbox under the “Show combinations” checkbox called “Include foreign words (\tl)”, and when this is unchecked, the text marked with \tl gets excluded. Sounds like a fairly simple, and quite useful feature request, @anon291708?

by (1.3k points)

Yes, filtering out characters in certain styles in the Characters Check/Inventory would be straightforward.

0 votes

I think there are two different issues being considered here. One is marking foreign words and being able to manipulate them. That already exists. The second issue is whether or not those characters should be listed in the character inventory. In my opinion - any character that will be printed should be listed in the character inventory so that you can verify if those characters are valid or not. In the Language Settings you identify the characters that belong to the language, but you really need to be able to identify if you have other characters that will be printed. I’m not clear on what would be gained by optionally hiding those characters from the inventory. When you are marking characters as valid in the Character Inventory, you are not saying that these are characters in the language, but that they are valid characters to be printed in this project.

by (8.4k points)

I work in a project that has a number of user-defined markers (\zeng, \zgrk, \zheb, etc.) to mark different languages. This allows us to use different fonts and writing directions for different languages. But an added bonus was that marking them as \nonpublishable in the stylesheet prevents them from showing up in the character inventory, and more importantly, the spelling wordlist.

Of course as people above have commented you lose some functionality and you eventually do want to check that all the characters are valid, so I find myself every once in a while removing that \nonpublishable tag to run a check. And obviously you’ll want to remove it if your publishing system reads that tag.

The basic problem here is we are working in a multilingual world (Greek, Hebrew, local language, language of wider communication), but the editor in Paratext does not recognize that. So “everything” in project X is in the X language. And this approach is just too simplistic.

What is needed is a way to mark Greek, Hebrew, and LWC, with unique SFM styles (for each) and these be maintained separately in the “Valid/Invalid” list. And NOT be mixed in with the X Language’s character, spelling, or wordlist inventories. (Even valid LWC punctuation might be invalid for language X.)

Where we work, today we might mark some weird letter as valid found in a word from the LWC in a footnote. But does doing this open the possibility that someone may introduce a typo in X language using that “valid” character, because we marked it as valid? I.e. it is valid for the LWC, but it is INVALID if used in the X Language.

So how do we differentiate this? And why can’t we just mark these as different languages? That seems the most obvious solution. Every other reasonably powerful editor I know recognizes “language”, surely we should in this business of translation.

Granted, this isn’t needed a lot, but it is needed often enough that a solutions would be very helpful. (And I agree with anon848905 that just “ignoring” them is a less than ideal solution.) These are “valid” in certain cases; we need Paratext to be smart enough to recognize the context (the SFM brackets) and validate them based on their particular subset.

Welcome to Support Bible, where you can ask questions and receive answers from other members of the community.
Just as a body, though one, has many parts, but all its many parts form one body, so it is with Christ.
1 Corinthians 12:12
2,627 questions
5,369 answers
5,042 comments
1,420 users