One rendering not working - how to debug?

Question

Project turned over to “match based on stems” a few weeks ago, after helpful input from this forum.

The transition is still creating some work, but we will not look back. The rendering are much more human-friendly now (for visiting consultants and team members) and there is not really more work for new entries.

Now one term is “messed up”. In the text it shows as bagʊbɔrɩfɔ (their flocks) and in the wordlist it is entered as ba+ gʊbɔrɩfɔ and in the rendering window as gʊbɔrɩfɔ. So it should create a hit. But does not. I tried misc tricks from a few decades of computer use, like delete and enter again. Try without the possessive (works), try the combined for in the rendering window (works too). So the morpheme setup is messed up somehow. In the interlinearizer is looks correct though, no additional word parses or other complications.

Now I am asking for ideas on how to tackle this very isolated bug please. All other terms are behaving as expected and this one does not. I hope it is just something like an accidental inserted invisible zero-width-space or something, but my first rounds of debugging did not show anything the like.

How for example would I be able to “totally delete” a word from the main text and from the wordlist and from any deeper memory-levels of PT8?

Paratext Apr 2, 2019 asked by Tim (855 points)
Apr 3, 2019 reshown

3 Answers

Best answer

@anon291708 I am still very thankful for your previous input which helped me find a solution. But with your last reply you gave me some sleepness nights. It should not happen that I misunderstand how an important tool for my work operates. I do not mind challenges or corrections if it helps the work.

This is what I understood so far (for projects which are run in stem-based setup):

Greek (or Hebrew) is the basis. In the Biblical Terms tool, we can record one or more translation renderings. Since Greek grammar works different from target language, we keep nice clean information in the renderings window (stems). For example “mother”. I believe the Biblical Terms window is the place where a consultant would go, or where team-members would go and look at difficult cases.

And the Wordlist with its morphology-magic is linking to all other occurences of the same Greek term, which are looking different in the target language, for example “mymother”, “yallsmother”, “themother”, “motherly”, “motherness”. I believe the Wordlist is useful but not muchly suited for human consumption, exept for users who have a firm grasp and enjoyment of regexes.

Since I really need to figure this out, I just spent some time in the inbuilt help and found this:

                Using Match based on stems - Demands most computing power, most accurate results
             
                ...
                In either the Wordlist or the Interlinearizer, specify morphology for a word.
                In the Biblical Terms tool, approve the stem of the word as a rendering for a Biblical Term. If other words contain the stem you just approved and those words have their morphology specified, Paratext also approves them as renderings for the Biblical Term. For example: Specify morphology for baptized (baptize +d) and baptizes (baptize +s). Approve baptize as a rendering for βαπτίζω. Paratext also approves baptized and baptizes as renderings for βαπτίζω.

In the above quote, I tried to put emphasis on “In the Biblical Terms tool, approve the stem of the word as a rendering” but did not manage because it is in the middle of a block-quote. It feels to me I am getting conflicting input about how to handle renderings in a stem-based setup, in the context of certain key terms having several different morphological shapes and possibly no (or very few) pure-stem appearances in the text.

I love the way how the Biblical Term tool looks much cleaner and much more “like the actual” language, when using stem-based as compared to a setup with lots and lots of affixes and wildcards. Typically there are only a handful of renderings, depending of type of word. Verbs typically show a naked stem, an infinitv form and an irrealis form. Nouns show a singular and a plural form but no possessed forms.

I would prefer to keep the concerned project this way, because it feels the most logical. In Flex we run it somewhat the same: The dictionary is roughly the equivalent of the renderings window, having the “pure data”. And the word analysis list is the equivalent of the PT Wordlist, showing all the (many) occurrences out there in the wild (with their respective analysis or morphology).

If I really got it wrong, please somebody help me see my errors or show me how it means a risk or can trigger problems in PT8. I would gladly get help and take this deeper - and no problem to take it off this forum if too specific and not helpful to other users.

Apr 6, 2019 answered by Tim (855 points)
Apr 6, 2019 reshown

I agree, @Tim. My understanding is that the stems should be put into the Biblical Terms tool as renderings (as the Help article clearly states). If the help was to be stated the way that @anon291708 describes the tool, perhaps it would say something like, “In the Wordlist, approve the stem of the Biblical Tool rendering as a word.”

In my understanding, inflected forms in the project’s renderings are errors; I instruct teams to fix those errors by removing genitive endings, plural markers, etc., from their renderings.

True, as @anon291708 has mentioned, the tool will work by extracting perceived stems (i.e., analyzed by PT) from renderings given. (That was new to me, but it does work.) However, as a best practice, this should be avoided, especially for thoroughly amalgamating languages. The reason, in my opinion, is that it creates more work (and/or confusion) later.

In the process of making the vocabulary consistent in a text, the person making sure renderings are “found” in the text does need to have an understanding of why certain renderings are still “missing” (whether or not that person is also one who is entering renderings). In order to solve that puzzle, if the rendering has been entered as a stem, that person will only need to look up the Wordlist morphology entry of the offending term in the text. If, however, an inflected form has been entered as the rendering, the person trying to make sure all renderings are “found” would have to first look up the rendering’s morphology in the Wordlist, and then check the Wordlist morphology entry of the offending term in the text.

Also, checking key terms in a list of variously inflected renderings would be very distracting, compared to having only stems listed in the renderings column.

@anon291708, are you saying that Paratext double-checks to make sure that every word marked as a stem in the Wordlist’s morphology actually exists as a stand-alone word in the text (and then, if a stem is not found in the text, it is rejected as being an inaccurate stem). If that’s the case, then I definitely agree with @Tim:

Apr 10, 2019 commented by Alex W. (187 points)

Partatext takes the word entered into the renderings field and looks it up in the Wordlist morphology list. If it is not found, then it can not do stem-based matching. Thus, if the word you entered into the renderings is actually a stem that never occurs as a word, it can not find matching morphology in the Wordlist.

I’m not a translator and do not know a lot about the translation process, I’m just telling you how it is currently designed to work based on what the code actually does.

Apr 10, 2019 commented by [Expert]

Fool Running (16.2k points)
Apr 10, 2019 reshown

This I do not understand. Let me take an example. For one Biblical term I have the stems kat, kāt and tāloo. If I take any of these stems and search all books in the project restricted to whole words, there are no hits at all, because these are stems not words. They never occur as whole words in the project, nor should they occur. In the Biblical Terms window, if the cursor hover over the last column with the 3 stems, I can see that the stem tāloo occurs in the word tāloosyēēt two times. The stem kāt occurs in six different words, and the stem kat in two words (kuukataak and akatakiisye both occur in the same verse). The total number of occurrences (10) for these 3 stems is not the same as the number of verses (9) in the counts column, since the same stem may occur more than once in a verse, and there may be verses where none of them occur, even if the Hebrew word is there.

Apr 11, 2019 commented by Iver Larsen (869 points)
Apr 11, 2019 reshown

I think that the conversation may be about two different things, one of which I am not familiar with. Iver+Larsen is talking about the part I use regularly. You can put a stem or an inflected word into list as a rendering and Paratext will recognize both or either. To recognize the stem, the relevant affixes must be listed in the Biblical Terms/Tools/Affixes menu. You would put the stem when the affixes are not relevant to the specific meaning of the Biblical terms. It does save a lot of time.

anon291708 is talking about a different function of the Biblical Terms tool but I’ve lost the thread and am not sure which. Perhaps when guessing renderings, when Paratext guesses stems and adds an asterisk to indicate affixes? If not, I’m interested to hear what it is.

Blessings,

Shegnada James

Language Technology and Publishing Coordinator, SIL Nigeria

Text Processing Specialist – Complex Script, GPS, SIL Intl

Skype: Shegnada.james.

[Email Removed]

+1 972 974 8146

Apr 11, 2019 commented by Shegnada (1.3k points)
Apr 11, 2019 reshown

Just to complement what Shegnada says. I am using Matched based on stems, so the affixes that may or may not be listed are ignored.
The following is from the Helps:
On the lower edge of the Biblical Terms tool, if Match based on stems: ON appears:
Paratext will ignore any Word Prefixes and Suffixes specified in the Tools menu of the Biblical Terms tool.
The matching process will take longer, perhaps twice as long on the average.

              **Advantages and Disadvantages of using Match based on stems**
              The advantage of using Match based on stems is the high degree of accuracy of the results. The biggest disadvantage of using Match based on stems is that specifying morphology can be as much work as entering all the specific word forms individually.

End of quote.
Yes, specifying morphology was a huge task, but it helped a lot in correcting wrong spellings for this very complex language where translators would in the beginning quite often mark a wrongly spelled word as correctly spelled. I have so far done the morphology breakdown for the 20404 words in NT and Psalms. Most word forms only occur once.
We started out by entering prefixes and suffixes, but when we reached 41 prefixes and 82 suffixes we gave up and went to Based on Stems, since this works better for languages with complex morphology.

Apr 11, 2019 commented by Iver Larsen (869 points)

Yes, @anon291708 has clearly understood our problem. I still claim, that this is unexpected behaviour from a tool which is doing powerful morphology. You can see my personal workaround in my previous post from 3rd April 2019.

Still, it should be relatively easy for the devs to fix this issue, that stems which a user has provided indirectly are presently not considered. Otherwise there should be a clear message (on mouse-over) why no hits are being found.

(user enters “my+ mother” so PT8 should internally note “mother” as a valid stem)

People, this has become a very good thread. I am learning a lot, even if some stuff is not precisely focused on my initial problem. So thank you all for your input and for sharing from your respective work flows.

Apr 19, 2019 commented by Tim (855 points)

Fool Running · Answer 1 · 2019-04-03T12:50:50+0000

Out there in the wild, the word “flocks” does exist on its own (gʊbɔrɩfɔ). But so far we do not have it in the wordlist by itself, because it is rare in Scripture.

Are you saying one of our team members giving the morphology is “not enough”, and PT8 cannot figure out from ba+ gʊbɔrɩfɔ that gʊbɔrɩfɔ is a valid stem?

How and why would I put “a word containing that stem” into the renderings-window? That would be defeating the entire setup as being stem-based and managed-from-the-wordlist. Imagine that later there will be a verse, where the translaters will use the stem by itself. Then nothing would trigger a note that “the hack” can be undone for bagʊbɔrɩfɔ.

Still, your question, has given me another idea: Since we are using the book XXA to store extra words for hyphenation-info management, even beyond scripture, I could just enter the stem, i.e. the word gʊbɔrɩfɔ (flocks). And that brought it into the wordlist.

And indeed: the issue of the non-rendering has gone away. I am glad for this solution.

But this is totally unexpected behaviour!

If PT8 needs each stem listed by itself (and what about each affix??), then that should be very clearly documented. And there should be a “normal” way to bring such “helper stems” into the wordlist. In this language (and possibly in many other languages) there are certain words which will almost always carry a possessive or other prefix. Like (mymother, yourmother, ourmother). The naked mother is so rare, that a user will certainly stumble over the same problem as I just did, because statistically a normal related mother will be entered sooner than the rare pure mother.

Maybe I am missing something. I will probably never forget this little adventure ever. But other users should be warned.

Apr 3, 2019 commented by Tim (855 points)

I think you misunderstand how Match based on stems works in the Biblical Terms tool. It is designed so you can just find a word in the text that is used for the term, and you can use that word as your rendering. Paratext then uses the morphology to look at the stem of that word to find words in other verses that have the same stem. It is not designed to require the user to know the stem of the word they are wanting to use as a rendering (since that is defined in the Wordlist - maybe by a completely different person). Thus it is actually slightly incorrect to only put in the stem of the word into the renderings list (especially if it never appears as a “word” in the text).

Apr 4, 2019 commented by [Expert]

Fool Running (16.2k points)
Apr 4, 2019 reshown

Iver Larsen · Answer 2 · 2019-04-10T13:39:40+0000

Working with a language with very complex morphology, I have found the stem-based Biblical Terms very useful. In the Renderings box I can get by with 1 to 6 different stems, most of which never occur as a word on its own. That is not a problem, since PT looks for stems in the Wordlist rather than words. These few stems cover dozens of different verb and noun forms. For instance, the stem for resurrect covers 21 different verb forms. Before using the stem-based option, we listed dozens of prefixes and suffixes and combinations, but it is much easier to just list the stem. However, the morphological breakdown needs to be correct in the Wordlist so that the stem can be found. The program may guess a breakdown, but that guess is usually wrong for our language, so it needs to be corrected. Sometimes we have the same rendering for different Greek or Hebrew words. Sometimes one Greek word needs several stems in the language. It would be nice if the morphology column could be sorted according to which breakdowns have been approved as correct or are only guessed by PT. Wading through 20.000 words to spot those that are not yet approved is cumbersome. One problem I have is that none of the translators in the current project understand the morphology well enough to make these breakdowns, so I need to do them.

One rendering not working - how to debug?

Please log in or register to answer this question.

3 Answers

Please log in or register to add a comment.

Please log in or register to add a comment.

Please log in or register to add a comment.

Related questions

Categories