0 votes

Project turned over to “match based on stems” a few weeks ago, after helpful input from this forum.

The transition is still creating some work, but we will not look back. The rendering are much more human-friendly now (for visiting consultants and team members) and there is not really more work for new entries.

Now one term is “messed up”. In the text it shows as bagʊbɔrɩfɔ (their flocks) and in the wordlist it is entered as ba+ gʊbɔrɩfɔ and in the rendering window as gʊbɔrɩfɔ. So it should create a hit. But does not. I tried misc tricks from a few decades of computer use, like delete and enter again. Try without the possessive (works), try the combined for in the rendering window (works too). So the morpheme setup is messed up somehow. In the interlinearizer is looks correct though, no additional word parses or other complications.

Now I am asking for ideas on how to tackle this very isolated bug please. All other terms are behaving as expected and this one does not. I hope it is just something like an accidental inserted invisible zero-width-space or something, but my first rounds of debugging did not show anything the like.

How for example would I be able to “totally delete” a word from the main text and from the wordlist and from any deeper memory-levels of PT8?

Paratext by (855 points)
reshown

3 Answers

0 votes
Best answer

@anon291708 I am still very thankful for your previous input which helped me find a solution. But with your last reply you gave me some sleepness nights. It should not happen that I misunderstand how an important tool for my work operates. I do not mind challenges or corrections if it helps the work.

This is what I understood so far (for projects which are run in stem-based setup):

Greek (or Hebrew) is the basis. In the Biblical Terms tool, we can record one or more translation renderings. Since Greek grammar works different from target language, we keep nice clean information in the renderings window (stems). For example “mother”. I believe the Biblical Terms window is the place where a consultant would go, or where team-members would go and look at difficult cases.

And the Wordlist with its morphology-magic is linking to all other occurences of the same Greek term, which are looking different in the target language, for example “mymother”, “yallsmother”, “themother”, “motherly”, “motherness”. I believe the Wordlist is useful but not muchly suited for human consumption, exept for users who have a firm grasp and enjoyment of regexes.

Since I really need to figure this out, I just spent some time in the inbuilt help and found this:

                Using Match based on stems - Demands most computing power, most accurate results
             
                ...
                In either the Wordlist or the Interlinearizer, specify morphology for a word.
                In the Biblical Terms tool, approve the stem of the word as a rendering for a Biblical Term. If other words contain the stem you just approved and those words have their morphology specified, Paratext also approves them as renderings for the Biblical Term. For example: Specify morphology for baptized (baptize +d) and baptizes (baptize +s). Approve baptize as a rendering for βαπτίζω. Paratext also approves baptized and baptizes as renderings for βαπτίζω.

In the above quote, I tried to put emphasis on “In the Biblical Terms tool, approve the stem of the word as a rendering” but did not manage because it is in the middle of a block-quote. It feels to me I am getting conflicting input about how to handle renderings in a stem-based setup, in the context of certain key terms having several different morphological shapes and possibly no (or very few) pure-stem appearances in the text.

I love the way how the Biblical Term tool looks much cleaner and much more “like the actual” language, when using stem-based as compared to a setup with lots and lots of affixes and wildcards. Typically there are only a handful of renderings, depending of type of word. Verbs typically show a naked stem, an infinitv form and an irrealis form. Nouns show a singular and a plural form but no possessed forms.

I would prefer to keep the concerned project this way, because it feels the most logical. In Flex we run it somewhat the same: The dictionary is roughly the equivalent of the renderings window, having the “pure data”. And the word analysis list is the equivalent of the PT Wordlist, showing all the (many) occurrences out there in the wild (with their respective analysis or morphology).

If I really got it wrong, please somebody help me see my errors or show me how it means a risk or can trigger problems in PT8. I would gladly get help and take this deeper - and no problem to take it off this forum if too specific and not helpful to other users.

by (855 points)
reshown

I agree, @Tim. My understanding is that the stems should be put into the Biblical Terms tool as renderings (as the Help article clearly states). If the help was to be stated the way that @anon291708 describes the tool, perhaps it would say something like, “In the Wordlist, approve the stem of the Biblical Tool rendering as a word.”

In my understanding, inflected forms in the project’s renderings are errors; I instruct teams to fix those errors by removing genitive endings, plural markers, etc., from their renderings.

True, as @anon291708 has mentioned, the tool will work by extracting perceived stems (i.e., analyzed by PT) from renderings given. (That was new to me, but it does work.) However, as a best practice, this should be avoided, especially for thoroughly amalgamating languages. The reason, in my opinion, is that it creates more work (and/or confusion) later.

In the process of making the vocabulary consistent in a text, the person making sure renderings are “found” in the text does need to have an understanding of why certain renderings are still “missing” (whether or not that person is also one who is entering renderings). In order to solve that puzzle, if the rendering has been entered as a stem, that person will only need to look up the Wordlist morphology entry of the offending term in the text. If, however, an inflected form has been entered as the rendering, the person trying to make sure all renderings are “found” would have to first look up the rendering’s morphology in the Wordlist, and then check the Wordlist morphology entry of the offending term in the text.

Also, checking key terms in a list of variously inflected renderings would be very distracting, compared to having only stems listed in the renderings column.

@anon291708, are you saying that Paratext double-checks to make sure that every word marked as a stem in the Wordlist’s morphology actually exists as a stand-alone word in the text (and then, if a stem is not found in the text, it is rejected as being an inaccurate stem). If that’s the case, then I definitely agree with @Tim:

Inflected forms DO have a legitimate use as a rendering even if you are normally using stems and have your affixes listed. They should be used when they impact the meaning of the biblical term in question. For example, if an affix indicates a person or a location, you should use the properly inflected form for Moabite (a person) or Moab (a location). This could be especially important in other biblical terms. So the decision on whether to use a stem or an inflected form is made on a case by case basis. (IMHO)

Blessings,

Shegnada James

Partatext takes the word entered into the renderings field and looks it up in the Wordlist morphology list. If it is not found, then it can not do stem-based matching. Thus, if the word you entered into the renderings is actually a stem that never occurs as a word, it can not find matching morphology in the Wordlist.

I’m not a translator and do not know a lot about the translation process, I’m just telling you how it is currently designed to work based on what the code actually does.

Great point, @Shegnada. But I’m now wondering if “match based on stems” undoes that distinction in the actual process of finding renderings. It seems that the inflected forms will not be what is “found” in the text; the tool will find and highlight any word that shares a stem with that inflected form.

@anon291708, if a project is set to have “match based on stems” turned on, would the rendering “Moabite” find “Moab” if the morphology for “Moabite” was “Moab +ite”?

Phil, let me try to explain how I understand it. In the Hebrew Biblical Terms, Moab is a different entry from Moabite, corresponding to English Moab (name of individual or country) and Moabite (title). You could give the stem Moab in both entries, and verses with either Moab or Moabite would be considered as found as long as Moabite is broken down as Moab + ite. If you want to maintain the distinction between the two, the rendering for Moab would be Moab, and for Moabite it would be Moabite. Forms like Moabites would be broken down as Moabite +s in the morphology. In our language we call a Moabite a person from Moab, so we would use Moab as the stem for both entries.

Yes.

What I was attempting to say (and what @Tim’s problem was) is that Paratext does not currently find stems unless that stem appears as a word somewhere in the project.

Thank you for the clarifications.

This I do not understand. Let me take an example. For one Biblical term I have the stems kat, kāt and tāloo. If I take any of these stems and search all books in the project restricted to whole words, there are no hits at all, because these are stems not words. They never occur as whole words in the project, nor should they occur. In the Biblical Terms window, if the cursor hover over the last column with the 3 stems, I can see that the stem tāloo occurs in the word tāloosyēēt two times. The stem kāt occurs in six different words, and the stem kat in two words (kuukataak and akatakiisye both occur in the same verse). The total number of occurrences (10) for these 3 stems is not the same as the number of verses (9) in the counts column, since the same stem may occur more than once in a verse, and there may be verses where none of them occur, even if the Hebrew word is there.

I think that the conversation may be about two different things, one of which I am not familiar with. Iver+Larsen is talking about the part I use regularly. You can put a stem or an inflected word into list as a rendering and Paratext will recognize both or either. To recognize the stem, the relevant affixes must be listed in the Biblical Terms/Tools/Affixes menu. You would put the stem when the affixes are not relevant to the specific meaning of the Biblical terms. It does save a lot of time.

anon291708 is talking about a different function of the Biblical Terms tool but I’ve lost the thread and am not sure which. Perhaps when guessing renderings, when Paratext guesses stems and adds an asterisk to indicate affixes? If not, I’m interested to hear what it is.

Blessings,

Shegnada James

Language Technology and Publishing Coordinator, SIL Nigeria

Text Processing Specialist – Complex Script, GPS, SIL Intl

Skype: Shegnada.james.

[Email Removed]

+1 972 974 8146

Just to complement what Shegnada says. I am using Matched based on stems, so the affixes that may or may not be listed are ignored.
The following is from the Helps:
On the lower edge of the Biblical Terms tool, if Match based on stems: ON appears:
Paratext will ignore any Word Prefixes and Suffixes specified in the Tools menu of the Biblical Terms tool.
The matching process will take longer, perhaps twice as long on the average.

              **Advantages and Disadvantages of using Match based on stems**
              The advantage of using Match based on stems is the high degree of accuracy of the results. The biggest disadvantage of using Match based on stems is that specifying morphology can be as much work as entering all the specific word forms individually.

End of quote.
Yes, specifying morphology was a huge task, but it helped a lot in correcting wrong spellings for this very complex language where translators would in the beginning quite often mark a wrongly spelled word as correctly spelled. I have so far done the morphology breakdown for the 20404 words in NT and Psalms. Most word forms only occur once.
We started out by entering prefixes and suffixes, but when we reached 41 prefixes and 82 suffixes we gave up and went to Based on Stems, since this works better for languages with complex morphology.

I’m just telling you what the code seems to do. If it’s finding a match for a stem it must have found a matching word - maybe a stem was entered into the text as a “word” at one point and since deleted. You could open the Wordlist and click on View > Show reviewed words which no longer appear in project to see if the stems were words at one point in time. Otherwise, I’m not sure what it going on either.

Ok, I have now looked at the words that are no longer in the project, and the stems have never been entered as complete words, because they do not exist as words. However, it is not a problem for me, since the program does what I need it to do. It finds all the words that include the stem in question with one or more affixes attached. It would be a problem if it behaved as you think it should behave. For instance, some suffixes change the vowels inside the root, so that I need to give both kat and kāt as stems, even though kāt could never occur alone without the suffix that has changed the vowel.

Yes, @anon291708 has clearly understood our problem. I still claim, that this is unexpected behaviour from a tool which is doing powerful morphology. You can see my personal workaround in my previous post from 3rd April 2019.

Still, it should be relatively easy for the devs to fix this issue, that stems which a user has provided indirectly are presently not considered. Otherwise there should be a clear message (on mouse-over) why no hits are being found.

(user enters “my+ mother” so PT8 should internally note “mother” as a valid stem)

People, this has become a very good thread. I am learning a lot, even if some stuff is not precisely focused on my initial problem. So thank you all for your input and for sharing from your respective work flows.

+1 vote

A quick question (first thought that came to my mind): Is gʊbɔrɩfɔ in the wordlist as a separate entry (i.e. a word) or is it only ever a stem? If it’s only ever a stem, then you might need to put a word containing that stem into the renderings window instead of just the stem.

If that doesn’t seem to be the issue, then I think the best bet would be to send in a problem report so we can check the project out and see what is going on.

by [Expert]
(16.2k points)

Out there in the wild, the word “flocks” does exist on its own (gʊbɔrɩfɔ). But so far we do not have it in the wordlist by itself, because it is rare in Scripture.

Are you saying one of our team members giving the morphology is “not enough”, and PT8 cannot figure out from ba+ gʊbɔrɩfɔ that gʊbɔrɩfɔ is a valid stem?

How and why would I put “a word containing that stem” into the renderings-window? That would be defeating the entire setup as being stem-based and managed-from-the-wordlist. Imagine that later there will be a verse, where the translaters will use the stem by itself. Then nothing would trigger a note that “the hack” can be undone for bagʊbɔrɩfɔ.

Still, your question, has given me another idea: Since we are using the book XXA to store extra words for hyphenation-info management, even beyond scripture, I could just enter the stem, i.e. the word gʊbɔrɩfɔ (flocks). And that brought it into the wordlist.

And indeed: the issue of the non-rendering has gone away. I am glad for this solution.

But this is totally unexpected behaviour!

If PT8 needs each stem listed by itself (and what about each affix??), then that should be very clearly documented. And there should be a “normal” way to bring such “helper stems” into the wordlist. In this language (and possibly in many other languages) there are certain words which will almost always carry a possessive or other prefix. Like (mymother, yourmother, ourmother). The naked mother is so rare, that a user will certainly stumble over the same problem as I just did, because statistically a normal related mother will be entered sooner than the rare pure mother.

Maybe I am missing something. I will probably never forget this little adventure ever. But other users should be warned.

I think you misunderstand how Match based on stems works in the Biblical Terms tool. It is designed so you can just find a word in the text that is used for the term, and you can use that word as your rendering. Paratext then uses the morphology to look at the stem of that word to find words in other verses that have the same stem. It is not designed to require the user to know the stem of the word they are wanting to use as a rendering (since that is defined in the Wordlist - maybe by a completely different person). Thus it is actually slightly incorrect to only put in the stem of the word into the renderings list (especially if it never appears as a “word” in the text).

0 votes

Working with a language with very complex morphology, I have found the stem-based Biblical Terms very useful. In the Renderings box I can get by with 1 to 6 different stems, most of which never occur as a word on its own. That is not a problem, since PT looks for stems in the Wordlist rather than words. These few stems cover dozens of different verb and noun forms. For instance, the stem for resurrect covers 21 different verb forms. Before using the stem-based option, we listed dozens of prefixes and suffixes and combinations, but it is much easier to just list the stem. However, the morphological breakdown needs to be correct in the Wordlist so that the stem can be found. The program may guess a breakdown, but that guess is usually wrong for our language, so it needs to be corrected. Sometimes we have the same rendering for different Greek or Hebrew words. Sometimes one Greek word needs several stems in the language. It would be nice if the morphology column could be sorted according to which breakdowns have been approved as correct or are only guessed by PT. Wading through 20.000 words to spot those that are not yet approved is cumbersome. One problem I have is that none of the translators in the current project understand the morphology well enough to make these breakdowns, so I need to do them.

by (869 points)

Related questions

0 votes
3 answers
0 votes
3 answers
Paratext Nov 17, 2018 asked by Paul (615 points)
0 votes
2 answers
Welcome to Support Bible, where you can ask questions and receive answers from other members of the community.
And over all these virtues put on love, which binds them all together in perfect unity.
Colossians 3:14
2,628 questions
5,369 answers
5,045 comments
1,420 users