0 votes

Project turned over to “match based on stems” a few weeks ago, after helpful input from this forum.

The transition is still creating some work, but we will not look back. The rendering are much more human-friendly now (for visiting consultants and team members) and there is not really more work for new entries.

Now one term is “messed up”. In the text it shows as bagʊbɔrɩfɔ (their flocks) and in the wordlist it is entered as ba+ gʊbɔrɩfɔ and in the rendering window as gʊbɔrɩfɔ. So it should create a hit. But does not. I tried misc tricks from a few decades of computer use, like delete and enter again. Try without the possessive (works), try the combined for in the rendering window (works too). So the morpheme setup is messed up somehow. In the interlinearizer is looks correct though, no additional word parses or other complications.

Now I am asking for ideas on how to tackle this very isolated bug please. All other terms are behaving as expected and this one does not. I hope it is just something like an accidental inserted invisible zero-width-space or something, but my first rounds of debugging did not show anything the like.

How for example would I be able to “totally delete” a word from the main text and from the wordlist and from any deeper memory-levels of PT8?

Paratext by (842 points)
reshown

3 Answers

0 votes
Best answer

@anon291708 I am still very thankful for your previous input which helped me find a solution. But with your last reply you gave me some sleepness nights. It should not happen that I misunderstand how an important tool for my work operates. I do not mind challenges or corrections if it helps the work.

This is what I understood so far (for projects which are run in stem-based setup):

Greek (or Hebrew) is the basis. In the Biblical Terms tool, we can record one or more translation renderings. Since Greek grammar works different from target language, we keep nice clean information in the renderings window (stems). For example “mother”. I believe the Biblical Terms window is the place where a consultant would go, or where team-members would go and look at difficult cases.

And the Wordlist with its morphology-magic is linking to all other occurences of the same Greek term, which are looking different in the target language, for example “mymother”, “yallsmother”, “themother”, “motherly”, “motherness”. I believe the Wordlist is useful but not muchly suited for human consumption, exept for users who have a firm grasp and enjoyment of regexes.

Since I really need to figure this out, I just spent some time in the inbuilt help and found this:

                Using Match based on stems - Demands most computing power, most accurate results
             
                ...
                In either the Wordlist or the Interlinearizer, specify morphology for a word.
                In the Biblical Terms tool, approve the stem of the word as a rendering for a Biblical Term. If other words contain the stem you just approved and those words have their morphology specified, Paratext also approves them as renderings for the Biblical Term. For example: Specify morphology for baptized (baptize +d) and baptizes (baptize +s). Approve baptize as a rendering for βαπτίζω. Paratext also approves baptized and baptizes as renderings for βαπτίζω.

In the above quote, I tried to put emphasis on “In the Biblical Terms tool, approve the stem of the word as a rendering” but did not manage because it is in the middle of a block-quote. It feels to me I am getting conflicting input about how to handle renderings in a stem-based setup, in the context of certain key terms having several different morphological shapes and possibly no (or very few) pure-stem appearances in the text.

I love the way how the Biblical Term tool looks much cleaner and much more “like the actual” language, when using stem-based as compared to a setup with lots and lots of affixes and wildcards. Typically there are only a handful of renderings, depending of type of word. Verbs typically show a naked stem, an infinitv form and an irrealis form. Nouns show a singular and a plural form but no possessed forms.

I would prefer to keep the concerned project this way, because it feels the most logical. In Flex we run it somewhat the same: The dictionary is roughly the equivalent of the renderings window, having the “pure data”. And the word analysis list is the equivalent of the PT Wordlist, showing all the (many) occurrences out there in the wild (with their respective analysis or morphology).

If I really got it wrong, please somebody help me see my errors or show me how it means a risk or can trigger problems in PT8. I would gladly get help and take this deeper - and no problem to take it off this forum if too specific and not helpful to other users.

by (842 points)
reshown

I think that the conversation may be about two different things, one of which I am not familiar with. Iver+Larsen is talking about the part I use regularly. You can put a stem or an inflected word into list as a rendering and Paratext will recognize both or either. To recognize the stem, the relevant affixes must be listed in the Biblical Terms/Tools/Affixes menu. You would put the stem when the affixes are not relevant to the specific meaning of the Biblical terms. It does save a lot of time.

anon291708 is talking about a different function of the Biblical Terms tool but I’ve lost the thread and am not sure which. Perhaps when guessing renderings, when Paratext guesses stems and adds an asterisk to indicate affixes? If not, I’m interested to hear what it is.

Blessings,

Shegnada James

Language Technology and Publishing Coordinator, SIL Nigeria

Text Processing Specialist – Complex Script, GPS, SIL Intl

Skype: Shegnada.james.

[Email Removed]

+1 972 974 8146

Just to complement what Shegnada says. I am using Matched based on stems, so the affixes that may or may not be listed are ignored.
The following is from the Helps:
On the lower edge of the Biblical Terms tool, if Match based on stems: ON appears:
Paratext will ignore any Word Prefixes and Suffixes specified in the Tools menu of the Biblical Terms tool.
The matching process will take longer, perhaps twice as long on the average.

              **Advantages and Disadvantages of using Match based on stems**
              The advantage of using Match based on stems is the high degree of accuracy of the results. The biggest disadvantage of using Match based on stems is that specifying morphology can be as much work as entering all the specific word forms individually.

End of quote.
Yes, specifying morphology was a huge task, but it helped a lot in correcting wrong spellings for this very complex language where translators would in the beginning quite often mark a wrongly spelled word as correctly spelled. I have so far done the morphology breakdown for the 20404 words in NT and Psalms. Most word forms only occur once.
We started out by entering prefixes and suffixes, but when we reached 41 prefixes and 82 suffixes we gave up and went to Based on Stems, since this works better for languages with complex morphology.

I’m just telling you what the code seems to do. If it’s finding a match for a stem it must have found a matching word - maybe a stem was entered into the text as a “word” at one point and since deleted. You could open the Wordlist and click on View > Show reviewed words which no longer appear in project to see if the stems were words at one point in time. Otherwise, I’m not sure what it going on either.

Ok, I have now looked at the words that are no longer in the project, and the stems have never been entered as complete words, because they do not exist as words. However, it is not a problem for me, since the program does what I need it to do. It finds all the words that include the stem in question with one or more affixes attached. It would be a problem if it behaved as you think it should behave. For instance, some suffixes change the vowels inside the root, so that I need to give both kat and kāt as stems, even though kāt could never occur alone without the suffix that has changed the vowel.

Yes, @anon291708 has clearly understood our problem. I still claim, that this is unexpected behaviour from a tool which is doing powerful morphology. You can see my personal workaround in my previous post from 3rd April 2019.

Still, it should be relatively easy for the devs to fix this issue, that stems which a user has provided indirectly are presently not considered. Otherwise there should be a clear message (on mouse-over) why no hits are being found.

(user enters “my+ mother” so PT8 should internally note “mother” as a valid stem)

People, this has become a very good thread. I am learning a lot, even if some stuff is not precisely focused on my initial problem. So thank you all for your input and for sharing from your respective work flows.

+1 vote

A quick question (first thought that came to my mind): Is gʊbɔrɩfɔ in the wordlist as a separate entry (i.e. a word) or is it only ever a stem? If it’s only ever a stem, then you might need to put a word containing that stem into the renderings window instead of just the stem.

If that doesn’t seem to be the issue, then I think the best bet would be to send in a problem report so we can check the project out and see what is going on.

by [Expert]
(16.2k points)

Out there in the wild, the word “flocks” does exist on its own (gʊbɔrɩfɔ). But so far we do not have it in the wordlist by itself, because it is rare in Scripture.

Are you saying one of our team members giving the morphology is “not enough”, and PT8 cannot figure out from ba+ gʊbɔrɩfɔ that gʊbɔrɩfɔ is a valid stem?

How and why would I put “a word containing that stem” into the renderings-window? That would be defeating the entire setup as being stem-based and managed-from-the-wordlist. Imagine that later there will be a verse, where the translaters will use the stem by itself. Then nothing would trigger a note that “the hack” can be undone for bagʊbɔrɩfɔ.

Still, your question, has given me another idea: Since we are using the book XXA to store extra words for hyphenation-info management, even beyond scripture, I could just enter the stem, i.e. the word gʊbɔrɩfɔ (flocks). And that brought it into the wordlist.

And indeed: the issue of the non-rendering has gone away. I am glad for this solution.

But this is totally unexpected behaviour!

If PT8 needs each stem listed by itself (and what about each affix??), then that should be very clearly documented. And there should be a “normal” way to bring such “helper stems” into the wordlist. In this language (and possibly in many other languages) there are certain words which will almost always carry a possessive or other prefix. Like (mymother, yourmother, ourmother). The naked mother is so rare, that a user will certainly stumble over the same problem as I just did, because statistically a normal related mother will be entered sooner than the rare pure mother.

Maybe I am missing something. I will probably never forget this little adventure ever. But other users should be warned.

I think you misunderstand how Match based on stems works in the Biblical Terms tool. It is designed so you can just find a word in the text that is used for the term, and you can use that word as your rendering. Paratext then uses the morphology to look at the stem of that word to find words in other verses that have the same stem. It is not designed to require the user to know the stem of the word they are wanting to use as a rendering (since that is defined in the Wordlist - maybe by a completely different person). Thus it is actually slightly incorrect to only put in the stem of the word into the renderings list (especially if it never appears as a “word” in the text).

0 votes

Working with a language with very complex morphology, I have found the stem-based Biblical Terms very useful. In the Renderings box I can get by with 1 to 6 different stems, most of which never occur as a word on its own. That is not a problem, since PT looks for stems in the Wordlist rather than words. These few stems cover dozens of different verb and noun forms. For instance, the stem for resurrect covers 21 different verb forms. Before using the stem-based option, we listed dozens of prefixes and suffixes and combinations, but it is much easier to just list the stem. However, the morphological breakdown needs to be correct in the Wordlist so that the stem can be found. The program may guess a breakdown, but that guess is usually wrong for our language, so it needs to be corrected. Sometimes we have the same rendering for different Greek or Hebrew words. Sometimes one Greek word needs several stems in the language. It would be nice if the morphology column could be sorted according to which breakdowns have been approved as correct or are only guessed by PT. Wading through 20.000 words to spot those that are not yet approved is cumbersome. One problem I have is that none of the translators in the current project understand the morphology well enough to make these breakdowns, so I need to do them.

by (869 points)

Related questions

0 votes
3 answers
0 votes
3 answers
Paratext Nov 17, 2018 asked by Paul (609 points)
Welcome to Support Bible, where you can ask questions and receive answers from other members of the community.
They devoted themselves to the apostles’ teaching and to fellowship, to the breaking of bread and to prayer.
Acts 2:42
2,476 questions
5,170 answers
4,866 comments
1,283 users