Malformed interlinear layout related to \qt

Question

Supporting an interlinear layout (Greek + Urdu glosses). In some cases (not all), poetic texts which have some text marked with \qt are malformed in the output from PTXPrint.

For example, in the following section from MRK 11:9:

In Paratext, the Greek text is marked as follows:

In the interlinearizer window, it appears as follows:

In the layout from PTXPrint:

There are other locations where the \q1, \q2 and \qt …\qt* combinations seem to be working OK.

I’ve tried to investigate what might be the cause – but being an interlinear, some of the process for how PTXPrint merges the texts and performs the layout is somewhat opaque/unclear. I could use some guidance about how to investigate this, or what might be going wrong.

Any help appreciated.

PTXprint Jul 13, 2023 asked by [Expert]

jmkla (285 points)

5 Answers

Mark P · Answer 1 · 2023-07-14T00:39:48+0000

@jklassen Thanks for what you’re doing to support other PTXprint users as they tackle some interesting issues with formatting of RTL + LTR interlinear texts.

I don’t yet have any answers for you on this question, and both the more technical developers are currently either in transition, or on vacation right now; so it might be a few more days to get a response from them. However, to help them dig deeper, I know it would be easiest if they have an Archive file to work from, so please go ahead and create that (from the Help tab) and then send the resulting .zip file to the [Email Removed] address. I’ll make sure at least one of them gets to dig deeper for you when they are back online.

DG · Answer 2 · 2023-07-20T09:52:40+0000

Hi, When I try to run the archive, I’m getting this error report:

Which isn’t a message you can ignore. Thus, for instance, I see that at 5:23, this happens:

The problem is that without the information that Paratext saves when you give approval to the interlinears, there’s just not enough information for PTXprint to try to synchronise the two lines, so it gets very confused.
I expect that if you resolve that, then your layout problems will go away.

@mjpenny Can I suggest that the error message get expanded to say something like ‘Failure to approve interlinear texts will result in layout errors’

DG · Answer 3 · 2023-07-21T12:22:21+0000

I forgot to mention, I did look for \qts elsewhere in the text, and found quite a few places where the interlinear was merged correctly

jmkla · Answer 4 · 2023-07-21T13:02:09+0000

Ok. Well, I was not meaning to claim that every \qt was not working, just that where it was not, there is a \qt.

I’m not sure how to get to the cause of the failed interlinear layout, David. If it’s a fault in the text or the configuration, I have not been able to identify it, yet. I’ll keep looking myself.

mhosken · Answer 5 · 2023-07-26T10:07:39+0000

The problem is, I believe, what Paratext’s XML specifies and what it doesn’t. It specifies chunk of text, gloss-id (cross-referencing the Lexicon.xml file) and the position of the original text in some representatoin of the input stream. It does not preserve the order of occurrence. The verses are not in sequence, either.

A given word or phrase may be glossed as word(s) or stem and morphemes, (of course with different glosses) and the only way to tell which homograph is which or which level of glossing the user expects is via the positioning data. The glossing file seems to ignore case, punctuation and SFM marks, and probably other things too.

Thus if the glossing file matches the input, then assuming that Martin has understood the undocumented way in which Paratext counts characters in its internal representation of the file, then the right thing to do is cut out the word/phrase unit by position and replace with the gloss, possibly doing some case transposition as appropriate.

If the verse has changed, then that doesn’t work. If the internal representation of the file is wrong, then it doesn’t work.

Any unconstrained search and replace is going to get false matches on homographs, so that can’t be used as a general approach. There could be some guessing done, looking for fuzzy-matches plus or minus a few characters, breaking at word-gaps, for instance. That would cost a fairly large investment of programmer time, but there could be problems. If the spacing has changed because someone has reordered the words in the verse, the homograph offset by 3 characters might not be the same word. E.g. Romani has fairly free word order:

 O    Del  te  del tut haro
(the) God SUBJ give you grace
"May God give you grace"

vs

 Te  del   o   Del  tut haro
SUBJ give (the) God you grace
"May God give you grace"

PTXprint can’t go into the linguistics rules of the language. Not even Paratext remembers / asks what parts of speech things are. The only reliable way to get trustworthy interlinear data is for the user to approve glosses.

Jul 27, 2023 commented by DG (907 points)

Malformed interlinear layout related to \qt

Please log in or register to answer this question.

5 Answers

Please log in or register to add a comment.

Please log in or register to add a comment.

Please log in or register to add a comment.

Please log in or register to add a comment.

Please log in or register to add a comment.

Related questions

Categories