I’m having a problem with text presentation of a resource in a Text Collection. To see this yourself, download the resource ShuLatn, language Arabic (shu-Latn), and add it to a Text Collection window. Click on the blue ShuLatn link in the list of texts, and the expanded text that appears to the right no longer has hyphens:

All of these are normal U+002D hyphen characters. You can see that it hasn’t just completely removed the hyphens, since for example al-makruubiin is still broken across a line. But the hyphen is missing.

I see in the language properties that hyphen is listed as a word-breaking character (instead of maybe punctuation inside a word), but if that were the only problem, then at least when a word is broken across a line that hyphen should appear.

I don’t have direct access to this project (which is in the DBL), but if the solution requires a change to the project, I believe we could get it changed.

Actually, now that I read the word break definition in the Guide, I think this behavior makes sense:
“If the script of your project does not normally indicate word breaks with a space, enter the character here that you will use in your project text to indicate word breaks. Immediately following the character in the box, enter (00AD) – include the parentheses – if you want a hyphen to appear when a long line wraps.”

So I think I should remove the hyphen from the word break characters field. And my guess is that if I add it to the punctuation inside a word field, it would consider as one word “l-iid” and “al-waali”. It might be better just to not put it anywhere, and I imagine then it would just be treated as punctuation. (It’s not in the alphabetical characters list.) Does this sound like an accurate analysis?

jeffh - if you want the hyphen to always appear in the word then you need to add it in the box for Word Medial Punctuation. If you don’t include it in Word Medial Punctuation then you will not see these words in the word list.

In this language, the hyphen basically separates two words, but that’s the way they are written in the orthography. E.g. “al-raajil” means “the man”. So having “al” as a separate word from “raajil” in the word list shouldn’t really be a problem.

To my knowledge the only orthographies that need these characters are in SE Asia (Thai, Lao Burmese and Khmer scripts). Perhaps there should be a warning in the language settings that if a character is put in the Word Break Character field, it will not be visible in Preview mode or in typesets.

