0 votes
I would like two morphemes* which are separated by a ~ to be treated as a single word. Is that possible?

This comes up particularly in the wordlist. If I find those two morphemes with no space between them, and mark them as incorrect, I cannot set the correct spelling as having the ~ in the middle (because the wordlist currently sees that as changing from one word to two).

[The 'find incorrectly joined or split words' tool helps to some extent when dealing with this, but doesn't fully solve the problem.]

* Okay, technically these are a bit more than morphemes, but I don't know the correct terminology. Maybe "morpheme clusters" or "word parts".
Paratext by (1.8k points)

1 Answer

0 votes
The tilde is reserved as a character to indicate a no-break space. Paratext sees this as two words that you want to keep together, but not as a single word.

One option would be to use (at least temporarily) a character like U223c (or some other unique character). You would need to add this to the word-medial characters.

This will cause the words to show up in the wordlist.

Once you are statsified that the spelling is handled correctly you could change the 223c back to the tilde.
by (8.4k points)
Thank you for your reply. I had hoped that there was some way to set the style of ~ to be word-medial, but I guess that's not possible.

Just to give some background, these are classes of words where at least some native speakers interpret them as separate words--hence the fact that there's a space between them. But from a linguistic standpoint I'd argue that they are single words, and there are reasons in PT to treat them as such.

Anyway, a couple of follow-up questions regarding the idea of using U223c.

1) I remember at some point in the past, doing a find/replace with ~, that things didn't work as I expected and regular spaces got added in where I wasn't expecting. As far as you know, if I do a global change U007e --> U223c and then later change them all back, should it round-trip without any changes?

2) Can you think of any downsides if I just changed all 007e characters into 223c (and set up a changes.txt rule to keep doing that in the future), and then only changed them back at the time of publication (using another changes rule)?
I just did a test in version 9.4 where I added a ~ (regular) after certain words. I then did a global replace of ~ for the U223c and verified that the words now show up in my wordlist (I had added the U223c as word-medial)

Then I did a global replace of U223c back to a ~ and did not encounter any problem.

I generally recommend that before doing anything radical like this you might want to create a copy of the project and test this yourself!
Welcome to Support Bible, where you can ask questions and receive answers from other members of the community.
Very truly I tell you, whoever accepts anyone I send accepts me; and whoever accepts me accepts the one who sent me.
John 13:20
2,626 questions
5,366 answers
5,041 comments
1,420 users