Yes, I agree with Matthew_Lee that this is an important issue, especially in the French-speaking world.There are a number of things I want to mention in my analysis, but I will try to summarize (TLDR) at the bottom of this post.
A little online research shows some interesting things, which are not really “beside the point”:
And some humorous failures (where they obviously were using a normal space - that broke in this case):
That’s exactly what we want to avoid - bits of punctuation not connected to its associated text. So if we are going to use a some sort of space character to set off punctuation, we must ALWAYS use a non-breaking space of some sort.
The main two options are the full No-Break Space (NBSP, U+00A0), or the Narrow No-Break Space (NNBSP, U+202F), whose definitions can be found in the Unicode standard, at https://unicode.org/charts/PDF/U0090.pdf and https://unicode.org/charts/PDF/U2000.pdf, respectively:
As you can see, the definition of the NNBSP says it is typically the width of a thin space, which is defined in that same chart as :
So a NNBSP would typically be a fifth of an em (0.2em). How big is a normal space or a NBSP? These metrics are font-dependent, but a rough calculation with the Charis SIL font shows that the space and NBSP characters are about 0.34em. The NNBSP is about 0.22em. This is a significant difference, and if you use a NBSP (or as a temporary measure a regular space, which has the disadvantage of breaking across lines) around punctuation, the typesetters I know will say that that space is too large. Using the NNBSP helps significantly, and can be done fairly easily in PTXprint with changes like the following lines in PrintDraftChanges.txt:
' *:' > '\u202f:' # Place non-breaking thin space before colon
'« *' > '«\u202f' # Place non-breaking thin space after opening guillemets
' *»' > '\u202f»' # Place non-breaking thin space before closing guillemets
'‹ *' > '‹\u202f' # Place non-breaking thin space after opening guillemets
' *›' > '\u202f›' # Place non-breaking thin space before closing guillemets
This puts a NNBSP before or after (as necessary) the punctuation, and also removes any spaces that are there (if any). That means that whether or not the team puts in spaces, they will be normalized to NNBSP characters. E.g. in this project the team is inconsistent and uses (regular) spaces around question marks and colons, but not around quote marks (guillemets):
image945×67 35 KB
Note that you can see that these are just regular spaces if you adjust the zoom and/or pane size just right, as they will allow a break across a line, like this:
image755×114 48.8 KB
But the changes above should be able to handle both those cases OK, and insert the NNBSP for the typesetting.
In a similar way, you would want to put change rules in your SAB projects, to make sure that your Scripture apps handle the spaces properly. Check out this post for sample rules: https://community.scripture.software.sil.org/t/suggestions-for-changes-gallery/590/3.
Note that the rules in this post do not handle the space or no-space as elegantly as the rules above, but you can adjust them with tricks like the " *" used above.
And one further point before we get to Paratext… In recent typesetting jobs we have actually used one tenth of an em (0.1em) as space around the punctuation, i.e. smaller than NNBSP. Here is the punctuation definition we used:
\catcode`\:=\active \def:{\unskip\kern0.1em\char`\:{}} % colon
Note: this was done in XeTeX, but the same could be done with PTXprint. I believe you would want it defined in the ptxprint-mods.tex configuration file available on the Advanced tool tab. This gives a fairly minimal space around the punctuation, as seen in this sample:
image752×102 11.5 KB
But the teams have felt that that is sufficient space to meet their felt need of space around the punctuation that is required in French. (Of course, the French may disagree, but it’s not their language!)
Conclusion (TLDR): So what does this mean for Paratext?
If the team uses regular spaces in the text to offset their punctuation, then sometimes it will appear incorrectly on their screen in Paratext (i.e. with punctuation not properly attached to its text, as shown above), which is distracting but not the end of the world. In this case, the onus is on the typesetter or app builder to change those regular spaces appropriately. Unfortunately, if this is the form that is put into the DBL (highly likely), then apps like YouVersion are going to have problems, because they notoriously DON’T handle those spaces appropriately.
Given this tendency to smaller and smaller no-break spaces to set off the punctuation (first the NNBSP at 0.2em, then manually typesetting at 0.1em with PTXprint) that I’ve seen in my typesetting projects, I almost always recommend that teams put NO spaces around their punctuation in Paratext, and then just trust the typesetting or app building to do the right thing around those punctuation marks. This means that when the text is put into the DBL, YouVersion is not going to have any hanging puctuation. (It won’t have spaces around the punctuation either, but that is a lesser problem IMHO.)
So with this specific plan of action, no changes are required in Paratext. If you wanted, as Matthew_Lee suggested, a way to show NNBSP or NBSP characters, I think that would be a good idea, but our keyboards would also need a way to type those characters (which they don’t always), and Paratext would need to know not to mess with those characters. (And punctuation inventories would need to show all of the combinations with those spaces, to make sure that they were being used consistently, e.g. always with a NNBSP.)
This post isn’t so much proposing solutions as providing more background and information. I really don’t like the way this tilde / NBSP stuff works in Paratext now, and agree that it should change. It seems like Paratext should assume that it should take every character in the text at face value, whether it is a tilde, NBSP or NNBSP. And a way to see them (subtly) would be nice. Should two or more spaces be automatically combined (responding to @anon942452)? Maybe if they are identical characters? That would still allow the automated Paratext spacing fix, but also provide some options for getting around it. And one would also need to come up with a way to deal with all of the legacy projects that have tildes for non-breaking spaces, maybe just a conversion, to convert them all to NBSP, once that’s handled properly in Paratext.
Anyway, some more food for thought…