Understanding how different ways of encoding characters and representing words affect language projects

Years ago I remember seeing a Paratext newsletter with an announcement about a new feature that helps users visualize the Unicode codepoints in the text. It said something like “We know that there is often more than one combination of Unicode codepoints that can be used to represent a given character. Often, the way the character or word is composed depends a lot on the keyboard the translator is using and the particular keystrokes he used. If you’re having trouble finding words in search results or when trying to perform regex operations, you can use this tool we made for you to visualize how each word and character is represented underlyingly.”

I’m familiar with Alt+X to see the codepoints for a character, and also with Tools>Checking Inventories>Characters Inventory, but neither of these show entire words with their codepoints- that’s what I remember seeing in the printscreen in the announcement.

Does anyone know where to find this announcement or the feature it describes? I’ve searched around and can’t find it.

My ultimate goal is to better understand how different ways of encoding characters and representing words affect language projects. Be it through looking through accompanying documentation or talking with experts on the topic.

Thank you!

1 Answer

Understanding how different ways of encoding characters and representing words affect language projects

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Related questions

Categories