+1 vote

Exporting the Wordlist to HTML is a nice feature, except that for many users including me, working with HTML is one of the “mysteries of the universe.” I would very much like to have a list in plain text, that could be used as the basis for a dictionary in Word, or for other such features. Anyone have a process to get a plain text listing of words that are currently marked as spelled correctly in PT?

anon101508

Paratext by (117 points)
reshown

3 Answers

+1 vote
Best answer

All possible disclaimers apply, these are old notes, you will destroy your data, only do this if you can handle all the consequences blablabla:

Have a new sheet in LibreOffice Calc.

Create a working copy of the spelling file from PT, normally found in

…path to PT projects\your-lang-CODE\SpellingStatus.xml

and put it somewhere out of the PT folders structure.

In Calc open the SpellingStatus.xml by > menu > Data > XML Source…

If XML Source is greyed out (unavailable) it needs to be activated:
menu > Tools > Options > LibreOffice > Advanced > tick box at the bottom of window: “Enable experimental features”.

In the little import window click in the left “Map to Document” pane on “Word” and five lines get selected in blue.

In “Mapped cell” fill in $Sheet1.$A$1 or use the clicker. No worries, you only need to provide one cell, even if you have got lots of spelling data; it will spill over.

Hit Import.

Now you got nice pure language data.

You can have all correct words nicely listed like this:

Select the entire sheet (click top left) and then unselect the top row with the headings.

menu > Data > Sort…

Sort Key 1 must be the column B (for State)

Sort Key 2 is optional but column A (Word) is helpful

Hit OK.

The NNNNNN project has got many proper names with Specific Case (and a few sentence-initial “false hits”) and with this trick, you can pull them all into a free column:

just go to the top free cell, should be normally E2 and enter this formula: =IF(D2<>"";D2;A2)

the formula is simple: if there is a not-empty cell for Special Case it gets copied, otherwise the default word from column A gets copied.

Now select your top-cell (E2) and select downwards (for example with Shift+arrow key or Shift+page-down) as many rows as you have correct words. Then do menu > Sheet > Fill Cells > Down

Now you got lots of correct words in Column E for whatever use (spice into your Hunspell file, but adapt the line-count and remove doubles).

Note: I have not spent time investigating whether it is worth harvesting the “corrections” from the wrong words. Those are also all good words, but I suspect they are all listed under good words anyway. To be checked.

If this does not make sense to you, please just forget, presently I have no time to do “support” for this little hack. Those are basically notes to myself from a while back. If you have a better solution please let us know here.

anon334662

by (842 points)
+1 vote

Do you mean exporting as XML (not HTML)? That’s the only option I see.

Typically I just open up the resulting .xml file and do find/replace to get rid off all the information I don’t want cluttering up the text. With regular expressions you can find and delete more complex kinds of searches.

So, to get the list of words that are spelled correctly, I’d search for ^.*?Incorrect.*?$ and replace it with nothing (i.e. delete it). Similarly search and delete ^.*?Unknown.*?$ (there’s unquestionably a way to do this in one search, but I general use what I know and don’t do the research to find how to search for “anything but Correct”).

I would then use find/replace to delete information I don’t need, like hyphenation status and maybe the count.

by (1.6k points)
reshown
+1 vote

anon101508, if you export the xml then tell Excel to open the file and select ‘As an xml table’ it presents a pretty reasonable table from which I would think you could sort and select the columns you want to put into Word.

JG

by (218 points)
reshown

That’s much better and easier. Just did it two days ago, but I had hands on help from someone much more experienced in this area.

Related questions

Welcome to Support Bible, where you can ask questions and receive answers from other members of the community.
There is neither Jew nor Gentile, neither slave nor free, nor is there male and female, for you are all one in Christ Jesus.
Galatians 3:28
2,476 questions
5,170 answers
4,866 comments
1,283 users