Jean Chalard | a424ff0 | 2012-10-31 17:34:47 +0900 | [diff] [blame] | 1 | # This is a sample wordlist that can be converted to a binary dictionary |
| 2 | # for use by the Latin IME. |
| 3 | # The file is essentially a CSV file, with indent level denoting nesting. |
| 4 | # |
| 5 | # The file starts with a single CSV line with the header attributes. Whatever |
| 6 | # the content, these are included as is in the binary file. The first attribute |
| 7 | # of the file should be `dictionary'. Usual fields are `locale', `description', |
| 8 | # `date', `version', `options'. |
| 9 | # |
| 10 | # Each word has a `word' entry and at least a `f' argument denoting its |
| 11 | # probability, as an integer between 0 and 255 on a logarithmic scale, with |
| 12 | # 255 meaning 1 and each decrement in 1 dividing probability by 1.15. |
| 13 | # As a special case, a weight of 0 is taken to mean profanity - words that |
| 14 | # should not be considered a typo, but that should never be suggested |
| 15 | # explicitly. An entry may be made not a word by adding a `not_a_word' |
| 16 | # field with a value of `true'. The main reason for putting such entries |
Adrian Roos | 444da56 | 2020-08-12 13:06:32 +0200 | [diff] [blame] | 17 | # into the dictionary is to add shortcut targets and maybe an allowlist |
Jean Chalard | a424ff0 | 2012-10-31 17:34:47 +0900 | [diff] [blame] | 18 | # replacement. |
| 19 | # |
| 20 | # Each word may or may not have any number of shortcut target lines |
| 21 | # starting with a `shortcut' entry and having at least a `f' frequency |
| 22 | # value between 0 and 14, or the special value `whitelist' which becomes |
| 23 | # 15, which is then taken to be the whitelist target of this word. |
| 24 | # |
| 25 | # Each word may also have any number of bigram lines starting with a |
| 26 | # `bigram' entry containing the following word whose frequency should |
| 27 | # override the unigram frequency when following the word this bigram is |
| 28 | # for. |
| 29 | # |
| 30 | dictionary=main:en,locale=en,description=Sample wordlist,date=1351495318,version=1 |
| 31 | word=sample,f=200 |
| 32 | bigram=wordlist,f=243 |
| 33 | word=wordlist,f=180 |
| 34 | word=shortcut,f=176 |
| 35 | shortcut=target,f=10 |
| 36 | word=witelisted,f=10,not_a_word=true |
| 37 | shortcut=whitelisted,f=whitelist |
| 38 | word=profanity,f=0 |