Bram Moolenaar | 13fcaaf | 2005-04-15 21:13:42 +0000 | [diff] [blame] | 1 | *spell.txt* For Vim version 7.0aa. Last change: 2005 Apr 15 |
Bram Moolenaar | 217ad92 | 2005-03-20 22:37:15 +0000 | [diff] [blame] | 2 | |
| 3 | |
| 4 | VIM REFERENCE MANUAL by Bram Moolenaar |
| 5 | |
| 6 | |
| 7 | Spell checking *spell* |
| 8 | |
| 9 | 1. Quick start |spell-quickstart| |
Bram Moolenaar | 13fcaaf | 2005-04-15 21:13:42 +0000 | [diff] [blame] | 10 | 2. Generating a spell file |spell-mkspell| |
| 11 | 9. Spell file format |spell-file-format| |
Bram Moolenaar | 217ad92 | 2005-03-20 22:37:15 +0000 | [diff] [blame] | 12 | |
| 13 | {Vi does not have any of these commands} |
| 14 | |
| 15 | Spell checking is not available when the |+syntax| feature has been disabled |
| 16 | at compile time. |
| 17 | |
| 18 | ============================================================================== |
| 19 | 1. Quick start *spell-quickstart* |
| 20 | |
| 21 | This command switches on spell checking: > |
| 22 | |
| 23 | :setlocal spell spelllang=en_us |
| 24 | |
Bram Moolenaar | 13fcaaf | 2005-04-15 21:13:42 +0000 | [diff] [blame] | 25 | This switches on the 'spell' option and specifies to check for US English. |
Bram Moolenaar | 217ad92 | 2005-03-20 22:37:15 +0000 | [diff] [blame] | 26 | |
| 27 | The words that are not recognized are highlighted with one of these: |
| 28 | SpellBad word not recognized |
| 29 | SpellRare rare word |
| 30 | SpellLocal wrong spelling for selected region |
| 31 | |
Bram Moolenaar | 13fcaaf | 2005-04-15 21:13:42 +0000 | [diff] [blame] | 32 | Vim only checks words for spelling, there is no grammar check. |
| 33 | |
| 34 | To search for the next misspelled word: |
| 35 | |
| 36 | *]s* *E756* |
| 37 | ]s Move to next misspelled word after the cursor. |
| 38 | |
| 39 | *[s* |
| 40 | [s Move to next misspelled word before the cursor. |
| 41 | DOESN'T WORK YET! |
| 42 | |
Bram Moolenaar | 217ad92 | 2005-03-20 22:37:15 +0000 | [diff] [blame] | 43 | |
Bram Moolenaar | 6bb6836 | 2005-03-22 23:03:44 +0000 | [diff] [blame] | 44 | PERFORMANCE |
| 45 | |
| 46 | Note that Vim does on-the-fly spellchecking. To make this work fast the |
Bram Moolenaar | 13fcaaf | 2005-04-15 21:13:42 +0000 | [diff] [blame] | 47 | word list is loaded in memory. Thus this uses a lot of memory (1 Mbyte or |
Bram Moolenaar | 6bb6836 | 2005-03-22 23:03:44 +0000 | [diff] [blame] | 48 | more). There might also be a noticable delay when the word list is loaded, |
| 49 | which happens when 'spelllang' is set. Each word list is only loaded once, |
| 50 | they are not deleted when 'spelllang' is made empty. When 'encoding' is set |
| 51 | the word lists are reloaded, thus you may notice a delay then too. |
| 52 | |
| 53 | |
Bram Moolenaar | 217ad92 | 2005-03-20 22:37:15 +0000 | [diff] [blame] | 54 | REGIONS |
| 55 | |
| 56 | A word may be spelled differently in various regions. For example, English |
| 57 | comes in (at least) these variants: |
| 58 | |
| 59 | en all regions |
| 60 | en_us US |
| 61 | en_gb Great Britain |
| 62 | en_ca Canada |
| 63 | |
| 64 | Words that are not used in one region but are used in another region are |
| 65 | highlighted with SpellLocal. |
| 66 | |
Bram Moolenaar | 13fcaaf | 2005-04-15 21:13:42 +0000 | [diff] [blame] | 67 | Always use lowercase letters for the language and region names. |
Bram Moolenaar | 217ad92 | 2005-03-20 22:37:15 +0000 | [diff] [blame] | 68 | |
| 69 | |
| 70 | SPELL FILES |
| 71 | |
| 72 | Vim searches for spell files in the "spell" subdirectory of the directories in |
Bram Moolenaar | 13fcaaf | 2005-04-15 21:13:42 +0000 | [diff] [blame] | 73 | 'runtimepath'. The name is: LL-XXX.EEE.spl, where: |
| 74 | LL the language name |
| 75 | -XXX optional addition |
| 76 | EEE the value of 'encoding' |
Bram Moolenaar | 217ad92 | 2005-03-20 22:37:15 +0000 | [diff] [blame] | 77 | |
| 78 | Exception: Vim uses "latin1" when 'encoding' is "iso-8859-15". The euro sign |
| 79 | doesn't matter for spelling. |
| 80 | |
Bram Moolenaar | 6bb6836 | 2005-03-22 23:03:44 +0000 | [diff] [blame] | 81 | Spelling for EBCDIC is currently not supported. |
| 82 | |
Bram Moolenaar | 13fcaaf | 2005-04-15 21:13:42 +0000 | [diff] [blame] | 83 | A spell file might not be available in the current 'encoding'. See |
| 84 | |spell-mkspell| about how to create a spell file. Converting a spell file |
| 85 | with "iconv" will NOT work. |
Bram Moolenaar | 217ad92 | 2005-03-20 22:37:15 +0000 | [diff] [blame] | 86 | |
| 87 | If a spell file only uses ASCII characters the encoding can be omitted. This |
| 88 | is useful for English: "en.spl" The file with encoding is checked first, thus |
| 89 | you could have one with encoding that includes words with non-ASCII characters |
| 90 | and use the ASCII file as a fall-back. |
| 91 | |
Bram Moolenaar | 13fcaaf | 2005-04-15 21:13:42 +0000 | [diff] [blame] | 92 | *E758* *E759* |
| 93 | When loading a spell file Vim checks that it is properly formatted. If you |
| 94 | get an error the file may be truncated, modified or for another Vim version. |
| 95 | |
Bram Moolenaar | 6bb6836 | 2005-03-22 23:03:44 +0000 | [diff] [blame] | 96 | |
| 97 | WORDS |
| 98 | |
| 99 | Vim uses a fixed method to recognize a word. This is independent of |
| 100 | 'iskeyword', so that it also works in help files and for languages that |
| 101 | include characters like '-' in 'iskeyword'. The word characters do depend on |
| 102 | 'encoding'. |
| 103 | |
| 104 | A word that starts with a digit is always ignored. |
| 105 | |
| 106 | |
| 107 | SYNTAX HIGHLIGHTING |
| 108 | |
| 109 | Files that use syntax highlighting can specify where spell checking should be |
| 110 | done: |
| 111 | |
| 112 | everywhere default |
| 113 | in specific items use "contains=@Spell" |
| 114 | everywhere but specific items use "contains=@NoSpell" |
| 115 | |
| 116 | Note that mixing @Spell and @NoSpell doesn't make sense. |
| 117 | |
Bram Moolenaar | 217ad92 | 2005-03-20 22:37:15 +0000 | [diff] [blame] | 118 | ============================================================================== |
Bram Moolenaar | 13fcaaf | 2005-04-15 21:13:42 +0000 | [diff] [blame] | 119 | 2. Generating a spell file *spell-mkspell* |
Bram Moolenaar | 217ad92 | 2005-03-20 22:37:15 +0000 | [diff] [blame] | 120 | |
Bram Moolenaar | 13fcaaf | 2005-04-15 21:13:42 +0000 | [diff] [blame] | 121 | Vim uses a binary file format for spelling. This greatly speeds up loading |
| 122 | the word list and keeps it small. |
Bram Moolenaar | 217ad92 | 2005-03-20 22:37:15 +0000 | [diff] [blame] | 123 | |
Bram Moolenaar | 13fcaaf | 2005-04-15 21:13:42 +0000 | [diff] [blame] | 124 | You can create a Vim spell file from the .aff and .dic files that Myspell |
| 125 | uses. Myspell is used by OpenOffice.org and Mozilla. You should be able to |
| 126 | find them here: |
| 127 | http://lingucomponent.openoffice.org/spell_dic.html |
Bram Moolenaar | 217ad92 | 2005-03-20 22:37:15 +0000 | [diff] [blame] | 128 | |
Bram Moolenaar | 13fcaaf | 2005-04-15 21:13:42 +0000 | [diff] [blame] | 129 | :mksp[ell] {outname} {inname} ... *:mksp* *:mkspell* |
| 130 | Generate spell file {outname}.spl from Myspell files |
| 131 | {inname}.aff and {inname}.dic. |
| 132 | Multiple {inname} arguments can be given to combine |
| 133 | regions into one Vim spell file. Example: > |
| 134 | :mkspell ~/.vim/spell/en /tmp/en_US /tmp/en_CA /tmp/en_AU |
| 135 | < This combines the English word lists for US, CA and AU |
| 136 | into one en.spl file. |
| 137 | Up to eight regions can be combined. *E754* *755* |
Bram Moolenaar | 217ad92 | 2005-03-20 22:37:15 +0000 | [diff] [blame] | 138 | |
Bram Moolenaar | 13fcaaf | 2005-04-15 21:13:42 +0000 | [diff] [blame] | 139 | Since you might want to change the word list for use with Vim the following |
| 140 | procedure is recommended: |
Bram Moolenaar | 217ad92 | 2005-03-20 22:37:15 +0000 | [diff] [blame] | 141 | |
Bram Moolenaar | 13fcaaf | 2005-04-15 21:13:42 +0000 | [diff] [blame] | 142 | 1. Obtain the xx_YY.aff and xx_YY.dic files from Myspell. |
| 143 | 2. Make a copy of these files to xx_YY.orig.aff and xx_YY.orig.dic. |
| 144 | 3. Change the xx_YY.aff and xx_YY.dic files to remove bad words, add missing |
| 145 | words, etc. |
| 146 | 4. Use |:mkspell| to generate the Vim spell file and try it out. |
Bram Moolenaar | 217ad92 | 2005-03-20 22:37:15 +0000 | [diff] [blame] | 147 | |
Bram Moolenaar | 13fcaaf | 2005-04-15 21:13:42 +0000 | [diff] [blame] | 148 | When the Myspell files are updated you can merge the differences: |
| 149 | 5. Obtain the new Myspell files as xx_YY.new.aff and xx_UU.new.dic. |
| 150 | 6. Use Vimdiff to see what changed: > |
| 151 | vimdiff xx_YY.orig.dic xx_YY.new.dic |
| 152 | 7. Take over the changes you like in xx_YY.dic. |
| 153 | You may also need to change xx_YY.aff. |
| 154 | 8. Rename xx_YY.new.dic to xx_YY.orig.dic and xx_YY.new.aff to xx_YY.new.aff. |
Bram Moolenaar | 217ad92 | 2005-03-20 22:37:15 +0000 | [diff] [blame] | 155 | |
Bram Moolenaar | 13fcaaf | 2005-04-15 21:13:42 +0000 | [diff] [blame] | 156 | ============================================================================== |
| 157 | 9. Spell file format *spell-file-format* |
Bram Moolenaar | 217ad92 | 2005-03-20 22:37:15 +0000 | [diff] [blame] | 158 | |
Bram Moolenaar | 13fcaaf | 2005-04-15 21:13:42 +0000 | [diff] [blame] | 159 | This is the format of the files that are used by the person who creates and |
| 160 | maintains a word list. |
Bram Moolenaar | 217ad92 | 2005-03-20 22:37:15 +0000 | [diff] [blame] | 161 | |
Bram Moolenaar | 13fcaaf | 2005-04-15 21:13:42 +0000 | [diff] [blame] | 162 | Note that we avoid the word "dictionary" here. That is because the goal of |
| 163 | spell checking differs from writing a dictionary (as in the book). For |
| 164 | spelling we need a list of words that are OK, thus need not to be highlighted. |
| 165 | Names will not appear in a dictionary, but do appear in a word list. And |
| 166 | some old words are rarely used and are common misspellings. These do appear |
| 167 | in a dictionary but not in a word list. |
Bram Moolenaar | 217ad92 | 2005-03-20 22:37:15 +0000 | [diff] [blame] | 168 | |
Bram Moolenaar | 13fcaaf | 2005-04-15 21:13:42 +0000 | [diff] [blame] | 169 | There are two files: the basic word list and an affix file. The affixes are |
| 170 | used to modify the basic words to get the full word list. This significantly |
| 171 | reduces the number of words, especially for a language like Polish. This is |
| 172 | called affix compression. |
Bram Moolenaar | 217ad92 | 2005-03-20 22:37:15 +0000 | [diff] [blame] | 173 | |
Bram Moolenaar | 13fcaaf | 2005-04-15 21:13:42 +0000 | [diff] [blame] | 174 | The format for the affix and word list files is mostly identical to what |
| 175 | Myspell uses (the spell checker of Mozilla and OpenOffice.org). A description |
| 176 | can be found here: |
| 177 | http://lingucomponent.openoffice.org/affix.readme ~ |
| 178 | Note that affixes are case sensitive, this isn't obvious from the description. |
| 179 | Vim supports a few extras. Hopefully Myspell will support these too some day. |
| 180 | See |spell-affix-vim|. |
Bram Moolenaar | 217ad92 | 2005-03-20 22:37:15 +0000 | [diff] [blame] | 181 | |
Bram Moolenaar | 13fcaaf | 2005-04-15 21:13:42 +0000 | [diff] [blame] | 182 | The basic word list and the affix file are combined and turned into a binary |
| 183 | spell file. All the preprocessing has been done, thus this file loads fast. |
| 184 | The binary spell file format is described in the source code (src/spell.c). |
| 185 | But only developers need to know about it. |
| 186 | |
| 187 | The preprocessing also allows us to take the Myspell language files and modify |
| 188 | them before the Vim word list is made. The tools for this can be found in the |
| 189 | "src/spell" directory. |
| 190 | |
| 191 | |
| 192 | WORD LIST FORMAT *spell-wordlist-format* |
| 193 | |
| 194 | A very short example, with line numbers: |
| 195 | |
| 196 | 1 1234 |
| 197 | 2 aan |
| 198 | 3 Als |
| 199 | 4 Etten-Leur |
| 200 | 5 et al. |
| 201 | 6 's-Gravenhage |
| 202 | 7 's-Gravenhaags |
| 203 | 8 bedel/P |
| 204 | 9 kado/1 |
| 205 | 10 cadeau/2 |
| 206 | |
| 207 | The first line contains the number of words. Vim ignores it. *E760* |
| 208 | |
| 209 | What follows is one word per line. There should be no white space after the |
| 210 | word. |
| 211 | |
| 212 | When the word only has lower-case letters it will also match with the word |
| 213 | starting with an upper-case letter. |
| 214 | |
| 215 | When the word includes an upper-case letter, this means the upper-case letter |
| 216 | is required at this position. The same word with a lower-case letter at this |
| 217 | position will not match. When some of the other letters are upper-case it will |
| 218 | not match either. |
| 219 | |
| 220 | The same word with all upper-case characters will always be OK. |
| 221 | |
| 222 | word list matches does not match ~ |
| 223 | als als Als ALS ALs AlS aLs aLS |
| 224 | Als Als ALS als ALs AlS aLs aLS |
| 225 | ALS ALS als Als ALs AlS aLs aLS |
| 226 | AlS AlS ALS als Als ALs aLs aLS |
| 227 | |
| 228 | Note in line 5 to 7 that non-word characters are used. You can include |
| 229 | any character in a word. When checking the text a word still only matches |
| 230 | when it appears with a non-word character before and after it. For Myspell a |
| 231 | word starting with a non-word character probably won't work. |
| 232 | |
| 233 | After the word there is an optional slash and flags. Most of these flags are |
| 234 | letters that indicate the affixes that can be used with this word. |
| 235 | |
| 236 | *spell-affix-vim* |
| 237 | A flag that Vim adds and is not in Myspell is the "=" flag. This has the |
| 238 | meaning that case matters. This can be used if the word does not have the |
| 239 | first letter in upper case at the start of a sentence. Example: |
| 240 | |
| 241 | word list matches does not match ~ |
| 242 | 's morgens/= 's morgens 'S morgens 's Morgens |
| 243 | 's Morgens 's Morgens 'S morgens 's morgens |
| 244 | |
| 245 | *spell-affix-mbyte* |
| 246 | The basic word list is normally in an 8-bit encoding, which is mentioned in |
| 247 | the affix file. The affix file must always be in the same encoding as the |
| 248 | word list. This is compatible with Myspell. For Vim the encoding may also be |
| 249 | something else, any encoding that "iconv" supports. The "SET" line must |
| 250 | specify the name of the encoding. When using a multi-byte encoding it's |
| 251 | possible to use more different affixes. |
| 252 | |
| 253 | Performance hint: Although using affixes reduces the number of words, it |
| 254 | reduces the speed. It's a good idea to put all the often used words in the |
| 255 | word list with the affixes prepended/appended. |
Bram Moolenaar | 217ad92 | 2005-03-20 22:37:15 +0000 | [diff] [blame] | 256 | |
| 257 | |
| 258 | vim:tw=78:sw=4:ts=8:ft=help:norl: |