Bram Moolenaar | 5c5474b | 2005-04-19 21:40:26 +0000 | [diff] [blame] | 1 | *spell.txt* For Vim version 7.0aa. Last change: 2005 Apr 19 |
Bram Moolenaar | 217ad92 | 2005-03-20 22:37:15 +0000 | [diff] [blame] | 2 | |
| 3 | |
| 4 | VIM REFERENCE MANUAL by Bram Moolenaar |
| 5 | |
| 6 | |
| 7 | Spell checking *spell* |
| 8 | |
| 9 | 1. Quick start |spell-quickstart| |
Bram Moolenaar | 13fcaaf | 2005-04-15 21:13:42 +0000 | [diff] [blame] | 10 | 2. Generating a spell file |spell-mkspell| |
| 11 | 9. Spell file format |spell-file-format| |
Bram Moolenaar | 217ad92 | 2005-03-20 22:37:15 +0000 | [diff] [blame] | 12 | |
| 13 | {Vi does not have any of these commands} |
| 14 | |
| 15 | Spell checking is not available when the |+syntax| feature has been disabled |
| 16 | at compile time. |
| 17 | |
| 18 | ============================================================================== |
| 19 | 1. Quick start *spell-quickstart* |
| 20 | |
| 21 | This command switches on spell checking: > |
| 22 | |
| 23 | :setlocal spell spelllang=en_us |
| 24 | |
Bram Moolenaar | 13fcaaf | 2005-04-15 21:13:42 +0000 | [diff] [blame] | 25 | This switches on the 'spell' option and specifies to check for US English. |
Bram Moolenaar | 217ad92 | 2005-03-20 22:37:15 +0000 | [diff] [blame] | 26 | |
| 27 | The words that are not recognized are highlighted with one of these: |
| 28 | SpellBad word not recognized |
| 29 | SpellRare rare word |
| 30 | SpellLocal wrong spelling for selected region |
| 31 | |
Bram Moolenaar | 13fcaaf | 2005-04-15 21:13:42 +0000 | [diff] [blame] | 32 | Vim only checks words for spelling, there is no grammar check. |
| 33 | |
| 34 | To search for the next misspelled word: |
| 35 | |
| 36 | *]s* *E756* |
| 37 | ]s Move to next misspelled word after the cursor. |
Bram Moolenaar | 5c5474b | 2005-04-19 21:40:26 +0000 | [diff] [blame] | 38 | NOTE: doesn't obey syntax highlighting yet, thus |
| 39 | will stop at more places than what is highlighted. |
Bram Moolenaar | 13fcaaf | 2005-04-15 21:13:42 +0000 | [diff] [blame] | 40 | |
| 41 | *[s* |
| 42 | [s Move to next misspelled word before the cursor. |
| 43 | DOESN'T WORK YET! |
| 44 | |
Bram Moolenaar | 217ad92 | 2005-03-20 22:37:15 +0000 | [diff] [blame] | 45 | |
Bram Moolenaar | 6bb6836 | 2005-03-22 23:03:44 +0000 | [diff] [blame] | 46 | PERFORMANCE |
| 47 | |
| 48 | Note that Vim does on-the-fly spellchecking. To make this work fast the |
Bram Moolenaar | 13fcaaf | 2005-04-15 21:13:42 +0000 | [diff] [blame] | 49 | word list is loaded in memory. Thus this uses a lot of memory (1 Mbyte or |
Bram Moolenaar | 6bb6836 | 2005-03-22 23:03:44 +0000 | [diff] [blame] | 50 | more). There might also be a noticable delay when the word list is loaded, |
| 51 | which happens when 'spelllang' is set. Each word list is only loaded once, |
| 52 | they are not deleted when 'spelllang' is made empty. When 'encoding' is set |
| 53 | the word lists are reloaded, thus you may notice a delay then too. |
| 54 | |
| 55 | |
Bram Moolenaar | 217ad92 | 2005-03-20 22:37:15 +0000 | [diff] [blame] | 56 | REGIONS |
| 57 | |
| 58 | A word may be spelled differently in various regions. For example, English |
| 59 | comes in (at least) these variants: |
| 60 | |
| 61 | en all regions |
Bram Moolenaar | 5c5474b | 2005-04-19 21:40:26 +0000 | [diff] [blame] | 62 | en_au Australia |
Bram Moolenaar | 217ad92 | 2005-03-20 22:37:15 +0000 | [diff] [blame] | 63 | en_ca Canada |
Bram Moolenaar | 5c5474b | 2005-04-19 21:40:26 +0000 | [diff] [blame] | 64 | en_gb Great Britain |
| 65 | en_nz New Zealand |
| 66 | en_us USA |
Bram Moolenaar | 217ad92 | 2005-03-20 22:37:15 +0000 | [diff] [blame] | 67 | |
| 68 | Words that are not used in one region but are used in another region are |
| 69 | highlighted with SpellLocal. |
| 70 | |
Bram Moolenaar | 13fcaaf | 2005-04-15 21:13:42 +0000 | [diff] [blame] | 71 | Always use lowercase letters for the language and region names. |
Bram Moolenaar | 217ad92 | 2005-03-20 22:37:15 +0000 | [diff] [blame] | 72 | |
| 73 | |
| 74 | SPELL FILES |
| 75 | |
| 76 | Vim searches for spell files in the "spell" subdirectory of the directories in |
Bram Moolenaar | 13fcaaf | 2005-04-15 21:13:42 +0000 | [diff] [blame] | 77 | 'runtimepath'. The name is: LL-XXX.EEE.spl, where: |
| 78 | LL the language name |
| 79 | -XXX optional addition |
| 80 | EEE the value of 'encoding' |
Bram Moolenaar | 217ad92 | 2005-03-20 22:37:15 +0000 | [diff] [blame] | 81 | |
Bram Moolenaar | 0e21a3f | 2005-04-17 20:28:32 +0000 | [diff] [blame] | 82 | Exceptions: |
| 83 | - Vim uses "latin1" when 'encoding' is "iso-8859-15". The euro sign doesn't |
| 84 | matter for spelling. |
| 85 | - When no spell file for 'encoding' is found "ascii" is tried. This only |
| 86 | works for languages where nearly all words are ASCII, such as English. It |
| 87 | helps when 'encoding' is not "latin1", such as iso-8859-2, and English text |
| 88 | is being edited. |
Bram Moolenaar | 217ad92 | 2005-03-20 22:37:15 +0000 | [diff] [blame] | 89 | |
Bram Moolenaar | 6bb6836 | 2005-03-22 23:03:44 +0000 | [diff] [blame] | 90 | Spelling for EBCDIC is currently not supported. |
| 91 | |
Bram Moolenaar | 13fcaaf | 2005-04-15 21:13:42 +0000 | [diff] [blame] | 92 | A spell file might not be available in the current 'encoding'. See |
| 93 | |spell-mkspell| about how to create a spell file. Converting a spell file |
Bram Moolenaar | 0e21a3f | 2005-04-17 20:28:32 +0000 | [diff] [blame] | 94 | with "iconv" will NOT work! |
Bram Moolenaar | 217ad92 | 2005-03-20 22:37:15 +0000 | [diff] [blame] | 95 | |
Bram Moolenaar | 13fcaaf | 2005-04-15 21:13:42 +0000 | [diff] [blame] | 96 | *E758* *E759* |
| 97 | When loading a spell file Vim checks that it is properly formatted. If you |
Bram Moolenaar | 0e21a3f | 2005-04-17 20:28:32 +0000 | [diff] [blame] | 98 | get an error the file may be truncated, modified or intended for another Vim |
| 99 | version. |
Bram Moolenaar | 13fcaaf | 2005-04-15 21:13:42 +0000 | [diff] [blame] | 100 | |
Bram Moolenaar | 6bb6836 | 2005-03-22 23:03:44 +0000 | [diff] [blame] | 101 | |
| 102 | WORDS |
| 103 | |
| 104 | Vim uses a fixed method to recognize a word. This is independent of |
| 105 | 'iskeyword', so that it also works in help files and for languages that |
| 106 | include characters like '-' in 'iskeyword'. The word characters do depend on |
| 107 | 'encoding'. |
| 108 | |
| 109 | A word that starts with a digit is always ignored. |
| 110 | |
| 111 | |
| 112 | SYNTAX HIGHLIGHTING |
| 113 | |
| 114 | Files that use syntax highlighting can specify where spell checking should be |
| 115 | done: |
| 116 | |
| 117 | everywhere default |
| 118 | in specific items use "contains=@Spell" |
| 119 | everywhere but specific items use "contains=@NoSpell" |
| 120 | |
| 121 | Note that mixing @Spell and @NoSpell doesn't make sense. |
| 122 | |
Bram Moolenaar | 217ad92 | 2005-03-20 22:37:15 +0000 | [diff] [blame] | 123 | ============================================================================== |
Bram Moolenaar | 13fcaaf | 2005-04-15 21:13:42 +0000 | [diff] [blame] | 124 | 2. Generating a spell file *spell-mkspell* |
Bram Moolenaar | 217ad92 | 2005-03-20 22:37:15 +0000 | [diff] [blame] | 125 | |
Bram Moolenaar | 13fcaaf | 2005-04-15 21:13:42 +0000 | [diff] [blame] | 126 | Vim uses a binary file format for spelling. This greatly speeds up loading |
| 127 | the word list and keeps it small. |
Bram Moolenaar | 217ad92 | 2005-03-20 22:37:15 +0000 | [diff] [blame] | 128 | |
Bram Moolenaar | 13fcaaf | 2005-04-15 21:13:42 +0000 | [diff] [blame] | 129 | You can create a Vim spell file from the .aff and .dic files that Myspell |
| 130 | uses. Myspell is used by OpenOffice.org and Mozilla. You should be able to |
| 131 | find them here: |
| 132 | http://lingucomponent.openoffice.org/spell_dic.html |
Bram Moolenaar | 217ad92 | 2005-03-20 22:37:15 +0000 | [diff] [blame] | 133 | |
Bram Moolenaar | 0e21a3f | 2005-04-17 20:28:32 +0000 | [diff] [blame] | 134 | :mksp[ell] [-ascii] {outname} {inname} ... *:mksp* *:mkspell* |
Bram Moolenaar | 13fcaaf | 2005-04-15 21:13:42 +0000 | [diff] [blame] | 135 | Generate spell file {outname}.spl from Myspell files |
| 136 | {inname}.aff and {inname}.dic. |
Bram Moolenaar | 0e21a3f | 2005-04-17 20:28:32 +0000 | [diff] [blame] | 137 | When the [-ascii] argument is present, words with |
| 138 | non-ascii characters are skipped. The resulting file |
| 139 | ends in "ascii.spl". Otherwise the resulting file |
| 140 | ends in "ENC.spl", where ENC is the value of |
| 141 | 'encoding'. |
Bram Moolenaar | 13fcaaf | 2005-04-15 21:13:42 +0000 | [diff] [blame] | 142 | Multiple {inname} arguments can be given to combine |
| 143 | regions into one Vim spell file. Example: > |
| 144 | :mkspell ~/.vim/spell/en /tmp/en_US /tmp/en_CA /tmp/en_AU |
| 145 | < This combines the English word lists for US, CA and AU |
| 146 | into one en.spl file. |
| 147 | Up to eight regions can be combined. *E754* *755* |
Bram Moolenaar | 217ad92 | 2005-03-20 22:37:15 +0000 | [diff] [blame] | 148 | |
Bram Moolenaar | 13fcaaf | 2005-04-15 21:13:42 +0000 | [diff] [blame] | 149 | Since you might want to change the word list for use with Vim the following |
| 150 | procedure is recommended: |
Bram Moolenaar | 217ad92 | 2005-03-20 22:37:15 +0000 | [diff] [blame] | 151 | |
Bram Moolenaar | 13fcaaf | 2005-04-15 21:13:42 +0000 | [diff] [blame] | 152 | 1. Obtain the xx_YY.aff and xx_YY.dic files from Myspell. |
| 153 | 2. Make a copy of these files to xx_YY.orig.aff and xx_YY.orig.dic. |
| 154 | 3. Change the xx_YY.aff and xx_YY.dic files to remove bad words, add missing |
| 155 | words, etc. |
| 156 | 4. Use |:mkspell| to generate the Vim spell file and try it out. |
Bram Moolenaar | 217ad92 | 2005-03-20 22:37:15 +0000 | [diff] [blame] | 157 | |
Bram Moolenaar | 13fcaaf | 2005-04-15 21:13:42 +0000 | [diff] [blame] | 158 | When the Myspell files are updated you can merge the differences: |
| 159 | 5. Obtain the new Myspell files as xx_YY.new.aff and xx_UU.new.dic. |
| 160 | 6. Use Vimdiff to see what changed: > |
| 161 | vimdiff xx_YY.orig.dic xx_YY.new.dic |
| 162 | 7. Take over the changes you like in xx_YY.dic. |
| 163 | You may also need to change xx_YY.aff. |
| 164 | 8. Rename xx_YY.new.dic to xx_YY.orig.dic and xx_YY.new.aff to xx_YY.new.aff. |
Bram Moolenaar | 217ad92 | 2005-03-20 22:37:15 +0000 | [diff] [blame] | 165 | |
Bram Moolenaar | 13fcaaf | 2005-04-15 21:13:42 +0000 | [diff] [blame] | 166 | ============================================================================== |
| 167 | 9. Spell file format *spell-file-format* |
Bram Moolenaar | 217ad92 | 2005-03-20 22:37:15 +0000 | [diff] [blame] | 168 | |
Bram Moolenaar | 13fcaaf | 2005-04-15 21:13:42 +0000 | [diff] [blame] | 169 | This is the format of the files that are used by the person who creates and |
| 170 | maintains a word list. |
Bram Moolenaar | 217ad92 | 2005-03-20 22:37:15 +0000 | [diff] [blame] | 171 | |
Bram Moolenaar | 13fcaaf | 2005-04-15 21:13:42 +0000 | [diff] [blame] | 172 | Note that we avoid the word "dictionary" here. That is because the goal of |
| 173 | spell checking differs from writing a dictionary (as in the book). For |
| 174 | spelling we need a list of words that are OK, thus need not to be highlighted. |
| 175 | Names will not appear in a dictionary, but do appear in a word list. And |
| 176 | some old words are rarely used and are common misspellings. These do appear |
| 177 | in a dictionary but not in a word list. |
Bram Moolenaar | 217ad92 | 2005-03-20 22:37:15 +0000 | [diff] [blame] | 178 | |
Bram Moolenaar | 13fcaaf | 2005-04-15 21:13:42 +0000 | [diff] [blame] | 179 | There are two files: the basic word list and an affix file. The affixes are |
| 180 | used to modify the basic words to get the full word list. This significantly |
| 181 | reduces the number of words, especially for a language like Polish. This is |
| 182 | called affix compression. |
Bram Moolenaar | 217ad92 | 2005-03-20 22:37:15 +0000 | [diff] [blame] | 183 | |
Bram Moolenaar | 13fcaaf | 2005-04-15 21:13:42 +0000 | [diff] [blame] | 184 | The format for the affix and word list files is mostly identical to what |
| 185 | Myspell uses (the spell checker of Mozilla and OpenOffice.org). A description |
| 186 | can be found here: |
| 187 | http://lingucomponent.openoffice.org/affix.readme ~ |
| 188 | Note that affixes are case sensitive, this isn't obvious from the description. |
| 189 | Vim supports a few extras. Hopefully Myspell will support these too some day. |
| 190 | See |spell-affix-vim|. |
Bram Moolenaar | 217ad92 | 2005-03-20 22:37:15 +0000 | [diff] [blame] | 191 | |
Bram Moolenaar | 13fcaaf | 2005-04-15 21:13:42 +0000 | [diff] [blame] | 192 | The basic word list and the affix file are combined and turned into a binary |
| 193 | spell file. All the preprocessing has been done, thus this file loads fast. |
| 194 | The binary spell file format is described in the source code (src/spell.c). |
| 195 | But only developers need to know about it. |
| 196 | |
| 197 | The preprocessing also allows us to take the Myspell language files and modify |
| 198 | them before the Vim word list is made. The tools for this can be found in the |
| 199 | "src/spell" directory. |
| 200 | |
| 201 | |
| 202 | WORD LIST FORMAT *spell-wordlist-format* |
| 203 | |
| 204 | A very short example, with line numbers: |
| 205 | |
| 206 | 1 1234 |
| 207 | 2 aan |
| 208 | 3 Als |
| 209 | 4 Etten-Leur |
| 210 | 5 et al. |
| 211 | 6 's-Gravenhage |
| 212 | 7 's-Gravenhaags |
| 213 | 8 bedel/P |
| 214 | 9 kado/1 |
| 215 | 10 cadeau/2 |
| 216 | |
| 217 | The first line contains the number of words. Vim ignores it. *E760* |
| 218 | |
| 219 | What follows is one word per line. There should be no white space after the |
| 220 | word. |
| 221 | |
| 222 | When the word only has lower-case letters it will also match with the word |
| 223 | starting with an upper-case letter. |
| 224 | |
| 225 | When the word includes an upper-case letter, this means the upper-case letter |
| 226 | is required at this position. The same word with a lower-case letter at this |
| 227 | position will not match. When some of the other letters are upper-case it will |
| 228 | not match either. |
| 229 | |
| 230 | The same word with all upper-case characters will always be OK. |
| 231 | |
| 232 | word list matches does not match ~ |
| 233 | als als Als ALS ALs AlS aLs aLS |
| 234 | Als Als ALS als ALs AlS aLs aLS |
| 235 | ALS ALS als Als ALs AlS aLs aLS |
| 236 | AlS AlS ALS als Als ALs aLs aLS |
| 237 | |
| 238 | Note in line 5 to 7 that non-word characters are used. You can include |
| 239 | any character in a word. When checking the text a word still only matches |
| 240 | when it appears with a non-word character before and after it. For Myspell a |
| 241 | word starting with a non-word character probably won't work. |
| 242 | |
| 243 | After the word there is an optional slash and flags. Most of these flags are |
| 244 | letters that indicate the affixes that can be used with this word. |
| 245 | |
| 246 | *spell-affix-vim* |
| 247 | A flag that Vim adds and is not in Myspell is the "=" flag. This has the |
| 248 | meaning that case matters. This can be used if the word does not have the |
| 249 | first letter in upper case at the start of a sentence. Example: |
| 250 | |
| 251 | word list matches does not match ~ |
| 252 | 's morgens/= 's morgens 'S morgens 's Morgens |
| 253 | 's Morgens 's Morgens 'S morgens 's morgens |
| 254 | |
| 255 | *spell-affix-mbyte* |
| 256 | The basic word list is normally in an 8-bit encoding, which is mentioned in |
| 257 | the affix file. The affix file must always be in the same encoding as the |
| 258 | word list. This is compatible with Myspell. For Vim the encoding may also be |
| 259 | something else, any encoding that "iconv" supports. The "SET" line must |
| 260 | specify the name of the encoding. When using a multi-byte encoding it's |
| 261 | possible to use more different affixes. |
| 262 | |
| 263 | Performance hint: Although using affixes reduces the number of words, it |
| 264 | reduces the speed. It's a good idea to put all the often used words in the |
| 265 | word list with the affixes prepended/appended. |
Bram Moolenaar | 217ad92 | 2005-03-20 22:37:15 +0000 | [diff] [blame] | 266 | |
| 267 | |
| 268 | vim:tw=78:sw=4:ts=8:ft=help:norl: |