blob: df1ea5434c7acca90388161de10559def25a28ec [file] [log] [blame]
Bram Moolenaar45eeb132005-06-06 21:59:07 +00001*spell.txt* For Vim version 7.0aa. Last change: 2005 Jun 06
Bram Moolenaar217ad922005-03-20 22:37:15 +00002
3
4 VIM REFERENCE MANUAL by Bram Moolenaar
5
6
7Spell checking *spell*
8
91. Quick start |spell-quickstart|
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000102. Generating a spell file |spell-mkspell|
119. Spell file format |spell-file-format|
Bram Moolenaar217ad922005-03-20 22:37:15 +000012
13{Vi does not have any of these commands}
14
15Spell checking is not available when the |+syntax| feature has been disabled
16at compile time.
17
18==============================================================================
191. Quick start *spell-quickstart*
20
21This command switches on spell checking: >
22
23 :setlocal spell spelllang=en_us
24
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +000025This switches on the 'spell' option and specifies to check for US English.
Bram Moolenaar217ad922005-03-20 22:37:15 +000026
27The words that are not recognized are highlighted with one of these:
28 SpellBad word not recognized
29 SpellRare rare word
30 SpellLocal wrong spelling for selected region
31
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +000032Vim only checks words for spelling, there is no grammar check.
33
34To search for the next misspelled word:
35
36 *]s* *E756*
37]s Move to next misspelled word after the cursor.
Bram Moolenaar9d0ec2e2005-04-20 19:45:58 +000038 A count before the command can be used to repeat.
39 This uses the @Spell and @NoSpell clusters from syntax
40 highlighting, see |spell-syntax|.
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +000041
42 *[s*
Bram Moolenaar9d0ec2e2005-04-20 19:45:58 +000043[s Like "]s" but search backwards, find the misspelled
44 word before the cursor.
45
46 *]S*
47]S Like "]s" but only stop at bad words, not at rare
48 words or words for another region.
49
50 *[S*
51[S Like "]S" but search backwards.
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +000052
Bram Moolenaar217ad922005-03-20 22:37:15 +000053
Bram Moolenaar6bb68362005-03-22 23:03:44 +000054PERFORMANCE
55
56Note that Vim does on-the-fly spellchecking. To make this work fast the
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +000057word list is loaded in memory. Thus this uses a lot of memory (1 Mbyte or
Bram Moolenaar6bb68362005-03-22 23:03:44 +000058more). There might also be a noticable delay when the word list is loaded,
59which happens when 'spelllang' is set. Each word list is only loaded once,
60they are not deleted when 'spelllang' is made empty. When 'encoding' is set
61the word lists are reloaded, thus you may notice a delay then too.
62
63
Bram Moolenaar217ad922005-03-20 22:37:15 +000064REGIONS
65
66A word may be spelled differently in various regions. For example, English
67comes in (at least) these variants:
68
69 en all regions
Bram Moolenaar5c5474b2005-04-19 21:40:26 +000070 en_au Australia
Bram Moolenaar217ad922005-03-20 22:37:15 +000071 en_ca Canada
Bram Moolenaar5c5474b2005-04-19 21:40:26 +000072 en_gb Great Britain
73 en_nz New Zealand
74 en_us USA
Bram Moolenaar217ad922005-03-20 22:37:15 +000075
76Words that are not used in one region but are used in another region are
77highlighted with SpellLocal.
78
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +000079Always use lowercase letters for the language and region names.
Bram Moolenaar217ad922005-03-20 22:37:15 +000080
81
82SPELL FILES
83
84Vim searches for spell files in the "spell" subdirectory of the directories in
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +000085'runtimepath'. The name is: LL-XXX.EEE.spl, where:
86 LL the language name
87 -XXX optional addition
88 EEE the value of 'encoding'
Bram Moolenaar217ad922005-03-20 22:37:15 +000089
Bram Moolenaar0e21a3f2005-04-17 20:28:32 +000090Exceptions:
91- Vim uses "latin1" when 'encoding' is "iso-8859-15". The euro sign doesn't
92 matter for spelling.
93- When no spell file for 'encoding' is found "ascii" is tried. This only
94 works for languages where nearly all words are ASCII, such as English. It
95 helps when 'encoding' is not "latin1", such as iso-8859-2, and English text
96 is being edited.
Bram Moolenaar217ad922005-03-20 22:37:15 +000097
Bram Moolenaar6bb68362005-03-22 23:03:44 +000098Spelling for EBCDIC is currently not supported.
99
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000100A spell file might not be available in the current 'encoding'. See
101|spell-mkspell| about how to create a spell file. Converting a spell file
Bram Moolenaar0e21a3f2005-04-17 20:28:32 +0000102with "iconv" will NOT work!
Bram Moolenaar217ad922005-03-20 22:37:15 +0000103
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000104 *E758* *E759*
105When loading a spell file Vim checks that it is properly formatted. If you
Bram Moolenaar0e21a3f2005-04-17 20:28:32 +0000106get an error the file may be truncated, modified or intended for another Vim
107version.
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000108
Bram Moolenaar6bb68362005-03-22 23:03:44 +0000109
110WORDS
111
112Vim uses a fixed method to recognize a word. This is independent of
113'iskeyword', so that it also works in help files and for languages that
114include characters like '-' in 'iskeyword'. The word characters do depend on
115'encoding'.
116
117A word that starts with a digit is always ignored.
118
119
Bram Moolenaar9d0ec2e2005-04-20 19:45:58 +0000120SYNTAX HIGHLIGHTING *spell-syntax*
Bram Moolenaar6bb68362005-03-22 23:03:44 +0000121
122Files that use syntax highlighting can specify where spell checking should be
123done:
124
125 everywhere default
126 in specific items use "contains=@Spell"
127 everywhere but specific items use "contains=@NoSpell"
128
129Note that mixing @Spell and @NoSpell doesn't make sense.
130
Bram Moolenaar217ad922005-03-20 22:37:15 +0000131==============================================================================
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +00001322. Generating a spell file *spell-mkspell*
Bram Moolenaar217ad922005-03-20 22:37:15 +0000133
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000134Vim uses a binary file format for spelling. This greatly speeds up loading
135the word list and keeps it small.
Bram Moolenaar217ad922005-03-20 22:37:15 +0000136
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000137You can create a Vim spell file from the .aff and .dic files that Myspell
138uses. Myspell is used by OpenOffice.org and Mozilla. You should be able to
139find them here:
140 http://lingucomponent.openoffice.org/spell_dic.html
Bram Moolenaar217ad922005-03-20 22:37:15 +0000141
Bram Moolenaar0e21a3f2005-04-17 20:28:32 +0000142:mksp[ell] [-ascii] {outname} {inname} ... *:mksp* *:mkspell*
Bram Moolenaar75c50c42005-06-04 22:06:24 +0000143 Generate spell file {outname}.spl.
144
Bram Moolenaar0e21a3f2005-04-17 20:28:32 +0000145 When the [-ascii] argument is present, words with
146 non-ascii characters are skipped. The resulting file
147 ends in "ascii.spl". Otherwise the resulting file
148 ends in "ENC.spl", where ENC is the value of
149 'encoding'.
Bram Moolenaar75c50c42005-06-04 22:06:24 +0000150
151 The input can be the Myspell format files {inname}.aff
152 and {inname}.dic. If {inname}.aff does not exist then
153 {inname} is used as the file name of a plain word
154 list.
155
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000156 Multiple {inname} arguments can be given to combine
157 regions into one Vim spell file. Example: >
158 :mkspell ~/.vim/spell/en /tmp/en_US /tmp/en_CA /tmp/en_AU
159< This combines the English word lists for US, CA and AU
160 into one en.spl file.
161 Up to eight regions can be combined. *E754* *755*
Bram Moolenaar217ad922005-03-20 22:37:15 +0000162
Bram Moolenaar45eeb132005-06-06 21:59:07 +0000163 When the spell file was written all currently used
164 spell files will be reloaded.
165
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000166Since you might want to change the word list for use with Vim the following
167procedure is recommended:
Bram Moolenaar217ad922005-03-20 22:37:15 +0000168
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +00001691. Obtain the xx_YY.aff and xx_YY.dic files from Myspell.
1702. Make a copy of these files to xx_YY.orig.aff and xx_YY.orig.dic.
1713. Change the xx_YY.aff and xx_YY.dic files to remove bad words, add missing
Bram Moolenaar0cb032e2005-04-23 20:52:00 +0000172 words, define word characters with FOL/LOW/UPP, etc. The distributed
173 "src/spell/*.diff" files can be used.
1744. Set 'encoding' to the desired encoding and use |:mkspell| to generate the
175 Vim spell file.
1765. Try out the spell file with ":set spell spelllang=xx_YY".
Bram Moolenaar217ad922005-03-20 22:37:15 +0000177
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000178When the Myspell files are updated you can merge the differences:
Bram Moolenaar0cb032e2005-04-23 20:52:00 +00001791. Obtain the new Myspell files as xx_YY.new.aff and xx_UU.new.dic.
1802. Use Vimdiff to see what changed: >
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000181 vimdiff xx_YY.orig.dic xx_YY.new.dic
Bram Moolenaar0cb032e2005-04-23 20:52:00 +00001823. Take over the changes you like in xx_YY.dic.
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000183 You may also need to change xx_YY.aff.
Bram Moolenaar0cb032e2005-04-23 20:52:00 +00001844. Rename xx_YY.new.dic to xx_YY.orig.dic and xx_YY.new.aff to xx_YY.new.aff.
Bram Moolenaar217ad922005-03-20 22:37:15 +0000185
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000186==============================================================================
1879. Spell file format *spell-file-format*
Bram Moolenaar217ad922005-03-20 22:37:15 +0000188
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000189This is the format of the files that are used by the person who creates and
190maintains a word list.
Bram Moolenaar217ad922005-03-20 22:37:15 +0000191
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000192Note that we avoid the word "dictionary" here. That is because the goal of
193spell checking differs from writing a dictionary (as in the book). For
194spelling we need a list of words that are OK, thus need not to be highlighted.
195Names will not appear in a dictionary, but do appear in a word list. And
196some old words are rarely used and are common misspellings. These do appear
197in a dictionary but not in a word list.
Bram Moolenaar217ad922005-03-20 22:37:15 +0000198
Bram Moolenaar75c50c42005-06-04 22:06:24 +0000199There are two formats: one with affix compression and one without. The files
200with affix compression are used by Myspell (Mozilla and OpenOffice.org). This
201requires two files, one with .aff and one with .dic extension. The second
202format is a list of words.
203
204
205FORMAT OF WORD LIST
206
207The words must appear one per line. That is all that is required. Optional
208items are:
209- Empty and blank lines are ignored.
210- Lines starting with a # are ignored (comment lines).
Bram Moolenaar45eeb132005-06-06 21:59:07 +0000211- A line starting with "/encoding=", before any word, specifies the encoding
212 of the file. After the second '=' comes an encoding name. This tells Vim
213 to setup conversion from the specified encoding to 'encoding'.
214- A line starting with "/?" specifies a word that should be marked as rare.
215- A line starting with "/!" specifies a word that should be marked as bad.
216- A line starting with "/=" specifies a word where case must match exactly.
217 A "?" or "!" may be following: "/=?" and "/=!".
218- Other lines starting with '/' are special. The ones that are not recognized
Bram Moolenaar75c50c42005-06-04 22:06:24 +0000219 are ignored (but you do get a warning message).
220
221
222FORMAT WITH AFFIX COMPRESSION
223
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000224There are two files: the basic word list and an affix file. The affixes are
225used to modify the basic words to get the full word list. This significantly
226reduces the number of words, especially for a language like Polish. This is
227called affix compression.
Bram Moolenaar217ad922005-03-20 22:37:15 +0000228
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000229The format for the affix and word list files is mostly identical to what
230Myspell uses (the spell checker of Mozilla and OpenOffice.org). A description
231can be found here:
232 http://lingucomponent.openoffice.org/affix.readme ~
233Note that affixes are case sensitive, this isn't obvious from the description.
234Vim supports a few extras. Hopefully Myspell will support these too some day.
235See |spell-affix-vim|.
Bram Moolenaar217ad922005-03-20 22:37:15 +0000236
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000237The basic word list and the affix file are combined and turned into a binary
238spell file. All the preprocessing has been done, thus this file loads fast.
239The binary spell file format is described in the source code (src/spell.c).
240But only developers need to know about it.
241
242The preprocessing also allows us to take the Myspell language files and modify
243them before the Vim word list is made. The tools for this can be found in the
244"src/spell" directory.
245
246
247WORD LIST FORMAT *spell-wordlist-format*
248
249A very short example, with line numbers:
250
251 1 1234
252 2 aan
253 3 Als
254 4 Etten-Leur
255 5 et al.
256 6 's-Gravenhage
257 7 's-Gravenhaags
258 8 bedel/P
259 9 kado/1
260 10 cadeau/2
261
262The first line contains the number of words. Vim ignores it. *E760*
263
264What follows is one word per line. There should be no white space after the
265word.
266
267When the word only has lower-case letters it will also match with the word
268starting with an upper-case letter.
269
270When the word includes an upper-case letter, this means the upper-case letter
271is required at this position. The same word with a lower-case letter at this
272position will not match. When some of the other letters are upper-case it will
273not match either.
274
275The same word with all upper-case characters will always be OK.
276
277 word list matches does not match ~
278 als als Als ALS ALs AlS aLs aLS
279 Als Als ALS als ALs AlS aLs aLS
280 ALS ALS als Als ALs AlS aLs aLS
281 AlS AlS ALS als Als ALs aLs aLS
282
Bram Moolenaar45eeb132005-06-06 21:59:07 +0000283The HUH affix ID can be used to specifically match a word in identical case
284only, see below.
285
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000286Note in line 5 to 7 that non-word characters are used. You can include
287any character in a word. When checking the text a word still only matches
288when it appears with a non-word character before and after it. For Myspell a
289word starting with a non-word character probably won't work.
290
291After the word there is an optional slash and flags. Most of these flags are
292letters that indicate the affixes that can be used with this word.
293
294 *spell-affix-vim*
Bram Moolenaar45eeb132005-06-06 21:59:07 +0000295A flag that Vim adds and is not in Myspell is the flag defined with HUH in the
296affix file. This has the meaning that case matters. This can be used if the
297word does not have the first letter in upper case at the start of a sentence.
298Example:
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000299
300 word list matches does not match ~
301 's morgens/= 's morgens 'S morgens 's Morgens
302 's Morgens 's Morgens 'S morgens 's morgens
303
304 *spell-affix-mbyte*
305The basic word list is normally in an 8-bit encoding, which is mentioned in
306the affix file. The affix file must always be in the same encoding as the
307word list. This is compatible with Myspell. For Vim the encoding may also be
308something else, any encoding that "iconv" supports. The "SET" line must
309specify the name of the encoding. When using a multi-byte encoding it's
310possible to use more different affixes.
311
312Performance hint: Although using affixes reduces the number of words, it
313reduces the speed. It's a good idea to put all the often used words in the
314word list with the affixes prepended/appended.
Bram Moolenaar217ad922005-03-20 22:37:15 +0000315
Bram Moolenaar0cb032e2005-04-23 20:52:00 +0000316 *spell-affix-chars*
317The affix file should define the word characters when using an 8-bit encoding
318(as specified with ENC). This is because the system where ":mkspell" is used
319may not support a locale with this encoding and isalpha() won't work. For
320example when using "cp1250" on Unix.
321
322 *E761* *E762*
323Three lines in the affix file are needed. Simplistic example:
324
Bram Moolenaar45eeb132005-06-06 21:59:07 +0000325 FOL áëñáëñ ~
326 LOW áëñáëñ ~
327 UPP áëñÁËÑ ~
Bram Moolenaar0cb032e2005-04-23 20:52:00 +0000328
329All three lines must have exactly the same number of characters.
330
331The "FOL" line specifies the case-folded characters. These are used to
332compare words while ignoring case. For most encodings this is identical to
333the lower case line.
334
335The "LOW" line specifies the characters in lower-case. Mostly it's equal to
336the "FOL" line.
337
338The "UPP" line specifies the characters with upper-case. That is, a character
339is upper-case where it's different from the character at the same position in
340"FOL".
341
342ASCII characters should be omitted, Vim always handles these in the same way.
343When the encoding is UTF-8 no word characters need to be specified.
344
345 *E763*
346All spell files for the same encoding must use the same word characters,
Bram Moolenaar46df82e2005-04-24 22:06:24 +0000347otherwise they can't be combined without errors. The XX.ascii.spl spell file
348generated with the "-ascii" argument will not contain the table with
349characters, so that it can be combine with spell files for any encoding.
Bram Moolenaar0cb032e2005-04-23 20:52:00 +0000350
Bram Moolenaar217ad922005-03-20 22:37:15 +0000351
Bram Moolenaar45eeb132005-06-06 21:59:07 +0000352In the affix file a HUH line can be used to define the affix name used for
353keep-case words. Example:
354
355 HUH = ~
356
357See above for an example |spell-affix-vim|.
358
359
360In the affix file a RAR line can be used to define the affix name used for
361rare words. Example:
362
363 RAR ? ~
364
365Rare words are highlighted differently from bad words. This is to be used for
366words that are correct for the language, but are hardly ever used and could be
367a typing mistake anyway.
368
369
Bram Moolenaar217ad922005-03-20 22:37:15 +0000370 vim:tw=78:sw=4:ts=8:ft=help:norl: