blob: 79419b24e13eebedb6a7382d3dd3c36da238819e [file] [log] [blame]
Bram Moolenaar75c50c42005-06-04 22:06:24 +00001*spell.txt* For Vim version 7.0aa. Last change: 2005 Jun 04
Bram Moolenaar217ad922005-03-20 22:37:15 +00002
3
4 VIM REFERENCE MANUAL by Bram Moolenaar
5
6
7Spell checking *spell*
8
91. Quick start |spell-quickstart|
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000102. Generating a spell file |spell-mkspell|
119. Spell file format |spell-file-format|
Bram Moolenaar217ad922005-03-20 22:37:15 +000012
13{Vi does not have any of these commands}
14
15Spell checking is not available when the |+syntax| feature has been disabled
16at compile time.
17
18==============================================================================
191. Quick start *spell-quickstart*
20
21This command switches on spell checking: >
22
23 :setlocal spell spelllang=en_us
24
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +000025This switches on the 'spell' option and specifies to check for US English.
Bram Moolenaar217ad922005-03-20 22:37:15 +000026
27The words that are not recognized are highlighted with one of these:
28 SpellBad word not recognized
29 SpellRare rare word
30 SpellLocal wrong spelling for selected region
31
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +000032Vim only checks words for spelling, there is no grammar check.
33
34To search for the next misspelled word:
35
36 *]s* *E756*
37]s Move to next misspelled word after the cursor.
Bram Moolenaar9d0ec2e2005-04-20 19:45:58 +000038 A count before the command can be used to repeat.
39 This uses the @Spell and @NoSpell clusters from syntax
40 highlighting, see |spell-syntax|.
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +000041
42 *[s*
Bram Moolenaar9d0ec2e2005-04-20 19:45:58 +000043[s Like "]s" but search backwards, find the misspelled
44 word before the cursor.
45
46 *]S*
47]S Like "]s" but only stop at bad words, not at rare
48 words or words for another region.
49
50 *[S*
51[S Like "]S" but search backwards.
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +000052
Bram Moolenaar217ad922005-03-20 22:37:15 +000053
Bram Moolenaar6bb68362005-03-22 23:03:44 +000054PERFORMANCE
55
56Note that Vim does on-the-fly spellchecking. To make this work fast the
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +000057word list is loaded in memory. Thus this uses a lot of memory (1 Mbyte or
Bram Moolenaar6bb68362005-03-22 23:03:44 +000058more). There might also be a noticable delay when the word list is loaded,
59which happens when 'spelllang' is set. Each word list is only loaded once,
60they are not deleted when 'spelllang' is made empty. When 'encoding' is set
61the word lists are reloaded, thus you may notice a delay then too.
62
63
Bram Moolenaar217ad922005-03-20 22:37:15 +000064REGIONS
65
66A word may be spelled differently in various regions. For example, English
67comes in (at least) these variants:
68
69 en all regions
Bram Moolenaar5c5474b2005-04-19 21:40:26 +000070 en_au Australia
Bram Moolenaar217ad922005-03-20 22:37:15 +000071 en_ca Canada
Bram Moolenaar5c5474b2005-04-19 21:40:26 +000072 en_gb Great Britain
73 en_nz New Zealand
74 en_us USA
Bram Moolenaar217ad922005-03-20 22:37:15 +000075
76Words that are not used in one region but are used in another region are
77highlighted with SpellLocal.
78
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +000079Always use lowercase letters for the language and region names.
Bram Moolenaar217ad922005-03-20 22:37:15 +000080
81
82SPELL FILES
83
84Vim searches for spell files in the "spell" subdirectory of the directories in
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +000085'runtimepath'. The name is: LL-XXX.EEE.spl, where:
86 LL the language name
87 -XXX optional addition
88 EEE the value of 'encoding'
Bram Moolenaar217ad922005-03-20 22:37:15 +000089
Bram Moolenaar0e21a3f2005-04-17 20:28:32 +000090Exceptions:
91- Vim uses "latin1" when 'encoding' is "iso-8859-15". The euro sign doesn't
92 matter for spelling.
93- When no spell file for 'encoding' is found "ascii" is tried. This only
94 works for languages where nearly all words are ASCII, such as English. It
95 helps when 'encoding' is not "latin1", such as iso-8859-2, and English text
96 is being edited.
Bram Moolenaar217ad922005-03-20 22:37:15 +000097
Bram Moolenaar6bb68362005-03-22 23:03:44 +000098Spelling for EBCDIC is currently not supported.
99
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000100A spell file might not be available in the current 'encoding'. See
101|spell-mkspell| about how to create a spell file. Converting a spell file
Bram Moolenaar0e21a3f2005-04-17 20:28:32 +0000102with "iconv" will NOT work!
Bram Moolenaar217ad922005-03-20 22:37:15 +0000103
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000104 *E758* *E759*
105When loading a spell file Vim checks that it is properly formatted. If you
Bram Moolenaar0e21a3f2005-04-17 20:28:32 +0000106get an error the file may be truncated, modified or intended for another Vim
107version.
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000108
Bram Moolenaar6bb68362005-03-22 23:03:44 +0000109
110WORDS
111
112Vim uses a fixed method to recognize a word. This is independent of
113'iskeyword', so that it also works in help files and for languages that
114include characters like '-' in 'iskeyword'. The word characters do depend on
115'encoding'.
116
117A word that starts with a digit is always ignored.
118
119
Bram Moolenaar9d0ec2e2005-04-20 19:45:58 +0000120SYNTAX HIGHLIGHTING *spell-syntax*
Bram Moolenaar6bb68362005-03-22 23:03:44 +0000121
122Files that use syntax highlighting can specify where spell checking should be
123done:
124
125 everywhere default
126 in specific items use "contains=@Spell"
127 everywhere but specific items use "contains=@NoSpell"
128
129Note that mixing @Spell and @NoSpell doesn't make sense.
130
Bram Moolenaar217ad922005-03-20 22:37:15 +0000131==============================================================================
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +00001322. Generating a spell file *spell-mkspell*
Bram Moolenaar217ad922005-03-20 22:37:15 +0000133
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000134Vim uses a binary file format for spelling. This greatly speeds up loading
135the word list and keeps it small.
Bram Moolenaar217ad922005-03-20 22:37:15 +0000136
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000137You can create a Vim spell file from the .aff and .dic files that Myspell
138uses. Myspell is used by OpenOffice.org and Mozilla. You should be able to
139find them here:
140 http://lingucomponent.openoffice.org/spell_dic.html
Bram Moolenaar217ad922005-03-20 22:37:15 +0000141
Bram Moolenaar0e21a3f2005-04-17 20:28:32 +0000142:mksp[ell] [-ascii] {outname} {inname} ... *:mksp* *:mkspell*
Bram Moolenaar75c50c42005-06-04 22:06:24 +0000143 Generate spell file {outname}.spl.
144
Bram Moolenaar0e21a3f2005-04-17 20:28:32 +0000145 When the [-ascii] argument is present, words with
146 non-ascii characters are skipped. The resulting file
147 ends in "ascii.spl". Otherwise the resulting file
148 ends in "ENC.spl", where ENC is the value of
149 'encoding'.
Bram Moolenaar75c50c42005-06-04 22:06:24 +0000150
151 The input can be the Myspell format files {inname}.aff
152 and {inname}.dic. If {inname}.aff does not exist then
153 {inname} is used as the file name of a plain word
154 list.
155
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000156 Multiple {inname} arguments can be given to combine
157 regions into one Vim spell file. Example: >
158 :mkspell ~/.vim/spell/en /tmp/en_US /tmp/en_CA /tmp/en_AU
159< This combines the English word lists for US, CA and AU
160 into one en.spl file.
161 Up to eight regions can be combined. *E754* *755*
Bram Moolenaar217ad922005-03-20 22:37:15 +0000162
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000163Since you might want to change the word list for use with Vim the following
164procedure is recommended:
Bram Moolenaar217ad922005-03-20 22:37:15 +0000165
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +00001661. Obtain the xx_YY.aff and xx_YY.dic files from Myspell.
1672. Make a copy of these files to xx_YY.orig.aff and xx_YY.orig.dic.
1683. Change the xx_YY.aff and xx_YY.dic files to remove bad words, add missing
Bram Moolenaar0cb032e2005-04-23 20:52:00 +0000169 words, define word characters with FOL/LOW/UPP, etc. The distributed
170 "src/spell/*.diff" files can be used.
1714. Set 'encoding' to the desired encoding and use |:mkspell| to generate the
172 Vim spell file.
1735. Try out the spell file with ":set spell spelllang=xx_YY".
Bram Moolenaar217ad922005-03-20 22:37:15 +0000174
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000175When the Myspell files are updated you can merge the differences:
Bram Moolenaar0cb032e2005-04-23 20:52:00 +00001761. Obtain the new Myspell files as xx_YY.new.aff and xx_UU.new.dic.
1772. Use Vimdiff to see what changed: >
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000178 vimdiff xx_YY.orig.dic xx_YY.new.dic
Bram Moolenaar0cb032e2005-04-23 20:52:00 +00001793. Take over the changes you like in xx_YY.dic.
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000180 You may also need to change xx_YY.aff.
Bram Moolenaar0cb032e2005-04-23 20:52:00 +00001814. Rename xx_YY.new.dic to xx_YY.orig.dic and xx_YY.new.aff to xx_YY.new.aff.
Bram Moolenaar217ad922005-03-20 22:37:15 +0000182
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000183==============================================================================
1849. Spell file format *spell-file-format*
Bram Moolenaar217ad922005-03-20 22:37:15 +0000185
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000186This is the format of the files that are used by the person who creates and
187maintains a word list.
Bram Moolenaar217ad922005-03-20 22:37:15 +0000188
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000189Note that we avoid the word "dictionary" here. That is because the goal of
190spell checking differs from writing a dictionary (as in the book). For
191spelling we need a list of words that are OK, thus need not to be highlighted.
192Names will not appear in a dictionary, but do appear in a word list. And
193some old words are rarely used and are common misspellings. These do appear
194in a dictionary but not in a word list.
Bram Moolenaar217ad922005-03-20 22:37:15 +0000195
Bram Moolenaar75c50c42005-06-04 22:06:24 +0000196There are two formats: one with affix compression and one without. The files
197with affix compression are used by Myspell (Mozilla and OpenOffice.org). This
198requires two files, one with .aff and one with .dic extension. The second
199format is a list of words.
200
201
202FORMAT OF WORD LIST
203
204The words must appear one per line. That is all that is required. Optional
205items are:
206- Empty and blank lines are ignored.
207- Lines starting with a # are ignored (comment lines).
208- A line starting with "=encoding=" before any word. After the second '='
209 comes an encoding name. This tells Vim to setup conversion from the
210 specified encoding to 'encoding'.
211- Other lines starting with '=' are special. The ones that are not recognized
212 are ignored (but you do get a warning message).
213
214
215FORMAT WITH AFFIX COMPRESSION
216
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000217There are two files: the basic word list and an affix file. The affixes are
218used to modify the basic words to get the full word list. This significantly
219reduces the number of words, especially for a language like Polish. This is
220called affix compression.
Bram Moolenaar217ad922005-03-20 22:37:15 +0000221
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000222The format for the affix and word list files is mostly identical to what
223Myspell uses (the spell checker of Mozilla and OpenOffice.org). A description
224can be found here:
225 http://lingucomponent.openoffice.org/affix.readme ~
226Note that affixes are case sensitive, this isn't obvious from the description.
227Vim supports a few extras. Hopefully Myspell will support these too some day.
228See |spell-affix-vim|.
Bram Moolenaar217ad922005-03-20 22:37:15 +0000229
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000230The basic word list and the affix file are combined and turned into a binary
231spell file. All the preprocessing has been done, thus this file loads fast.
232The binary spell file format is described in the source code (src/spell.c).
233But only developers need to know about it.
234
235The preprocessing also allows us to take the Myspell language files and modify
236them before the Vim word list is made. The tools for this can be found in the
237"src/spell" directory.
238
239
240WORD LIST FORMAT *spell-wordlist-format*
241
242A very short example, with line numbers:
243
244 1 1234
245 2 aan
246 3 Als
247 4 Etten-Leur
248 5 et al.
249 6 's-Gravenhage
250 7 's-Gravenhaags
251 8 bedel/P
252 9 kado/1
253 10 cadeau/2
254
255The first line contains the number of words. Vim ignores it. *E760*
256
257What follows is one word per line. There should be no white space after the
258word.
259
260When the word only has lower-case letters it will also match with the word
261starting with an upper-case letter.
262
263When the word includes an upper-case letter, this means the upper-case letter
264is required at this position. The same word with a lower-case letter at this
265position will not match. When some of the other letters are upper-case it will
266not match either.
267
268The same word with all upper-case characters will always be OK.
269
270 word list matches does not match ~
271 als als Als ALS ALs AlS aLs aLS
272 Als Als ALS als ALs AlS aLs aLS
273 ALS ALS als Als ALs AlS aLs aLS
274 AlS AlS ALS als Als ALs aLs aLS
275
276Note in line 5 to 7 that non-word characters are used. You can include
277any character in a word. When checking the text a word still only matches
278when it appears with a non-word character before and after it. For Myspell a
279word starting with a non-word character probably won't work.
280
281After the word there is an optional slash and flags. Most of these flags are
282letters that indicate the affixes that can be used with this word.
283
284 *spell-affix-vim*
285A flag that Vim adds and is not in Myspell is the "=" flag. This has the
286meaning that case matters. This can be used if the word does not have the
287first letter in upper case at the start of a sentence. Example:
288
289 word list matches does not match ~
290 's morgens/= 's morgens 'S morgens 's Morgens
291 's Morgens 's Morgens 'S morgens 's morgens
292
293 *spell-affix-mbyte*
294The basic word list is normally in an 8-bit encoding, which is mentioned in
295the affix file. The affix file must always be in the same encoding as the
296word list. This is compatible with Myspell. For Vim the encoding may also be
297something else, any encoding that "iconv" supports. The "SET" line must
298specify the name of the encoding. When using a multi-byte encoding it's
299possible to use more different affixes.
300
301Performance hint: Although using affixes reduces the number of words, it
302reduces the speed. It's a good idea to put all the often used words in the
303word list with the affixes prepended/appended.
Bram Moolenaar217ad922005-03-20 22:37:15 +0000304
Bram Moolenaar0cb032e2005-04-23 20:52:00 +0000305 *spell-affix-chars*
306The affix file should define the word characters when using an 8-bit encoding
307(as specified with ENC). This is because the system where ":mkspell" is used
308may not support a locale with this encoding and isalpha() won't work. For
309example when using "cp1250" on Unix.
310
311 *E761* *E762*
312Three lines in the affix file are needed. Simplistic example:
313
314 FOL áëñáëñ
315 LOW áëñáëñ
316 UPP áëñÁËÑ
317
318All three lines must have exactly the same number of characters.
319
320The "FOL" line specifies the case-folded characters. These are used to
321compare words while ignoring case. For most encodings this is identical to
322the lower case line.
323
324The "LOW" line specifies the characters in lower-case. Mostly it's equal to
325the "FOL" line.
326
327The "UPP" line specifies the characters with upper-case. That is, a character
328is upper-case where it's different from the character at the same position in
329"FOL".
330
331ASCII characters should be omitted, Vim always handles these in the same way.
332When the encoding is UTF-8 no word characters need to be specified.
333
334 *E763*
335All spell files for the same encoding must use the same word characters,
Bram Moolenaar46df82e2005-04-24 22:06:24 +0000336otherwise they can't be combined without errors. The XX.ascii.spl spell file
337generated with the "-ascii" argument will not contain the table with
338characters, so that it can be combine with spell files for any encoding.
Bram Moolenaar0cb032e2005-04-23 20:52:00 +0000339
Bram Moolenaar217ad922005-03-20 22:37:15 +0000340
341 vim:tw=78:sw=4:ts=8:ft=help:norl: