blob: 291a3d3666f62f3039c8a2d94f4204c0a25d9c3b [file] [log] [blame]
Bram Moolenaar6e7c7f32005-08-24 22:16:11 +00001*spell.txt* For Vim version 7.0aa. Last change: 2005 Aug 24
Bram Moolenaar217ad922005-03-20 22:37:15 +00002
3
4 VIM REFERENCE MANUAL by Bram Moolenaar
5
6
7Spell checking *spell*
8
91. Quick start |spell-quickstart|
Bram Moolenaard042c562005-06-30 22:04:15 +0000102. Remarks on spell checking |spell-remarks|
113. Generating a spell file |spell-mkspell|
124. Spell file format |spell-file-format|
Bram Moolenaar217ad922005-03-20 22:37:15 +000013
14{Vi does not have any of these commands}
15
16Spell checking is not available when the |+syntax| feature has been disabled
17at compile time.
18
19==============================================================================
201. Quick start *spell-quickstart*
21
22This command switches on spell checking: >
23
24 :setlocal spell spelllang=en_us
25
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +000026This switches on the 'spell' option and specifies to check for US English.
Bram Moolenaar217ad922005-03-20 22:37:15 +000027
28The words that are not recognized are highlighted with one of these:
Bram Moolenaar520470a2005-06-16 21:59:56 +000029 SpellBad word not recognized |hl-SpellBad|
Bram Moolenaar7d1f5db2005-07-03 21:39:27 +000030 SpellCap word not capitalised |hl-SpellCap|
Bram Moolenaar520470a2005-06-16 21:59:56 +000031 SpellRare rare word |hl-SpellRare|
32 SpellLocal wrong spelling for selected region |hl-SpellLocal|
Bram Moolenaar217ad922005-03-20 22:37:15 +000033
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +000034Vim only checks words for spelling, there is no grammar check.
35
Bram Moolenaar45360022005-07-21 21:08:21 +000036If the 'mousemodel' option is set to "popup" and the cursor is on a badly
37spelled word or it is "popup_setpos" and the mouse pointer is on a badly
38spelled word, then the popup menu will contain an submenu to replace the bad
39word. Note: this slows down the appearance of the popup menu.
40
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +000041To search for the next misspelled word:
42
43 *]s* *E756*
44]s Move to next misspelled word after the cursor.
Bram Moolenaar9d0ec2e2005-04-20 19:45:58 +000045 A count before the command can be used to repeat.
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +000046
47 *[s*
Bram Moolenaar9d0ec2e2005-04-20 19:45:58 +000048[s Like "]s" but search backwards, find the misspelled
Bram Moolenaar30abd282005-06-22 22:35:10 +000049 word before the cursor. Doesn't recognize words
50 split over two lines, thus may stop at words that are
Bram Moolenaar0d9c26d2005-07-02 23:19:16 +000051 not highlighted as bad. Does not stop at word with
52 missing capital at the start of a line.
Bram Moolenaar9d0ec2e2005-04-20 19:45:58 +000053
54 *]S*
55]S Like "]s" but only stop at bad words, not at rare
56 words or words for another region.
57
58 *[S*
59[S Like "]S" but search backwards.
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +000060
Bram Moolenaar217ad922005-03-20 22:37:15 +000061
Bram Moolenaar9ba0eb82005-06-13 22:28:56 +000062To add words to your own word list: *E764*
Bram Moolenaar82cf9b62005-06-07 21:09:25 +000063
64 *zg*
Bram Moolenaar0d9c26d2005-07-02 23:19:16 +000065zg Add word under the cursor as a good word to the first
66 name in 'spellfile'. In Visual mode the selected
67 characters are added as a word (including white
68 space!). If the word is explicitly marked as bad word
69 in another spell file the result is unpredictable.
70 A count may precede the command to indicate the entry
71 in 'spellfile' to be used. A count of two uses the
72 second entry.
Bram Moolenaar82cf9b62005-06-07 21:09:25 +000073
Bram Moolenaar1f8a5f02005-07-01 22:41:52 +000074 *zG*
Bram Moolenaar0d9c26d2005-07-02 23:19:16 +000075zG Like "zg" but add the word to the internal word list
76 |internal-wordlist|.
Bram Moolenaar1f8a5f02005-07-01 22:41:52 +000077
Bram Moolenaar82cf9b62005-06-07 21:09:25 +000078 *zw*
Bram Moolenaar0d9c26d2005-07-02 23:19:16 +000079zw Like "zg" but mark the word as a wrong (bad) word.
Bram Moolenaar82cf9b62005-06-07 21:09:25 +000080
Bram Moolenaar1f8a5f02005-07-01 22:41:52 +000081 *zW*
Bram Moolenaar0d9c26d2005-07-02 23:19:16 +000082zW Like "zw" but add the word to the internal word list
83 |internal-wordlist|.
Bram Moolenaar1f8a5f02005-07-01 22:41:52 +000084
Bram Moolenaar520470a2005-06-16 21:59:56 +000085 *:spe* *:spellgood*
Bram Moolenaar0d9c26d2005-07-02 23:19:16 +000086:[count]spe[llgood] {word}
Bram Moolenaar53180ce2005-07-05 21:48:14 +000087 Add {word} as a good word to 'spellfile', like with
Bram Moolenaar0d9c26d2005-07-02 23:19:16 +000088 "zg". Without count the first name is used, with a
89 count of two the second entry, etc.
Bram Moolenaar82cf9b62005-06-07 21:09:25 +000090
Bram Moolenaar53180ce2005-07-05 21:48:14 +000091:spe[llgood]! {word} Add {word} as a good word to the internal word list,
Bram Moolenaar0d9c26d2005-07-02 23:19:16 +000092 like with "zG".
Bram Moolenaar1f8a5f02005-07-01 22:41:52 +000093
Bram Moolenaar520470a2005-06-16 21:59:56 +000094 *:spellw* *:spellwrong*
Bram Moolenaar0d9c26d2005-07-02 23:19:16 +000095:[count]spellw[rong] {word}
Bram Moolenaar53180ce2005-07-05 21:48:14 +000096 Add {word} as a wrong (bad) word to 'spellfile', as
Bram Moolenaar0d9c26d2005-07-02 23:19:16 +000097 with "zw". Without count the first name is used, with
98 a count of two the second entry, etc.
Bram Moolenaar82cf9b62005-06-07 21:09:25 +000099
Bram Moolenaar53180ce2005-07-05 21:48:14 +0000100:spellw[rong]! {word} Add {word} as a wrong (bad) word to the internal word
Bram Moolenaar1f8a5f02005-07-01 22:41:52 +0000101 list.
102
Bram Moolenaarf461c8e2005-06-25 23:04:51 +0000103After adding a word to 'spellfile' with the above commands its associated
Bram Moolenaard042c562005-06-30 22:04:15 +0000104".spl" file will automatically be updated and reloaded. If you change
105'spellfile' manually you need to use the |:mkspell| command. This sequence of
106commands mostly works well: >
Bram Moolenaar0d9c26d2005-07-02 23:19:16 +0000107 :edit <file in 'spellfile'>
Bram Moolenaarf461c8e2005-06-25 23:04:51 +0000108< (make changes to the spell file) >
109 :mkspell! %
110
111More details about the 'spellfile' format below |spell-wordlist-format|.
Bram Moolenaar82cf9b62005-06-07 21:09:25 +0000112
Bram Moolenaar0d9c26d2005-07-02 23:19:16 +0000113 *internal-wordlist*
Bram Moolenaar1f8a5f02005-07-01 22:41:52 +0000114The internal word list is used for all buffers where 'spell' is set. It is
115not stored, it is lost when you exit Vim. It is also cleared when 'encoding'
116is set.
117
Bram Moolenaar82cf9b62005-06-07 21:09:25 +0000118
Bram Moolenaar9ba0eb82005-06-13 22:28:56 +0000119Finding suggestions for bad words:
Bram Moolenaar9ba0eb82005-06-13 22:28:56 +0000120 *z?*
Bram Moolenaar30abd282005-06-22 22:35:10 +0000121z? For the word under/after the cursor suggest correctly
Bram Moolenaard042c562005-06-30 22:04:15 +0000122 spelled words. This also works to find alternatives
123 for a word that is not highlighted as a bad word,
124 e.g., when the word after it is bad.
Bram Moolenaar30abd282005-06-22 22:35:10 +0000125 The results are sorted on similarity to the word
126 under/after the cursor.
Bram Moolenaar90915b52005-08-21 22:17:52 +0000127 This may take a long time. Hit CTRL-C when you get
Bram Moolenaar9ba0eb82005-06-13 22:28:56 +0000128 bored.
Bram Moolenaar90915b52005-08-21 22:17:52 +0000129
130 If the command is used without a count the
131 alternatives are listed and you can enter the number
132 of your choice or press <Enter> if you don't want to
133 replace. You can also use the mouse to click on your
134 choice (only works if the mouse can be used in Normal
135 mode and when there are no line wraps). Click on the
136 first line (the header) to cancel.
137
138 If a count is used that suggestion is used, without
139 prompting. For example, "1z?" always takes the first
140 suggestion.
141
142 If 'verbose' is non-zero a score will be displayed
143 with the suggestions to indicate the likeliness to the
144 badly spelled word (the higher the score the more
145 different).
Bram Moolenaard857f0e2005-06-21 22:37:39 +0000146 When a word was replaced the redo command "." will
147 repeat the word replacement. This works like "ciw",
Bram Moolenaar6f16eb82005-08-23 21:02:42 +0000148 the good word and <Esc>. This does NOT work for Thai
149 and other languages without spaces between words.
Bram Moolenaard857f0e2005-06-21 22:37:39 +0000150
Bram Moolenaar24bbcfe2005-06-28 23:32:02 +0000151 *:spellr* *:spellrepall* *E752* *E753*
152:spellr[epall] Repeat the replacement done by |z?| for all matches
153 with the replaced word in the current window.
154
Bram Moolenaar488c6512005-08-11 20:09:58 +0000155In Insert mode, when the cursor is after a badly spelled word, you can use
156CTRL-X s to find suggestions. This works like Insert mode completion. Use
157CTRL-N to use the next suggestion, CTRL-P to go back. |i_CTRL-X_s|
158
Bram Moolenaard857f0e2005-06-21 22:37:39 +0000159The 'spellsuggest' option influences how the list of suggestions is generated
160and sorted. See |'spellsuggest'|.
Bram Moolenaar9ba0eb82005-06-13 22:28:56 +0000161
Bram Moolenaar0d9c26d2005-07-02 23:19:16 +0000162The 'spellcapcheck' option is used to check the first word of a sentence
163starts with a capital. This doesn't work for the first word in the file.
164When there is a line break right after a sentence the highlighting of the next
Bram Moolenaar90cfdbe2005-08-12 19:59:19 +0000165line may be postponed. Use |CTRL-L| when needed. Also see |set-spc-auto| for
166how it can be set automatically when 'spelllang' is set.
Bram Moolenaar0d9c26d2005-07-02 23:19:16 +0000167
Bram Moolenaard042c562005-06-30 22:04:15 +0000168==============================================================================
1692. Remarks on spell checking *spell-remarks*
Bram Moolenaar9ba0eb82005-06-13 22:28:56 +0000170
Bram Moolenaar6bb68362005-03-22 23:03:44 +0000171PERFORMANCE
172
Bram Moolenaard042c562005-06-30 22:04:15 +0000173Vim does on-the-fly spell checking. To make this work fast the word list is
174loaded in memory. Thus this uses a lot of memory (1 Mbyte or more). There
175might also be a noticeable delay when the word list is loaded, which happens
176when 'spell' is set and when 'spelllang' is set while 'spell' was already set.
177To minimize the delay each word list is only loaded once, it is not deleted
178when 'spelllang' is made empty or 'spell' is reset. When 'encoding' is set
179all the word lists are reloaded, thus you may notice a delay then too.
Bram Moolenaar6bb68362005-03-22 23:03:44 +0000180
181
Bram Moolenaar217ad922005-03-20 22:37:15 +0000182REGIONS
183
184A word may be spelled differently in various regions. For example, English
185comes in (at least) these variants:
186
187 en all regions
Bram Moolenaar5c5474b2005-04-19 21:40:26 +0000188 en_au Australia
Bram Moolenaar217ad922005-03-20 22:37:15 +0000189 en_ca Canada
Bram Moolenaar5c5474b2005-04-19 21:40:26 +0000190 en_gb Great Britain
191 en_nz New Zealand
192 en_us USA
Bram Moolenaar217ad922005-03-20 22:37:15 +0000193
194Words that are not used in one region but are used in another region are
Bram Moolenaar520470a2005-06-16 21:59:56 +0000195highlighted with SpellLocal |hl-SpellLocal|.
Bram Moolenaar217ad922005-03-20 22:37:15 +0000196
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000197Always use lowercase letters for the language and region names.
Bram Moolenaar217ad922005-03-20 22:37:15 +0000198
Bram Moolenaar3638c682005-06-08 22:05:14 +0000199When adding a word with |zg| or another command it's always added for all
200regions. You can change that by manually editing the 'spellfile'. See
Bram Moolenaar0dc065e2005-07-04 22:49:24 +0000201|spell-wordlist-format|. Note that the regions as specified in the files in
202'spellfile' are only used when all entries in "spelllang" specify the same
203region (not counting files specified by their .spl name).
Bram Moolenaar3638c682005-06-08 22:05:14 +0000204
Bram Moolenaar5b8d8fd2005-08-16 23:01:50 +0000205 *spell-german*
Bram Moolenaarae5bce12005-08-15 21:41:48 +0000206Specific exception: For German these special regions are used:
207 de all German words accepted
208 de_de old and new spelling
209 de_19 old spelling
210 de_20 new spelling
211 de_at Austria
212 de_ch Switzerland
213
Bram Moolenaar5b8d8fd2005-08-16 23:01:50 +0000214 *spell-yiddish*
215Yiddish requires using "utf-8" encoding, because of the special characters
216used. If you are using latin1 Vim will use transliterated (romanized) Yiddish
217instead. If you want to use transliterated Yiddish with utf-8 use "yi-tr".
218In a table:
219 'encoding' 'spelllang'
220 utf-8 yi Yiddish
221 latin1 yi transliterated Yiddish
222 utf-8 yi-tr transliterated Yiddish
223
Bram Moolenaar217ad922005-03-20 22:37:15 +0000224
Bram Moolenaar3b506942005-06-23 22:36:45 +0000225SPELL FILES *spell-load*
Bram Moolenaar217ad922005-03-20 22:37:15 +0000226
227Vim searches for spell files in the "spell" subdirectory of the directories in
Bram Moolenaar3638c682005-06-08 22:05:14 +0000228'runtimepath'. The name is: LL.EEE.spl, where:
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000229 LL the language name
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000230 EEE the value of 'encoding'
Bram Moolenaar217ad922005-03-20 22:37:15 +0000231
Bram Moolenaar3b506942005-06-23 22:36:45 +0000232The value for "LL" comes from 'spelllang', but excludes the region name.
233Examples:
234 'spelllang' LL ~
235 en_us en
236 en-rare en-rare
237 medical_ca medical
238
Bram Moolenaar3638c682005-06-08 22:05:14 +0000239Only the first file is loaded, the one that is first in 'runtimepath'. If
240this succeeds then additionally files with the name LL.EEE.add.spl are loaded.
241All the ones that are found are used.
242
Bram Moolenaar0d9c26d2005-07-02 23:19:16 +0000243Additionally, the files related to the names in 'spellfile' are loaded. These
244are the files that |zg| and |zw| add good and wrong words to.
Bram Moolenaar3b506942005-06-23 22:36:45 +0000245
Bram Moolenaar0e21a3f2005-04-17 20:28:32 +0000246Exceptions:
247- Vim uses "latin1" when 'encoding' is "iso-8859-15". The euro sign doesn't
248 matter for spelling.
249- When no spell file for 'encoding' is found "ascii" is tried. This only
250 works for languages where nearly all words are ASCII, such as English. It
251 helps when 'encoding' is not "latin1", such as iso-8859-2, and English text
Bram Moolenaar3638c682005-06-08 22:05:14 +0000252 is being edited. For the ".add" files the same name as the found main
253 spell file is used.
254
255For example, with these values:
256 'runtimepath' is "~/.vim,/usr/share/vim70,~/.vim/after"
257 'encoding' is "iso-8859-2"
258 'spelllang' is "pl"
259
260Vim will look for:
2611. ~/.vim/spell/pl.iso-8859-2.spl
2622. /usr/share/vim70/spell/pl.iso-8859-2.spl
2633. ~/.vim/spell/pl.iso-8859-2.add.spl
2644. /usr/share/vim70/spell/pl.iso-8859-2.add.spl
2655. ~/.vim/after/spell/pl.iso-8859-2.add.spl
266
267This assumes 1. is not found and 2. is found.
268
269If 'encoding' is "latin1" Vim will look for:
2701. ~/.vim/spell/pl.latin1.spl
2712. /usr/share/vim70/spell/pl.latin1.spl
2723. ~/.vim/after/spell/pl.latin1.spl
2734. ~/.vim/spell/pl.ascii.spl
2745. /usr/share/vim70/spell/pl.ascii.spl
2756. ~/.vim/after/spell/pl.ascii.spl
276
277This assumes none of them are found (Polish doesn't make sense when leaving
278out the non-ASCII characters).
Bram Moolenaar217ad922005-03-20 22:37:15 +0000279
Bram Moolenaar6bb68362005-03-22 23:03:44 +0000280Spelling for EBCDIC is currently not supported.
281
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000282A spell file might not be available in the current 'encoding'. See
283|spell-mkspell| about how to create a spell file. Converting a spell file
Bram Moolenaar0e21a3f2005-04-17 20:28:32 +0000284with "iconv" will NOT work!
Bram Moolenaar217ad922005-03-20 22:37:15 +0000285
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000286 *E758* *E759*
287When loading a spell file Vim checks that it is properly formatted. If you
Bram Moolenaar0e21a3f2005-04-17 20:28:32 +0000288get an error the file may be truncated, modified or intended for another Vim
289version.
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000290
Bram Moolenaar6bb68362005-03-22 23:03:44 +0000291
292WORDS
293
294Vim uses a fixed method to recognize a word. This is independent of
295'iskeyword', so that it also works in help files and for languages that
296include characters like '-' in 'iskeyword'. The word characters do depend on
297'encoding'.
298
Bram Moolenaar9ba0eb82005-06-13 22:28:56 +0000299The table with word characters is stored in the main .spl file. Therefore it
300matters what the current locale is when generating it! A .add.spl file does
Bram Moolenaarf461c8e2005-06-25 23:04:51 +0000301not contain a word table though.
Bram Moolenaar9ba0eb82005-06-13 22:28:56 +0000302
Bram Moolenaar3638c682005-06-08 22:05:14 +0000303A word that starts with a digit is always ignored. That includes hex numbers
304in the form 0xff and 0XFF.
Bram Moolenaar6bb68362005-03-22 23:03:44 +0000305
306
Bram Moolenaar30abd282005-06-22 22:35:10 +0000307WORD COMBINATIONS
308
309It is possible to spell-check words that include a space. This is used to
310recognize words that are invalid when used by themselves, e.g. for "et al.".
311It can also be used to recognize "the the" and highlight it.
312
313The number of spaces is irrelevant. In most cases a line break may also
314appear. However, this makes it difficult to find out where to start checking
315for spelling mistakes. When you make a change to one line and only that line
316is redrawn Vim won't look in the previous line, thus when "et" is at the end
317of the previous line "al." will be flagged as an error. And when you type
318"the<CR>the" the highlighting doesn't appear until the first line is redrawn.
319Use |CTRL-L| to redraw right away. "[s" will also stop at a word combination
320with a line break.
321
322When encountering a line break Vim skips characters such as '*', '>' and '"',
323so that comments in C, shell and Vim code can be spell checked.
324
325
Bram Moolenaar9d0ec2e2005-04-20 19:45:58 +0000326SYNTAX HIGHLIGHTING *spell-syntax*
Bram Moolenaar6bb68362005-03-22 23:03:44 +0000327
328Files that use syntax highlighting can specify where spell checking should be
329done:
330
Bram Moolenaar3638c682005-06-08 22:05:14 +00003311. everywhere default
3322. in specific items use "contains=@Spell"
3333. everywhere but specific items use "contains=@NoSpell"
Bram Moolenaar6bb68362005-03-22 23:03:44 +0000334
Bram Moolenaar3638c682005-06-08 22:05:14 +0000335For the second method adding the @NoSpell cluster will disable spell checking
336again. This can be used, for example, to add @Spell to the comments of a
337program, and add @NoSpell for items that shouldn't be checked.
Bram Moolenaar6bb68362005-03-22 23:03:44 +0000338
Bram Moolenaar30abd282005-06-22 22:35:10 +0000339
340VIM SCRIPTS
341
342If you want to write a Vim script that does something with spelling, you may
343find these functions useful:
344
345 spellbadword() find badly spelled word at the cursor
346 spellsuggest() get list of spelling suggestions
Bram Moolenaard042c562005-06-30 22:04:15 +0000347 soundfold() get the sound-a-like version of a word
Bram Moolenaar30abd282005-06-22 22:35:10 +0000348
Bram Moolenaar90cfdbe2005-08-12 19:59:19 +0000349
350SETTING 'spellcapcheck' AUTOMATICALLY *set-spc-auto*
351
352After the 'spelllang' option has been set successfully, Vim will source the
353files "spell/LANG.vim" in 'runtimepath'. "LANG" is the value of 'spelllang'
354up to the first comma, dot or underscore. This can be used to set options
355specifically for the language, especially 'spellcapcheck'.
356
357The distribution includes a few of these files. Use this command to see what
358they do: >
359 :next $VIMRUNTIME/spell/*.vim
360
361Note that the default scripts don't set 'spellcapcheck' if it was changed from
362the default value. This assumes the user prefers another value then.
363
Bram Moolenaarae5bce12005-08-15 21:41:48 +0000364
365DOUBLE SCORING *spell-double-scoring*
366
367The 'spellsuggest' option can be used to select "double" scoring. This
368mechanism is based on the principle that there are two kinds of spelling
369mistakes:
370
3711. You know how to spell the word, but mistype something. This results in a
372 small editing distance (character swapped/omitted/inserted) and possibly a
373 word that sounds completely different.
374
3752. You don't know how to spell the word and type something that sounds right.
376 The edit distance can be big but the word is similar after sound-folding.
377
378Since scores for these two mistakes will be very different we use a list
379for each and mix them.
380
381The sound-folding is slow and people that know the language won't make the
382second kind of mistakes. Therefore 'spellsuggest' can be set to select the
383preferred method for scoring the suggestions.
384
Bram Moolenaar217ad922005-03-20 22:37:15 +0000385==============================================================================
Bram Moolenaard042c562005-06-30 22:04:15 +00003863. Generating a spell file *spell-mkspell*
Bram Moolenaar217ad922005-03-20 22:37:15 +0000387
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000388Vim uses a binary file format for spelling. This greatly speeds up loading
389the word list and keeps it small.
Bram Moolenaar9a50b1b2005-06-27 22:48:21 +0000390 *.aff* *.dic* *Myspell*
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000391You can create a Vim spell file from the .aff and .dic files that Myspell
392uses. Myspell is used by OpenOffice.org and Mozilla. You should be able to
393find them here:
394 http://lingucomponent.openoffice.org/spell_dic.html
Bram Moolenaar30abd282005-06-22 22:35:10 +0000395You can also use a plain word list. The results are the same, the choice
Bram Moolenaard042c562005-06-30 22:04:15 +0000396depends on what word lists you can find.
Bram Moolenaar217ad922005-03-20 22:37:15 +0000397
Bram Moolenaar7d1f5db2005-07-03 21:39:27 +0000398If you install Aap (from www.a-a-p.org) you can use the recipes in the
399runtime/spell/??/ directories. Aap will take care of downloading the files,
400apply patches needed for Vim and build the .spl file.
401
Bram Moolenaare13305e2005-06-19 22:54:15 +0000402Make sure your current locale is set properly, otherwise Vim doesn't know what
403characters are upper/lower case letters. If the locale isn't available (e.g.,
404when using an MS-Windows codepage on Unix) add tables to the .aff file
Bram Moolenaar3b506942005-06-23 22:36:45 +0000405|spell-affix-chars|. If the .aff file doesn't define a table then the word
406table of the currently active spelling is used. If spelling is not active
407then Vim will try to guess.
Bram Moolenaare13305e2005-06-19 22:54:15 +0000408
Bram Moolenaar3b506942005-06-23 22:36:45 +0000409 *:mksp* *:mkspell*
410:mksp[ell][!] [-ascii] {outname} {inname} ...
Bram Moolenaar82cf9b62005-06-07 21:09:25 +0000411 Generate a Vim spell file word lists. Example: >
Bram Moolenaard042c562005-06-30 22:04:15 +0000412 :mkspell /tmp/nl nl_NL.words
Bram Moolenaar3b506942005-06-23 22:36:45 +0000413< *E751*
Bram Moolenaar82cf9b62005-06-07 21:09:25 +0000414 When {outname} ends in ".spl" it is used as the output
415 file name. Otherwise it should be a language name,
Bram Moolenaar3b506942005-06-23 22:36:45 +0000416 such as "en", without the region name. The file
417 written will be "{outname}.{encoding}.spl", where
418 {encoding} is the value of the 'encoding' option.
Bram Moolenaar75c50c42005-06-04 22:06:24 +0000419
Bram Moolenaard042c562005-06-30 22:04:15 +0000420 When the output file already exists [!] must be used
Bram Moolenaar520470a2005-06-16 21:59:56 +0000421 to overwrite it.
422
Bram Moolenaar0e21a3f2005-04-17 20:28:32 +0000423 When the [-ascii] argument is present, words with
424 non-ascii characters are skipped. The resulting file
Bram Moolenaar82cf9b62005-06-07 21:09:25 +0000425 ends in "ascii.spl".
Bram Moolenaar75c50c42005-06-04 22:06:24 +0000426
427 The input can be the Myspell format files {inname}.aff
428 and {inname}.dic. If {inname}.aff does not exist then
429 {inname} is used as the file name of a plain word
430 list.
431
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000432 Multiple {inname} arguments can be given to combine
433 regions into one Vim spell file. Example: >
434 :mkspell ~/.vim/spell/en /tmp/en_US /tmp/en_CA /tmp/en_AU
435< This combines the English word lists for US, CA and AU
436 into one en.spl file.
437 Up to eight regions can be combined. *E754* *755*
Bram Moolenaar9ba0eb82005-06-13 22:28:56 +0000438 The REP and SAL items of the first .aff file where
Bram Moolenaar6f16eb82005-08-23 21:02:42 +0000439 they appear are used. |spell-REP| |spell-SAL|
Bram Moolenaar217ad922005-03-20 22:37:15 +0000440
Bram Moolenaar30abd282005-06-22 22:35:10 +0000441 This command uses a lot of memory, required to find
Bram Moolenaar8aff23a2005-08-19 20:40:30 +0000442 the optimal word tree (Polish, Italian and Hungarian
443 require several hundred Mbyte). The final result will
444 be much smaller, because compression is used. To
445 avoid running out of memory compression will be done
446 now and then. This can be tuned with the 'mkspellmem'
447 option.
Bram Moolenaar30abd282005-06-22 22:35:10 +0000448
Bram Moolenaard042c562005-06-30 22:04:15 +0000449 After the spell file was written and it was being used
450 in a buffer it will be reloaded automatically.
Bram Moolenaar45eeb132005-06-06 21:59:07 +0000451
Bram Moolenaar9a50b1b2005-06-27 22:48:21 +0000452:mksp[ell] [-ascii] {name}.{enc}.add
453 Like ":mkspell" above, using {name}.{enc}.add as the
Bram Moolenaard042c562005-06-30 22:04:15 +0000454 input file and producing an output file in the same
455 directory that has ".spl" appended.
Bram Moolenaar9a50b1b2005-06-27 22:48:21 +0000456
457:mksp[ell] [-ascii] {name}
458 Like ":mkspell" above, using {name} as the input file
Bram Moolenaard042c562005-06-30 22:04:15 +0000459 and producing an output file in the same directory
460 that has ".{enc}.spl" appended.
Bram Moolenaar82cf9b62005-06-07 21:09:25 +0000461
Bram Moolenaarae5bce12005-08-15 21:41:48 +0000462Vim will report the number of duplicate words. This might be a mistake in the
463list of words. But sometimes it is used to have different prefixes and
464suffixes for the same basic word to avoid them combining (e.g. Czech uses
Bram Moolenaar8aff23a2005-08-19 20:40:30 +0000465this). If you want Vim to report all duplicate words set the 'verbose'
466option.
Bram Moolenaarae5bce12005-08-15 21:41:48 +0000467
Bram Moolenaar82cf9b62005-06-07 21:09:25 +0000468Since you might want to change a Myspell word list for use with Vim the
469following procedure is recommended:
Bram Moolenaar217ad922005-03-20 22:37:15 +0000470
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +00004711. Obtain the xx_YY.aff and xx_YY.dic files from Myspell.
4722. Make a copy of these files to xx_YY.orig.aff and xx_YY.orig.dic.
4733. Change the xx_YY.aff and xx_YY.dic files to remove bad words, add missing
Bram Moolenaar0cb032e2005-04-23 20:52:00 +0000474 words, define word characters with FOL/LOW/UPP, etc. The distributed
475 "src/spell/*.diff" files can be used.
Bram Moolenaard042c562005-06-30 22:04:15 +00004764. Start Vim with the right locale and use |:mkspell| to generate the Vim
477 spell file.
4785. Try out the spell file with ":set spell spelllang=xx" if you wrote it in
Bram Moolenaar7d1f5db2005-07-03 21:39:27 +0000479 a spell directory in 'runtimepath', or ":set spelllang=xx.enc.spl" if you
Bram Moolenaard042c562005-06-30 22:04:15 +0000480 wrote it somewhere else.
Bram Moolenaar217ad922005-03-20 22:37:15 +0000481
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000482When the Myspell files are updated you can merge the differences:
Bram Moolenaar0cb032e2005-04-23 20:52:00 +00004831. Obtain the new Myspell files as xx_YY.new.aff and xx_UU.new.dic.
4842. Use Vimdiff to see what changed: >
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000485 vimdiff xx_YY.orig.dic xx_YY.new.dic
Bram Moolenaar0cb032e2005-04-23 20:52:00 +00004863. Take over the changes you like in xx_YY.dic.
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000487 You may also need to change xx_YY.aff.
Bram Moolenaar0cb032e2005-04-23 20:52:00 +00004884. Rename xx_YY.new.dic to xx_YY.orig.dic and xx_YY.new.aff to xx_YY.new.aff.
Bram Moolenaar217ad922005-03-20 22:37:15 +0000489
Bram Moolenaar3b506942005-06-23 22:36:45 +0000490
Bram Moolenaar8aff23a2005-08-19 20:40:30 +0000491SPELL FILE VERSIONS *E770* *E771* *E772*
492
493Spell checking is a relatively new feature in Vim, thus it's possible that the
494.spl file format will be changed to support more languages. Vim will check
495the validity of the spell file and report anything wrong.
496
497 E771: Old spell file, needs to be updated ~
498This spell file is older than your Vim. You need to update the .spl file.
499
500 E772: Spell file is for newer version of Vim ~
501This means the spell file was made for a later version of Vim. You need to
502update Vim.
503
504 E770: Unsupported section in spell file ~
505This means the spell file was made for a later version of Vim and contains a
506section that is required for the spell file to work. In this case it's
507probably a good idea to upgrade your Vim.
508
509
Bram Moolenaar3b506942005-06-23 22:36:45 +0000510SPELL FILE DUMP
511
512If for some reason you want to check what words are supported by the currently
513used spelling files, use this command:
514
515 *:spelldump* *:spelld*
516:spelld[ump] Open a new window and fill it with all currently valid
517 words.
Bram Moolenaard042c562005-06-30 22:04:15 +0000518 Note: For some languages the result may be enormous,
519 causing Vim to run out of memory.
Bram Moolenaar3b506942005-06-23 22:36:45 +0000520
521The format of the word list is used |spell-wordlist-format|. You should be
522able to read it with ":mkspell" to generate one .spl file that includes all
523the words.
524
Bram Moolenaar1f8a5f02005-07-01 22:41:52 +0000525When all entries to 'spelllang' use the same regions or no regions at all then
526the region information is included in the dumped words. Otherwise only words
527for the current region are included and no "/regions" line is generated.
Bram Moolenaar3b506942005-06-23 22:36:45 +0000528
Bram Moolenaard042c562005-06-30 22:04:15 +0000529Comment lines with the name of the .spl file are used as a header above the
530words that were generated from that .spl file.
Bram Moolenaar3b506942005-06-23 22:36:45 +0000531
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000532==============================================================================
Bram Moolenaard042c562005-06-30 22:04:15 +00005334. Spell file format *spell-file-format*
Bram Moolenaar217ad922005-03-20 22:37:15 +0000534
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000535This is the format of the files that are used by the person who creates and
536maintains a word list.
Bram Moolenaar217ad922005-03-20 22:37:15 +0000537
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000538Note that we avoid the word "dictionary" here. That is because the goal of
539spell checking differs from writing a dictionary (as in the book). For
Bram Moolenaard042c562005-06-30 22:04:15 +0000540spelling we need a list of words that are OK, thus should not to be
541highlighted. Person and company names will not appear in a dictionary, but do
542appear in a word list. And some old words are rarely used while they are
543common misspellings. These do appear in a dictionary but not in a word list.
Bram Moolenaar217ad922005-03-20 22:37:15 +0000544
Bram Moolenaar7d1f5db2005-07-03 21:39:27 +0000545There are two formats: A straight list of words and a list using affix
Bram Moolenaard042c562005-06-30 22:04:15 +0000546compression. The files with affix compression are used by Myspell (Mozilla
547and OpenOffice.org). This requires two files, one with .aff and one with .dic
548extension.
Bram Moolenaar75c50c42005-06-04 22:06:24 +0000549
550
Bram Moolenaard042c562005-06-30 22:04:15 +0000551FORMAT OF STRAIGHT WORD LIST *spell-wordlist-format*
Bram Moolenaar75c50c42005-06-04 22:06:24 +0000552
Bram Moolenaar82cf9b62005-06-07 21:09:25 +0000553The words must appear one per line. That is all that is required.
Bram Moolenaard042c562005-06-30 22:04:15 +0000554
Bram Moolenaar82cf9b62005-06-07 21:09:25 +0000555Additionally the following items are recognized:
Bram Moolenaard042c562005-06-30 22:04:15 +0000556
Bram Moolenaar75c50c42005-06-04 22:06:24 +0000557- Empty and blank lines are ignored.
Bram Moolenaard042c562005-06-30 22:04:15 +0000558
Bram Moolenaar75c50c42005-06-04 22:06:24 +0000559- Lines starting with a # are ignored (comment lines).
Bram Moolenaard042c562005-06-30 22:04:15 +0000560
Bram Moolenaar45eeb132005-06-06 21:59:07 +0000561- A line starting with "/encoding=", before any word, specifies the encoding
562 of the file. After the second '=' comes an encoding name. This tells Vim
Bram Moolenaard042c562005-06-30 22:04:15 +0000563 to setup conversion from the specified encoding to 'encoding'. Thus you can
564 use one word list for several target encodings.
565
Bram Moolenaar3638c682005-06-08 22:05:14 +0000566- A line starting with "/regions=" specifies the region names that are
567 supported. Each region name must be two ASCII letters. The first one is
568 region 1. Thus "/regions=usca" has region 1 "us" and region 2 "ca".
Bram Moolenaard042c562005-06-30 22:04:15 +0000569 In an addition word list the region names should be equal to the main word
570 list!
571
Bram Moolenaar82cf9b62005-06-07 21:09:25 +0000572- Other lines starting with '/' are reserved for future use. The ones that
573 are not recognized are ignored (but you do get a warning message).
Bram Moolenaar75c50c42005-06-04 22:06:24 +0000574
Bram Moolenaar1f8a5f02005-07-01 22:41:52 +0000575- A "/" may follow the word with the following items:
576 = Case must match exactly.
577 ? Rare word.
578 ! Bad (wrong) word.
579 digit A region in which the word is valid. If no regions are
580 specified the word is valid in all regions.
581
Bram Moolenaar3638c682005-06-08 22:05:14 +0000582Example:
583
584 # This is an example word list comment
585 /encoding=latin1 encoding of the file
586 /regions=uscagb regions "us", "ca" and "gb"
587 example word for all regions
Bram Moolenaar1f8a5f02005-07-01 22:41:52 +0000588 blah/12 word for regions "us" and "ca"
589 vim/! bad word
590 Campbell/?3 rare word in region 3 "gb"
591 's mornings/= keep-case word
Bram Moolenaar3638c682005-06-08 22:05:14 +0000592
Bram Moolenaar0dc065e2005-07-04 22:49:24 +0000593Note that when "/=" is used the same word with all upper-case letters is not
594accepted. This is different from a word with mixed case that is automatically
595marked as keep-case, those words may appear in all upper-case letters.
596
Bram Moolenaar75c50c42005-06-04 22:06:24 +0000597
598FORMAT WITH AFFIX COMPRESSION
599
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000600There are two files: the basic word list and an affix file. The affixes are
601used to modify the basic words to get the full word list. This significantly
602reduces the number of words, especially for a language like Polish. This is
603called affix compression.
Bram Moolenaar217ad922005-03-20 22:37:15 +0000604
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000605The basic word list and the affix file are combined and turned into a binary
606spell file. All the preprocessing has been done, thus this file loads fast.
607The binary spell file format is described in the source code (src/spell.c).
608But only developers need to know about it.
609
610The preprocessing also allows us to take the Myspell language files and modify
611them before the Vim word list is made. The tools for this can be found in the
612"src/spell" directory.
613
Bram Moolenaar6f16eb82005-08-23 21:02:42 +0000614The format for the affix and word list files is based on what Myspell uses
615(the spell checker of Mozilla and OpenOffice.org). A description can be found
616here:
617 http://lingucomponent.openoffice.org/affix.readme ~
618Note that affixes are case sensitive, this isn't obvious from the description.
619
620Vim does not use the TRY item, it is ignored. For making suggestions the
621possible characters in the words are used.
622
623Vim supports quite a few extras. They are described below |spell-affix-vim|.
624Attempts have been made to keep this compatible with other spell checkers, so
625that the same files can be used.
626
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000627
Bram Moolenaar3638c682005-06-08 22:05:14 +0000628WORD LIST FORMAT *spell-dic-format*
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000629
630A very short example, with line numbers:
631
632 1 1234
633 2 aan
634 3 Als
635 4 Etten-Leur
636 5 et al.
637 6 's-Gravenhage
638 7 's-Gravenhaags
639 8 bedel/P
640 9 kado/1
641 10 cadeau/2
Bram Moolenaar6f16eb82005-08-23 21:02:42 +0000642 11 TCP,IP
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000643
Bram Moolenaar82cf9b62005-06-07 21:09:25 +0000644The first line contains the number of words. Vim ignores it, but you do get
645an error message if it's not there. *E760*
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000646
Bram Moolenaar82cf9b62005-06-07 21:09:25 +0000647What follows is one word per line. There should be no white space before or
Bram Moolenaar6e7c7f32005-08-24 22:16:11 +0000648after the word. After the word there is an optional slash and flags. Most of
649these flags are letters that indicate the affixes that can be used with this
650word. These are specified with SFX and PFX lines in the .aff file. See the
651Myspell documentation. Vim allows using other flag types with the FLAG item
652in the affix file |spell-FLAG|.
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000653
654When the word only has lower-case letters it will also match with the word
655starting with an upper-case letter.
656
657When the word includes an upper-case letter, this means the upper-case letter
658is required at this position. The same word with a lower-case letter at this
659position will not match. When some of the other letters are upper-case it will
660not match either.
661
Bram Moolenaard042c562005-06-30 22:04:15 +0000662The word with all upper-case characters will always be OK.
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000663
664 word list matches does not match ~
665 als als Als ALS ALs AlS aLs aLS
666 Als Als ALS als ALs AlS aLs aLS
667 ALS ALS als Als ALs AlS aLs aLS
668 AlS AlS ALS als Als ALs aLs aLS
669
Bram Moolenaar82cf9b62005-06-07 21:09:25 +0000670The KEP affix ID can be used to specifically match a word with identical case
Bram Moolenaar6f16eb82005-08-23 21:02:42 +0000671only, see below |spell-KEP|.
Bram Moolenaar45eeb132005-06-06 21:59:07 +0000672
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000673Note in line 5 to 7 that non-word characters are used. You can include
674any character in a word. When checking the text a word still only matches
675when it appears with a non-word character before and after it. For Myspell a
676word starting with a non-word character probably won't work.
677
Bram Moolenaar6f16eb82005-08-23 21:02:42 +0000678In line 12 the word "TCP/IP" is defined. Since the slash has a special
679meaning the comma is used instead. This is defined with the SLASH item in the
680affix file, see |spell-SLASH|. Note that without this SLASH item the
681word will be "TCP,IP".
682
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000683 *spell-affix-vim*
Bram Moolenaar82cf9b62005-06-07 21:09:25 +0000684A flag that Vim adds and is not in Myspell is the flag defined with KEP in the
Bram Moolenaar45eeb132005-06-06 21:59:07 +0000685affix file. This has the meaning that case matters. This can be used if the
686word does not have the first letter in upper case at the start of a sentence.
Bram Moolenaar82cf9b62005-06-07 21:09:25 +0000687Example (assuming that = was used for KEP):
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000688
Bram Moolenaar0dc065e2005-07-04 22:49:24 +0000689 word list matches does not match ~
690 's morgens/= 's morgens 'S morgens 's Morgens 'S MORGENS
691 's Morgens 's Morgens 'S MORGENS 'S morgens 's morgens
692
693The flag can also be used to avoid that the word matches when it is in all
694upper-case letters.
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000695
696 *spell-affix-mbyte*
697The basic word list is normally in an 8-bit encoding, which is mentioned in
698the affix file. The affix file must always be in the same encoding as the
699word list. This is compatible with Myspell. For Vim the encoding may also be
700something else, any encoding that "iconv" supports. The "SET" line must
701specify the name of the encoding. When using a multi-byte encoding it's
Bram Moolenaard042c562005-06-30 22:04:15 +0000702possible to use more different affixes (but Myspell doesn't support that, thus
703you may not want to use it anyway).
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000704
Bram Moolenaare13305e2005-06-19 22:54:15 +0000705
706CHARACTER TABLES
Bram Moolenaar0cb032e2005-04-23 20:52:00 +0000707 *spell-affix-chars*
Bram Moolenaar82cf9b62005-06-07 21:09:25 +0000708When using an 8-bit encoding the affix file should define what characters are
709word characters (as specified with ENC). This is because the system where
710":mkspell" is used may not support a locale with this encoding and isalpha()
711won't work. For example when using "cp1250" on Unix.
Bram Moolenaar0cb032e2005-04-23 20:52:00 +0000712
Bram Moolenaar6f16eb82005-08-23 21:02:42 +0000713 *E761* *E762* *spell-FOL*
714 *spell-LOW* *spell-UPP*
Bram Moolenaar0cb032e2005-04-23 20:52:00 +0000715Three lines in the affix file are needed. Simplistic example:
716
Bram Moolenaare13305e2005-06-19 22:54:15 +0000717 FOL áëñ ~
718 LOW áëñ ~
719 UPP ÁËÑ ~
Bram Moolenaar0cb032e2005-04-23 20:52:00 +0000720
721All three lines must have exactly the same number of characters.
722
723The "FOL" line specifies the case-folded characters. These are used to
724compare words while ignoring case. For most encodings this is identical to
725the lower case line.
726
727The "LOW" line specifies the characters in lower-case. Mostly it's equal to
728the "FOL" line.
729
730The "UPP" line specifies the characters with upper-case. That is, a character
731is upper-case where it's different from the character at the same position in
732"FOL".
733
Bram Moolenaar6f16eb82005-08-23 21:02:42 +0000734An exception is made for the German sharp s ß. The upper-case version is
735"SS". In the FOL/LOW/UPP lines it should be included, so that it's recognized
736as a word character, but use the ß character in all three.
737
Bram Moolenaar0cb032e2005-04-23 20:52:00 +0000738ASCII characters should be omitted, Vim always handles these in the same way.
739When the encoding is UTF-8 no word characters need to be specified.
740
741 *E763*
Bram Moolenaar3b506942005-06-23 22:36:45 +0000742Vim allows you to use spell checking for several languages in the same file.
743You can list them in the 'spelllang' option. As a consequence all spell files
744for the same encoding must use the same word characters, otherwise they can't
745be combined without errors. If you get a warning that the word tables differ
746you may need to generate the .spl file again with |:mkspell|. Check the FOL,
747LOW and UPP lines in the used .aff file.
748
749The XX.ascii.spl spell file generated with the "-ascii" argument will not
750contain the table with characters, so that it can be combine with spell files
751for any encoding. The .add.spl files also do not contain the table.
Bram Moolenaar0cb032e2005-04-23 20:52:00 +0000752
Bram Moolenaare7566042005-06-17 22:00:15 +0000753
Bram Moolenaar9a50b1b2005-06-27 22:48:21 +0000754MID-WORD CHARACTERS
755 *spell-midword*
756Some characters are only to be considered word characters if they are used in
757between two ordinary word characters. An example is the single quote: It is
758often used to put text in quotes, thus it can't be recognized as a word
759character, but when it appears in between word characters it must be part of
760the word. This is needed to detect a spelling error such as they'are. That
761should be they're, but since "they" and "are" are words themselves that would
762go unnoticed.
763
764These characters are defined with MIDWORD in the .aff file:
765
766 MIDWORD '- ~
767
768
Bram Moolenaar6e7c7f32005-08-24 22:16:11 +0000769FLAG TYPES *spell-FLAG*
770
771Flags are used to specify the affixes that can be used with a word and for
772other properties of the word. Normally single-character flags are used. This
773limits the number of possible flags, especially for 8-bit encodings. The FLAG
774item can be used if more affixes are to be used. Possible values:
775
776 FLAG long use two-character flags
777 FLAG num use numbers, from 1 up to 65000
778 FLAG huh use one-character flags without A-Z and two-character
779 flags that start with A-Z
780
781With "FLAG num" the numbers in a list of affixes need to be separated with a
782comma: "234,2143,1435". This method is inefficient, but useful if the file is
783generated with a program.
784
785When using "huh" the two-character flags all start with a capital: "Aa", "B1",
786"BB", etc. This is useful to use one-character flags for the most common
787items and two-character flags for uncommon items.
788
789Note: When using utf-8 only characters up to 65000 may be used for flags.
790
791
Bram Moolenaare13305e2005-06-19 22:54:15 +0000792AFFIXES
Bram Moolenaar6f16eb82005-08-23 21:02:42 +0000793 *spell-PFX* *spell-SFX*
Bram Moolenaare13305e2005-06-19 22:54:15 +0000794The usual PFX (prefix) and SFX (suffix) lines are supported (see the Myspell
Bram Moolenaar9a50b1b2005-06-27 22:48:21 +0000795documentation or the Aspell manual:
796http://aspell.net/man-html/Affix-Compression.html).
Bram Moolenaare13305e2005-06-19 22:54:15 +0000797
Bram Moolenaar9a50b1b2005-06-27 22:48:21 +0000798Note that Myspell ignores any extra text after the relevant info. Vim
799requires this text to start with a "#" so that mistakes don't go unnoticed.
800Example:
801
802 SFX F 0 in [^i]n # Spion > Spionin ~
803 SFX F 0 nen in # Bauerin > Bauerinnen ~
804
Bram Moolenaar8aff23a2005-08-19 20:40:30 +0000805 *spell-affix-rare*
Bram Moolenaar9a50b1b2005-06-27 22:48:21 +0000806An extra item for Vim is the "rare" flag. It must come after the other
807fields, before a comment. When used then all words that use the affix will be
808marked as rare words. Example:
809
810 PFX F 0 nene . rare ~
811 SFX F 0 oin n rare # hardly ever used ~
812
813However, if the word also appears as a good word in another way it won't be
814marked as rare.
Bram Moolenaare13305e2005-06-19 22:54:15 +0000815
Bram Moolenaar8aff23a2005-08-19 20:40:30 +0000816 *spell-affix-nocomp*
817Another extra item for Vim is the "nocomp" flag. It must come after the other
Bram Moolenaar90915b52005-08-21 22:17:52 +0000818fields, before a comment. It can be either before or after "rare". When
819present then all words that use the affix will not be part of a compound word.
Bram Moolenaar8aff23a2005-08-19 20:40:30 +0000820Example:
821 affix file:
822 COMPOUNDFLAG c ~
823 SFX a Y 2 ~
824 SFX a 0 s . ~
825 SFX a 0 ize . nocomp ~
826 dictionary:
827 word/c ~
828 util/ac ~
829
830This allows for "wordutil" and "wordutils" but not "wordutilize".
831
Bram Moolenaar6f16eb82005-08-23 21:02:42 +0000832 *spell-PFXPOSTPONE*
Bram Moolenaare13305e2005-06-19 22:54:15 +0000833When an affix file has very many prefixes that apply to many words it's not
834possible to build the whole word list in memory. This applies to Hebrew (a
835list with all words is over a Gbyte). In that case applying prefixes must be
836postponed. This makes spell checking slower. It is indicated by this keyword
837in the .aff file:
838
839 PFXPOSTPONE ~
840
841Only prefixes without a chop string can be postponed, prefixes with a chop
Bram Moolenaar78984f52005-08-01 07:19:10 +0000842string will still be included in the word list. An exception if the chop
843string is one character and equal to the last character of the added string,
844but in lower case. Thus when the chop string is used to allow the following
845word to start with an upper case letter.
Bram Moolenaare13305e2005-06-19 22:54:15 +0000846
Bram Moolenaarae5bce12005-08-15 21:41:48 +0000847
Bram Moolenaar6f16eb82005-08-23 21:02:42 +0000848WORDS WITH A SLASH *spell-SLASH*
Bram Moolenaarae5bce12005-08-15 21:41:48 +0000849
850The slash is used in the .dic file to separate the basic word from the affix
851letters that can be used. Unfortunately, this means you cannot use a slash in
852a word. Thus "TCP/IP" cannot be a word. To work around that you can define a
853replacement character for the slash. Example:
854
855 SLASH , ~
856
857Now you can use "TCP,IP" to add the word "TCP/IP".
858
859Of course, the letter used should itself not appear in any word! The letter
860must be ASCII, thus a single byte.
861
862
Bram Moolenaar6f16eb82005-08-23 21:02:42 +0000863KEEP-CASE WORDS *spell-KEP*
Bram Moolenaarae5bce12005-08-15 21:41:48 +0000864
Bram Moolenaar82cf9b62005-06-07 21:09:25 +0000865In the affix file a KEP line can be used to define the affix name used for
Bram Moolenaar45eeb132005-06-06 21:59:07 +0000866keep-case words. Example:
867
Bram Moolenaar82cf9b62005-06-07 21:09:25 +0000868 KEP = ~
Bram Moolenaar45eeb132005-06-06 21:59:07 +0000869
870See above for an example |spell-affix-vim|.
871
Bram Moolenaare13305e2005-06-19 22:54:15 +0000872
Bram Moolenaar6f16eb82005-08-23 21:02:42 +0000873RARE WORDS *spell-RAR*
Bram Moolenaarae5bce12005-08-15 21:41:48 +0000874
Bram Moolenaar45eeb132005-06-06 21:59:07 +0000875In the affix file a RAR line can be used to define the affix name used for
876rare words. Example:
877
878 RAR ? ~
879
880Rare words are highlighted differently from bad words. This is to be used for
881words that are correct for the language, but are hardly ever used and could be
Bram Moolenaar30abd282005-06-22 22:35:10 +0000882a typing mistake anyway. When the same word is found as good it won't be
883highlighted as rare.
884
885
Bram Moolenaar6f16eb82005-08-23 21:02:42 +0000886BAD WORDS *spell-BAD*
Bram Moolenaarae5bce12005-08-15 21:41:48 +0000887
Bram Moolenaar30abd282005-06-22 22:35:10 +0000888In the affix file a BAD line can be used to define the affix name used for
889bad words. Example:
890
891 BAD ! ~
892
893This can be used to exclude words that would otherwise be good. For example
Bram Moolenaar9a50b1b2005-06-27 22:48:21 +0000894"the the" in the .dic file:
895
896 the the/! ~
897
898Once a word has been marked as bad it won't be undone by encountering the same
899word as good.
Bram Moolenaar45eeb132005-06-06 21:59:07 +0000900
Bram Moolenaar6f16eb82005-08-23 21:02:42 +0000901 *spell-NEEDAFFIX*
Bram Moolenaar8aff23a2005-08-19 20:40:30 +0000902The NEEDAFFIX flag is used to require that a word is used with an affix. The
903word itself is not a good word. Example:
904
905 NEEDAFFIX + ~
906
Bram Moolenaar45eeb132005-06-06 21:59:07 +0000907
Bram Moolenaar6f16eb82005-08-23 21:02:42 +0000908COMPOUND WORDS *spell-compound*
Bram Moolenaarae5bce12005-08-15 21:41:48 +0000909
Bram Moolenaar8aff23a2005-08-19 20:40:30 +0000910A compound word is a longer word made by concatenating words that appear in
911the .dic file. To specify which words may be concatenated a character is
912used. This character is put in the list of affixes after the word. We will
913call this character a flag here. Obviously these flags must be different from
914any affix IDs used.
Bram Moolenaarae5bce12005-08-15 21:41:48 +0000915
916 *spell-COMPOUNDFLAG*
917The Myspell compatible method uses one flag, specified with COMPOUNDFLAG.
Bram Moolenaar8aff23a2005-08-19 20:40:30 +0000918All words with this flag combine in any order. This means there is no control
919over which word comes first. Example:
Bram Moolenaarae5bce12005-08-15 21:41:48 +0000920 COMPOUNDFLAG c ~
921
922 *spell-COMPOUNDFLAGS*
Bram Moolenaar8aff23a2005-08-19 20:40:30 +0000923A more advanced method to specify how compound words can be formed uses
924multiple items with multiple flags. This is not compatible with Myspell 3.0.
925Let's start with an example:
926 COMPOUNDFLAGS c+ ~
927 COMPOUNDFLAGS se ~
Bram Moolenaarae5bce12005-08-15 21:41:48 +0000928
Bram Moolenaar8aff23a2005-08-19 20:40:30 +0000929The first line defines that words with the "c" flag can be concatenated in any
930order. The second line defines compound words that are made of one word with
931the "s" flag and one word with the "e" flag. With this dictionary:
932 bork/c ~
933 onion/s ~
934 soup/e ~
Bram Moolenaarae5bce12005-08-15 21:41:48 +0000935
Bram Moolenaar8aff23a2005-08-19 20:40:30 +0000936You can make these words:
937 bork
938 borkbork
939 borkborkbork
940 (etc.)
Bram Moolenaarae5bce12005-08-15 21:41:48 +0000941 onion
942 soup
Bram Moolenaarae5bce12005-08-15 21:41:48 +0000943 onionsoup
Bram Moolenaarae5bce12005-08-15 21:41:48 +0000944
Bram Moolenaar8aff23a2005-08-19 20:40:30 +0000945The COMPOUNDFLAGS item may appear multiple times. The argument is made out of
946one or more groups, where each group can be:
947 one flag e.g., c
948 alternate flags inside [] e.g., [abc]
949Optionally this may be followed by:
950 * the group appears zero or more times, e.g., sm*e
951 + the group appears one or more times, e.g., c+
Bram Moolenaarae5bce12005-08-15 21:41:48 +0000952
Bram Moolenaar8aff23a2005-08-19 20:40:30 +0000953This is similar to the regexp pattern syntax (but not the same!). A few
954examples with the sequence of word flags they require:
955 COMPOUNDFLAGS x+ x xx xxx etc.
956 COMPOUNDFLAGS yz yz
957 COMPOUNDFLAGS x+z xz xxz xxxz etc.
958 COMPOUNDFLAGS yx+ yx yxx yxxx etc.
Bram Moolenaarae5bce12005-08-15 21:41:48 +0000959
Bram Moolenaar8aff23a2005-08-19 20:40:30 +0000960 COMPOUNDFLAGS [abc]z az bz cz
961 COMPOUNDFLAGS [abc]+z az aaz abaz bz baz bcbz cz caz cbaz etc.
962 COMPOUNDFLAGS a[xyz]+ ax axx axyz ay ayx ayzz az azy azxy etc.
963 COMPOUNDFLAGS sm*e se sme smme smmme etc.
964 COMPOUNDFLAGS s[xyz]*e se sxe sxye sxyxe sye syze sze szye szyxe etc.
Bram Moolenaarae5bce12005-08-15 21:41:48 +0000965
Bram Moolenaara6c840d2005-08-22 22:59:46 +0000966A specific example: Allow a compound to be made of two words and a dash:
967 In the .aff file:
968 COMPOUNDFLAGS sde ~
969 NEEDAFFIX x ~
970 COMPOUNDMAX 3 ~
971 COMPOUNDMIN 1 ~
972 In the .dic file:
973 start/s ~
974 end/e ~
975 -/xd ~
976
977This allows for the word "start-end", but not "startend".
978
Bram Moolenaarae5bce12005-08-15 21:41:48 +0000979 *spell-COMPOUNDMIN*
Bram Moolenaar8aff23a2005-08-19 20:40:30 +0000980The minimal byte length of a word used for concatenation is specified with
Bram Moolenaarae5bce12005-08-15 21:41:48 +0000981COMPOUNDMIN. Example:
982 COMPOUNDMIN 5 ~
983
984When omitted a minimal length of 3 bytes is used. Obviously you could just
985leave out the compound flag from short words instead, this feature is present
986for compatibility with Myspell.
987
Bram Moolenaar8aff23a2005-08-19 20:40:30 +0000988 *spell-COMPOUNDMAX*
989The maximum number of words that can be concatenated into a compound word is
990specified with COMPOUNDMAX. Example:
991 COMPOUNDMAX 3 ~
Bram Moolenaarae5bce12005-08-15 21:41:48 +0000992
Bram Moolenaar8aff23a2005-08-19 20:40:30 +0000993When omitted there is no maximum. It applies to all compound words.
Bram Moolenaarae5bce12005-08-15 21:41:48 +0000994
Bram Moolenaar8aff23a2005-08-19 20:40:30 +0000995To set a limit for words with specific flags make sure the items in
996COMPOUNDFLAGS where they appear don't allow too many words.
Bram Moolenaarae5bce12005-08-15 21:41:48 +0000997
Bram Moolenaar8aff23a2005-08-19 20:40:30 +0000998 *spell-COMPOUNDSYLMAX*
999The maximum number of syllables that a compound word may contain is specified
1000with COMPOUNDSYLMAX. Example:
1001 COMPOUNDSYLMAX 6 ~
Bram Moolenaarae5bce12005-08-15 21:41:48 +00001002
Bram Moolenaar8aff23a2005-08-19 20:40:30 +00001003This has no effect if there is no SYLLABLE item. Without COMPOUNDSYLMAX there
1004is no limit on the number of syllables.
Bram Moolenaarae5bce12005-08-15 21:41:48 +00001005
Bram Moolenaara6c840d2005-08-22 22:59:46 +00001006If both COMPOUNDMAX and COMPOUNDSYLMAX are defined, a compound word is
1007accepted if it fits one of the criteria, thus is either made from up to
1008COMPOUNDMAX words or contains up to COMPOUNDSYLMAX syllables.
1009
Bram Moolenaar8aff23a2005-08-19 20:40:30 +00001010 *spell-SYLLABLE*
1011The SYLLABLE item defines characters or character sequences that are used to
1012count the number of syllables in a word. Example:
1013 SYLLABLE aáeéiíoóöõuúüûy/aa/au/ea/ee/ei/ie/oa/oe/oo/ou/uu/ui ~
Bram Moolenaarae5bce12005-08-15 21:41:48 +00001014
Bram Moolenaar8aff23a2005-08-19 20:40:30 +00001015Before the first slash is the set of characters that are counted for one
1016syllable, also when repeated and mixed, until the next character that is not
1017in this set. After the slash come sequences of characters that are counted
1018for one syllable. These are preferred over using characters from the set.
1019With the example "ideeen" has three syllables, counted by "i", "ee" and "e".
1020
1021Only case-folded letters need to be included.
1022
1023Above another way to restrict compounding was mentioned above: adding "nocomp"
1024after an affix causes all words that are made with that affix not be be used
1025for compounding. |spell-affix-nocomp|
1026
Bram Moolenaar6f16eb82005-08-23 21:02:42 +00001027
1028UNLIMITED COMPOUNDING *spell-NOBREAK*
1029
1030For some languages, such as Thai, there is no space in between words. This
1031looks like all words are compounded. To specify this use the NOBREAK item in
1032the affix file, without arguments:
1033 NOBREAK ~
1034
1035Vim will try to figure out where one word ends and a next starts. When there
1036are spelling mistakes this may not be quite right.
1037
Bram Moolenaar8aff23a2005-08-19 20:40:30 +00001038>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>><<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
1039NOTE: The following has not been implemented yet, because there are no word
1040lists that support this.
1041> *spell-CMP*
1042> Sometimes it is necessary to change a word when concatenating it to another,
1043> by removing a few letters, inserting something or both. It can also be useful
1044> to restrict concatenation to words that match a pattern. For this purpose CMP
1045> items can be used. They look like this:
1046> CMP {flag} {flags} {strip} {strip2} {add} {cond} {cond2}
1047>
1048> {flag} the flag, as used in COMPOUNDFLAGS for the lead word
1049> {flags} accepted flags for the following word ('.' to accept
1050> all)
1051> {strip} text to remove from the end of the lead word (zero
1052> for no stripping)
1053> {strip2} text to remove from the start of the following word
1054> (zero for no stripping)
1055> {add} text to insert between the words (zero for no
1056> addition)
1057> {cond} condition to match at the end of the lead word
1058> {cond2} condition to match at the start of the following word
1059>
1060> This is the same as what is used for SFX and PFX items, with the extra {flags}
1061> and {cond2} fields. Example:
1062> CMP f mrt 0 - . . ~
1063>
1064> When used with the food and dish word list above, this means that a dash is
1065> inserted after each food item. Thus you get "onion-soup" and
1066> "onion-tomato-salat".
1067>
1068> When there are CMP items for a compound flag the concatenation is only done
1069> when a CMP item matches.
1070>
1071> When there are no CMP items for a compound flag, then all words will be
1072> concatenated, as if there was an item:
1073> CMP {flag} . 0 0 . .
1074>
1075>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>><<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
Bram Moolenaarae5bce12005-08-15 21:41:48 +00001076
1077
Bram Moolenaar6f16eb82005-08-23 21:02:42 +00001078REPLACEMENTS *spell-REP*
Bram Moolenaar9ba0eb82005-06-13 22:28:56 +00001079
1080In the affix file REP items can be used to define common mistakes. This is
1081used to make spelling suggestions. The items define the "from" text and the
1082"to" replacement. Example:
1083
1084 REP 4 ~
1085 REP f ph ~
1086 REP ph f ~
1087 REP k ch ~
1088 REP ch k ~
1089
Bram Moolenaar6e7c7f32005-08-24 22:16:11 +00001090The first line specifies the number of REP lines following. Vim ignores the
1091number, but it must be there.
1092
Bram Moolenaard042c562005-06-30 22:04:15 +00001093Don't include simple one-character replacements or swaps. Vim will try these
1094anyway. You can include whole words if you want to, but you might want to use
1095the "file:" item in 'spellsuggest' instead.
Bram Moolenaar9ba0eb82005-06-13 22:28:56 +00001096
1097
Bram Moolenaar6f16eb82005-08-23 21:02:42 +00001098SIMILAR CHARACTERS *spell-MAP*
Bram Moolenaar9ba0eb82005-06-13 22:28:56 +00001099
Bram Moolenaard042c562005-06-30 22:04:15 +00001100In the affix file MAP items can be used to define letters that are very much
Bram Moolenaar9ba0eb82005-06-13 22:28:56 +00001101alike. This is mostly used for a letter with different accents. This is used
1102to prefer suggestions with these letters substituted. Example:
1103
1104 MAP 2 ~
1105 MAP eéëêè ~
1106 MAP uüùúû ~
1107
Bram Moolenaar6e7c7f32005-08-24 22:16:11 +00001108The first line specifies the number of MAP lines following. Vim ignores the
1109number, but the line must be there.
Bram Moolenaar9ba0eb82005-06-13 22:28:56 +00001110
Bram Moolenaard042c562005-06-30 22:04:15 +00001111Each letter must appear in only one of the MAP items. It's a bit more
1112efficient if the first letter is ASCII or at least one without accents.
Bram Moolenaare7566042005-06-17 22:00:15 +00001113
Bram Moolenaar9ba0eb82005-06-13 22:28:56 +00001114
Bram Moolenaar6f16eb82005-08-23 21:02:42 +00001115SOUND-A-LIKE *spell-SAL*
Bram Moolenaar9ba0eb82005-06-13 22:28:56 +00001116
1117In the affix file SAL items can be used to define the sounds-a-like mechanism
1118to be used. The main items define the "from" text and the "to" replacement.
Bram Moolenaard042c562005-06-30 22:04:15 +00001119Simplistic example:
Bram Moolenaar9ba0eb82005-06-13 22:28:56 +00001120
1121 SAL CIA X ~
1122 SAL CH X ~
1123 SAL C K ~
1124 SAL K K ~
1125
Bram Moolenaar7d1f5db2005-07-03 21:39:27 +00001126There are a few rules and this can become quite complicated. An explanation
Bram Moolenaard042c562005-06-30 22:04:15 +00001127how it works can be found in the Aspell manual:
Bram Moolenaar42eeac32005-06-29 22:40:58 +00001128http://aspell.net/man-html/Phonetic-Code.html.
Bram Moolenaar9ba0eb82005-06-13 22:28:56 +00001129
1130There are a few special items:
1131
1132 SAL followup true ~
1133 SAL collapse_result true ~
1134 SAL remove_accents true ~
1135
1136"1" has the same meaning as "true". Any other value means "false".
1137
Bram Moolenaar42eeac32005-06-29 22:40:58 +00001138
Bram Moolenaar6f16eb82005-08-23 21:02:42 +00001139SIMPLE SOUNDFOLDING *spell-SOFOFROM* *spell-SOFOTO*
Bram Moolenaar42eeac32005-06-29 22:40:58 +00001140
1141The SAL mechanism is complex and slow. A simpler mechanism is mapping all
1142characters to another character, mapping similar sounding characters to the
1143same character. At the same time this does case folding. You can not have
Bram Moolenaard042c562005-06-30 22:04:15 +00001144both SAL items and simple soundfolding.
Bram Moolenaar42eeac32005-06-29 22:40:58 +00001145
Bram Moolenaar7d1f5db2005-07-03 21:39:27 +00001146There are two items required: one to specify the characters that are mapped
Bram Moolenaar42eeac32005-06-29 22:40:58 +00001147and one that specifies the characters they are mapped to. They must have
1148exactly the same number of characters. Example:
1149
1150 SOFOFROM abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ ~
1151 SOFOTO ebctefghejklnnepkrstevvkesebctefghejklnnepkrstevvkes ~
1152
1153In the example all vowels are mapped to the same character 'e'. Another
Bram Moolenaard042c562005-06-30 22:04:15 +00001154method would be to leave out all vowels. Some characters that sound nearly
1155the same and are often mixed up, such as 'm' and 'n', are mapped to the same
1156character. Don't do this too much, all words will start looking alike.
Bram Moolenaar42eeac32005-06-29 22:40:58 +00001157
1158Characters that do not appear in SOFOFROM will be left out, except that all
1159white space is replaced by one space. Sequences of the same character in
1160SOFOFROM are replaced by one.
1161
1162You can use the |soundfold()| function to try out the results. Or set the
1163'verbose' option to see the score in the output of the |z?| command.
1164
1165
Bram Moolenaar217ad922005-03-20 22:37:15 +00001166 vim:tw=78:sw=4:ts=8:ft=help:norl: