blob: 04b31324993d8885afa439ab33d711b15def46a8 [file] [log] [blame]
Bram Moolenaara6c840d2005-08-22 22:59:46 +00001*spell.txt* For Vim version 7.0aa. Last change: 2005 Aug 22
Bram Moolenaar217ad922005-03-20 22:37:15 +00002
3
4 VIM REFERENCE MANUAL by Bram Moolenaar
5
6
7Spell checking *spell*
8
91. Quick start |spell-quickstart|
Bram Moolenaard042c562005-06-30 22:04:15 +0000102. Remarks on spell checking |spell-remarks|
113. Generating a spell file |spell-mkspell|
124. Spell file format |spell-file-format|
Bram Moolenaar217ad922005-03-20 22:37:15 +000013
14{Vi does not have any of these commands}
15
16Spell checking is not available when the |+syntax| feature has been disabled
17at compile time.
18
19==============================================================================
201. Quick start *spell-quickstart*
21
22This command switches on spell checking: >
23
24 :setlocal spell spelllang=en_us
25
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +000026This switches on the 'spell' option and specifies to check for US English.
Bram Moolenaar217ad922005-03-20 22:37:15 +000027
28The words that are not recognized are highlighted with one of these:
Bram Moolenaar520470a2005-06-16 21:59:56 +000029 SpellBad word not recognized |hl-SpellBad|
Bram Moolenaar7d1f5db2005-07-03 21:39:27 +000030 SpellCap word not capitalised |hl-SpellCap|
Bram Moolenaar520470a2005-06-16 21:59:56 +000031 SpellRare rare word |hl-SpellRare|
32 SpellLocal wrong spelling for selected region |hl-SpellLocal|
Bram Moolenaar217ad922005-03-20 22:37:15 +000033
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +000034Vim only checks words for spelling, there is no grammar check.
35
Bram Moolenaar45360022005-07-21 21:08:21 +000036If the 'mousemodel' option is set to "popup" and the cursor is on a badly
37spelled word or it is "popup_setpos" and the mouse pointer is on a badly
38spelled word, then the popup menu will contain an submenu to replace the bad
39word. Note: this slows down the appearance of the popup menu.
40
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +000041To search for the next misspelled word:
42
43 *]s* *E756*
44]s Move to next misspelled word after the cursor.
Bram Moolenaar9d0ec2e2005-04-20 19:45:58 +000045 A count before the command can be used to repeat.
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +000046
47 *[s*
Bram Moolenaar9d0ec2e2005-04-20 19:45:58 +000048[s Like "]s" but search backwards, find the misspelled
Bram Moolenaar30abd282005-06-22 22:35:10 +000049 word before the cursor. Doesn't recognize words
50 split over two lines, thus may stop at words that are
Bram Moolenaar0d9c26d2005-07-02 23:19:16 +000051 not highlighted as bad. Does not stop at word with
52 missing capital at the start of a line.
Bram Moolenaar9d0ec2e2005-04-20 19:45:58 +000053
54 *]S*
55]S Like "]s" but only stop at bad words, not at rare
56 words or words for another region.
57
58 *[S*
59[S Like "]S" but search backwards.
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +000060
Bram Moolenaar217ad922005-03-20 22:37:15 +000061
Bram Moolenaar9ba0eb82005-06-13 22:28:56 +000062To add words to your own word list: *E764*
Bram Moolenaar82cf9b62005-06-07 21:09:25 +000063
64 *zg*
Bram Moolenaar0d9c26d2005-07-02 23:19:16 +000065zg Add word under the cursor as a good word to the first
66 name in 'spellfile'. In Visual mode the selected
67 characters are added as a word (including white
68 space!). If the word is explicitly marked as bad word
69 in another spell file the result is unpredictable.
70 A count may precede the command to indicate the entry
71 in 'spellfile' to be used. A count of two uses the
72 second entry.
Bram Moolenaar82cf9b62005-06-07 21:09:25 +000073
Bram Moolenaar1f8a5f02005-07-01 22:41:52 +000074 *zG*
Bram Moolenaar0d9c26d2005-07-02 23:19:16 +000075zG Like "zg" but add the word to the internal word list
76 |internal-wordlist|.
Bram Moolenaar1f8a5f02005-07-01 22:41:52 +000077
Bram Moolenaar82cf9b62005-06-07 21:09:25 +000078 *zw*
Bram Moolenaar0d9c26d2005-07-02 23:19:16 +000079zw Like "zg" but mark the word as a wrong (bad) word.
Bram Moolenaar82cf9b62005-06-07 21:09:25 +000080
Bram Moolenaar1f8a5f02005-07-01 22:41:52 +000081 *zW*
Bram Moolenaar0d9c26d2005-07-02 23:19:16 +000082zW Like "zw" but add the word to the internal word list
83 |internal-wordlist|.
Bram Moolenaar1f8a5f02005-07-01 22:41:52 +000084
Bram Moolenaar520470a2005-06-16 21:59:56 +000085 *:spe* *:spellgood*
Bram Moolenaar0d9c26d2005-07-02 23:19:16 +000086:[count]spe[llgood] {word}
Bram Moolenaar53180ce2005-07-05 21:48:14 +000087 Add {word} as a good word to 'spellfile', like with
Bram Moolenaar0d9c26d2005-07-02 23:19:16 +000088 "zg". Without count the first name is used, with a
89 count of two the second entry, etc.
Bram Moolenaar82cf9b62005-06-07 21:09:25 +000090
Bram Moolenaar53180ce2005-07-05 21:48:14 +000091:spe[llgood]! {word} Add {word} as a good word to the internal word list,
Bram Moolenaar0d9c26d2005-07-02 23:19:16 +000092 like with "zG".
Bram Moolenaar1f8a5f02005-07-01 22:41:52 +000093
Bram Moolenaar520470a2005-06-16 21:59:56 +000094 *:spellw* *:spellwrong*
Bram Moolenaar0d9c26d2005-07-02 23:19:16 +000095:[count]spellw[rong] {word}
Bram Moolenaar53180ce2005-07-05 21:48:14 +000096 Add {word} as a wrong (bad) word to 'spellfile', as
Bram Moolenaar0d9c26d2005-07-02 23:19:16 +000097 with "zw". Without count the first name is used, with
98 a count of two the second entry, etc.
Bram Moolenaar82cf9b62005-06-07 21:09:25 +000099
Bram Moolenaar53180ce2005-07-05 21:48:14 +0000100:spellw[rong]! {word} Add {word} as a wrong (bad) word to the internal word
Bram Moolenaar1f8a5f02005-07-01 22:41:52 +0000101 list.
102
Bram Moolenaarf461c8e2005-06-25 23:04:51 +0000103After adding a word to 'spellfile' with the above commands its associated
Bram Moolenaard042c562005-06-30 22:04:15 +0000104".spl" file will automatically be updated and reloaded. If you change
105'spellfile' manually you need to use the |:mkspell| command. This sequence of
106commands mostly works well: >
Bram Moolenaar0d9c26d2005-07-02 23:19:16 +0000107 :edit <file in 'spellfile'>
Bram Moolenaarf461c8e2005-06-25 23:04:51 +0000108< (make changes to the spell file) >
109 :mkspell! %
110
111More details about the 'spellfile' format below |spell-wordlist-format|.
Bram Moolenaar82cf9b62005-06-07 21:09:25 +0000112
Bram Moolenaar0d9c26d2005-07-02 23:19:16 +0000113 *internal-wordlist*
Bram Moolenaar1f8a5f02005-07-01 22:41:52 +0000114The internal word list is used for all buffers where 'spell' is set. It is
115not stored, it is lost when you exit Vim. It is also cleared when 'encoding'
116is set.
117
Bram Moolenaar82cf9b62005-06-07 21:09:25 +0000118
Bram Moolenaar9ba0eb82005-06-13 22:28:56 +0000119Finding suggestions for bad words:
Bram Moolenaar9ba0eb82005-06-13 22:28:56 +0000120 *z?*
Bram Moolenaar30abd282005-06-22 22:35:10 +0000121z? For the word under/after the cursor suggest correctly
Bram Moolenaard042c562005-06-30 22:04:15 +0000122 spelled words. This also works to find alternatives
123 for a word that is not highlighted as a bad word,
124 e.g., when the word after it is bad.
Bram Moolenaar30abd282005-06-22 22:35:10 +0000125 The results are sorted on similarity to the word
126 under/after the cursor.
Bram Moolenaar90915b52005-08-21 22:17:52 +0000127 This may take a long time. Hit CTRL-C when you get
Bram Moolenaar9ba0eb82005-06-13 22:28:56 +0000128 bored.
Bram Moolenaar90915b52005-08-21 22:17:52 +0000129
130 If the command is used without a count the
131 alternatives are listed and you can enter the number
132 of your choice or press <Enter> if you don't want to
133 replace. You can also use the mouse to click on your
134 choice (only works if the mouse can be used in Normal
135 mode and when there are no line wraps). Click on the
136 first line (the header) to cancel.
137
138 If a count is used that suggestion is used, without
139 prompting. For example, "1z?" always takes the first
140 suggestion.
141
142 If 'verbose' is non-zero a score will be displayed
143 with the suggestions to indicate the likeliness to the
144 badly spelled word (the higher the score the more
145 different).
Bram Moolenaard857f0e2005-06-21 22:37:39 +0000146 When a word was replaced the redo command "." will
147 repeat the word replacement. This works like "ciw",
148 the good word and <Esc>.
149
Bram Moolenaar24bbcfe2005-06-28 23:32:02 +0000150 *:spellr* *:spellrepall* *E752* *E753*
151:spellr[epall] Repeat the replacement done by |z?| for all matches
152 with the replaced word in the current window.
153
Bram Moolenaar488c6512005-08-11 20:09:58 +0000154In Insert mode, when the cursor is after a badly spelled word, you can use
155CTRL-X s to find suggestions. This works like Insert mode completion. Use
156CTRL-N to use the next suggestion, CTRL-P to go back. |i_CTRL-X_s|
157
Bram Moolenaard857f0e2005-06-21 22:37:39 +0000158The 'spellsuggest' option influences how the list of suggestions is generated
159and sorted. See |'spellsuggest'|.
Bram Moolenaar9ba0eb82005-06-13 22:28:56 +0000160
Bram Moolenaar0d9c26d2005-07-02 23:19:16 +0000161The 'spellcapcheck' option is used to check the first word of a sentence
162starts with a capital. This doesn't work for the first word in the file.
163When there is a line break right after a sentence the highlighting of the next
Bram Moolenaar90cfdbe2005-08-12 19:59:19 +0000164line may be postponed. Use |CTRL-L| when needed. Also see |set-spc-auto| for
165how it can be set automatically when 'spelllang' is set.
Bram Moolenaar0d9c26d2005-07-02 23:19:16 +0000166
Bram Moolenaard042c562005-06-30 22:04:15 +0000167==============================================================================
1682. Remarks on spell checking *spell-remarks*
Bram Moolenaar9ba0eb82005-06-13 22:28:56 +0000169
Bram Moolenaar6bb68362005-03-22 23:03:44 +0000170PERFORMANCE
171
Bram Moolenaard042c562005-06-30 22:04:15 +0000172Vim does on-the-fly spell checking. To make this work fast the word list is
173loaded in memory. Thus this uses a lot of memory (1 Mbyte or more). There
174might also be a noticeable delay when the word list is loaded, which happens
175when 'spell' is set and when 'spelllang' is set while 'spell' was already set.
176To minimize the delay each word list is only loaded once, it is not deleted
177when 'spelllang' is made empty or 'spell' is reset. When 'encoding' is set
178all the word lists are reloaded, thus you may notice a delay then too.
Bram Moolenaar6bb68362005-03-22 23:03:44 +0000179
180
Bram Moolenaar217ad922005-03-20 22:37:15 +0000181REGIONS
182
183A word may be spelled differently in various regions. For example, English
184comes in (at least) these variants:
185
186 en all regions
Bram Moolenaar5c5474b2005-04-19 21:40:26 +0000187 en_au Australia
Bram Moolenaar217ad922005-03-20 22:37:15 +0000188 en_ca Canada
Bram Moolenaar5c5474b2005-04-19 21:40:26 +0000189 en_gb Great Britain
190 en_nz New Zealand
191 en_us USA
Bram Moolenaar217ad922005-03-20 22:37:15 +0000192
193Words that are not used in one region but are used in another region are
Bram Moolenaar520470a2005-06-16 21:59:56 +0000194highlighted with SpellLocal |hl-SpellLocal|.
Bram Moolenaar217ad922005-03-20 22:37:15 +0000195
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000196Always use lowercase letters for the language and region names.
Bram Moolenaar217ad922005-03-20 22:37:15 +0000197
Bram Moolenaar3638c682005-06-08 22:05:14 +0000198When adding a word with |zg| or another command it's always added for all
199regions. You can change that by manually editing the 'spellfile'. See
Bram Moolenaar0dc065e2005-07-04 22:49:24 +0000200|spell-wordlist-format|. Note that the regions as specified in the files in
201'spellfile' are only used when all entries in "spelllang" specify the same
202region (not counting files specified by their .spl name).
Bram Moolenaar3638c682005-06-08 22:05:14 +0000203
Bram Moolenaar5b8d8fd2005-08-16 23:01:50 +0000204 *spell-german*
Bram Moolenaarae5bce12005-08-15 21:41:48 +0000205Specific exception: For German these special regions are used:
206 de all German words accepted
207 de_de old and new spelling
208 de_19 old spelling
209 de_20 new spelling
210 de_at Austria
211 de_ch Switzerland
212
Bram Moolenaar5b8d8fd2005-08-16 23:01:50 +0000213 *spell-yiddish*
214Yiddish requires using "utf-8" encoding, because of the special characters
215used. If you are using latin1 Vim will use transliterated (romanized) Yiddish
216instead. If you want to use transliterated Yiddish with utf-8 use "yi-tr".
217In a table:
218 'encoding' 'spelllang'
219 utf-8 yi Yiddish
220 latin1 yi transliterated Yiddish
221 utf-8 yi-tr transliterated Yiddish
222
Bram Moolenaar217ad922005-03-20 22:37:15 +0000223
Bram Moolenaar3b506942005-06-23 22:36:45 +0000224SPELL FILES *spell-load*
Bram Moolenaar217ad922005-03-20 22:37:15 +0000225
226Vim searches for spell files in the "spell" subdirectory of the directories in
Bram Moolenaar3638c682005-06-08 22:05:14 +0000227'runtimepath'. The name is: LL.EEE.spl, where:
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000228 LL the language name
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000229 EEE the value of 'encoding'
Bram Moolenaar217ad922005-03-20 22:37:15 +0000230
Bram Moolenaar3b506942005-06-23 22:36:45 +0000231The value for "LL" comes from 'spelllang', but excludes the region name.
232Examples:
233 'spelllang' LL ~
234 en_us en
235 en-rare en-rare
236 medical_ca medical
237
Bram Moolenaar3638c682005-06-08 22:05:14 +0000238Only the first file is loaded, the one that is first in 'runtimepath'. If
239this succeeds then additionally files with the name LL.EEE.add.spl are loaded.
240All the ones that are found are used.
241
Bram Moolenaar0d9c26d2005-07-02 23:19:16 +0000242Additionally, the files related to the names in 'spellfile' are loaded. These
243are the files that |zg| and |zw| add good and wrong words to.
Bram Moolenaar3b506942005-06-23 22:36:45 +0000244
Bram Moolenaar0e21a3f2005-04-17 20:28:32 +0000245Exceptions:
246- Vim uses "latin1" when 'encoding' is "iso-8859-15". The euro sign doesn't
247 matter for spelling.
248- When no spell file for 'encoding' is found "ascii" is tried. This only
249 works for languages where nearly all words are ASCII, such as English. It
250 helps when 'encoding' is not "latin1", such as iso-8859-2, and English text
Bram Moolenaar3638c682005-06-08 22:05:14 +0000251 is being edited. For the ".add" files the same name as the found main
252 spell file is used.
253
254For example, with these values:
255 'runtimepath' is "~/.vim,/usr/share/vim70,~/.vim/after"
256 'encoding' is "iso-8859-2"
257 'spelllang' is "pl"
258
259Vim will look for:
2601. ~/.vim/spell/pl.iso-8859-2.spl
2612. /usr/share/vim70/spell/pl.iso-8859-2.spl
2623. ~/.vim/spell/pl.iso-8859-2.add.spl
2634. /usr/share/vim70/spell/pl.iso-8859-2.add.spl
2645. ~/.vim/after/spell/pl.iso-8859-2.add.spl
265
266This assumes 1. is not found and 2. is found.
267
268If 'encoding' is "latin1" Vim will look for:
2691. ~/.vim/spell/pl.latin1.spl
2702. /usr/share/vim70/spell/pl.latin1.spl
2713. ~/.vim/after/spell/pl.latin1.spl
2724. ~/.vim/spell/pl.ascii.spl
2735. /usr/share/vim70/spell/pl.ascii.spl
2746. ~/.vim/after/spell/pl.ascii.spl
275
276This assumes none of them are found (Polish doesn't make sense when leaving
277out the non-ASCII characters).
Bram Moolenaar217ad922005-03-20 22:37:15 +0000278
Bram Moolenaar6bb68362005-03-22 23:03:44 +0000279Spelling for EBCDIC is currently not supported.
280
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000281A spell file might not be available in the current 'encoding'. See
282|spell-mkspell| about how to create a spell file. Converting a spell file
Bram Moolenaar0e21a3f2005-04-17 20:28:32 +0000283with "iconv" will NOT work!
Bram Moolenaar217ad922005-03-20 22:37:15 +0000284
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000285 *E758* *E759*
286When loading a spell file Vim checks that it is properly formatted. If you
Bram Moolenaar0e21a3f2005-04-17 20:28:32 +0000287get an error the file may be truncated, modified or intended for another Vim
288version.
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000289
Bram Moolenaar6bb68362005-03-22 23:03:44 +0000290
291WORDS
292
293Vim uses a fixed method to recognize a word. This is independent of
294'iskeyword', so that it also works in help files and for languages that
295include characters like '-' in 'iskeyword'. The word characters do depend on
296'encoding'.
297
Bram Moolenaar9ba0eb82005-06-13 22:28:56 +0000298The table with word characters is stored in the main .spl file. Therefore it
299matters what the current locale is when generating it! A .add.spl file does
Bram Moolenaarf461c8e2005-06-25 23:04:51 +0000300not contain a word table though.
Bram Moolenaar9ba0eb82005-06-13 22:28:56 +0000301
Bram Moolenaar3638c682005-06-08 22:05:14 +0000302A word that starts with a digit is always ignored. That includes hex numbers
303in the form 0xff and 0XFF.
Bram Moolenaar6bb68362005-03-22 23:03:44 +0000304
305
Bram Moolenaar30abd282005-06-22 22:35:10 +0000306WORD COMBINATIONS
307
308It is possible to spell-check words that include a space. This is used to
309recognize words that are invalid when used by themselves, e.g. for "et al.".
310It can also be used to recognize "the the" and highlight it.
311
312The number of spaces is irrelevant. In most cases a line break may also
313appear. However, this makes it difficult to find out where to start checking
314for spelling mistakes. When you make a change to one line and only that line
315is redrawn Vim won't look in the previous line, thus when "et" is at the end
316of the previous line "al." will be flagged as an error. And when you type
317"the<CR>the" the highlighting doesn't appear until the first line is redrawn.
318Use |CTRL-L| to redraw right away. "[s" will also stop at a word combination
319with a line break.
320
321When encountering a line break Vim skips characters such as '*', '>' and '"',
322so that comments in C, shell and Vim code can be spell checked.
323
324
Bram Moolenaar9d0ec2e2005-04-20 19:45:58 +0000325SYNTAX HIGHLIGHTING *spell-syntax*
Bram Moolenaar6bb68362005-03-22 23:03:44 +0000326
327Files that use syntax highlighting can specify where spell checking should be
328done:
329
Bram Moolenaar3638c682005-06-08 22:05:14 +00003301. everywhere default
3312. in specific items use "contains=@Spell"
3323. everywhere but specific items use "contains=@NoSpell"
Bram Moolenaar6bb68362005-03-22 23:03:44 +0000333
Bram Moolenaar3638c682005-06-08 22:05:14 +0000334For the second method adding the @NoSpell cluster will disable spell checking
335again. This can be used, for example, to add @Spell to the comments of a
336program, and add @NoSpell for items that shouldn't be checked.
Bram Moolenaar6bb68362005-03-22 23:03:44 +0000337
Bram Moolenaar30abd282005-06-22 22:35:10 +0000338
339VIM SCRIPTS
340
341If you want to write a Vim script that does something with spelling, you may
342find these functions useful:
343
344 spellbadword() find badly spelled word at the cursor
345 spellsuggest() get list of spelling suggestions
Bram Moolenaard042c562005-06-30 22:04:15 +0000346 soundfold() get the sound-a-like version of a word
Bram Moolenaar30abd282005-06-22 22:35:10 +0000347
Bram Moolenaar90cfdbe2005-08-12 19:59:19 +0000348
349SETTING 'spellcapcheck' AUTOMATICALLY *set-spc-auto*
350
351After the 'spelllang' option has been set successfully, Vim will source the
352files "spell/LANG.vim" in 'runtimepath'. "LANG" is the value of 'spelllang'
353up to the first comma, dot or underscore. This can be used to set options
354specifically for the language, especially 'spellcapcheck'.
355
356The distribution includes a few of these files. Use this command to see what
357they do: >
358 :next $VIMRUNTIME/spell/*.vim
359
360Note that the default scripts don't set 'spellcapcheck' if it was changed from
361the default value. This assumes the user prefers another value then.
362
Bram Moolenaarae5bce12005-08-15 21:41:48 +0000363
364DOUBLE SCORING *spell-double-scoring*
365
366The 'spellsuggest' option can be used to select "double" scoring. This
367mechanism is based on the principle that there are two kinds of spelling
368mistakes:
369
3701. You know how to spell the word, but mistype something. This results in a
371 small editing distance (character swapped/omitted/inserted) and possibly a
372 word that sounds completely different.
373
3742. You don't know how to spell the word and type something that sounds right.
375 The edit distance can be big but the word is similar after sound-folding.
376
377Since scores for these two mistakes will be very different we use a list
378for each and mix them.
379
380The sound-folding is slow and people that know the language won't make the
381second kind of mistakes. Therefore 'spellsuggest' can be set to select the
382preferred method for scoring the suggestions.
383
Bram Moolenaar217ad922005-03-20 22:37:15 +0000384==============================================================================
Bram Moolenaard042c562005-06-30 22:04:15 +00003853. Generating a spell file *spell-mkspell*
Bram Moolenaar217ad922005-03-20 22:37:15 +0000386
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000387Vim uses a binary file format for spelling. This greatly speeds up loading
388the word list and keeps it small.
Bram Moolenaar9a50b1b2005-06-27 22:48:21 +0000389 *.aff* *.dic* *Myspell*
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000390You can create a Vim spell file from the .aff and .dic files that Myspell
391uses. Myspell is used by OpenOffice.org and Mozilla. You should be able to
392find them here:
393 http://lingucomponent.openoffice.org/spell_dic.html
Bram Moolenaar30abd282005-06-22 22:35:10 +0000394You can also use a plain word list. The results are the same, the choice
Bram Moolenaard042c562005-06-30 22:04:15 +0000395depends on what word lists you can find.
Bram Moolenaar217ad922005-03-20 22:37:15 +0000396
Bram Moolenaar7d1f5db2005-07-03 21:39:27 +0000397If you install Aap (from www.a-a-p.org) you can use the recipes in the
398runtime/spell/??/ directories. Aap will take care of downloading the files,
399apply patches needed for Vim and build the .spl file.
400
Bram Moolenaare13305e2005-06-19 22:54:15 +0000401Make sure your current locale is set properly, otherwise Vim doesn't know what
402characters are upper/lower case letters. If the locale isn't available (e.g.,
403when using an MS-Windows codepage on Unix) add tables to the .aff file
Bram Moolenaar3b506942005-06-23 22:36:45 +0000404|spell-affix-chars|. If the .aff file doesn't define a table then the word
405table of the currently active spelling is used. If spelling is not active
406then Vim will try to guess.
Bram Moolenaare13305e2005-06-19 22:54:15 +0000407
Bram Moolenaar3b506942005-06-23 22:36:45 +0000408 *:mksp* *:mkspell*
409:mksp[ell][!] [-ascii] {outname} {inname} ...
Bram Moolenaar82cf9b62005-06-07 21:09:25 +0000410 Generate a Vim spell file word lists. Example: >
Bram Moolenaard042c562005-06-30 22:04:15 +0000411 :mkspell /tmp/nl nl_NL.words
Bram Moolenaar3b506942005-06-23 22:36:45 +0000412< *E751*
Bram Moolenaar82cf9b62005-06-07 21:09:25 +0000413 When {outname} ends in ".spl" it is used as the output
414 file name. Otherwise it should be a language name,
Bram Moolenaar3b506942005-06-23 22:36:45 +0000415 such as "en", without the region name. The file
416 written will be "{outname}.{encoding}.spl", where
417 {encoding} is the value of the 'encoding' option.
Bram Moolenaar75c50c42005-06-04 22:06:24 +0000418
Bram Moolenaard042c562005-06-30 22:04:15 +0000419 When the output file already exists [!] must be used
Bram Moolenaar520470a2005-06-16 21:59:56 +0000420 to overwrite it.
421
Bram Moolenaar0e21a3f2005-04-17 20:28:32 +0000422 When the [-ascii] argument is present, words with
423 non-ascii characters are skipped. The resulting file
Bram Moolenaar82cf9b62005-06-07 21:09:25 +0000424 ends in "ascii.spl".
Bram Moolenaar75c50c42005-06-04 22:06:24 +0000425
426 The input can be the Myspell format files {inname}.aff
427 and {inname}.dic. If {inname}.aff does not exist then
428 {inname} is used as the file name of a plain word
429 list.
430
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000431 Multiple {inname} arguments can be given to combine
432 regions into one Vim spell file. Example: >
433 :mkspell ~/.vim/spell/en /tmp/en_US /tmp/en_CA /tmp/en_AU
434< This combines the English word lists for US, CA and AU
435 into one en.spl file.
436 Up to eight regions can be combined. *E754* *755*
Bram Moolenaar9ba0eb82005-06-13 22:28:56 +0000437 The REP and SAL items of the first .aff file where
438 they appear are used. |spell-affix-REP|
439 |spell-affix-SAL|
Bram Moolenaar217ad922005-03-20 22:37:15 +0000440
Bram Moolenaar30abd282005-06-22 22:35:10 +0000441 This command uses a lot of memory, required to find
Bram Moolenaar8aff23a2005-08-19 20:40:30 +0000442 the optimal word tree (Polish, Italian and Hungarian
443 require several hundred Mbyte). The final result will
444 be much smaller, because compression is used. To
445 avoid running out of memory compression will be done
446 now and then. This can be tuned with the 'mkspellmem'
447 option.
Bram Moolenaar30abd282005-06-22 22:35:10 +0000448
Bram Moolenaard042c562005-06-30 22:04:15 +0000449 After the spell file was written and it was being used
450 in a buffer it will be reloaded automatically.
Bram Moolenaar45eeb132005-06-06 21:59:07 +0000451
Bram Moolenaar9a50b1b2005-06-27 22:48:21 +0000452:mksp[ell] [-ascii] {name}.{enc}.add
453 Like ":mkspell" above, using {name}.{enc}.add as the
Bram Moolenaard042c562005-06-30 22:04:15 +0000454 input file and producing an output file in the same
455 directory that has ".spl" appended.
Bram Moolenaar9a50b1b2005-06-27 22:48:21 +0000456
457:mksp[ell] [-ascii] {name}
458 Like ":mkspell" above, using {name} as the input file
Bram Moolenaard042c562005-06-30 22:04:15 +0000459 and producing an output file in the same directory
460 that has ".{enc}.spl" appended.
Bram Moolenaar82cf9b62005-06-07 21:09:25 +0000461
Bram Moolenaarae5bce12005-08-15 21:41:48 +0000462Vim will report the number of duplicate words. This might be a mistake in the
463list of words. But sometimes it is used to have different prefixes and
464suffixes for the same basic word to avoid them combining (e.g. Czech uses
Bram Moolenaar8aff23a2005-08-19 20:40:30 +0000465this). If you want Vim to report all duplicate words set the 'verbose'
466option.
Bram Moolenaarae5bce12005-08-15 21:41:48 +0000467
Bram Moolenaar82cf9b62005-06-07 21:09:25 +0000468Since you might want to change a Myspell word list for use with Vim the
469following procedure is recommended:
Bram Moolenaar217ad922005-03-20 22:37:15 +0000470
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +00004711. Obtain the xx_YY.aff and xx_YY.dic files from Myspell.
4722. Make a copy of these files to xx_YY.orig.aff and xx_YY.orig.dic.
4733. Change the xx_YY.aff and xx_YY.dic files to remove bad words, add missing
Bram Moolenaar0cb032e2005-04-23 20:52:00 +0000474 words, define word characters with FOL/LOW/UPP, etc. The distributed
475 "src/spell/*.diff" files can be used.
Bram Moolenaard042c562005-06-30 22:04:15 +00004764. Start Vim with the right locale and use |:mkspell| to generate the Vim
477 spell file.
4785. Try out the spell file with ":set spell spelllang=xx" if you wrote it in
Bram Moolenaar7d1f5db2005-07-03 21:39:27 +0000479 a spell directory in 'runtimepath', or ":set spelllang=xx.enc.spl" if you
Bram Moolenaard042c562005-06-30 22:04:15 +0000480 wrote it somewhere else.
Bram Moolenaar217ad922005-03-20 22:37:15 +0000481
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000482When the Myspell files are updated you can merge the differences:
Bram Moolenaar0cb032e2005-04-23 20:52:00 +00004831. Obtain the new Myspell files as xx_YY.new.aff and xx_UU.new.dic.
4842. Use Vimdiff to see what changed: >
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000485 vimdiff xx_YY.orig.dic xx_YY.new.dic
Bram Moolenaar0cb032e2005-04-23 20:52:00 +00004863. Take over the changes you like in xx_YY.dic.
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000487 You may also need to change xx_YY.aff.
Bram Moolenaar0cb032e2005-04-23 20:52:00 +00004884. Rename xx_YY.new.dic to xx_YY.orig.dic and xx_YY.new.aff to xx_YY.new.aff.
Bram Moolenaar217ad922005-03-20 22:37:15 +0000489
Bram Moolenaar3b506942005-06-23 22:36:45 +0000490
Bram Moolenaar8aff23a2005-08-19 20:40:30 +0000491SPELL FILE VERSIONS *E770* *E771* *E772*
492
493Spell checking is a relatively new feature in Vim, thus it's possible that the
494.spl file format will be changed to support more languages. Vim will check
495the validity of the spell file and report anything wrong.
496
497 E771: Old spell file, needs to be updated ~
498This spell file is older than your Vim. You need to update the .spl file.
499
500 E772: Spell file is for newer version of Vim ~
501This means the spell file was made for a later version of Vim. You need to
502update Vim.
503
504 E770: Unsupported section in spell file ~
505This means the spell file was made for a later version of Vim and contains a
506section that is required for the spell file to work. In this case it's
507probably a good idea to upgrade your Vim.
508
509
Bram Moolenaar3b506942005-06-23 22:36:45 +0000510SPELL FILE DUMP
511
512If for some reason you want to check what words are supported by the currently
513used spelling files, use this command:
514
515 *:spelldump* *:spelld*
516:spelld[ump] Open a new window and fill it with all currently valid
517 words.
Bram Moolenaard042c562005-06-30 22:04:15 +0000518 Note: For some languages the result may be enormous,
519 causing Vim to run out of memory.
Bram Moolenaar3b506942005-06-23 22:36:45 +0000520
521The format of the word list is used |spell-wordlist-format|. You should be
522able to read it with ":mkspell" to generate one .spl file that includes all
523the words.
524
Bram Moolenaar1f8a5f02005-07-01 22:41:52 +0000525When all entries to 'spelllang' use the same regions or no regions at all then
526the region information is included in the dumped words. Otherwise only words
527for the current region are included and no "/regions" line is generated.
Bram Moolenaar3b506942005-06-23 22:36:45 +0000528
Bram Moolenaard042c562005-06-30 22:04:15 +0000529Comment lines with the name of the .spl file are used as a header above the
530words that were generated from that .spl file.
Bram Moolenaar3b506942005-06-23 22:36:45 +0000531
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000532==============================================================================
Bram Moolenaard042c562005-06-30 22:04:15 +00005334. Spell file format *spell-file-format*
Bram Moolenaar217ad922005-03-20 22:37:15 +0000534
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000535This is the format of the files that are used by the person who creates and
536maintains a word list.
Bram Moolenaar217ad922005-03-20 22:37:15 +0000537
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000538Note that we avoid the word "dictionary" here. That is because the goal of
539spell checking differs from writing a dictionary (as in the book). For
Bram Moolenaard042c562005-06-30 22:04:15 +0000540spelling we need a list of words that are OK, thus should not to be
541highlighted. Person and company names will not appear in a dictionary, but do
542appear in a word list. And some old words are rarely used while they are
543common misspellings. These do appear in a dictionary but not in a word list.
Bram Moolenaar217ad922005-03-20 22:37:15 +0000544
Bram Moolenaar7d1f5db2005-07-03 21:39:27 +0000545There are two formats: A straight list of words and a list using affix
Bram Moolenaard042c562005-06-30 22:04:15 +0000546compression. The files with affix compression are used by Myspell (Mozilla
547and OpenOffice.org). This requires two files, one with .aff and one with .dic
548extension.
Bram Moolenaar75c50c42005-06-04 22:06:24 +0000549
550
Bram Moolenaard042c562005-06-30 22:04:15 +0000551FORMAT OF STRAIGHT WORD LIST *spell-wordlist-format*
Bram Moolenaar75c50c42005-06-04 22:06:24 +0000552
Bram Moolenaar82cf9b62005-06-07 21:09:25 +0000553The words must appear one per line. That is all that is required.
Bram Moolenaard042c562005-06-30 22:04:15 +0000554
Bram Moolenaar82cf9b62005-06-07 21:09:25 +0000555Additionally the following items are recognized:
Bram Moolenaard042c562005-06-30 22:04:15 +0000556
Bram Moolenaar75c50c42005-06-04 22:06:24 +0000557- Empty and blank lines are ignored.
Bram Moolenaard042c562005-06-30 22:04:15 +0000558
Bram Moolenaar75c50c42005-06-04 22:06:24 +0000559- Lines starting with a # are ignored (comment lines).
Bram Moolenaard042c562005-06-30 22:04:15 +0000560
Bram Moolenaar45eeb132005-06-06 21:59:07 +0000561- A line starting with "/encoding=", before any word, specifies the encoding
562 of the file. After the second '=' comes an encoding name. This tells Vim
Bram Moolenaard042c562005-06-30 22:04:15 +0000563 to setup conversion from the specified encoding to 'encoding'. Thus you can
564 use one word list for several target encodings.
565
Bram Moolenaar3638c682005-06-08 22:05:14 +0000566- A line starting with "/regions=" specifies the region names that are
567 supported. Each region name must be two ASCII letters. The first one is
568 region 1. Thus "/regions=usca" has region 1 "us" and region 2 "ca".
Bram Moolenaard042c562005-06-30 22:04:15 +0000569 In an addition word list the region names should be equal to the main word
570 list!
571
Bram Moolenaar82cf9b62005-06-07 21:09:25 +0000572- Other lines starting with '/' are reserved for future use. The ones that
573 are not recognized are ignored (but you do get a warning message).
Bram Moolenaar75c50c42005-06-04 22:06:24 +0000574
Bram Moolenaar1f8a5f02005-07-01 22:41:52 +0000575- A "/" may follow the word with the following items:
576 = Case must match exactly.
577 ? Rare word.
578 ! Bad (wrong) word.
579 digit A region in which the word is valid. If no regions are
580 specified the word is valid in all regions.
581
Bram Moolenaar3638c682005-06-08 22:05:14 +0000582Example:
583
584 # This is an example word list comment
585 /encoding=latin1 encoding of the file
586 /regions=uscagb regions "us", "ca" and "gb"
587 example word for all regions
Bram Moolenaar1f8a5f02005-07-01 22:41:52 +0000588 blah/12 word for regions "us" and "ca"
589 vim/! bad word
590 Campbell/?3 rare word in region 3 "gb"
591 's mornings/= keep-case word
Bram Moolenaar3638c682005-06-08 22:05:14 +0000592
Bram Moolenaar0dc065e2005-07-04 22:49:24 +0000593Note that when "/=" is used the same word with all upper-case letters is not
594accepted. This is different from a word with mixed case that is automatically
595marked as keep-case, those words may appear in all upper-case letters.
596
Bram Moolenaar75c50c42005-06-04 22:06:24 +0000597
598FORMAT WITH AFFIX COMPRESSION
599
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000600There are two files: the basic word list and an affix file. The affixes are
601used to modify the basic words to get the full word list. This significantly
602reduces the number of words, especially for a language like Polish. This is
603called affix compression.
Bram Moolenaar217ad922005-03-20 22:37:15 +0000604
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000605The format for the affix and word list files is mostly identical to what
606Myspell uses (the spell checker of Mozilla and OpenOffice.org). A description
607can be found here:
608 http://lingucomponent.openoffice.org/affix.readme ~
609Note that affixes are case sensitive, this isn't obvious from the description.
Bram Moolenaar82cf9b62005-06-07 21:09:25 +0000610
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000611Vim supports a few extras. Hopefully Myspell will support these too some day.
612See |spell-affix-vim|.
Bram Moolenaar217ad922005-03-20 22:37:15 +0000613
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000614The basic word list and the affix file are combined and turned into a binary
615spell file. All the preprocessing has been done, thus this file loads fast.
616The binary spell file format is described in the source code (src/spell.c).
617But only developers need to know about it.
618
619The preprocessing also allows us to take the Myspell language files and modify
620them before the Vim word list is made. The tools for this can be found in the
621"src/spell" directory.
622
623
Bram Moolenaar3638c682005-06-08 22:05:14 +0000624WORD LIST FORMAT *spell-dic-format*
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000625
626A very short example, with line numbers:
627
628 1 1234
629 2 aan
630 3 Als
631 4 Etten-Leur
632 5 et al.
633 6 's-Gravenhage
634 7 's-Gravenhaags
635 8 bedel/P
636 9 kado/1
637 10 cadeau/2
638
Bram Moolenaar82cf9b62005-06-07 21:09:25 +0000639The first line contains the number of words. Vim ignores it, but you do get
640an error message if it's not there. *E760*
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000641
Bram Moolenaar82cf9b62005-06-07 21:09:25 +0000642What follows is one word per line. There should be no white space before or
643after the word.
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000644
645When the word only has lower-case letters it will also match with the word
646starting with an upper-case letter.
647
648When the word includes an upper-case letter, this means the upper-case letter
649is required at this position. The same word with a lower-case letter at this
650position will not match. When some of the other letters are upper-case it will
651not match either.
652
Bram Moolenaard042c562005-06-30 22:04:15 +0000653The word with all upper-case characters will always be OK.
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000654
655 word list matches does not match ~
656 als als Als ALS ALs AlS aLs aLS
657 Als Als ALS als ALs AlS aLs aLS
658 ALS ALS als Als ALs AlS aLs aLS
659 AlS AlS ALS als Als ALs aLs aLS
660
Bram Moolenaar82cf9b62005-06-07 21:09:25 +0000661The KEP affix ID can be used to specifically match a word with identical case
Bram Moolenaare7566042005-06-17 22:00:15 +0000662only, see below |spell-affix-KEP|.
Bram Moolenaar45eeb132005-06-06 21:59:07 +0000663
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000664Note in line 5 to 7 that non-word characters are used. You can include
665any character in a word. When checking the text a word still only matches
666when it appears with a non-word character before and after it. For Myspell a
667word starting with a non-word character probably won't work.
668
669After the word there is an optional slash and flags. Most of these flags are
Bram Moolenaard042c562005-06-30 22:04:15 +0000670letters that indicate the affixes that can be used with this word. These are
671specified with SFX and PFX lines in the .aff file. See the Myspell
672documentation.
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000673
674 *spell-affix-vim*
Bram Moolenaar82cf9b62005-06-07 21:09:25 +0000675A flag that Vim adds and is not in Myspell is the flag defined with KEP in the
Bram Moolenaar45eeb132005-06-06 21:59:07 +0000676affix file. This has the meaning that case matters. This can be used if the
677word does not have the first letter in upper case at the start of a sentence.
Bram Moolenaar82cf9b62005-06-07 21:09:25 +0000678Example (assuming that = was used for KEP):
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000679
Bram Moolenaar0dc065e2005-07-04 22:49:24 +0000680 word list matches does not match ~
681 's morgens/= 's morgens 'S morgens 's Morgens 'S MORGENS
682 's Morgens 's Morgens 'S MORGENS 'S morgens 's morgens
683
684The flag can also be used to avoid that the word matches when it is in all
685upper-case letters.
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000686
687 *spell-affix-mbyte*
688The basic word list is normally in an 8-bit encoding, which is mentioned in
689the affix file. The affix file must always be in the same encoding as the
690word list. This is compatible with Myspell. For Vim the encoding may also be
691something else, any encoding that "iconv" supports. The "SET" line must
692specify the name of the encoding. When using a multi-byte encoding it's
Bram Moolenaard042c562005-06-30 22:04:15 +0000693possible to use more different affixes (but Myspell doesn't support that, thus
694you may not want to use it anyway).
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000695
Bram Moolenaare13305e2005-06-19 22:54:15 +0000696
697CHARACTER TABLES
Bram Moolenaar0cb032e2005-04-23 20:52:00 +0000698 *spell-affix-chars*
Bram Moolenaar82cf9b62005-06-07 21:09:25 +0000699When using an 8-bit encoding the affix file should define what characters are
700word characters (as specified with ENC). This is because the system where
701":mkspell" is used may not support a locale with this encoding and isalpha()
702won't work. For example when using "cp1250" on Unix.
Bram Moolenaar0cb032e2005-04-23 20:52:00 +0000703
Bram Moolenaare7566042005-06-17 22:00:15 +0000704 *E761* *E762* *spell-affix-FOL*
705 *spell-affix-LOW* *spell-affix-UPP*
Bram Moolenaar0cb032e2005-04-23 20:52:00 +0000706Three lines in the affix file are needed. Simplistic example:
707
Bram Moolenaare13305e2005-06-19 22:54:15 +0000708 FOL áëñ ~
709 LOW áëñ ~
710 UPP ÁËÑ ~
Bram Moolenaar0cb032e2005-04-23 20:52:00 +0000711
712All three lines must have exactly the same number of characters.
713
714The "FOL" line specifies the case-folded characters. These are used to
715compare words while ignoring case. For most encodings this is identical to
716the lower case line.
717
718The "LOW" line specifies the characters in lower-case. Mostly it's equal to
719the "FOL" line.
720
721The "UPP" line specifies the characters with upper-case. That is, a character
722is upper-case where it's different from the character at the same position in
723"FOL".
724
725ASCII characters should be omitted, Vim always handles these in the same way.
726When the encoding is UTF-8 no word characters need to be specified.
727
728 *E763*
Bram Moolenaar3b506942005-06-23 22:36:45 +0000729Vim allows you to use spell checking for several languages in the same file.
730You can list them in the 'spelllang' option. As a consequence all spell files
731for the same encoding must use the same word characters, otherwise they can't
732be combined without errors. If you get a warning that the word tables differ
733you may need to generate the .spl file again with |:mkspell|. Check the FOL,
734LOW and UPP lines in the used .aff file.
735
736The XX.ascii.spl spell file generated with the "-ascii" argument will not
737contain the table with characters, so that it can be combine with spell files
738for any encoding. The .add.spl files also do not contain the table.
Bram Moolenaar0cb032e2005-04-23 20:52:00 +0000739
Bram Moolenaare7566042005-06-17 22:00:15 +0000740
Bram Moolenaar9a50b1b2005-06-27 22:48:21 +0000741MID-WORD CHARACTERS
742 *spell-midword*
743Some characters are only to be considered word characters if they are used in
744between two ordinary word characters. An example is the single quote: It is
745often used to put text in quotes, thus it can't be recognized as a word
746character, but when it appears in between word characters it must be part of
747the word. This is needed to detect a spelling error such as they'are. That
748should be they're, but since "they" and "are" are words themselves that would
749go unnoticed.
750
751These characters are defined with MIDWORD in the .aff file:
752
753 MIDWORD '- ~
754
755
Bram Moolenaare13305e2005-06-19 22:54:15 +0000756AFFIXES
757 *spell-affix-PFX* *spell-affix-SFX*
758The usual PFX (prefix) and SFX (suffix) lines are supported (see the Myspell
Bram Moolenaar9a50b1b2005-06-27 22:48:21 +0000759documentation or the Aspell manual:
760http://aspell.net/man-html/Affix-Compression.html).
Bram Moolenaare13305e2005-06-19 22:54:15 +0000761
Bram Moolenaar9a50b1b2005-06-27 22:48:21 +0000762Note that Myspell ignores any extra text after the relevant info. Vim
763requires this text to start with a "#" so that mistakes don't go unnoticed.
764Example:
765
766 SFX F 0 in [^i]n # Spion > Spionin ~
767 SFX F 0 nen in # Bauerin > Bauerinnen ~
768
Bram Moolenaar8aff23a2005-08-19 20:40:30 +0000769 *spell-affix-rare*
Bram Moolenaar9a50b1b2005-06-27 22:48:21 +0000770An extra item for Vim is the "rare" flag. It must come after the other
771fields, before a comment. When used then all words that use the affix will be
772marked as rare words. Example:
773
774 PFX F 0 nene . rare ~
775 SFX F 0 oin n rare # hardly ever used ~
776
777However, if the word also appears as a good word in another way it won't be
778marked as rare.
Bram Moolenaare13305e2005-06-19 22:54:15 +0000779
Bram Moolenaar8aff23a2005-08-19 20:40:30 +0000780 *spell-affix-nocomp*
781Another extra item for Vim is the "nocomp" flag. It must come after the other
Bram Moolenaar90915b52005-08-21 22:17:52 +0000782fields, before a comment. It can be either before or after "rare". When
783present then all words that use the affix will not be part of a compound word.
Bram Moolenaar8aff23a2005-08-19 20:40:30 +0000784Example:
785 affix file:
786 COMPOUNDFLAG c ~
787 SFX a Y 2 ~
788 SFX a 0 s . ~
789 SFX a 0 ize . nocomp ~
790 dictionary:
791 word/c ~
792 util/ac ~
793
794This allows for "wordutil" and "wordutils" but not "wordutilize".
795
Bram Moolenaare13305e2005-06-19 22:54:15 +0000796 *spell-affix-PFXPOSTPONE*
797When an affix file has very many prefixes that apply to many words it's not
798possible to build the whole word list in memory. This applies to Hebrew (a
799list with all words is over a Gbyte). In that case applying prefixes must be
800postponed. This makes spell checking slower. It is indicated by this keyword
801in the .aff file:
802
803 PFXPOSTPONE ~
804
805Only prefixes without a chop string can be postponed, prefixes with a chop
Bram Moolenaar78984f52005-08-01 07:19:10 +0000806string will still be included in the word list. An exception if the chop
807string is one character and equal to the last character of the added string,
808but in lower case. Thus when the chop string is used to allow the following
809word to start with an upper case letter.
Bram Moolenaare13305e2005-06-19 22:54:15 +0000810
Bram Moolenaarae5bce12005-08-15 21:41:48 +0000811
812WORDS WITH A SLASH *spell-affix-SLASH*
813
814The slash is used in the .dic file to separate the basic word from the affix
815letters that can be used. Unfortunately, this means you cannot use a slash in
816a word. Thus "TCP/IP" cannot be a word. To work around that you can define a
817replacement character for the slash. Example:
818
819 SLASH , ~
820
821Now you can use "TCP,IP" to add the word "TCP/IP".
822
823Of course, the letter used should itself not appear in any word! The letter
824must be ASCII, thus a single byte.
825
826
827KEEP-CASE WORDS *spell-affix-KEP*
828
Bram Moolenaar82cf9b62005-06-07 21:09:25 +0000829In the affix file a KEP line can be used to define the affix name used for
Bram Moolenaar45eeb132005-06-06 21:59:07 +0000830keep-case words. Example:
831
Bram Moolenaar82cf9b62005-06-07 21:09:25 +0000832 KEP = ~
Bram Moolenaar45eeb132005-06-06 21:59:07 +0000833
834See above for an example |spell-affix-vim|.
835
Bram Moolenaare13305e2005-06-19 22:54:15 +0000836
Bram Moolenaarae5bce12005-08-15 21:41:48 +0000837RARE WORDS *spell-affix-RAR*
838
Bram Moolenaar45eeb132005-06-06 21:59:07 +0000839In the affix file a RAR line can be used to define the affix name used for
840rare words. Example:
841
842 RAR ? ~
843
844Rare words are highlighted differently from bad words. This is to be used for
845words that are correct for the language, but are hardly ever used and could be
Bram Moolenaar30abd282005-06-22 22:35:10 +0000846a typing mistake anyway. When the same word is found as good it won't be
847highlighted as rare.
848
849
Bram Moolenaarae5bce12005-08-15 21:41:48 +0000850BAD WORDS *spell-affix-BAD*
851
Bram Moolenaar30abd282005-06-22 22:35:10 +0000852In the affix file a BAD line can be used to define the affix name used for
853bad words. Example:
854
855 BAD ! ~
856
857This can be used to exclude words that would otherwise be good. For example
Bram Moolenaar9a50b1b2005-06-27 22:48:21 +0000858"the the" in the .dic file:
859
860 the the/! ~
861
862Once a word has been marked as bad it won't be undone by encountering the same
863word as good.
Bram Moolenaar45eeb132005-06-06 21:59:07 +0000864
Bram Moolenaar8aff23a2005-08-19 20:40:30 +0000865 *spell-affix-NEEDAFFIX*
866The NEEDAFFIX flag is used to require that a word is used with an affix. The
867word itself is not a good word. Example:
868
869 NEEDAFFIX + ~
870
Bram Moolenaar45eeb132005-06-06 21:59:07 +0000871
Bram Moolenaarae5bce12005-08-15 21:41:48 +0000872COMPOUND WORDS *spell-affix-compound*
873
Bram Moolenaar8aff23a2005-08-19 20:40:30 +0000874A compound word is a longer word made by concatenating words that appear in
875the .dic file. To specify which words may be concatenated a character is
876used. This character is put in the list of affixes after the word. We will
877call this character a flag here. Obviously these flags must be different from
878any affix IDs used.
Bram Moolenaarae5bce12005-08-15 21:41:48 +0000879
880 *spell-COMPOUNDFLAG*
881The Myspell compatible method uses one flag, specified with COMPOUNDFLAG.
Bram Moolenaar8aff23a2005-08-19 20:40:30 +0000882All words with this flag combine in any order. This means there is no control
883over which word comes first. Example:
Bram Moolenaarae5bce12005-08-15 21:41:48 +0000884 COMPOUNDFLAG c ~
885
886 *spell-COMPOUNDFLAGS*
Bram Moolenaar8aff23a2005-08-19 20:40:30 +0000887A more advanced method to specify how compound words can be formed uses
888multiple items with multiple flags. This is not compatible with Myspell 3.0.
889Let's start with an example:
890 COMPOUNDFLAGS c+ ~
891 COMPOUNDFLAGS se ~
Bram Moolenaarae5bce12005-08-15 21:41:48 +0000892
Bram Moolenaar8aff23a2005-08-19 20:40:30 +0000893The first line defines that words with the "c" flag can be concatenated in any
894order. The second line defines compound words that are made of one word with
895the "s" flag and one word with the "e" flag. With this dictionary:
896 bork/c ~
897 onion/s ~
898 soup/e ~
Bram Moolenaarae5bce12005-08-15 21:41:48 +0000899
Bram Moolenaar8aff23a2005-08-19 20:40:30 +0000900You can make these words:
901 bork
902 borkbork
903 borkborkbork
904 (etc.)
Bram Moolenaarae5bce12005-08-15 21:41:48 +0000905 onion
906 soup
Bram Moolenaarae5bce12005-08-15 21:41:48 +0000907 onionsoup
Bram Moolenaarae5bce12005-08-15 21:41:48 +0000908
Bram Moolenaar8aff23a2005-08-19 20:40:30 +0000909The COMPOUNDFLAGS item may appear multiple times. The argument is made out of
910one or more groups, where each group can be:
911 one flag e.g., c
912 alternate flags inside [] e.g., [abc]
913Optionally this may be followed by:
914 * the group appears zero or more times, e.g., sm*e
915 + the group appears one or more times, e.g., c+
Bram Moolenaarae5bce12005-08-15 21:41:48 +0000916
Bram Moolenaar8aff23a2005-08-19 20:40:30 +0000917This is similar to the regexp pattern syntax (but not the same!). A few
918examples with the sequence of word flags they require:
919 COMPOUNDFLAGS x+ x xx xxx etc.
920 COMPOUNDFLAGS yz yz
921 COMPOUNDFLAGS x+z xz xxz xxxz etc.
922 COMPOUNDFLAGS yx+ yx yxx yxxx etc.
Bram Moolenaarae5bce12005-08-15 21:41:48 +0000923
Bram Moolenaar8aff23a2005-08-19 20:40:30 +0000924 COMPOUNDFLAGS [abc]z az bz cz
925 COMPOUNDFLAGS [abc]+z az aaz abaz bz baz bcbz cz caz cbaz etc.
926 COMPOUNDFLAGS a[xyz]+ ax axx axyz ay ayx ayzz az azy azxy etc.
927 COMPOUNDFLAGS sm*e se sme smme smmme etc.
928 COMPOUNDFLAGS s[xyz]*e se sxe sxye sxyxe sye syze sze szye szyxe etc.
Bram Moolenaarae5bce12005-08-15 21:41:48 +0000929
Bram Moolenaara6c840d2005-08-22 22:59:46 +0000930A specific example: Allow a compound to be made of two words and a dash:
931 In the .aff file:
932 COMPOUNDFLAGS sde ~
933 NEEDAFFIX x ~
934 COMPOUNDMAX 3 ~
935 COMPOUNDMIN 1 ~
936 In the .dic file:
937 start/s ~
938 end/e ~
939 -/xd ~
940
941This allows for the word "start-end", but not "startend".
942
Bram Moolenaarae5bce12005-08-15 21:41:48 +0000943 *spell-COMPOUNDMIN*
Bram Moolenaar8aff23a2005-08-19 20:40:30 +0000944The minimal byte length of a word used for concatenation is specified with
Bram Moolenaarae5bce12005-08-15 21:41:48 +0000945COMPOUNDMIN. Example:
946 COMPOUNDMIN 5 ~
947
948When omitted a minimal length of 3 bytes is used. Obviously you could just
949leave out the compound flag from short words instead, this feature is present
950for compatibility with Myspell.
951
Bram Moolenaar8aff23a2005-08-19 20:40:30 +0000952 *spell-COMPOUNDMAX*
953The maximum number of words that can be concatenated into a compound word is
954specified with COMPOUNDMAX. Example:
955 COMPOUNDMAX 3 ~
Bram Moolenaarae5bce12005-08-15 21:41:48 +0000956
Bram Moolenaar8aff23a2005-08-19 20:40:30 +0000957When omitted there is no maximum. It applies to all compound words.
Bram Moolenaarae5bce12005-08-15 21:41:48 +0000958
Bram Moolenaar8aff23a2005-08-19 20:40:30 +0000959To set a limit for words with specific flags make sure the items in
960COMPOUNDFLAGS where they appear don't allow too many words.
Bram Moolenaarae5bce12005-08-15 21:41:48 +0000961
Bram Moolenaar8aff23a2005-08-19 20:40:30 +0000962 *spell-COMPOUNDSYLMAX*
963The maximum number of syllables that a compound word may contain is specified
964with COMPOUNDSYLMAX. Example:
965 COMPOUNDSYLMAX 6 ~
Bram Moolenaarae5bce12005-08-15 21:41:48 +0000966
Bram Moolenaar8aff23a2005-08-19 20:40:30 +0000967This has no effect if there is no SYLLABLE item. Without COMPOUNDSYLMAX there
968is no limit on the number of syllables.
Bram Moolenaarae5bce12005-08-15 21:41:48 +0000969
Bram Moolenaara6c840d2005-08-22 22:59:46 +0000970If both COMPOUNDMAX and COMPOUNDSYLMAX are defined, a compound word is
971accepted if it fits one of the criteria, thus is either made from up to
972COMPOUNDMAX words or contains up to COMPOUNDSYLMAX syllables.
973
Bram Moolenaar8aff23a2005-08-19 20:40:30 +0000974 *spell-SYLLABLE*
975The SYLLABLE item defines characters or character sequences that are used to
976count the number of syllables in a word. Example:
977 SYLLABLE aáeéiíoóöõuúüûy/aa/au/ea/ee/ei/ie/oa/oe/oo/ou/uu/ui ~
Bram Moolenaarae5bce12005-08-15 21:41:48 +0000978
Bram Moolenaar8aff23a2005-08-19 20:40:30 +0000979Before the first slash is the set of characters that are counted for one
980syllable, also when repeated and mixed, until the next character that is not
981in this set. After the slash come sequences of characters that are counted
982for one syllable. These are preferred over using characters from the set.
983With the example "ideeen" has three syllables, counted by "i", "ee" and "e".
984
985Only case-folded letters need to be included.
986
987Above another way to restrict compounding was mentioned above: adding "nocomp"
988after an affix causes all words that are made with that affix not be be used
989for compounding. |spell-affix-nocomp|
990
991>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>><<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
992NOTE: The following has not been implemented yet, because there are no word
993lists that support this.
994> *spell-CMP*
995> Sometimes it is necessary to change a word when concatenating it to another,
996> by removing a few letters, inserting something or both. It can also be useful
997> to restrict concatenation to words that match a pattern. For this purpose CMP
998> items can be used. They look like this:
999> CMP {flag} {flags} {strip} {strip2} {add} {cond} {cond2}
1000>
1001> {flag} the flag, as used in COMPOUNDFLAGS for the lead word
1002> {flags} accepted flags for the following word ('.' to accept
1003> all)
1004> {strip} text to remove from the end of the lead word (zero
1005> for no stripping)
1006> {strip2} text to remove from the start of the following word
1007> (zero for no stripping)
1008> {add} text to insert between the words (zero for no
1009> addition)
1010> {cond} condition to match at the end of the lead word
1011> {cond2} condition to match at the start of the following word
1012>
1013> This is the same as what is used for SFX and PFX items, with the extra {flags}
1014> and {cond2} fields. Example:
1015> CMP f mrt 0 - . . ~
1016>
1017> When used with the food and dish word list above, this means that a dash is
1018> inserted after each food item. Thus you get "onion-soup" and
1019> "onion-tomato-salat".
1020>
1021> When there are CMP items for a compound flag the concatenation is only done
1022> when a CMP item matches.
1023>
1024> When there are no CMP items for a compound flag, then all words will be
1025> concatenated, as if there was an item:
1026> CMP {flag} . 0 0 . .
1027>
1028>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>><<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
Bram Moolenaarae5bce12005-08-15 21:41:48 +00001029
1030
Bram Moolenaar9ba0eb82005-06-13 22:28:56 +00001031REPLACEMENTS *spell-affix-REP*
1032
1033In the affix file REP items can be used to define common mistakes. This is
1034used to make spelling suggestions. The items define the "from" text and the
1035"to" replacement. Example:
1036
1037 REP 4 ~
1038 REP f ph ~
1039 REP ph f ~
1040 REP k ch ~
1041 REP ch k ~
1042
1043The first line specifies the number of REP lines following. Vim ignores it.
Bram Moolenaard042c562005-06-30 22:04:15 +00001044Don't include simple one-character replacements or swaps. Vim will try these
1045anyway. You can include whole words if you want to, but you might want to use
1046the "file:" item in 'spellsuggest' instead.
Bram Moolenaar9ba0eb82005-06-13 22:28:56 +00001047
1048
1049SIMILAR CHARACTERS *spell-affix-MAP*
1050
Bram Moolenaard042c562005-06-30 22:04:15 +00001051In the affix file MAP items can be used to define letters that are very much
Bram Moolenaar9ba0eb82005-06-13 22:28:56 +00001052alike. This is mostly used for a letter with different accents. This is used
1053to prefer suggestions with these letters substituted. Example:
1054
1055 MAP 2 ~
1056 MAP eéëêè ~
1057 MAP uüùúû ~
1058
1059The first line specifies the number of MAP lines following. Vim ignores it.
1060
Bram Moolenaard042c562005-06-30 22:04:15 +00001061Each letter must appear in only one of the MAP items. It's a bit more
1062efficient if the first letter is ASCII or at least one without accents.
Bram Moolenaare7566042005-06-17 22:00:15 +00001063
Bram Moolenaar9ba0eb82005-06-13 22:28:56 +00001064
Bram Moolenaard042c562005-06-30 22:04:15 +00001065SOUND-A-LIKE *spell-affix-SAL*
Bram Moolenaar9ba0eb82005-06-13 22:28:56 +00001066
1067In the affix file SAL items can be used to define the sounds-a-like mechanism
1068to be used. The main items define the "from" text and the "to" replacement.
Bram Moolenaard042c562005-06-30 22:04:15 +00001069Simplistic example:
Bram Moolenaar9ba0eb82005-06-13 22:28:56 +00001070
1071 SAL CIA X ~
1072 SAL CH X ~
1073 SAL C K ~
1074 SAL K K ~
1075
Bram Moolenaar7d1f5db2005-07-03 21:39:27 +00001076There are a few rules and this can become quite complicated. An explanation
Bram Moolenaard042c562005-06-30 22:04:15 +00001077how it works can be found in the Aspell manual:
Bram Moolenaar42eeac32005-06-29 22:40:58 +00001078http://aspell.net/man-html/Phonetic-Code.html.
Bram Moolenaar9ba0eb82005-06-13 22:28:56 +00001079
1080There are a few special items:
1081
1082 SAL followup true ~
1083 SAL collapse_result true ~
1084 SAL remove_accents true ~
1085
1086"1" has the same meaning as "true". Any other value means "false".
1087
Bram Moolenaar42eeac32005-06-29 22:40:58 +00001088
1089SIMPLE SOUNDFOLDING *spell-affix-SOFOFROM* *spell-affix-SOFOTO*
1090
1091The SAL mechanism is complex and slow. A simpler mechanism is mapping all
1092characters to another character, mapping similar sounding characters to the
1093same character. At the same time this does case folding. You can not have
Bram Moolenaard042c562005-06-30 22:04:15 +00001094both SAL items and simple soundfolding.
Bram Moolenaar42eeac32005-06-29 22:40:58 +00001095
Bram Moolenaar7d1f5db2005-07-03 21:39:27 +00001096There are two items required: one to specify the characters that are mapped
Bram Moolenaar42eeac32005-06-29 22:40:58 +00001097and one that specifies the characters they are mapped to. They must have
1098exactly the same number of characters. Example:
1099
1100 SOFOFROM abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ ~
1101 SOFOTO ebctefghejklnnepkrstevvkesebctefghejklnnepkrstevvkes ~
1102
1103In the example all vowels are mapped to the same character 'e'. Another
Bram Moolenaard042c562005-06-30 22:04:15 +00001104method would be to leave out all vowels. Some characters that sound nearly
1105the same and are often mixed up, such as 'm' and 'n', are mapped to the same
1106character. Don't do this too much, all words will start looking alike.
Bram Moolenaar42eeac32005-06-29 22:40:58 +00001107
1108Characters that do not appear in SOFOFROM will be left out, except that all
1109white space is replaced by one space. Sequences of the same character in
1110SOFOFROM are replaced by one.
1111
1112You can use the |soundfold()| function to try out the results. Or set the
1113'verbose' option to see the score in the output of the |z?| command.
1114
1115
Bram Moolenaar217ad922005-03-20 22:37:15 +00001116 vim:tw=78:sw=4:ts=8:ft=help:norl: