blob: b9e511428949aa2b4ecfa00c95419731dfa478ee [file] [log] [blame]
Bram Moolenaarac6e65f2005-08-29 22:25:38 +00001*spell.txt* For Vim version 7.0aa. Last change: 2005 Aug 29
Bram Moolenaar217ad922005-03-20 22:37:15 +00002
3
4 VIM REFERENCE MANUAL by Bram Moolenaar
5
6
7Spell checking *spell*
8
91. Quick start |spell-quickstart|
Bram Moolenaard042c562005-06-30 22:04:15 +0000102. Remarks on spell checking |spell-remarks|
113. Generating a spell file |spell-mkspell|
124. Spell file format |spell-file-format|
Bram Moolenaar217ad922005-03-20 22:37:15 +000013
14{Vi does not have any of these commands}
15
16Spell checking is not available when the |+syntax| feature has been disabled
17at compile time.
18
19==============================================================================
201. Quick start *spell-quickstart*
21
22This command switches on spell checking: >
23
24 :setlocal spell spelllang=en_us
25
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +000026This switches on the 'spell' option and specifies to check for US English.
Bram Moolenaar217ad922005-03-20 22:37:15 +000027
28The words that are not recognized are highlighted with one of these:
Bram Moolenaar520470a2005-06-16 21:59:56 +000029 SpellBad word not recognized |hl-SpellBad|
Bram Moolenaar7d1f5db2005-07-03 21:39:27 +000030 SpellCap word not capitalised |hl-SpellCap|
Bram Moolenaar520470a2005-06-16 21:59:56 +000031 SpellRare rare word |hl-SpellRare|
32 SpellLocal wrong spelling for selected region |hl-SpellLocal|
Bram Moolenaar217ad922005-03-20 22:37:15 +000033
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +000034Vim only checks words for spelling, there is no grammar check.
35
Bram Moolenaar45360022005-07-21 21:08:21 +000036If the 'mousemodel' option is set to "popup" and the cursor is on a badly
37spelled word or it is "popup_setpos" and the mouse pointer is on a badly
38spelled word, then the popup menu will contain an submenu to replace the bad
39word. Note: this slows down the appearance of the popup menu.
40
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +000041To search for the next misspelled word:
42
43 *]s* *E756*
44]s Move to next misspelled word after the cursor.
Bram Moolenaar9d0ec2e2005-04-20 19:45:58 +000045 A count before the command can be used to repeat.
Bram Moolenaarac6e65f2005-08-29 22:25:38 +000046 'wrapscan' applies.
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +000047
48 *[s*
Bram Moolenaar9d0ec2e2005-04-20 19:45:58 +000049[s Like "]s" but search backwards, find the misspelled
Bram Moolenaar30abd282005-06-22 22:35:10 +000050 word before the cursor. Doesn't recognize words
51 split over two lines, thus may stop at words that are
Bram Moolenaar0d9c26d2005-07-02 23:19:16 +000052 not highlighted as bad. Does not stop at word with
53 missing capital at the start of a line.
Bram Moolenaar9d0ec2e2005-04-20 19:45:58 +000054
55 *]S*
56]S Like "]s" but only stop at bad words, not at rare
57 words or words for another region.
58
59 *[S*
60[S Like "]S" but search backwards.
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +000061
Bram Moolenaar217ad922005-03-20 22:37:15 +000062
Bram Moolenaar9ba0eb82005-06-13 22:28:56 +000063To add words to your own word list: *E764*
Bram Moolenaar82cf9b62005-06-07 21:09:25 +000064
65 *zg*
Bram Moolenaar0d9c26d2005-07-02 23:19:16 +000066zg Add word under the cursor as a good word to the first
67 name in 'spellfile'. In Visual mode the selected
68 characters are added as a word (including white
69 space!). If the word is explicitly marked as bad word
70 in another spell file the result is unpredictable.
71 A count may precede the command to indicate the entry
72 in 'spellfile' to be used. A count of two uses the
73 second entry.
Bram Moolenaar82cf9b62005-06-07 21:09:25 +000074
Bram Moolenaar1f8a5f02005-07-01 22:41:52 +000075 *zG*
Bram Moolenaar0d9c26d2005-07-02 23:19:16 +000076zG Like "zg" but add the word to the internal word list
77 |internal-wordlist|.
Bram Moolenaar1f8a5f02005-07-01 22:41:52 +000078
Bram Moolenaar82cf9b62005-06-07 21:09:25 +000079 *zw*
Bram Moolenaar0d9c26d2005-07-02 23:19:16 +000080zw Like "zg" but mark the word as a wrong (bad) word.
Bram Moolenaar82cf9b62005-06-07 21:09:25 +000081
Bram Moolenaar1f8a5f02005-07-01 22:41:52 +000082 *zW*
Bram Moolenaar0d9c26d2005-07-02 23:19:16 +000083zW Like "zw" but add the word to the internal word list
84 |internal-wordlist|.
Bram Moolenaar1f8a5f02005-07-01 22:41:52 +000085
Bram Moolenaar520470a2005-06-16 21:59:56 +000086 *:spe* *:spellgood*
Bram Moolenaar0d9c26d2005-07-02 23:19:16 +000087:[count]spe[llgood] {word}
Bram Moolenaar53180ce2005-07-05 21:48:14 +000088 Add {word} as a good word to 'spellfile', like with
Bram Moolenaar0d9c26d2005-07-02 23:19:16 +000089 "zg". Without count the first name is used, with a
90 count of two the second entry, etc.
Bram Moolenaar82cf9b62005-06-07 21:09:25 +000091
Bram Moolenaar53180ce2005-07-05 21:48:14 +000092:spe[llgood]! {word} Add {word} as a good word to the internal word list,
Bram Moolenaar0d9c26d2005-07-02 23:19:16 +000093 like with "zG".
Bram Moolenaar1f8a5f02005-07-01 22:41:52 +000094
Bram Moolenaar520470a2005-06-16 21:59:56 +000095 *:spellw* *:spellwrong*
Bram Moolenaar0d9c26d2005-07-02 23:19:16 +000096:[count]spellw[rong] {word}
Bram Moolenaar53180ce2005-07-05 21:48:14 +000097 Add {word} as a wrong (bad) word to 'spellfile', as
Bram Moolenaar0d9c26d2005-07-02 23:19:16 +000098 with "zw". Without count the first name is used, with
99 a count of two the second entry, etc.
Bram Moolenaar82cf9b62005-06-07 21:09:25 +0000100
Bram Moolenaar53180ce2005-07-05 21:48:14 +0000101:spellw[rong]! {word} Add {word} as a wrong (bad) word to the internal word
Bram Moolenaar1f8a5f02005-07-01 22:41:52 +0000102 list.
103
Bram Moolenaarf461c8e2005-06-25 23:04:51 +0000104After adding a word to 'spellfile' with the above commands its associated
Bram Moolenaard042c562005-06-30 22:04:15 +0000105".spl" file will automatically be updated and reloaded. If you change
106'spellfile' manually you need to use the |:mkspell| command. This sequence of
107commands mostly works well: >
Bram Moolenaar0d9c26d2005-07-02 23:19:16 +0000108 :edit <file in 'spellfile'>
Bram Moolenaarf461c8e2005-06-25 23:04:51 +0000109< (make changes to the spell file) >
110 :mkspell! %
111
112More details about the 'spellfile' format below |spell-wordlist-format|.
Bram Moolenaar82cf9b62005-06-07 21:09:25 +0000113
Bram Moolenaar0d9c26d2005-07-02 23:19:16 +0000114 *internal-wordlist*
Bram Moolenaar1f8a5f02005-07-01 22:41:52 +0000115The internal word list is used for all buffers where 'spell' is set. It is
116not stored, it is lost when you exit Vim. It is also cleared when 'encoding'
117is set.
118
Bram Moolenaar82cf9b62005-06-07 21:09:25 +0000119
Bram Moolenaar9ba0eb82005-06-13 22:28:56 +0000120Finding suggestions for bad words:
Bram Moolenaar9ba0eb82005-06-13 22:28:56 +0000121 *z?*
Bram Moolenaar30abd282005-06-22 22:35:10 +0000122z? For the word under/after the cursor suggest correctly
Bram Moolenaard042c562005-06-30 22:04:15 +0000123 spelled words. This also works to find alternatives
124 for a word that is not highlighted as a bad word,
125 e.g., when the word after it is bad.
Bram Moolenaar30abd282005-06-22 22:35:10 +0000126 The results are sorted on similarity to the word
127 under/after the cursor.
Bram Moolenaar90915b52005-08-21 22:17:52 +0000128 This may take a long time. Hit CTRL-C when you get
Bram Moolenaar9ba0eb82005-06-13 22:28:56 +0000129 bored.
Bram Moolenaar90915b52005-08-21 22:17:52 +0000130
131 If the command is used without a count the
132 alternatives are listed and you can enter the number
133 of your choice or press <Enter> if you don't want to
134 replace. You can also use the mouse to click on your
135 choice (only works if the mouse can be used in Normal
136 mode and when there are no line wraps). Click on the
137 first line (the header) to cancel.
138
139 If a count is used that suggestion is used, without
140 prompting. For example, "1z?" always takes the first
141 suggestion.
142
143 If 'verbose' is non-zero a score will be displayed
144 with the suggestions to indicate the likeliness to the
145 badly spelled word (the higher the score the more
146 different).
Bram Moolenaard857f0e2005-06-21 22:37:39 +0000147 When a word was replaced the redo command "." will
148 repeat the word replacement. This works like "ciw",
Bram Moolenaar6f16eb82005-08-23 21:02:42 +0000149 the good word and <Esc>. This does NOT work for Thai
150 and other languages without spaces between words.
Bram Moolenaard857f0e2005-06-21 22:37:39 +0000151
Bram Moolenaar24bbcfe2005-06-28 23:32:02 +0000152 *:spellr* *:spellrepall* *E752* *E753*
153:spellr[epall] Repeat the replacement done by |z?| for all matches
154 with the replaced word in the current window.
155
Bram Moolenaar488c6512005-08-11 20:09:58 +0000156In Insert mode, when the cursor is after a badly spelled word, you can use
157CTRL-X s to find suggestions. This works like Insert mode completion. Use
158CTRL-N to use the next suggestion, CTRL-P to go back. |i_CTRL-X_s|
159
Bram Moolenaard857f0e2005-06-21 22:37:39 +0000160The 'spellsuggest' option influences how the list of suggestions is generated
161and sorted. See |'spellsuggest'|.
Bram Moolenaar9ba0eb82005-06-13 22:28:56 +0000162
Bram Moolenaar0d9c26d2005-07-02 23:19:16 +0000163The 'spellcapcheck' option is used to check the first word of a sentence
164starts with a capital. This doesn't work for the first word in the file.
165When there is a line break right after a sentence the highlighting of the next
Bram Moolenaar90cfdbe2005-08-12 19:59:19 +0000166line may be postponed. Use |CTRL-L| when needed. Also see |set-spc-auto| for
167how it can be set automatically when 'spelllang' is set.
Bram Moolenaar0d9c26d2005-07-02 23:19:16 +0000168
Bram Moolenaard042c562005-06-30 22:04:15 +0000169==============================================================================
1702. Remarks on spell checking *spell-remarks*
Bram Moolenaar9ba0eb82005-06-13 22:28:56 +0000171
Bram Moolenaar6bb68362005-03-22 23:03:44 +0000172PERFORMANCE
173
Bram Moolenaard042c562005-06-30 22:04:15 +0000174Vim does on-the-fly spell checking. To make this work fast the word list is
175loaded in memory. Thus this uses a lot of memory (1 Mbyte or more). There
176might also be a noticeable delay when the word list is loaded, which happens
177when 'spell' is set and when 'spelllang' is set while 'spell' was already set.
178To minimize the delay each word list is only loaded once, it is not deleted
179when 'spelllang' is made empty or 'spell' is reset. When 'encoding' is set
180all the word lists are reloaded, thus you may notice a delay then too.
Bram Moolenaar6bb68362005-03-22 23:03:44 +0000181
182
Bram Moolenaar217ad922005-03-20 22:37:15 +0000183REGIONS
184
185A word may be spelled differently in various regions. For example, English
186comes in (at least) these variants:
187
188 en all regions
Bram Moolenaar5c5474b2005-04-19 21:40:26 +0000189 en_au Australia
Bram Moolenaar217ad922005-03-20 22:37:15 +0000190 en_ca Canada
Bram Moolenaar5c5474b2005-04-19 21:40:26 +0000191 en_gb Great Britain
192 en_nz New Zealand
193 en_us USA
Bram Moolenaar217ad922005-03-20 22:37:15 +0000194
195Words that are not used in one region but are used in another region are
Bram Moolenaar520470a2005-06-16 21:59:56 +0000196highlighted with SpellLocal |hl-SpellLocal|.
Bram Moolenaar217ad922005-03-20 22:37:15 +0000197
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000198Always use lowercase letters for the language and region names.
Bram Moolenaar217ad922005-03-20 22:37:15 +0000199
Bram Moolenaar3638c682005-06-08 22:05:14 +0000200When adding a word with |zg| or another command it's always added for all
201regions. You can change that by manually editing the 'spellfile'. See
Bram Moolenaar0dc065e2005-07-04 22:49:24 +0000202|spell-wordlist-format|. Note that the regions as specified in the files in
203'spellfile' are only used when all entries in "spelllang" specify the same
204region (not counting files specified by their .spl name).
Bram Moolenaar3638c682005-06-08 22:05:14 +0000205
Bram Moolenaar5b8d8fd2005-08-16 23:01:50 +0000206 *spell-german*
Bram Moolenaarae5bce12005-08-15 21:41:48 +0000207Specific exception: For German these special regions are used:
208 de all German words accepted
209 de_de old and new spelling
210 de_19 old spelling
211 de_20 new spelling
212 de_at Austria
213 de_ch Switzerland
214
Bram Moolenaar5b8d8fd2005-08-16 23:01:50 +0000215 *spell-yiddish*
216Yiddish requires using "utf-8" encoding, because of the special characters
217used. If you are using latin1 Vim will use transliterated (romanized) Yiddish
218instead. If you want to use transliterated Yiddish with utf-8 use "yi-tr".
219In a table:
220 'encoding' 'spelllang'
221 utf-8 yi Yiddish
222 latin1 yi transliterated Yiddish
223 utf-8 yi-tr transliterated Yiddish
224
Bram Moolenaar217ad922005-03-20 22:37:15 +0000225
Bram Moolenaar3b506942005-06-23 22:36:45 +0000226SPELL FILES *spell-load*
Bram Moolenaar217ad922005-03-20 22:37:15 +0000227
228Vim searches for spell files in the "spell" subdirectory of the directories in
Bram Moolenaar3638c682005-06-08 22:05:14 +0000229'runtimepath'. The name is: LL.EEE.spl, where:
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000230 LL the language name
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000231 EEE the value of 'encoding'
Bram Moolenaar217ad922005-03-20 22:37:15 +0000232
Bram Moolenaar3b506942005-06-23 22:36:45 +0000233The value for "LL" comes from 'spelllang', but excludes the region name.
234Examples:
235 'spelllang' LL ~
236 en_us en
237 en-rare en-rare
238 medical_ca medical
239
Bram Moolenaar3638c682005-06-08 22:05:14 +0000240Only the first file is loaded, the one that is first in 'runtimepath'. If
241this succeeds then additionally files with the name LL.EEE.add.spl are loaded.
242All the ones that are found are used.
243
Bram Moolenaar0d9c26d2005-07-02 23:19:16 +0000244Additionally, the files related to the names in 'spellfile' are loaded. These
245are the files that |zg| and |zw| add good and wrong words to.
Bram Moolenaar3b506942005-06-23 22:36:45 +0000246
Bram Moolenaar0e21a3f2005-04-17 20:28:32 +0000247Exceptions:
248- Vim uses "latin1" when 'encoding' is "iso-8859-15". The euro sign doesn't
249 matter for spelling.
250- When no spell file for 'encoding' is found "ascii" is tried. This only
251 works for languages where nearly all words are ASCII, such as English. It
252 helps when 'encoding' is not "latin1", such as iso-8859-2, and English text
Bram Moolenaar3638c682005-06-08 22:05:14 +0000253 is being edited. For the ".add" files the same name as the found main
254 spell file is used.
255
256For example, with these values:
257 'runtimepath' is "~/.vim,/usr/share/vim70,~/.vim/after"
258 'encoding' is "iso-8859-2"
259 'spelllang' is "pl"
260
261Vim will look for:
2621. ~/.vim/spell/pl.iso-8859-2.spl
2632. /usr/share/vim70/spell/pl.iso-8859-2.spl
2643. ~/.vim/spell/pl.iso-8859-2.add.spl
2654. /usr/share/vim70/spell/pl.iso-8859-2.add.spl
2665. ~/.vim/after/spell/pl.iso-8859-2.add.spl
267
268This assumes 1. is not found and 2. is found.
269
270If 'encoding' is "latin1" Vim will look for:
2711. ~/.vim/spell/pl.latin1.spl
2722. /usr/share/vim70/spell/pl.latin1.spl
2733. ~/.vim/after/spell/pl.latin1.spl
2744. ~/.vim/spell/pl.ascii.spl
2755. /usr/share/vim70/spell/pl.ascii.spl
2766. ~/.vim/after/spell/pl.ascii.spl
277
278This assumes none of them are found (Polish doesn't make sense when leaving
279out the non-ASCII characters).
Bram Moolenaar217ad922005-03-20 22:37:15 +0000280
Bram Moolenaar6bb68362005-03-22 23:03:44 +0000281Spelling for EBCDIC is currently not supported.
282
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000283A spell file might not be available in the current 'encoding'. See
284|spell-mkspell| about how to create a spell file. Converting a spell file
Bram Moolenaar0e21a3f2005-04-17 20:28:32 +0000285with "iconv" will NOT work!
Bram Moolenaar217ad922005-03-20 22:37:15 +0000286
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000287 *E758* *E759*
288When loading a spell file Vim checks that it is properly formatted. If you
Bram Moolenaar0e21a3f2005-04-17 20:28:32 +0000289get an error the file may be truncated, modified or intended for another Vim
290version.
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000291
Bram Moolenaar6bb68362005-03-22 23:03:44 +0000292
293WORDS
294
295Vim uses a fixed method to recognize a word. This is independent of
296'iskeyword', so that it also works in help files and for languages that
297include characters like '-' in 'iskeyword'. The word characters do depend on
298'encoding'.
299
Bram Moolenaar9ba0eb82005-06-13 22:28:56 +0000300The table with word characters is stored in the main .spl file. Therefore it
301matters what the current locale is when generating it! A .add.spl file does
Bram Moolenaarf461c8e2005-06-25 23:04:51 +0000302not contain a word table though.
Bram Moolenaar9ba0eb82005-06-13 22:28:56 +0000303
Bram Moolenaar3638c682005-06-08 22:05:14 +0000304A word that starts with a digit is always ignored. That includes hex numbers
305in the form 0xff and 0XFF.
Bram Moolenaar6bb68362005-03-22 23:03:44 +0000306
307
Bram Moolenaar30abd282005-06-22 22:35:10 +0000308WORD COMBINATIONS
309
310It is possible to spell-check words that include a space. This is used to
311recognize words that are invalid when used by themselves, e.g. for "et al.".
312It can also be used to recognize "the the" and highlight it.
313
314The number of spaces is irrelevant. In most cases a line break may also
315appear. However, this makes it difficult to find out where to start checking
316for spelling mistakes. When you make a change to one line and only that line
317is redrawn Vim won't look in the previous line, thus when "et" is at the end
318of the previous line "al." will be flagged as an error. And when you type
319"the<CR>the" the highlighting doesn't appear until the first line is redrawn.
320Use |CTRL-L| to redraw right away. "[s" will also stop at a word combination
321with a line break.
322
323When encountering a line break Vim skips characters such as '*', '>' and '"',
324so that comments in C, shell and Vim code can be spell checked.
325
326
Bram Moolenaar9d0ec2e2005-04-20 19:45:58 +0000327SYNTAX HIGHLIGHTING *spell-syntax*
Bram Moolenaar6bb68362005-03-22 23:03:44 +0000328
329Files that use syntax highlighting can specify where spell checking should be
330done:
331
Bram Moolenaar3638c682005-06-08 22:05:14 +00003321. everywhere default
3332. in specific items use "contains=@Spell"
3343. everywhere but specific items use "contains=@NoSpell"
Bram Moolenaar6bb68362005-03-22 23:03:44 +0000335
Bram Moolenaar3638c682005-06-08 22:05:14 +0000336For the second method adding the @NoSpell cluster will disable spell checking
337again. This can be used, for example, to add @Spell to the comments of a
338program, and add @NoSpell for items that shouldn't be checked.
Bram Moolenaar6bb68362005-03-22 23:03:44 +0000339
Bram Moolenaar30abd282005-06-22 22:35:10 +0000340
341VIM SCRIPTS
342
343If you want to write a Vim script that does something with spelling, you may
344find these functions useful:
345
346 spellbadword() find badly spelled word at the cursor
347 spellsuggest() get list of spelling suggestions
Bram Moolenaard042c562005-06-30 22:04:15 +0000348 soundfold() get the sound-a-like version of a word
Bram Moolenaar30abd282005-06-22 22:35:10 +0000349
Bram Moolenaar90cfdbe2005-08-12 19:59:19 +0000350
351SETTING 'spellcapcheck' AUTOMATICALLY *set-spc-auto*
352
353After the 'spelllang' option has been set successfully, Vim will source the
354files "spell/LANG.vim" in 'runtimepath'. "LANG" is the value of 'spelllang'
355up to the first comma, dot or underscore. This can be used to set options
356specifically for the language, especially 'spellcapcheck'.
357
358The distribution includes a few of these files. Use this command to see what
359they do: >
360 :next $VIMRUNTIME/spell/*.vim
361
362Note that the default scripts don't set 'spellcapcheck' if it was changed from
363the default value. This assumes the user prefers another value then.
364
Bram Moolenaarae5bce12005-08-15 21:41:48 +0000365
366DOUBLE SCORING *spell-double-scoring*
367
368The 'spellsuggest' option can be used to select "double" scoring. This
369mechanism is based on the principle that there are two kinds of spelling
370mistakes:
371
3721. You know how to spell the word, but mistype something. This results in a
373 small editing distance (character swapped/omitted/inserted) and possibly a
374 word that sounds completely different.
375
3762. You don't know how to spell the word and type something that sounds right.
377 The edit distance can be big but the word is similar after sound-folding.
378
379Since scores for these two mistakes will be very different we use a list
380for each and mix them.
381
382The sound-folding is slow and people that know the language won't make the
383second kind of mistakes. Therefore 'spellsuggest' can be set to select the
384preferred method for scoring the suggestions.
385
Bram Moolenaar217ad922005-03-20 22:37:15 +0000386==============================================================================
Bram Moolenaard042c562005-06-30 22:04:15 +00003873. Generating a spell file *spell-mkspell*
Bram Moolenaar217ad922005-03-20 22:37:15 +0000388
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000389Vim uses a binary file format for spelling. This greatly speeds up loading
390the word list and keeps it small.
Bram Moolenaar9a50b1b2005-06-27 22:48:21 +0000391 *.aff* *.dic* *Myspell*
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000392You can create a Vim spell file from the .aff and .dic files that Myspell
393uses. Myspell is used by OpenOffice.org and Mozilla. You should be able to
394find them here:
395 http://lingucomponent.openoffice.org/spell_dic.html
Bram Moolenaar30abd282005-06-22 22:35:10 +0000396You can also use a plain word list. The results are the same, the choice
Bram Moolenaard042c562005-06-30 22:04:15 +0000397depends on what word lists you can find.
Bram Moolenaar217ad922005-03-20 22:37:15 +0000398
Bram Moolenaar7d1f5db2005-07-03 21:39:27 +0000399If you install Aap (from www.a-a-p.org) you can use the recipes in the
400runtime/spell/??/ directories. Aap will take care of downloading the files,
401apply patches needed for Vim and build the .spl file.
402
Bram Moolenaare13305e2005-06-19 22:54:15 +0000403Make sure your current locale is set properly, otherwise Vim doesn't know what
404characters are upper/lower case letters. If the locale isn't available (e.g.,
405when using an MS-Windows codepage on Unix) add tables to the .aff file
Bram Moolenaar3b506942005-06-23 22:36:45 +0000406|spell-affix-chars|. If the .aff file doesn't define a table then the word
407table of the currently active spelling is used. If spelling is not active
408then Vim will try to guess.
Bram Moolenaare13305e2005-06-19 22:54:15 +0000409
Bram Moolenaar3b506942005-06-23 22:36:45 +0000410 *:mksp* *:mkspell*
411:mksp[ell][!] [-ascii] {outname} {inname} ...
Bram Moolenaar82cf9b62005-06-07 21:09:25 +0000412 Generate a Vim spell file word lists. Example: >
Bram Moolenaard042c562005-06-30 22:04:15 +0000413 :mkspell /tmp/nl nl_NL.words
Bram Moolenaar3b506942005-06-23 22:36:45 +0000414< *E751*
Bram Moolenaar82cf9b62005-06-07 21:09:25 +0000415 When {outname} ends in ".spl" it is used as the output
416 file name. Otherwise it should be a language name,
Bram Moolenaar3b506942005-06-23 22:36:45 +0000417 such as "en", without the region name. The file
418 written will be "{outname}.{encoding}.spl", where
419 {encoding} is the value of the 'encoding' option.
Bram Moolenaar75c50c42005-06-04 22:06:24 +0000420
Bram Moolenaard042c562005-06-30 22:04:15 +0000421 When the output file already exists [!] must be used
Bram Moolenaar520470a2005-06-16 21:59:56 +0000422 to overwrite it.
423
Bram Moolenaar0e21a3f2005-04-17 20:28:32 +0000424 When the [-ascii] argument is present, words with
425 non-ascii characters are skipped. The resulting file
Bram Moolenaar82cf9b62005-06-07 21:09:25 +0000426 ends in "ascii.spl".
Bram Moolenaar75c50c42005-06-04 22:06:24 +0000427
428 The input can be the Myspell format files {inname}.aff
429 and {inname}.dic. If {inname}.aff does not exist then
430 {inname} is used as the file name of a plain word
431 list.
432
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000433 Multiple {inname} arguments can be given to combine
434 regions into one Vim spell file. Example: >
435 :mkspell ~/.vim/spell/en /tmp/en_US /tmp/en_CA /tmp/en_AU
436< This combines the English word lists for US, CA and AU
437 into one en.spl file.
438 Up to eight regions can be combined. *E754* *755*
Bram Moolenaar9ba0eb82005-06-13 22:28:56 +0000439 The REP and SAL items of the first .aff file where
Bram Moolenaar6f16eb82005-08-23 21:02:42 +0000440 they appear are used. |spell-REP| |spell-SAL|
Bram Moolenaar217ad922005-03-20 22:37:15 +0000441
Bram Moolenaar30abd282005-06-22 22:35:10 +0000442 This command uses a lot of memory, required to find
Bram Moolenaar8aff23a2005-08-19 20:40:30 +0000443 the optimal word tree (Polish, Italian and Hungarian
444 require several hundred Mbyte). The final result will
445 be much smaller, because compression is used. To
446 avoid running out of memory compression will be done
447 now and then. This can be tuned with the 'mkspellmem'
448 option.
Bram Moolenaar30abd282005-06-22 22:35:10 +0000449
Bram Moolenaard042c562005-06-30 22:04:15 +0000450 After the spell file was written and it was being used
451 in a buffer it will be reloaded automatically.
Bram Moolenaar45eeb132005-06-06 21:59:07 +0000452
Bram Moolenaar9a50b1b2005-06-27 22:48:21 +0000453:mksp[ell] [-ascii] {name}.{enc}.add
454 Like ":mkspell" above, using {name}.{enc}.add as the
Bram Moolenaard042c562005-06-30 22:04:15 +0000455 input file and producing an output file in the same
456 directory that has ".spl" appended.
Bram Moolenaar9a50b1b2005-06-27 22:48:21 +0000457
458:mksp[ell] [-ascii] {name}
459 Like ":mkspell" above, using {name} as the input file
Bram Moolenaard042c562005-06-30 22:04:15 +0000460 and producing an output file in the same directory
461 that has ".{enc}.spl" appended.
Bram Moolenaar82cf9b62005-06-07 21:09:25 +0000462
Bram Moolenaarae5bce12005-08-15 21:41:48 +0000463Vim will report the number of duplicate words. This might be a mistake in the
464list of words. But sometimes it is used to have different prefixes and
465suffixes for the same basic word to avoid them combining (e.g. Czech uses
Bram Moolenaar8aff23a2005-08-19 20:40:30 +0000466this). If you want Vim to report all duplicate words set the 'verbose'
467option.
Bram Moolenaarae5bce12005-08-15 21:41:48 +0000468
Bram Moolenaar82cf9b62005-06-07 21:09:25 +0000469Since you might want to change a Myspell word list for use with Vim the
470following procedure is recommended:
Bram Moolenaar217ad922005-03-20 22:37:15 +0000471
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +00004721. Obtain the xx_YY.aff and xx_YY.dic files from Myspell.
4732. Make a copy of these files to xx_YY.orig.aff and xx_YY.orig.dic.
4743. Change the xx_YY.aff and xx_YY.dic files to remove bad words, add missing
Bram Moolenaar0cb032e2005-04-23 20:52:00 +0000475 words, define word characters with FOL/LOW/UPP, etc. The distributed
476 "src/spell/*.diff" files can be used.
Bram Moolenaard042c562005-06-30 22:04:15 +00004774. Start Vim with the right locale and use |:mkspell| to generate the Vim
478 spell file.
4795. Try out the spell file with ":set spell spelllang=xx" if you wrote it in
Bram Moolenaar7d1f5db2005-07-03 21:39:27 +0000480 a spell directory in 'runtimepath', or ":set spelllang=xx.enc.spl" if you
Bram Moolenaard042c562005-06-30 22:04:15 +0000481 wrote it somewhere else.
Bram Moolenaar217ad922005-03-20 22:37:15 +0000482
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000483When the Myspell files are updated you can merge the differences:
Bram Moolenaar0cb032e2005-04-23 20:52:00 +00004841. Obtain the new Myspell files as xx_YY.new.aff and xx_UU.new.dic.
4852. Use Vimdiff to see what changed: >
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000486 vimdiff xx_YY.orig.dic xx_YY.new.dic
Bram Moolenaar0cb032e2005-04-23 20:52:00 +00004873. Take over the changes you like in xx_YY.dic.
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000488 You may also need to change xx_YY.aff.
Bram Moolenaar0cb032e2005-04-23 20:52:00 +00004894. Rename xx_YY.new.dic to xx_YY.orig.dic and xx_YY.new.aff to xx_YY.new.aff.
Bram Moolenaar217ad922005-03-20 22:37:15 +0000490
Bram Moolenaar3b506942005-06-23 22:36:45 +0000491
Bram Moolenaar8aff23a2005-08-19 20:40:30 +0000492SPELL FILE VERSIONS *E770* *E771* *E772*
493
494Spell checking is a relatively new feature in Vim, thus it's possible that the
495.spl file format will be changed to support more languages. Vim will check
496the validity of the spell file and report anything wrong.
497
498 E771: Old spell file, needs to be updated ~
499This spell file is older than your Vim. You need to update the .spl file.
500
501 E772: Spell file is for newer version of Vim ~
502This means the spell file was made for a later version of Vim. You need to
503update Vim.
504
505 E770: Unsupported section in spell file ~
506This means the spell file was made for a later version of Vim and contains a
507section that is required for the spell file to work. In this case it's
508probably a good idea to upgrade your Vim.
509
510
Bram Moolenaar3b506942005-06-23 22:36:45 +0000511SPELL FILE DUMP
512
513If for some reason you want to check what words are supported by the currently
514used spelling files, use this command:
515
516 *:spelldump* *:spelld*
517:spelld[ump] Open a new window and fill it with all currently valid
Bram Moolenaarac6e65f2005-08-29 22:25:38 +0000518 words. Compound words are not included.
Bram Moolenaard042c562005-06-30 22:04:15 +0000519 Note: For some languages the result may be enormous,
520 causing Vim to run out of memory.
Bram Moolenaar3b506942005-06-23 22:36:45 +0000521
522The format of the word list is used |spell-wordlist-format|. You should be
523able to read it with ":mkspell" to generate one .spl file that includes all
524the words.
525
Bram Moolenaar1f8a5f02005-07-01 22:41:52 +0000526When all entries to 'spelllang' use the same regions or no regions at all then
527the region information is included in the dumped words. Otherwise only words
528for the current region are included and no "/regions" line is generated.
Bram Moolenaar3b506942005-06-23 22:36:45 +0000529
Bram Moolenaard042c562005-06-30 22:04:15 +0000530Comment lines with the name of the .spl file are used as a header above the
531words that were generated from that .spl file.
Bram Moolenaar3b506942005-06-23 22:36:45 +0000532
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000533==============================================================================
Bram Moolenaard042c562005-06-30 22:04:15 +00005344. Spell file format *spell-file-format*
Bram Moolenaar217ad922005-03-20 22:37:15 +0000535
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000536This is the format of the files that are used by the person who creates and
537maintains a word list.
Bram Moolenaar217ad922005-03-20 22:37:15 +0000538
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000539Note that we avoid the word "dictionary" here. That is because the goal of
540spell checking differs from writing a dictionary (as in the book). For
Bram Moolenaard042c562005-06-30 22:04:15 +0000541spelling we need a list of words that are OK, thus should not to be
542highlighted. Person and company names will not appear in a dictionary, but do
543appear in a word list. And some old words are rarely used while they are
544common misspellings. These do appear in a dictionary but not in a word list.
Bram Moolenaar217ad922005-03-20 22:37:15 +0000545
Bram Moolenaar7d1f5db2005-07-03 21:39:27 +0000546There are two formats: A straight list of words and a list using affix
Bram Moolenaard042c562005-06-30 22:04:15 +0000547compression. The files with affix compression are used by Myspell (Mozilla
548and OpenOffice.org). This requires two files, one with .aff and one with .dic
549extension.
Bram Moolenaar75c50c42005-06-04 22:06:24 +0000550
551
Bram Moolenaard042c562005-06-30 22:04:15 +0000552FORMAT OF STRAIGHT WORD LIST *spell-wordlist-format*
Bram Moolenaar75c50c42005-06-04 22:06:24 +0000553
Bram Moolenaar82cf9b62005-06-07 21:09:25 +0000554The words must appear one per line. That is all that is required.
Bram Moolenaard042c562005-06-30 22:04:15 +0000555
Bram Moolenaar82cf9b62005-06-07 21:09:25 +0000556Additionally the following items are recognized:
Bram Moolenaard042c562005-06-30 22:04:15 +0000557
Bram Moolenaar75c50c42005-06-04 22:06:24 +0000558- Empty and blank lines are ignored.
Bram Moolenaard042c562005-06-30 22:04:15 +0000559
Bram Moolenaar75c50c42005-06-04 22:06:24 +0000560- Lines starting with a # are ignored (comment lines).
Bram Moolenaard042c562005-06-30 22:04:15 +0000561
Bram Moolenaar45eeb132005-06-06 21:59:07 +0000562- A line starting with "/encoding=", before any word, specifies the encoding
563 of the file. After the second '=' comes an encoding name. This tells Vim
Bram Moolenaard042c562005-06-30 22:04:15 +0000564 to setup conversion from the specified encoding to 'encoding'. Thus you can
565 use one word list for several target encodings.
566
Bram Moolenaar3638c682005-06-08 22:05:14 +0000567- A line starting with "/regions=" specifies the region names that are
568 supported. Each region name must be two ASCII letters. The first one is
569 region 1. Thus "/regions=usca" has region 1 "us" and region 2 "ca".
Bram Moolenaard042c562005-06-30 22:04:15 +0000570 In an addition word list the region names should be equal to the main word
571 list!
572
Bram Moolenaar82cf9b62005-06-07 21:09:25 +0000573- Other lines starting with '/' are reserved for future use. The ones that
574 are not recognized are ignored (but you do get a warning message).
Bram Moolenaar75c50c42005-06-04 22:06:24 +0000575
Bram Moolenaar1f8a5f02005-07-01 22:41:52 +0000576- A "/" may follow the word with the following items:
577 = Case must match exactly.
578 ? Rare word.
579 ! Bad (wrong) word.
580 digit A region in which the word is valid. If no regions are
581 specified the word is valid in all regions.
582
Bram Moolenaar3638c682005-06-08 22:05:14 +0000583Example:
584
585 # This is an example word list comment
586 /encoding=latin1 encoding of the file
587 /regions=uscagb regions "us", "ca" and "gb"
588 example word for all regions
Bram Moolenaar1f8a5f02005-07-01 22:41:52 +0000589 blah/12 word for regions "us" and "ca"
590 vim/! bad word
591 Campbell/?3 rare word in region 3 "gb"
592 's mornings/= keep-case word
Bram Moolenaar3638c682005-06-08 22:05:14 +0000593
Bram Moolenaar0dc065e2005-07-04 22:49:24 +0000594Note that when "/=" is used the same word with all upper-case letters is not
595accepted. This is different from a word with mixed case that is automatically
596marked as keep-case, those words may appear in all upper-case letters.
597
Bram Moolenaar75c50c42005-06-04 22:06:24 +0000598
599FORMAT WITH AFFIX COMPRESSION
600
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000601There are two files: the basic word list and an affix file. The affixes are
602used to modify the basic words to get the full word list. This significantly
603reduces the number of words, especially for a language like Polish. This is
604called affix compression.
Bram Moolenaar217ad922005-03-20 22:37:15 +0000605
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000606The basic word list and the affix file are combined and turned into a binary
607spell file. All the preprocessing has been done, thus this file loads fast.
608The binary spell file format is described in the source code (src/spell.c).
609But only developers need to know about it.
610
611The preprocessing also allows us to take the Myspell language files and modify
612them before the Vim word list is made. The tools for this can be found in the
613"src/spell" directory.
614
Bram Moolenaar6f16eb82005-08-23 21:02:42 +0000615The format for the affix and word list files is based on what Myspell uses
616(the spell checker of Mozilla and OpenOffice.org). A description can be found
617here:
618 http://lingucomponent.openoffice.org/affix.readme ~
619Note that affixes are case sensitive, this isn't obvious from the description.
620
621Vim does not use the TRY item, it is ignored. For making suggestions the
622possible characters in the words are used.
623
624Vim supports quite a few extras. They are described below |spell-affix-vim|.
625Attempts have been made to keep this compatible with other spell checkers, so
626that the same files can be used.
627
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000628
Bram Moolenaar3638c682005-06-08 22:05:14 +0000629WORD LIST FORMAT *spell-dic-format*
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000630
631A very short example, with line numbers:
632
633 1 1234
634 2 aan
635 3 Als
636 4 Etten-Leur
637 5 et al.
638 6 's-Gravenhage
639 7 's-Gravenhaags
640 8 bedel/P
641 9 kado/1
642 10 cadeau/2
Bram Moolenaar6f16eb82005-08-23 21:02:42 +0000643 11 TCP,IP
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000644
Bram Moolenaar82cf9b62005-06-07 21:09:25 +0000645The first line contains the number of words. Vim ignores it, but you do get
646an error message if it's not there. *E760*
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000647
Bram Moolenaar82cf9b62005-06-07 21:09:25 +0000648What follows is one word per line. There should be no white space before or
Bram Moolenaar6e7c7f32005-08-24 22:16:11 +0000649after the word. After the word there is an optional slash and flags. Most of
650these flags are letters that indicate the affixes that can be used with this
651word. These are specified with SFX and PFX lines in the .aff file. See the
652Myspell documentation. Vim allows using other flag types with the FLAG item
653in the affix file |spell-FLAG|.
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000654
655When the word only has lower-case letters it will also match with the word
656starting with an upper-case letter.
657
658When the word includes an upper-case letter, this means the upper-case letter
659is required at this position. The same word with a lower-case letter at this
660position will not match. When some of the other letters are upper-case it will
661not match either.
662
Bram Moolenaard042c562005-06-30 22:04:15 +0000663The word with all upper-case characters will always be OK.
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000664
665 word list matches does not match ~
666 als als Als ALS ALs AlS aLs aLS
667 Als Als ALS als ALs AlS aLs aLS
668 ALS ALS als Als ALs AlS aLs aLS
669 AlS AlS ALS als Als ALs aLs aLS
670
Bram Moolenaar82cf9b62005-06-07 21:09:25 +0000671The KEP affix ID can be used to specifically match a word with identical case
Bram Moolenaar6f16eb82005-08-23 21:02:42 +0000672only, see below |spell-KEP|.
Bram Moolenaar45eeb132005-06-06 21:59:07 +0000673
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000674Note in line 5 to 7 that non-word characters are used. You can include
675any character in a word. When checking the text a word still only matches
676when it appears with a non-word character before and after it. For Myspell a
677word starting with a non-word character probably won't work.
678
Bram Moolenaar6f16eb82005-08-23 21:02:42 +0000679In line 12 the word "TCP/IP" is defined. Since the slash has a special
680meaning the comma is used instead. This is defined with the SLASH item in the
681affix file, see |spell-SLASH|. Note that without this SLASH item the
682word will be "TCP,IP".
683
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000684 *spell-affix-vim*
Bram Moolenaar82cf9b62005-06-07 21:09:25 +0000685A flag that Vim adds and is not in Myspell is the flag defined with KEP in the
Bram Moolenaar45eeb132005-06-06 21:59:07 +0000686affix file. This has the meaning that case matters. This can be used if the
687word does not have the first letter in upper case at the start of a sentence.
Bram Moolenaar82cf9b62005-06-07 21:09:25 +0000688Example (assuming that = was used for KEP):
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000689
Bram Moolenaar0dc065e2005-07-04 22:49:24 +0000690 word list matches does not match ~
691 's morgens/= 's morgens 'S morgens 's Morgens 'S MORGENS
692 's Morgens 's Morgens 'S MORGENS 'S morgens 's morgens
693
694The flag can also be used to avoid that the word matches when it is in all
695upper-case letters.
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000696
697 *spell-affix-mbyte*
698The basic word list is normally in an 8-bit encoding, which is mentioned in
699the affix file. The affix file must always be in the same encoding as the
700word list. This is compatible with Myspell. For Vim the encoding may also be
701something else, any encoding that "iconv" supports. The "SET" line must
702specify the name of the encoding. When using a multi-byte encoding it's
Bram Moolenaard042c562005-06-30 22:04:15 +0000703possible to use more different affixes (but Myspell doesn't support that, thus
704you may not want to use it anyway).
Bram Moolenaar13fcaaf2005-04-15 21:13:42 +0000705
Bram Moolenaare13305e2005-06-19 22:54:15 +0000706
707CHARACTER TABLES
Bram Moolenaar0cb032e2005-04-23 20:52:00 +0000708 *spell-affix-chars*
Bram Moolenaar82cf9b62005-06-07 21:09:25 +0000709When using an 8-bit encoding the affix file should define what characters are
710word characters (as specified with ENC). This is because the system where
711":mkspell" is used may not support a locale with this encoding and isalpha()
712won't work. For example when using "cp1250" on Unix.
Bram Moolenaar0cb032e2005-04-23 20:52:00 +0000713
Bram Moolenaar6f16eb82005-08-23 21:02:42 +0000714 *E761* *E762* *spell-FOL*
715 *spell-LOW* *spell-UPP*
Bram Moolenaar0cb032e2005-04-23 20:52:00 +0000716Three lines in the affix file are needed. Simplistic example:
717
Bram Moolenaare13305e2005-06-19 22:54:15 +0000718 FOL áëñ ~
719 LOW áëñ ~
720 UPP ÁËÑ ~
Bram Moolenaar0cb032e2005-04-23 20:52:00 +0000721
722All three lines must have exactly the same number of characters.
723
724The "FOL" line specifies the case-folded characters. These are used to
725compare words while ignoring case. For most encodings this is identical to
726the lower case line.
727
728The "LOW" line specifies the characters in lower-case. Mostly it's equal to
729the "FOL" line.
730
731The "UPP" line specifies the characters with upper-case. That is, a character
732is upper-case where it's different from the character at the same position in
733"FOL".
734
Bram Moolenaar6f16eb82005-08-23 21:02:42 +0000735An exception is made for the German sharp s ß. The upper-case version is
736"SS". In the FOL/LOW/UPP lines it should be included, so that it's recognized
737as a word character, but use the ß character in all three.
738
Bram Moolenaar0cb032e2005-04-23 20:52:00 +0000739ASCII characters should be omitted, Vim always handles these in the same way.
740When the encoding is UTF-8 no word characters need to be specified.
741
742 *E763*
Bram Moolenaar3b506942005-06-23 22:36:45 +0000743Vim allows you to use spell checking for several languages in the same file.
744You can list them in the 'spelllang' option. As a consequence all spell files
745for the same encoding must use the same word characters, otherwise they can't
746be combined without errors. If you get a warning that the word tables differ
747you may need to generate the .spl file again with |:mkspell|. Check the FOL,
748LOW and UPP lines in the used .aff file.
749
750The XX.ascii.spl spell file generated with the "-ascii" argument will not
751contain the table with characters, so that it can be combine with spell files
752for any encoding. The .add.spl files also do not contain the table.
Bram Moolenaar0cb032e2005-04-23 20:52:00 +0000753
Bram Moolenaare7566042005-06-17 22:00:15 +0000754
Bram Moolenaar9a50b1b2005-06-27 22:48:21 +0000755MID-WORD CHARACTERS
756 *spell-midword*
757Some characters are only to be considered word characters if they are used in
758between two ordinary word characters. An example is the single quote: It is
759often used to put text in quotes, thus it can't be recognized as a word
760character, but when it appears in between word characters it must be part of
761the word. This is needed to detect a spelling error such as they'are. That
762should be they're, but since "they" and "are" are words themselves that would
763go unnoticed.
764
765These characters are defined with MIDWORD in the .aff file:
766
767 MIDWORD '- ~
768
769
Bram Moolenaar6e7c7f32005-08-24 22:16:11 +0000770FLAG TYPES *spell-FLAG*
771
772Flags are used to specify the affixes that can be used with a word and for
773other properties of the word. Normally single-character flags are used. This
774limits the number of possible flags, especially for 8-bit encodings. The FLAG
775item can be used if more affixes are to be used. Possible values:
776
777 FLAG long use two-character flags
778 FLAG num use numbers, from 1 up to 65000
Bram Moolenaar81f1ecb2005-08-25 21:27:31 +0000779 FLAG caplong use one-character flags without A-Z and two-character
Bram Moolenaar6e7c7f32005-08-24 22:16:11 +0000780 flags that start with A-Z
781
782With "FLAG num" the numbers in a list of affixes need to be separated with a
783comma: "234,2143,1435". This method is inefficient, but useful if the file is
784generated with a program.
785
Bram Moolenaar81f1ecb2005-08-25 21:27:31 +0000786When using "caplong" the two-character flags all start with a capital: "Aa",
787"B1", "BB", etc. This is useful to use one-character flags for the most
788common items and two-character flags for uncommon items.
Bram Moolenaar6e7c7f32005-08-24 22:16:11 +0000789
790Note: When using utf-8 only characters up to 65000 may be used for flags.
791
792
Bram Moolenaare13305e2005-06-19 22:54:15 +0000793AFFIXES
Bram Moolenaar6f16eb82005-08-23 21:02:42 +0000794 *spell-PFX* *spell-SFX*
Bram Moolenaare13305e2005-06-19 22:54:15 +0000795The usual PFX (prefix) and SFX (suffix) lines are supported (see the Myspell
Bram Moolenaar9a50b1b2005-06-27 22:48:21 +0000796documentation or the Aspell manual:
797http://aspell.net/man-html/Affix-Compression.html).
Bram Moolenaare13305e2005-06-19 22:54:15 +0000798
Bram Moolenaar9a50b1b2005-06-27 22:48:21 +0000799Note that Myspell ignores any extra text after the relevant info. Vim
800requires this text to start with a "#" so that mistakes don't go unnoticed.
801Example:
802
803 SFX F 0 in [^i]n # Spion > Spionin ~
804 SFX F 0 nen in # Bauerin > Bauerinnen ~
805
Bram Moolenaar81f1ecb2005-08-25 21:27:31 +0000806Apparently Myspell allows an affix name to appear more than once. Since this
807might also be a mistake, Vim checks for an extra "S". The affix files for
808Myspell that use this feature apparently have this flag. Example:
809
810 SFX a Y 1 S ~
811 SFX a 0 an . ~
812
813 SFX a Y 2 S ~
814 SFX a 0 en . ~
815 SFX a 0 on . ~
816
Bram Moolenaar8aff23a2005-08-19 20:40:30 +0000817 *spell-affix-rare*
Bram Moolenaar9a50b1b2005-06-27 22:48:21 +0000818An extra item for Vim is the "rare" flag. It must come after the other
819fields, before a comment. When used then all words that use the affix will be
820marked as rare words. Example:
821
822 PFX F 0 nene . rare ~
823 SFX F 0 oin n rare # hardly ever used ~
824
825However, if the word also appears as a good word in another way it won't be
826marked as rare.
Bram Moolenaare13305e2005-06-19 22:54:15 +0000827
Bram Moolenaar8aff23a2005-08-19 20:40:30 +0000828 *spell-affix-nocomp*
829Another extra item for Vim is the "nocomp" flag. It must come after the other
Bram Moolenaar90915b52005-08-21 22:17:52 +0000830fields, before a comment. It can be either before or after "rare". When
831present then all words that use the affix will not be part of a compound word.
Bram Moolenaar8aff23a2005-08-19 20:40:30 +0000832Example:
833 affix file:
834 COMPOUNDFLAG c ~
835 SFX a Y 2 ~
836 SFX a 0 s . ~
837 SFX a 0 ize . nocomp ~
838 dictionary:
839 word/c ~
840 util/ac ~
841
842This allows for "wordutil" and "wordutils" but not "wordutilize".
843
Bram Moolenaar6f16eb82005-08-23 21:02:42 +0000844 *spell-PFXPOSTPONE*
Bram Moolenaare13305e2005-06-19 22:54:15 +0000845When an affix file has very many prefixes that apply to many words it's not
846possible to build the whole word list in memory. This applies to Hebrew (a
847list with all words is over a Gbyte). In that case applying prefixes must be
848postponed. This makes spell checking slower. It is indicated by this keyword
849in the .aff file:
850
851 PFXPOSTPONE ~
852
853Only prefixes without a chop string can be postponed, prefixes with a chop
Bram Moolenaar78984f52005-08-01 07:19:10 +0000854string will still be included in the word list. An exception if the chop
855string is one character and equal to the last character of the added string,
856but in lower case. Thus when the chop string is used to allow the following
857word to start with an upper case letter.
Bram Moolenaare13305e2005-06-19 22:54:15 +0000858
Bram Moolenaarae5bce12005-08-15 21:41:48 +0000859
Bram Moolenaar6f16eb82005-08-23 21:02:42 +0000860WORDS WITH A SLASH *spell-SLASH*
Bram Moolenaarae5bce12005-08-15 21:41:48 +0000861
862The slash is used in the .dic file to separate the basic word from the affix
863letters that can be used. Unfortunately, this means you cannot use a slash in
864a word. Thus "TCP/IP" cannot be a word. To work around that you can define a
865replacement character for the slash. Example:
866
867 SLASH , ~
868
869Now you can use "TCP,IP" to add the word "TCP/IP".
870
871Of course, the letter used should itself not appear in any word! The letter
872must be ASCII, thus a single byte.
873
874
Bram Moolenaar6f16eb82005-08-23 21:02:42 +0000875KEEP-CASE WORDS *spell-KEP*
Bram Moolenaarae5bce12005-08-15 21:41:48 +0000876
Bram Moolenaar82cf9b62005-06-07 21:09:25 +0000877In the affix file a KEP line can be used to define the affix name used for
Bram Moolenaar45eeb132005-06-06 21:59:07 +0000878keep-case words. Example:
879
Bram Moolenaar82cf9b62005-06-07 21:09:25 +0000880 KEP = ~
Bram Moolenaar45eeb132005-06-06 21:59:07 +0000881
882See above for an example |spell-affix-vim|.
883
Bram Moolenaare13305e2005-06-19 22:54:15 +0000884
Bram Moolenaar6f16eb82005-08-23 21:02:42 +0000885RARE WORDS *spell-RAR*
Bram Moolenaarae5bce12005-08-15 21:41:48 +0000886
Bram Moolenaar45eeb132005-06-06 21:59:07 +0000887In the affix file a RAR line can be used to define the affix name used for
888rare words. Example:
889
890 RAR ? ~
891
892Rare words are highlighted differently from bad words. This is to be used for
893words that are correct for the language, but are hardly ever used and could be
Bram Moolenaar30abd282005-06-22 22:35:10 +0000894a typing mistake anyway. When the same word is found as good it won't be
895highlighted as rare.
896
897
Bram Moolenaar6f16eb82005-08-23 21:02:42 +0000898BAD WORDS *spell-BAD*
Bram Moolenaarae5bce12005-08-15 21:41:48 +0000899
Bram Moolenaar30abd282005-06-22 22:35:10 +0000900In the affix file a BAD line can be used to define the affix name used for
901bad words. Example:
902
903 BAD ! ~
904
905This can be used to exclude words that would otherwise be good. For example
Bram Moolenaar9a50b1b2005-06-27 22:48:21 +0000906"the the" in the .dic file:
907
908 the the/! ~
909
910Once a word has been marked as bad it won't be undone by encountering the same
911word as good.
Bram Moolenaar45eeb132005-06-06 21:59:07 +0000912
Bram Moolenaar6f16eb82005-08-23 21:02:42 +0000913 *spell-NEEDAFFIX*
Bram Moolenaar8aff23a2005-08-19 20:40:30 +0000914The NEEDAFFIX flag is used to require that a word is used with an affix. The
915word itself is not a good word. Example:
916
917 NEEDAFFIX + ~
918
Bram Moolenaarac6e65f2005-08-29 22:25:38 +0000919 *spell-NEEDCOMPOUND*
920The NEEDCOMPOUND flag is used to require that a word is used as part of a
921compound word The word itself is not a good word. Example:
922
923 NEEDCOMPOUND & ~
924
Bram Moolenaar45eeb132005-06-06 21:59:07 +0000925
Bram Moolenaar6f16eb82005-08-23 21:02:42 +0000926COMPOUND WORDS *spell-compound*
Bram Moolenaarae5bce12005-08-15 21:41:48 +0000927
Bram Moolenaar8aff23a2005-08-19 20:40:30 +0000928A compound word is a longer word made by concatenating words that appear in
929the .dic file. To specify which words may be concatenated a character is
930used. This character is put in the list of affixes after the word. We will
931call this character a flag here. Obviously these flags must be different from
932any affix IDs used.
Bram Moolenaarae5bce12005-08-15 21:41:48 +0000933
934 *spell-COMPOUNDFLAG*
935The Myspell compatible method uses one flag, specified with COMPOUNDFLAG.
Bram Moolenaar8aff23a2005-08-19 20:40:30 +0000936All words with this flag combine in any order. This means there is no control
937over which word comes first. Example:
Bram Moolenaarae5bce12005-08-15 21:41:48 +0000938 COMPOUNDFLAG c ~
939
940 *spell-COMPOUNDFLAGS*
Bram Moolenaar8aff23a2005-08-19 20:40:30 +0000941A more advanced method to specify how compound words can be formed uses
942multiple items with multiple flags. This is not compatible with Myspell 3.0.
943Let's start with an example:
944 COMPOUNDFLAGS c+ ~
945 COMPOUNDFLAGS se ~
Bram Moolenaarae5bce12005-08-15 21:41:48 +0000946
Bram Moolenaar8aff23a2005-08-19 20:40:30 +0000947The first line defines that words with the "c" flag can be concatenated in any
948order. The second line defines compound words that are made of one word with
949the "s" flag and one word with the "e" flag. With this dictionary:
950 bork/c ~
951 onion/s ~
952 soup/e ~
Bram Moolenaarae5bce12005-08-15 21:41:48 +0000953
Bram Moolenaar8aff23a2005-08-19 20:40:30 +0000954You can make these words:
955 bork
956 borkbork
957 borkborkbork
958 (etc.)
Bram Moolenaarae5bce12005-08-15 21:41:48 +0000959 onion
960 soup
Bram Moolenaarae5bce12005-08-15 21:41:48 +0000961 onionsoup
Bram Moolenaarae5bce12005-08-15 21:41:48 +0000962
Bram Moolenaar8aff23a2005-08-19 20:40:30 +0000963The COMPOUNDFLAGS item may appear multiple times. The argument is made out of
964one or more groups, where each group can be:
965 one flag e.g., c
966 alternate flags inside [] e.g., [abc]
967Optionally this may be followed by:
968 * the group appears zero or more times, e.g., sm*e
969 + the group appears one or more times, e.g., c+
Bram Moolenaarae5bce12005-08-15 21:41:48 +0000970
Bram Moolenaar8aff23a2005-08-19 20:40:30 +0000971This is similar to the regexp pattern syntax (but not the same!). A few
972examples with the sequence of word flags they require:
973 COMPOUNDFLAGS x+ x xx xxx etc.
974 COMPOUNDFLAGS yz yz
975 COMPOUNDFLAGS x+z xz xxz xxxz etc.
976 COMPOUNDFLAGS yx+ yx yxx yxxx etc.
Bram Moolenaarae5bce12005-08-15 21:41:48 +0000977
Bram Moolenaar8aff23a2005-08-19 20:40:30 +0000978 COMPOUNDFLAGS [abc]z az bz cz
979 COMPOUNDFLAGS [abc]+z az aaz abaz bz baz bcbz cz caz cbaz etc.
980 COMPOUNDFLAGS a[xyz]+ ax axx axyz ay ayx ayzz az azy azxy etc.
981 COMPOUNDFLAGS sm*e se sme smme smmme etc.
982 COMPOUNDFLAGS s[xyz]*e se sxe sxye sxyxe sye syze sze szye szyxe etc.
Bram Moolenaarae5bce12005-08-15 21:41:48 +0000983
Bram Moolenaara6c840d2005-08-22 22:59:46 +0000984A specific example: Allow a compound to be made of two words and a dash:
985 In the .aff file:
986 COMPOUNDFLAGS sde ~
987 NEEDAFFIX x ~
988 COMPOUNDMAX 3 ~
989 COMPOUNDMIN 1 ~
990 In the .dic file:
991 start/s ~
992 end/e ~
993 -/xd ~
994
995This allows for the word "start-end", but not "startend".
996
Bram Moolenaarae5bce12005-08-15 21:41:48 +0000997 *spell-COMPOUNDMIN*
Bram Moolenaarac6e65f2005-08-29 22:25:38 +0000998The minimal character length of a word used for compounding is specified with
Bram Moolenaarae5bce12005-08-15 21:41:48 +0000999COMPOUNDMIN. Example:
1000 COMPOUNDMIN 5 ~
1001
Bram Moolenaarac6e65f2005-08-29 22:25:38 +00001002When omitted there is no minimal length. Obviously you could just leave out
1003the compound flag from short words instead, this feature is present for
1004compatibility with Myspell.
Bram Moolenaarae5bce12005-08-15 21:41:48 +00001005
Bram Moolenaar8aff23a2005-08-19 20:40:30 +00001006 *spell-COMPOUNDMAX*
1007The maximum number of words that can be concatenated into a compound word is
1008specified with COMPOUNDMAX. Example:
1009 COMPOUNDMAX 3 ~
Bram Moolenaarae5bce12005-08-15 21:41:48 +00001010
Bram Moolenaar8aff23a2005-08-19 20:40:30 +00001011When omitted there is no maximum. It applies to all compound words.
Bram Moolenaarae5bce12005-08-15 21:41:48 +00001012
Bram Moolenaar8aff23a2005-08-19 20:40:30 +00001013To set a limit for words with specific flags make sure the items in
1014COMPOUNDFLAGS where they appear don't allow too many words.
Bram Moolenaarae5bce12005-08-15 21:41:48 +00001015
Bram Moolenaar8aff23a2005-08-19 20:40:30 +00001016 *spell-COMPOUNDSYLMAX*
1017The maximum number of syllables that a compound word may contain is specified
1018with COMPOUNDSYLMAX. Example:
1019 COMPOUNDSYLMAX 6 ~
Bram Moolenaarae5bce12005-08-15 21:41:48 +00001020
Bram Moolenaar8aff23a2005-08-19 20:40:30 +00001021This has no effect if there is no SYLLABLE item. Without COMPOUNDSYLMAX there
1022is no limit on the number of syllables.
Bram Moolenaarae5bce12005-08-15 21:41:48 +00001023
Bram Moolenaara6c840d2005-08-22 22:59:46 +00001024If both COMPOUNDMAX and COMPOUNDSYLMAX are defined, a compound word is
1025accepted if it fits one of the criteria, thus is either made from up to
1026COMPOUNDMAX words or contains up to COMPOUNDSYLMAX syllables.
1027
Bram Moolenaar8aff23a2005-08-19 20:40:30 +00001028 *spell-SYLLABLE*
1029The SYLLABLE item defines characters or character sequences that are used to
1030count the number of syllables in a word. Example:
1031 SYLLABLE aáeéiíoóöõuúüûy/aa/au/ea/ee/ei/ie/oa/oe/oo/ou/uu/ui ~
Bram Moolenaarae5bce12005-08-15 21:41:48 +00001032
Bram Moolenaar8aff23a2005-08-19 20:40:30 +00001033Before the first slash is the set of characters that are counted for one
1034syllable, also when repeated and mixed, until the next character that is not
1035in this set. After the slash come sequences of characters that are counted
1036for one syllable. These are preferred over using characters from the set.
1037With the example "ideeen" has three syllables, counted by "i", "ee" and "e".
1038
1039Only case-folded letters need to be included.
1040
1041Above another way to restrict compounding was mentioned above: adding "nocomp"
1042after an affix causes all words that are made with that affix not be be used
1043for compounding. |spell-affix-nocomp|
1044
Bram Moolenaar6f16eb82005-08-23 21:02:42 +00001045
1046UNLIMITED COMPOUNDING *spell-NOBREAK*
1047
1048For some languages, such as Thai, there is no space in between words. This
1049looks like all words are compounded. To specify this use the NOBREAK item in
1050the affix file, without arguments:
1051 NOBREAK ~
1052
1053Vim will try to figure out where one word ends and a next starts. When there
1054are spelling mistakes this may not be quite right.
1055
Bram Moolenaar8aff23a2005-08-19 20:40:30 +00001056>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>><<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
1057NOTE: The following has not been implemented yet, because there are no word
1058lists that support this.
1059> *spell-CMP*
1060> Sometimes it is necessary to change a word when concatenating it to another,
1061> by removing a few letters, inserting something or both. It can also be useful
1062> to restrict concatenation to words that match a pattern. For this purpose CMP
1063> items can be used. They look like this:
1064> CMP {flag} {flags} {strip} {strip2} {add} {cond} {cond2}
1065>
1066> {flag} the flag, as used in COMPOUNDFLAGS for the lead word
1067> {flags} accepted flags for the following word ('.' to accept
1068> all)
1069> {strip} text to remove from the end of the lead word (zero
1070> for no stripping)
1071> {strip2} text to remove from the start of the following word
1072> (zero for no stripping)
1073> {add} text to insert between the words (zero for no
1074> addition)
1075> {cond} condition to match at the end of the lead word
1076> {cond2} condition to match at the start of the following word
1077>
1078> This is the same as what is used for SFX and PFX items, with the extra {flags}
1079> and {cond2} fields. Example:
1080> CMP f mrt 0 - . . ~
1081>
1082> When used with the food and dish word list above, this means that a dash is
1083> inserted after each food item. Thus you get "onion-soup" and
1084> "onion-tomato-salat".
1085>
1086> When there are CMP items for a compound flag the concatenation is only done
1087> when a CMP item matches.
1088>
1089> When there are no CMP items for a compound flag, then all words will be
1090> concatenated, as if there was an item:
1091> CMP {flag} . 0 0 . .
1092>
1093>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>><<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
Bram Moolenaarae5bce12005-08-15 21:41:48 +00001094
1095
Bram Moolenaar6f16eb82005-08-23 21:02:42 +00001096REPLACEMENTS *spell-REP*
Bram Moolenaar9ba0eb82005-06-13 22:28:56 +00001097
1098In the affix file REP items can be used to define common mistakes. This is
1099used to make spelling suggestions. The items define the "from" text and the
1100"to" replacement. Example:
1101
1102 REP 4 ~
1103 REP f ph ~
1104 REP ph f ~
1105 REP k ch ~
1106 REP ch k ~
1107
Bram Moolenaar6e7c7f32005-08-24 22:16:11 +00001108The first line specifies the number of REP lines following. Vim ignores the
1109number, but it must be there.
1110
Bram Moolenaard042c562005-06-30 22:04:15 +00001111Don't include simple one-character replacements or swaps. Vim will try these
1112anyway. You can include whole words if you want to, but you might want to use
1113the "file:" item in 'spellsuggest' instead.
Bram Moolenaar9ba0eb82005-06-13 22:28:56 +00001114
1115
Bram Moolenaar6f16eb82005-08-23 21:02:42 +00001116SIMILAR CHARACTERS *spell-MAP*
Bram Moolenaar9ba0eb82005-06-13 22:28:56 +00001117
Bram Moolenaard042c562005-06-30 22:04:15 +00001118In the affix file MAP items can be used to define letters that are very much
Bram Moolenaar9ba0eb82005-06-13 22:28:56 +00001119alike. This is mostly used for a letter with different accents. This is used
1120to prefer suggestions with these letters substituted. Example:
1121
1122 MAP 2 ~
1123 MAP eéëêè ~
1124 MAP uüùúû ~
1125
Bram Moolenaar6e7c7f32005-08-24 22:16:11 +00001126The first line specifies the number of MAP lines following. Vim ignores the
1127number, but the line must be there.
Bram Moolenaar9ba0eb82005-06-13 22:28:56 +00001128
Bram Moolenaard042c562005-06-30 22:04:15 +00001129Each letter must appear in only one of the MAP items. It's a bit more
1130efficient if the first letter is ASCII or at least one without accents.
Bram Moolenaare7566042005-06-17 22:00:15 +00001131
Bram Moolenaar9ba0eb82005-06-13 22:28:56 +00001132
Bram Moolenaar6f16eb82005-08-23 21:02:42 +00001133SOUND-A-LIKE *spell-SAL*
Bram Moolenaar9ba0eb82005-06-13 22:28:56 +00001134
1135In the affix file SAL items can be used to define the sounds-a-like mechanism
1136to be used. The main items define the "from" text and the "to" replacement.
Bram Moolenaard042c562005-06-30 22:04:15 +00001137Simplistic example:
Bram Moolenaar9ba0eb82005-06-13 22:28:56 +00001138
1139 SAL CIA X ~
1140 SAL CH X ~
1141 SAL C K ~
1142 SAL K K ~
1143
Bram Moolenaar7d1f5db2005-07-03 21:39:27 +00001144There are a few rules and this can become quite complicated. An explanation
Bram Moolenaard042c562005-06-30 22:04:15 +00001145how it works can be found in the Aspell manual:
Bram Moolenaar42eeac32005-06-29 22:40:58 +00001146http://aspell.net/man-html/Phonetic-Code.html.
Bram Moolenaar9ba0eb82005-06-13 22:28:56 +00001147
1148There are a few special items:
1149
1150 SAL followup true ~
1151 SAL collapse_result true ~
1152 SAL remove_accents true ~
1153
1154"1" has the same meaning as "true". Any other value means "false".
1155
Bram Moolenaar42eeac32005-06-29 22:40:58 +00001156
Bram Moolenaar6f16eb82005-08-23 21:02:42 +00001157SIMPLE SOUNDFOLDING *spell-SOFOFROM* *spell-SOFOTO*
Bram Moolenaar42eeac32005-06-29 22:40:58 +00001158
1159The SAL mechanism is complex and slow. A simpler mechanism is mapping all
1160characters to another character, mapping similar sounding characters to the
1161same character. At the same time this does case folding. You can not have
Bram Moolenaard042c562005-06-30 22:04:15 +00001162both SAL items and simple soundfolding.
Bram Moolenaar42eeac32005-06-29 22:40:58 +00001163
Bram Moolenaar7d1f5db2005-07-03 21:39:27 +00001164There are two items required: one to specify the characters that are mapped
Bram Moolenaar42eeac32005-06-29 22:40:58 +00001165and one that specifies the characters they are mapped to. They must have
1166exactly the same number of characters. Example:
1167
1168 SOFOFROM abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ ~
1169 SOFOTO ebctefghejklnnepkrstevvkesebctefghejklnnepkrstevvkes ~
1170
1171In the example all vowels are mapped to the same character 'e'. Another
Bram Moolenaard042c562005-06-30 22:04:15 +00001172method would be to leave out all vowels. Some characters that sound nearly
1173the same and are often mixed up, such as 'm' and 'n', are mapped to the same
1174character. Don't do this too much, all words will start looking alike.
Bram Moolenaar42eeac32005-06-29 22:40:58 +00001175
1176Characters that do not appear in SOFOFROM will be left out, except that all
1177white space is replaced by one space. Sequences of the same character in
1178SOFOFROM are replaced by one.
1179
1180You can use the |soundfold()| function to try out the results. Or set the
1181'verbose' option to see the score in the output of the |z?| command.
1182
1183
Bram Moolenaar217ad922005-03-20 22:37:15 +00001184 vim:tw=78:sw=4:ts=8:ft=help:norl: